Pandas

keshav
Analytics Vidhya
Published in
3 min readJan 10, 2021

--

Introduction

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

DataFrame

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating-point values, categorical data, and more) in columns. It is similar to a spreadsheet, a SQL table, or the data.frame in R.

Creating a DataFrame.

Now we are going to create an array using NumPy. and a data frame.

arr = np.random.randint(0,10,(5,3))
df = pd.DataFrame(arr)

Now we are going to create the index and column names of the data frame.

df.columns = [“C1”,”C2",”C3"]
df.index = [“R1”,”R2",”R3",”R4",”R5",]

Now, the data Frame looks like this.

To read or select the data.

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

df.iloc[2]
>> C1 9
C2 8
C3 6

.loc[2] shows the following output because .loc[2] reads every values R3 as the index starts from 0.

df.iloc[2,1]
>> 8

The output is 8 because after reading the values of R3 the values of the columns are read and the index of the columns also starts from 0.

df.iloc[0]
>>C1 0
C2 0
C3 4

It reads every value from the index R1

df.iloc[:,0]>> R1    0  
R2 1
R3 9
R4 7
R5 8

.loc

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

df.loc["R2"]
>> C1 1
C2 1
C3 2

.loc[“R2”] reads the values by calling the index name.

df.loc["R2","C1"]
>> 1

.loc[“R2”,”c1"] reads the value from index R2 and the values of C1 from index R2

df.loc["R1":"R3"]

It reads every value from R1 and R3

df.loc["R1":"R3" , "C1":"C2"]

It reads the values from the index R1 to R3 from the column C1 to C2.

Thank you!

--

--