Quick Introduction to Python Pandas
Pandas is Python Data Analysis Library. It was developed by Wes Mckinney.
Python Pandas is an open-source, fast, easy-to-use tool for data manipulation and data analysis. It provides rich data structures and functions to make data analysis easy and fast.
Install Pandas
It is very easy to install Pandas module with PIP tool -
pip install pandas
On successful installation, it returns something like this -
Installing collected packages: pandas
Successfully installed pandas-1.0.1
Pandas Data Structures
Pandas has two mostly used workhorse data structures - Series and Data Frames.
Series
A Series is one-dimensional data structure containing an array of data and an associative array of data labels, called its index.
pandas.Series( data, index, dtype, copy)
Here, the data can be ndarray, lists, constants, index is unique value, dtype is datatype, copy is a copy data, which is by default FALSE.
Example-import pandas as pd
getData = pd.Series(['a','b','c','d','e'],
index=[1,2,3,4,5])
print(getData)
#retrieve the second element
print(getData[1])
#retrieve the first three elements
print(getData[:3])
Output of the above code-
1 a
2 b
3 c
4 d
5 e
dtype: object
a
1 a
2 b
3 c
dtype: object
Data Frame
A DataFrame is a tabular-like data structure containing an ordered collection of columns. The each column can be of different data types, like numeric, Boolean, strings, etc. It has both a row and a column index.
pandas.DataFrame( data, index, columns, dtype, copy)
Here, the data can be ndarray, series, dict, map, lists, constants and also dataframe. The index is used for the resulting frame, the columns are used for column labels, dtype is the data type of each column and the copy is used for copying data.
Example-
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'])
print(getData)
Output of the above code -
Item Quantity
0 Biscuit 6
1 Chips 3
2 Candy 20
Pandas Iteration
Python Pandas provides three methods for iterations - iteritems(), iterrows() and itertuples().
iteritems()
It iterates over the key, value pairs.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'])
for key,value in getData.iteritems():
print(key, value)
Output of the above code-
Item 0 Biscuit
1 Chips
2 Candy
Name: Item, dtype: object
Quantity 0 6
1 3
2 20
Name: Quantity, dtype: int64
iterrows()
It returns each index value along with the data containing in each row.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'])
for key,value in getData.iterrows():
print(key, value)
Output of the above code-
0 Item Biscuit
Quantity 6
Name: 0, dtype: object
1 Item Chips
Quantity 3
Name: 1, dtype: object
2 Item Candy
Quantity 20
Name: 2, dtype: object
itertuples()
It returns an iterator yielding a named tuple for each row in the DataFrame.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'])
for row in getData.itertuples():
print(row)
Output of the above code-
Pandas(Index=0, Item='Biscuit', Quantity=6)
Pandas(Index=1, Item='Chips', Quantity=3)
Pandas(Index=2, Item='Candy', Quantity=20)
Pandas Sorting
These are the following sorting techniques used in Python Pandas -
sort_index()
The sort_index() method sort the series or data frames by index labels.
import pandas as pd
data = [['Chips',3],['Candy',20],['Biscuit',6],['Jelly',4]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],index=[5,4,1,3])
print(getData)
sorted_df = getData.sort_index()
print(sorted_df)
Output of the above code-
Item Quantity
5 Chips 3
4 Candy 20
1 Biscuit 6
3 Jelly 4
Item Quantity
1 Biscuit 6
3 Jelly 4
4 Candy 20
5 Chips 3
If we want to sort the data in descending order, simply pass 'ascending=False' -
import pandas as pd
data = [['Chips',3],['Candy',20],['Biscuit',6],['Jelly',4]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],index=[5,4,1,3])
sorted_df = getData.sort_index(ascending=False)
print(sorted_df)
Output of the above code-
Item Quantity
5 Chips 3
4 Candy 20
3 Jelly 4
1 Biscuit 6
sort_values
The sort_values() method sort the series or data frames by values. In the below example, we have sorted the by 'Quantity' values.
import pandas as pd
data = [['Chips',3],['Candy',20],['Biscuit',6],['Jelly',4]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],index=[5,4,1,3])
sorted_df = getData.sort_values(by='Quantity')
print(sorted_df)
Output of the above code-
Item Quantity
5 Chips 3
3 Jelly 4
1 Biscuit 6
4 Candy 20
Related Articles
Write Python Pandas Dataframe to CSVPandas string to datetime
Convert Excel to CSV Python Pandas
Python Pandas DataFrame
Fillna Pandas Example
Python Pandas Plotting
Quick Introduction to Python Pandas
Python Pandas CSV to Dataframe
Convert List to Dataframe Python
Python convert dict to xml
Convert dictionary to dataframe Python
Python convert dataframe to numpy array
Exponential function in Python
Reverse list in Python
Convert binary to decimal in Python
Decimal to hexadecimal in Python