Python Pandas DataFrame
A DataFrame is a tabular-like data structure containing an ordered collection of columns. The each column can be of different data types, like numeric, boolean, strings, etc. It has both a row and a column index. It is basically used for analytical purposes.
Create DataFrame
A DataFrame is created by using DataFrame object.
import pandas as pd
df = pd.DataFrame()
print(df)
The above code returns an empty dataframe -
Empty DataFrame
Columns: []
Index: []
DataFrame Syntax
pandas.DataFrame( data, index, columns, dtype, copy)
Here, the data can be ndarray, series, dict, map, lists, constants and also dataframe. The index is used for the resulting frame, the columns are used for column labels, dtype is the data type of each column and the copy is used for copying data. Here, we have created a dataframe from a single list -
import pandas as pd
data = ["Rose","Tulip","Sunflower","Lilly"]
getData = pd.DataFrame(data,columns=['FLOWERS'])
print(getData)
Output of the above code-
FLOWERS
0 Rose
1 Tulip
2 Sunflower
3 Lilly
Similarly, we can create a dataframe from lists -
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData)
Output of the above code-
Item Quantity
101 Biscuit 6
102 Chips 3
103 Candy 20
DataFrame Information
Pandas provides various methods to get information about dataframe, like - index dtype, column dtype, memory uses.
shape
The shape returns information about the shape (Rows, columns) of the dataframe.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData.shape)
Output of the above code-
(3, 2)
index
The index returns the indexes of the dataframe.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData.index)
Output of the above code-
Int64Index([101, 102, 103], dtype='int64')
columns
The columns returns column labels of the dataframe.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData.columns)
Output of the above code-
Index(['Item', 'Quantity'], dtype='object')
info()
The info() method returns information about a dataframe including the index dtype and column dtypes, memory usage and non-null values.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData.info())
Output of the above code-
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 101 to 103
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Item 3 non-null object
1 Quantity 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 72.0+ bytes
None
count()
The count() method counts non-NA cells for each column or row.
import pandas as pd
data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
index=[101,102,103])
print(getData.count())
Output of the above code-
Item 3
Quantity 3
dtype: int64
DataFrame Functions
Pandas provide functions to perform mathematical operations on data -
sum()
It returns the sum of the values for the requested axis. In the given syntax, axis is {index (0), columns (1)}, skipna excludes NA/Null values, level is the level name to count along a particular level, numeric_only includes only float, int, boolean columns, min_count is the required number of valid values to perform the operation. The syntax is -
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.sum())
Output of the above code-
0 131
dtype: int64
cumsum
The cumsum() function returns cumulative sum of values over a dataframe. In the given syntax, axis is the index or name of the column, skipna excludes NA/Null values. The syntax is -
DataFrame.cumsum(self, axis=None, skipna=True)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.cumsum())
Output of the above code-
0
0 20
1 32
2 35
3 65
4 108
5 131
min() and max()
It returns the minimum and the maximum of the values for the requested axis.
DataFrame.min(self, axis=None, skipna=None, level=None, numeric_only=None)
DataFrame.max(self, axis=None, skipna=None, level=None, numeric_only=None)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.min())
print(getData.max())
Output of the above code-
0 3
dtype: int64
0 43
dtype: int64
idxmin() and idxmax()
The idxmin() and idxmax() functions return the index of the first occurrence of the minimum and the maximum values over the requested axes. The syntax is -
DataFrame.idxmin(self, axis=0, skipna=True)
DataFrame.idxmax(self, axis=0, skipna=True)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.idxmin())
print(getData.idxmax())
Output of the above code-
0 2
dtype: int64
0 4
dtype: int64
describe()
It returns the summary statistics of dataframe, like - count, mean, min, max, etc. The syntax is -
DataFrame.describe(self, percentiles=None, include=None, exclude=None)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.describe())
Output of the above code-
0
count 6.000000
mean 21.833333
std 13.934370
min 3.000000
25% 14.000000
50% 21.500000
75% 28.250000
max 43.000000
mean()
The mean() function is used to return the mean of the values for the requested axis. The syntax is -
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.mean())
Output of the above code-
0 21.833333
dtype: float64
median()
The median() function return the median of the values for the requested axis. The syntax is -
median(axis=None, skipna=None, level=None, numeric_only=None)
Example
import pandas as pd
data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.median())
Output of the above code-
0 21.5
dtype: float64
Dataframe Iteration Methods
These are the dataframe iteration methods- iteritems() and iterrows().
iteritems()
The iteritems() method returns col-index pairs.
import pandas as pd
data = [['Priska',16],['Smith',13],['Alex',20]]
getData = pd.DataFrame(data,columns=['Name','Age'],
index=[101,102,103])
for key,value in getData.iteritems():
print(key, value)
Output of the above code-
Name 101 Priska
102 Smith
103 Alex
Name: Name, dtype: object
Age 101 16
102 13
103 20
Name: Age, dtype: int64
iterrows()
The iterrows() method returns row-index pairs.
import pandas as pd
data = [['Priska',16],['Smith',13],['Alex',20]]
getData = pd.DataFrame(data,columns=['Name','Age'],
index=[101,102,103])
for key,value in getData.iterrows():
print(key, value)
Output of the above code-
101 Name Priska
Age 16
Name: 101, dtype: object
102 Name Smith
Age 13
Name: 102, dtype: object
103 Name Alex
Age 20
Name: 103, dtype: object
Related Articles
Quick Introduction to Python PandasPandas string to datetime
Write Python Pandas Dataframe to CSV
Fillna Pandas Example
Convert Excel to CSV Python Pandas
Python Pandas DataFrame
Python Pandas Plotting
Quick Introduction to Python Pandas
Python Pandas CSV to Dataframe
Python convert xml to dict
Python convert dict to xml
Read data from excel file using Python Pandas
Convert dictionary to dataframe Python
Python Convert XML to CSV
Multiply each element of a list by a number in Python
Convert JSON to CSV using Python