Python Pandas DataFrame

A DataFrame is a tabular-like data structure containing an ordered collection of columns. The each column can be of different data types, like numeric, boolean, strings, etc. It has both a row and a column index. It is basically used for analytical purposes.

Create DataFrame

A DataFrame is created by using DataFrame object.

import pandas as pd

df = pd.DataFrame()
print(df)

The above code returns an empty dataframe -

Empty DataFrame
Columns: []
Index: []

DataFrame Syntax

pandas.DataFrame( data, index, columns, dtype, copy)

Here, the data can be ndarray, series, dict, map, lists, constants and also dataframe. The index is used for the resulting frame, the columns are used for column labels, dtype is the data type of each column and the copy is used for copying data. Here, we have created a dataframe from a single list -

import pandas as pd

data = ["Rose","Tulip","Sunflower","Lilly"]
getData = pd.DataFrame(data,columns=['FLOWERS'])
print(getData)

Output of the above code-

     FLOWERS
0       Rose
1      Tulip
2  Sunflower
3      Lilly

Similarly, we can create a dataframe from lists -

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData)

Output of the above code-

        Item  Quantity
101  Biscuit         6
102    Chips         3
103    Candy        20

DataFrame Information

Pandas provides various methods to get information about dataframe, like - index dtype, column dtype, memory uses.

shape

The shape returns information about the shape (Rows, columns) of the dataframe.

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData.shape)

Output of the above code-

(3, 2)

index

The index returns the indexes of the dataframe.

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData.index)

Output of the above code-

Int64Index([101, 102, 103], dtype='int64')

columns

The columns returns column labels of the dataframe.

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData.columns)

Output of the above code-

Index(['Item', 'Quantity'], dtype='object')

info()

The info() method returns information about a dataframe including the index dtype and column dtypes, memory usage and non-null values.

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData.info())

Output of the above code-

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 101 to 103
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Item      3 non-null      object
 1   Quantity  3 non-null      int64
dtypes: int64(1), object(1)
memory usage: 72.0+ bytes
None

count()

The count() method counts non-NA cells for each column or row.

import pandas as pd

data = [['Biscuit',6],['Chips',3],['Candy',20]]
getData = pd.DataFrame(data,columns=['Item','Quantity'],
                       index=[101,102,103])
print(getData.count())

Output of the above code-

Item        3
Quantity    3
dtype: int64

DataFrame Functions

Pandas provide functions to perform mathematical operations on data -

sum()

It returns the sum of the values for the requested axis. In the given syntax, axis is {index (0), columns (1)}, skipna excludes NA/Null values, level is the level name to count along a particular level, numeric_only includes only float, int, boolean columns, min_count is the required number of valid values to perform the operation. The syntax is -

DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.sum())

Output of the above code-

0    131
dtype: int64

cumsum

The cumsum() function returns cumulative sum of values over a dataframe. In the given syntax, axis is the index or name of the column, skipna excludes NA/Null values. The syntax is -

DataFrame.cumsum(self, axis=None, skipna=True)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.cumsum())

Output of the above code-

min() and max()

It returns the minimum and the maximum of the values for the requested axis.

DataFrame.min(self, axis=None, skipna=None, level=None, numeric_only=None)
DataFrame.max(self, axis=None, skipna=None, level=None, numeric_only=None)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.min())
print(getData.max())

Output of the above code-

0    3
dtype: int64
0    43
dtype: int64

idxmin() and idxmax()

The idxmin() and idxmax() functions return the index of the first occurrence of the minimum and the maximum values over the requested axes. The syntax is -

DataFrame.idxmin(self, axis=0, skipna=True)

DataFrame.idxmax(self, axis=0, skipna=True)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.idxmin())
print(getData.idxmax())

Output of the above code-

0    2
dtype: int64
0    4
dtype: int64

describe()

It returns the summary statistics of dataframe, like - count, mean, min, max, etc. The syntax is -

DataFrame.describe(self, percentiles=None, include=None, exclude=None)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.describe())

Output of the above code-

               0
count   6.000000
mean   21.833333
std    13.934370
min     3.000000
25%    14.000000
50%    21.500000
75%    28.250000
max    43.000000

mean()

The mean() function is used to return the mean of the values for the requested axis. The syntax is -

DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.mean())

Output of the above code-

0    21.833333
dtype: float64

median()

The median() function return the median of the values for the requested axis. The syntax is -

median(axis=None, skipna=None, level=None, numeric_only=None)

Example

import pandas as pd

data = [20,12,3,30,43,23]
getData = pd.DataFrame(data)
print(getData.median())

Output of the above code-

0    21.5
dtype: float64

Dataframe Iteration Methods

These are the dataframe iteration methods- iteritems() and iterrows().

iteritems()

The iteritems() method returns col-index pairs.

import pandas as pd

data = [['Priska',16],['Smith',13],['Alex',20]]
getData = pd.DataFrame(data,columns=['Name','Age'],
                       index=[101,102,103])

for key,value in getData.iteritems():
    print(key, value)

Output of the above code-

Name 101    Priska
102     Smith
103      Alex
Name: Name, dtype: object
Age 101    16
102    13
103    20
Name: Age, dtype: int64

iterrows()

The iterrows() method returns row-index pairs.

import pandas as pd

data = [['Priska',16],['Smith',13],['Alex',20]]
getData = pd.DataFrame(data,columns=['Name','Age'],
                       index=[101,102,103])

for key,value in getData.iterrows():
    print(key, value)

Output of the above code-

101 Name    Priska
Age         16
Name: 101, dtype: object
102 Name    Smith
Age        13
Name: 102, dtype: object
103 Name    Alex
Age       20
Name: 103, dtype: object

Quick Introduction to Python Pandas
Pandas string to datetime
Write Python Pandas Dataframe to CSV
Fillna Pandas Example
Convert Excel to CSV Python Pandas
Python Pandas DataFrame
Python Pandas Plotting
Quick Introduction to Python Pandas
Python Pandas CSV to Dataframe
Python convert xml to dict
Python convert dict to xml
Read data from excel file using Python Pandas
Convert dictionary to dataframe Python
Python Convert XML to CSV
Multiply each element of a list by a number in Python
Convert JSON to CSV using Python

Python Pandas DataFrame

Create DataFrame

DataFrame Syntax

DataFrame Information

shape

index

columns

info()

count()

DataFrame Functions

sum()

cumsum

min() and max()

idxmin() and idxmax()

describe()

mean()

median()

Dataframe Iteration Methods

iteritems()

iterrows()

Related Articles

Blogs

Stateful vs Stateless

Best programming language to learn in 2021

How is Python best for mobile app development?

Learn all about Emoji

Data Science Recruitment of Freshers