Pandas GroupBy Function in Python
Pandas GroupBy function is used to split the data into groups based on some criteria.
Any GroupBy operation involves one of the following operations on the original object:
-Splitting the object
-Applying a function
-Combining the result
Syntax: DataFrame.groupby()
import pandas as pd
df = pd.read_csv('D:\\DataSet\\student_result1.csv')
df
Output >>>
Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
1 1002 B 10 6 80
2 1003 A 10 3 60
3 1004 C 11 0 45
4 1005 C 12 5 75
gr1 = df.groupby(by = 'Section')
gr1
Output >>> <pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001878749E780>
gr1.groups
Output >>>
{'A': Int64Index([0, 2], dtype='int64'),
'B': Int64Index([1], dtype='int64'),
'C': Int64Index([3, 4], dtype='int64')}
df.groupby(['Section', 'Class']).groups
Output >>>
{('A', 10): Int64Index([0, 2], dtype='int64'),
('B', 10): Int64Index([1], dtype='int64'),
('C', 11): Int64Index([3], dtype='int64'),
('C', 12): Int64Index([4], dtype='int64')}
for Class, df_1 in gr1:
print(Class)
print(df_1)
Output >>>
A
Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
2 1003 A 10 3 60
B
Student ID Section Class Study hrs Percentage
1 1002 B 10 6 80
C
Student ID Section Class Study hrs Percentage
3 1004 C 11 0 45
4 1005 C 12 5 75
list(gr1) # convert to list
Output >>>
[('A',
Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
2 1003 A 10 3 60),
('B',
Student ID Section Class Study hrs Percentage
1 1002 B 10 6 80),
('C',
Student ID Section Class Study hrs Percentage
3 1004 C 11 0 45
4 1005 C 12 5 75)]
dict(list(gr1)) # convert to dict
Output >>>
{'A': Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
2 1003 A 10 3 80,
'B': Student ID Section Class Study hrs Percentage
1 1002 B 10 6 60,
'C': Student ID Section Class Study hrs Percentage
3 1004 C 11 0 45
4 1005 C 12 5 75}
#Selecting a group
# A single group can be selected using get_group():
gr3 = df.groupby('Class').get_group(10)
gr3
Output >>>
Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
1 1002 B 10 6 80
2 1003 A 10 3 60
gr3 = df.groupby('Section').get_group('A')
gr3
Output >>>
Student ID Section Class Study hrs Percentage
0 1001 A 10 2 50
2 1003 A 10 3 60
# Applying a function into group
gr1.sum()
Output >>>
Student ID Class Study hrs Percentage
Section
A 2004 20 5 110
B 1002 10 6 80
C 2009 23 5 120
gr1.mean()
Output >>>
Student ID Class Study hrs Percentage
Section
A 1002.0 10.0 2.5 55.0
B 1002.0 10.0 6.0 50.0
C 1004.5 11.5 2.5 60.0
gr1.describe()
Output >>>
gr1.agg(['sum', 'max', 'mean'])
Output >>>
Download dataset click here – student_result1
Download Jupyter file pandas groupby source code
Visit the official site of pandas.groupby