Pandas GroupBy | Mastering in Python Pandas Library

Pandas GroupBy Function in Python

Pandas GroupBy function is used to split the data into groups based on some criteria.
Any GroupBy operation involves one of the following operations on the original object:
-Splitting the object
-Applying a function
-Combining the result

Syntax: DataFrame.groupby()

import pandas as pd
df = pd.read_csv('D:\\DataSet\\student_result1.csv')
df

Output >>>
         Student ID  Section  Class  Study hrs  Percentage
0        1001              A     10          2          50
1        1002              B     10          6          80
2        1003              A     10          3          60
3        1004              C     11          0          45
4        1005              C     12          5          75

gr1 = df.groupby(by = 'Section')
gr1

Output >>>   <pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001878749E780>

gr1.groups

Output >>>
          {'A': Int64Index([0, 2], dtype='int64'),
           'B': Int64Index([1], dtype='int64'),
           'C': Int64Index([3, 4], dtype='int64')}

 df.groupby(['Section', 'Class']).groups

Output >>>
          {('A', 10): Int64Index([0, 2], dtype='int64'),
           ('B', 10): Int64Index([1], dtype='int64'),
           ('C', 11): Int64Index([3], dtype='int64'),
           ('C', 12): Int64Index([4], dtype='int64')}

for Class, df_1 in gr1:
    print(Class)
    print(df_1)

Output >>>
          A
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60
          B
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80
          C
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75

list(gr1)              # convert to list

Output >>>
         [('A',
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60),
 ('B',
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80),
          ('C',
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75)]

dict(list(gr1))        # convert to dict

Output >>>
          {'A':    Student ID Section  Class  Study hrs  Percentage
           0            1001       A     10          2          50
           2            1003       A     10          3          80,
           'B':    Student ID Section  Class  Study hrs  Percentage
           1            1002       B     10          6          60,
          'C':    Student ID Section  Class  Study hrs  Percentage
           3            1004       C     11          0          45
           4            1005       C     12          5          75}

#Selecting a group

# A single group can be selected using get_group():

gr3 = df.groupby('Class').get_group(10)
gr3

Output >>>
            Student ID  Section  Class  Study hrs  Percentage
         0        1001        A     10          2          50
         1        1002        B     10          6          80
         2        1003        A     10          3          60

gr3 = df.groupby('Section').get_group('A')
gr3

Output >>>
             Student ID  Section Class  Study hrs  Percentage
          0        1001        A    10          2          50
          2        1003        A    10          3          60

# Applying a function into group
gr1.sum()

Output >>>
                    Student ID  Class Study hrs  Percentage
          Section
                A         2004     20         5         110
                B         1002     10         6          80
                C         2009     23         5         120

gr1.mean()

Output >>>
                    Student ID  Class  Study hrs  Percentage
          Section
                A       1002.0   10.0        2.5        55.0
                B       1002.0   10.0        6.0        50.0
                C       1004.5   11.5        2.5        60.0