Pandas-handling-missing-values

Pandas GroupBy | Mastering in Python Pandas Library

Pandas GroupBy Function in Python

Pandas GroupBy function is used to split the data into groups based on some criteria.
Any GroupBy operation involves one of the following operations on the original object:
-Splitting the object
-Applying a function
-Combining the result

Syntax: DataFrame.groupby()

import pandas as pd
df = pd.read_csv('D:\\DataSet\\student_result1.csv')
df
Output >>>
         Student ID  Section  Class  Study hrs  Percentage
0        1001              A     10          2          50
1        1002              B     10          6          80
2        1003              A     10          3          60
3        1004              C     11          0          45
4        1005              C     12          5          75
gr1 = df.groupby(by = 'Section')
gr1
Output >>>   <pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001878749E780>
gr1.groups
Output >>>
          {'A': Int64Index([0, 2], dtype='int64'),
           'B': Int64Index([1], dtype='int64'),
           'C': Int64Index([3, 4], dtype='int64')}
 df.groupby(['Section', 'Class']).groups
Output >>>
          {('A', 10): Int64Index([0, 2], dtype='int64'),
           ('B', 10): Int64Index([1], dtype='int64'),
           ('C', 11): Int64Index([3], dtype='int64'),
           ('C', 12): Int64Index([4], dtype='int64')}
for Class, df_1 in gr1:
    print(Class)
    print(df_1)
Output >>>
          A
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60
          B
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80
          C
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75
list(gr1)              # convert to list
Output >>>
         [('A',
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60),
 ('B',
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80),
          ('C',
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75)]

dict(list(gr1))        # convert to dict
Output >>>
          {'A':    Student ID Section  Class  Study hrs  Percentage
           0            1001       A     10          2          50
           2            1003       A     10          3          80,
           'B':    Student ID Section  Class  Study hrs  Percentage
           1            1002       B     10          6          60,
          'C':    Student ID Section  Class  Study hrs  Percentage
           3            1004       C     11          0          45
           4            1005       C     12          5          75}
#Selecting a group

# A single group can be selected using get_group():

gr3 = df.groupby('Class').get_group(10)
gr3
Output >>>
            Student ID  Section  Class  Study hrs  Percentage
         0        1001        A     10          2          50
         1        1002        B     10          6          80
         2        1003        A     10          3          60
gr3 = df.groupby('Section').get_group('A')
gr3
Output >>>
             Student ID  Section Class  Study hrs  Percentage
          0        1001        A    10          2          50
          2        1003        A    10          3          60
   
# Applying a function into group
gr1.sum()
Output >>>
                    Student ID  Class Study hrs  Percentage
          Section
                A         2004     20         5         110
                B         1002     10         6          80
                C         2009     23         5         120

gr1.mean()
Output >>>
                    Student ID  Class  Study hrs  Percentage
          Section
                A       1002.0   10.0        2.5        55.0
                B       1002.0   10.0        6.0        50.0
                C       1004.5   11.5        2.5        60.0
gr1.describe()
Output >>>
Pandas Concat
gr1.agg(['sum', 'max', 'mean'])
Output >>>

Download dataset click here – student_result1

Download Jupyter file pandas groupby source code

Visit the official site of pandas.groupby

Leave a Reply