Python Pandas Tutorial Archives - Indian AI Production

Pandas Handling Missing Values|Mastering in Python Pandas Library

Indian AI Production — Mon, 29 Jul 2019 05:30:48 +0000

isnull Function in Pandas

To Download dataset click here – Fortune_10

The post Pandas Handling Missing Values|Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas Write CSV File | Mastering in Python Pandas Library

Indian AI Production — Sat, 20 Jul 2019 16:58:24 +0000

Write csv file means to do some operations for data preprocessing or data cleaning.Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.

How to Write CSV File in Python

Here we will discuss about pentameters of pd.read_csv function

import pandas as pd
df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv')
df

Output >>>
           
    ID	Name	        Industry	        Inception	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	    $13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	    $9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectro	    Health	            2009	    $9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	    $14,001,180	 3,878,153 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	    $11,088,336	 5,635,276 Dollars	5453060	    7%

To know the type of the dataset use type function

type(df)

Output >>>   pandas.core.frame.DataFrame

This dataset is dataframe type

To know all the columns name

df.columns

Output >>>   Index(['ID', 'Name', 'Industry', 'Inception', 'Revenue', 'Expenses', 'Profit',
       'Growth'],
      dtype='object')

If you want to read some specific rows of the dataset use nrows parameters

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', nrows = 1)
df

Output >>>
    ID	Name	Industry	Inception	Revenue	     Expenses	        Profit	 Growth
0	1	Lamtone	IT Services	2009	    $11,757,018	 6,482,465 Dollars	5274553	 30%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', nrows = 5)
df

Output >>>
    ID	Name	        Industry	        Inception	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', usecols = [0])
df

df2 = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', usecols = [0,1])
df2

Output >>>
    ID	Name
0	1	Lamtone
1	2	Stripfind
2	3	Canecorporation
3	4	Mattouch
4	5	Techdrill
5	6	Techline
6	7	Cityace
7	8	Kayelectronics
8	9	Ganzlax
9	10	Trantraxlax

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', usecols = [1,2])
df

Output >>>
           
    Name	        Industry
0	Lamtone	        IT Services
1	Stripfind	    Financial Services
2	Canecorporation	Health
3	Mattouch	    IT Services
4	Techdrill	    Health
5	Techline	    Health
6	Cityace	        Health
7	Kayelectronics	Health
8	Ganzlax	        IT Services
9	Trantraxlax	    Government Services

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', usecols = [2,4,7])
df

Output >>>
          
    Industry	        Revenue	      Profit
0	IT Services	        $11,757,018	  5274553
1	Financial Services	$12,329,371	  11412916
2	Health	            $10,597,009	  3005820
3	IT Services	        $14,026,934	  6597557
4	Health	            $10,573,990	  3138627
5	Health	            $13,898,119	  8427816
6	Health	            $9,254,614	  3005116
7	Health	            $9,451,943	  5573830
8	IT Services	        $14,001,180	  11901180
9	Government Services	$11,088,336	  5453060

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv')
df

Output >>>
    0   1               2                   3           4            5                  6           7
    ID	Name	        Industry	        Inception	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	    $13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	    $9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectro	    Health	            2009	    $9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	    $14,001,180	 3,878,153 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	    $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = 1)
df

Output >>>
          
    ID	Name	        Industry	        Inception	Employees	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    55	        $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    25	        $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    6	        $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    6	        $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    9	        $10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	    65	        $13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	    25	        $9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectro	    Health	            2009	    687	        $9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	    75	        $14,001,180	 3,878,153 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	    35	        $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = 2)
df

Output >>>
    1	Lamtone	        IT Services	        2009	$11,757,018	 6,482,465 Dollars	5274553	    30%
0	2	Stripfind	    Financial Services	2010	$12,329,371	 916,455 Dollars	11412916	20%
1	3	Canecorporation	Health	            2012	$10,597,009	 7,591,189 Dollars	3005820	    7%
2	4	Mattouch	    IT Services	        2013	$14,026,934	 7,429,377 Dollars	6597557	    26%
3	5	Techdrill	    Health	            2009	$10,573,990	 7,435,363 Dollars	3138627	    8%
4	6	Techline	    Health	            2006	$13,898,119	 5,470,303 Dollars	8427816	    23%
5	7	Cityace	        Health	            2010	$9,254,614	 6,249,498 Dollars	3005116	    6%
6	8	Kayelectronics	Health	            2009	$9,451,943	 3,878,113 Dollars	5573830	    4%
7	9	Ganzlax	        IT Services	        2011	$14,001,180	 3,878,113 Dollars	11901180	18%
8	10	Trantraxlax	    Government Services	2011	$11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = 0)
df

Output >>>
    0   1               2                   3           4           5            6                  7           8
0   ID	Name	        Industry	        Inception	Employees	Revenue	     Expenses	        Profit	    Growth
1	1	Lamtone	        IT Services	        2009	    55	        $11,757,018	 6,482,465 Dollars	5274553	    30%
2	2	Stripfind	    Financial	        2010	    25	        $12,329,371	 916,455   Dollars	11412916	20%
3	3	Canecorporation	Health	            2012	    6	        $10,597,009	 7,591,189 Dollars	3005820	    7%
4	4	Mattouch	    IT Services	        2013	    6	        $14,026,934	 7,429,377 Dollars	6597557	    26%
5	5	Techdrill	    Health	            2009	    9	        $10,573,990	 7,435,363 Dollars	3138627	    8%
6	6	Techline	    Health	            2006	    65	        $13,898,119	 5,470,303 Dollars	8427816	    23%
7	7	Cityace	        Health	            2010	    25	        $9,254,614	 6,249,498 Dollars	3005116	    6%
8	8	Kayelectro	    Health	            2009	    687	        $9,451,943	 3,878,113 Dollars	5573830	    4%
9	9	Ganzlax	        IT Services	        2011	    75	        $14,001,180	 3,878,153 Dollars	11901180	18%
10	10	Trantraxlax	    Government Services	2011	    35	        $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = [0])
df

 Output >>>
    ID	Name	        Industry	        Inception	Employees	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    55	        $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    25	        $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    6	        $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    6	        $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    9	        $10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	    65	        $13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	    25	        $9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectro	    Health	            2009	    687	        $9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	    75	        $14,001,180	 3,878,153 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	    35	        $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = [1])
df

Output >>>
    0	1	            2	                3	    4	         5	                6	        7
0	1	Lamtone	        IT Services	        2009	$11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial Services	2010	$12,329,371	 916,455 Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	$10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	$14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	$10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	$13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	$9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectronics	Health	            2009	$9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	$14,001,180	 3,878,113 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	$11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', skiprows = [0,2,3])
df

Output >>>
    ID	Name	        Industry	        Inception	Revenue	    Expenses	        Profit	 Growth
0	3	Canecorporation	Health	            2012	    $10,597,009	7,591,189 Dollars	3005820	 7%
1	4	Mattouch	    IT Services	        2013	    $14,026,934	7,429,377 Dollars	6597557	 26%
2	5	Techdrill	    Health	            2009	    $10,573,990	7,435,363 Dollars	3138627	 8%
3	6	Techline	    Health	            2006	    $13,898,119	5,470,303 Dollars	8427816	 23%
4	7	Cityace	        Health	            2010	    $9,254,614	6,249,498 Dollars	3005116	 6%
5	8	Kayelectronics	Health	            2009	    $9,451,943	3,878,113 Dollars	5573830	 4%
6	9	Ganzlax	        IT Services	        2011	    $14,001,180	3,878,113 Dollars	11901180 18%
7	10	Trantraxlax	    Government Services	2011	    $11,088,336	5,635,276 Dollars	5453060	 7%

df1 = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv')
df1

Output >>>
    
    ID	Name	        Industry	        Inception	Revenue	     Expenses	        Profit	    Growth
0	1	Lamtone	        IT Services	        2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
1	2	Stripfind	    Financial	        2010	    $12,329,371	 916,455   Dollars	11412916	20%
2	3	Canecorporation	Health	            2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
3	4	Mattouch	    IT Services	        2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
4	5	Techdrill	    Health	            2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%
5	6	Techline	    Health	            2006	    $13,898,119	 5,470,303 Dollars	8427816	    23%
6	7	Cityace	        Health	            2010	    $9,254,614	 6,249,498 Dollars	3005116	    6%
7	8	Kayelectro	    Health	            2009	    $9,451,943	 3,878,113 Dollars	5573830	    4%
8	9	Ganzlax	        IT Services	        2011	    $14,001,180	 3,878,153 Dollars	11901180	18%
9	10	Trantraxlax	    Government Services	2011	    $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', index_col = 'ID')
df

Output >>>
    Name	        Industry	        Inception	Employees	Revenue	     Expenses	        Profit	    Growth

ID	
1	Lamtone	        IT Services	        2009	    55	        $11,757,018	 6,482,465 Dollars	5274553	    30%
2	Stripfind	    Financial	        2010	    25	        $12,329,371	 916,455   Dollars	11412916	20%
3	Canecorporation	Health	            2012	    6	        $10,597,009	 7,591,189 Dollars	3005820	    7%
4	Mattouch	    IT Services	        2013	    6	        $14,026,934	 7,429,377 Dollars	6597557	    26%
5	Techdrill	    Health	            2009	    9	        $10,573,990	 7,435,363 Dollars	3138627	    8%
6	Techline	    Health	            2006	    65	        $13,898,119	 5,470,303 Dollars	8427816	    23%
7	Cityace	        Health	            2010	    25	        $9,254,614	 6,249,498 Dollars	3005116	    6%
8	Kayelectro	    Health	            2009	    687	        $9,451,943	 3,878,113 Dollars	5573830	    4%
9	Ganzlax	        IT Services	        2011	    75	        $14,001,180	 3,878,153 Dollars	11901180	18%
10	Trantraxlax	    Government Services	2011	    35	        $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', index_col = 0)
df

Output >>>
    Name	        Industry	        Inception	Employees	Revenue	     Expenses	        Profit	    Growth

ID	
1	Lamtone	        IT Services	        2009	    55	        $11,757,018	 6,482,465 Dollars	5274553	    30%
2	Stripfind	    Financial	        2010	    25	        $12,329,371	 916,455   Dollars	11412916	20%
3	Canecorporation	Health	            2012	    6	        $10,597,009	 7,591,189 Dollars	3005820	    7%
4	Mattouch	    IT Services	        2013	    6	        $14,026,934	 7,429,377 Dollars	6597557	    26%
5	Techdrill	    Health	            2009	    9	        $10,573,990	 7,435,363 Dollars	3138627	    8%
6	Techline	    Health	            2006	    65	        $13,898,119	 5,470,303 Dollars	8427816	    23%
7	Cityace	        Health	            2010	    25	        $9,254,614	 6,249,498 Dollars	3005116	    6%
8	Kayelectro	    Health	            2009	    687	        $9,451,943	 3,878,113 Dollars	5573830	    4%
9	Ganzlax	        IT Services	        2011	    75	        $14,001,180	 3,878,153 Dollars	11901180	18%
10	Trantraxlax	    Government Services	2011	    35	        $11,088,336	 5,635,276 Dollars	5453060	    7%

df = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', index_col = 'Name')
df

Output >>>
                 ID	 Industry	            Inception	Revenue	     Expenses	Profit	            Growth
Name							
Lamtone	         1	 IT Services	        2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
Stripfind	     2	 Financial Services	    2010	    $12,329,371	 916,455 Dollars	11412916	20%
Canecorporation	 3	 Health	                2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
Mattouch	     4	 IT Services	        2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
Techdrill	     5	 Health	                2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%
Techline	     6	 Health	                2006	    $13,898,119	 5,470,303 Dollars	8427816	    23%
Cityace	         7	 Health	                2010	    $9,254,614	 6,249,498 Dollars	3005116	    6%
Kayelectronics	 8	 Health	                2009	    $9,451,943	 3,878,113 Dollars	5573830	    4%
Ganzlax	         9	 IT Services	        2011	    $14,001,180	 3,878,113 Dollars	11901180	18%
Trantraxlax	     10	 Government Services	2011	    $11,088,336	 5,635,276 Dollars	5453060	    7%

df1 = pd.read_csv('F:\\Machine Learning\\DataSet\\Fortune_10.csv', index_col = 2)
df1

Output >>>
                    ID	Name	         Inception	Revenue	     Expenses	        Profit	    Growth
Industry							
IT Services	        1	Lamtone	         2009	    $11,757,018	 6,482,465 Dollars	5274553	    30%
Financial Services	2	Stripfind	     2010	    $12,329,371	 916,455 Dollars	11412916	20%
Health	            3	Canecorporation	 2012	    $10,597,009	 7,591,189 Dollars	3005820	    7%
IT Services	        4	Mattouch	     2013	    $14,026,934	 7,429,377 Dollars	6597557	    26%
Health	            5	Techdrill	     2009	    $10,573,990	 7,435,363 Dollars	3138627	    8%
Health	            6	Techline	     2006	    $13,898,119	 5,470,303 Dollars	8427816	    23%
Health	            7	Cityace	         2010	    $9,254,614	 6,249,498 Dollars	3005116	    6%
Health	            8	Kayelectronics	 2009	    $9,451,943	 3,878,113 Dollars	5573830	    4%
IT Services	        9	Ganzlax	         2011	    $14,001,180	 3,878,113 Dollars	11901180	18%
Government Services	10	Trantraxlax	     2011	    $11,088,336	 5,635,276 Dollars	5453060	    7%

To Download dataset click here – Fortune_10

Download Jupyter file pandas write csv source code

Pandas Write CSV source code

Visit the official site of pandas

The post Pandas Write CSV File | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas read_csv | Mastering in Python Pandas Library

Indian AI Production — Sat, 20 Jul 2019 14:39:56 +0000

Pandas Read CSV File in Python

What is CSV File

A CSV is a comma separated values file which allows to store data in tabular format. That data includes numbers and text in plain text form. CSV is an extension of any file or spreadsheet .

Advantages of CSV File
1. Universally used
2. Easy to read
3. Easy to understand
4. Quick to create

How to Read or Import CSV File in Python IDLE or IDE

import pandas as pd

Pandas.read_csv function used to import csv file
for more information about pandas.read_csv (pd.read_csv) function use help function

help(pd.read_csv)

pd.read_csv('F:\\Machine Learning\\DataSet\\student_results.csv')

Output >>>
    StudentID	Class	StudyHrs	SleepingHrs	SocialMedia	MobileGames	Percantege
0	1001	    10	    2	        9	        3	        5	        50
1	1002	    10	    6	        8	        2	        0	        80
2	1003	    10	    3	        8	        2	        4	        60
3	1004	    11	    0	        10	        1	        5	        45
4	1005	    11	    4	        7	        2	        0	        75
5	1006	    11	    10	        7	        0	        0	        96
6	1007	    12	    4	        6	        0	        0	        80
7	1008	    12	    10	        6	        2	        0	        90
8	1009	    12	    2	        8	        2	        4	        60
9	1010	    12	    6	        9	        1	        0	        85

This datastructure is dataframe, we can also import series datastructure using pd.read_csv function

pd.read_csv('F:\\Machine Learning\\DataSet\\Top 10 IT Companies in India.csv')

Output >>>
          
    Top 10 IT Companies in India
0	TCS
1	Infosys
2	Tech Mahindra
3	Wipro
4	HCL Technologies
5	L&T Infotech
6	Mindtree
7	Mphasis
8	Oracle Financial Services
9	Rotla India

This is series datastructure

To know where is these files store in your computer system use ‘os’ library

import os
print(os.getcwd())

C:\Users\T-Rex\Pandas Practical

Download dataset click here – student_results

Download dataset click here – Top 10 IT Companies in India

Download Jupyter file pandas read_csv source code

Pandas read_csv source code

Visit the official site of pandas.read_csv

The post Pandas read_csv | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas GroupBy | Mastering in Python Pandas Library

Indian AI Production — Sat, 20 Jul 2019 13:26:04 +0000

Pandas GroupBy Function in Python

Pandas GroupBy function is used to split the data into groups based on some criteria.
Any GroupBy operation involves one of the following operations on the original object:
-Splitting the object
-Applying a function
-Combining the result

Syntax: DataFrame.groupby()

import pandas as pd
df = pd.read_csv('D:\\DataSet\\student_result1.csv')
df

Output >>>
         Student ID  Section  Class  Study hrs  Percentage
0        1001              A     10          2          50
1        1002              B     10          6          80
2        1003              A     10          3          60
3        1004              C     11          0          45
4        1005              C     12          5          75

gr1 = df.groupby(by = 'Section')
gr1

Output >>>

gr1.groups

Output >>>
          {'A': Int64Index([0, 2], dtype='int64'),
           'B': Int64Index([1], dtype='int64'),
           'C': Int64Index([3, 4], dtype='int64')}

 df.groupby(['Section', 'Class']).groups

Output >>>
          {('A', 10): Int64Index([0, 2], dtype='int64'),
           ('B', 10): Int64Index([1], dtype='int64'),
           ('C', 11): Int64Index([3], dtype='int64'),
           ('C', 12): Int64Index([4], dtype='int64')}

for Class, df_1 in gr1:
    print(Class)
    print(df_1)

Output >>>
          A
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60
          B
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80
          C
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75

list(gr1)              # convert to list

Output >>>
         [('A',
             Student ID Section  Class  Study hrs  Percentage
          0        1001       A     10          2          50
          2        1003       A     10          3          60),
 ('B',
             Student ID Section  Class  Study hrs  Percentage
          1        1002       B     10          6          80),
          ('C',
             Student ID Section  Class  Study hrs  Percentage
          3        1004       C     11          0          45
          4        1005       C     12          5          75)]

dict(list(gr1))        # convert to dict

Output >>>
          {'A':    Student ID Section  Class  Study hrs  Percentage
           0            1001       A     10          2          50
           2            1003       A     10          3          80,
           'B':    Student ID Section  Class  Study hrs  Percentage
           1            1002       B     10          6          60,
          'C':    Student ID Section  Class  Study hrs  Percentage
           3            1004       C     11          0          45
           4            1005       C     12          5          75}

#Selecting a group

# A single group can be selected using get_group():

gr3 = df.groupby('Class').get_group(10)
gr3

Output >>>
            Student ID  Section  Class  Study hrs  Percentage
         0        1001        A     10          2          50
         1        1002        B     10          6          80
         2        1003        A     10          3          60

gr3 = df.groupby('Section').get_group('A')
gr3

Output >>>
             Student ID  Section Class  Study hrs  Percentage
          0        1001        A    10          2          50
          2        1003        A    10          3          60

# Applying a function into group
gr1.sum()

Output >>>
                    Student ID  Class Study hrs  Percentage
          Section
                A         2004     20         5         110
                B         1002     10         6          80
                C         2009     23         5         120

gr1.mean()

Output >>>
                    Student ID  Class  Study hrs  Percentage
          Section
                A       1002.0   10.0        2.5        55.0
                B       1002.0   10.0        6.0        50.0
                C       1004.5   11.5        2.5        60.0

gr1.describe()

Output >>>

gr1.agg(['sum', 'max', 'mean'])

Output >>>

Download dataset click here – student_result1

Download Jupyter file pandas groupby source code

Pandas GroupBy source code

Visit the official site of pandas.groupby

The post Pandas GroupBy | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas Append() | Mastering in Python Pandas Library

Indian AI Production — Thu, 18 Jul 2019 05:24:00 +0000

Pandas Append() Function in Python

import pandas as pd

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]})


df2 = pd.DataFrame({'A': [4,5,6],
                   'B': [40,50,60]})

display(df1 ,df2)

df1.append(df2)

df1.append(df2, ignore_index = True)

df2.append(df1, ignore_index = True)

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]})


df2 = pd.DataFrame({'C': [4,5,6],
                   'B': [40,50,60]})

display(df1 ,df2)

df1.append(df2, ignore_index = True)

Output >>>
          C:\Users\Shubham Matiyara\Anaconda3\lib\site- 
          packages\pandas\core\frame.py:6211: FutureWarning: Sorting because 
          non-concatenation axis is not aligned. A future version
          of pandas will change to not sort by default.

          To accept the future behavior, pass 'sort=False'.

          To retain the current behavior and silence the warning, pass 
          'sort=True'.

            sort=sort)

                A    B     C
          0   1.0   10   NaN
          1   2.0   20   NaN
          2   3.0   30   NaN
          3   NaN   40   4.0
          4   NaN   50   5.0
          5   NaN   60   6.0

df1.append(df2, ignore_index = True, sort = False)

Output >>>
                A    B     C
          0   1.0   10   NaN
          1   2.0   20   NaN
          2   3.0   30   NaN
          3   NaN   40   4.0
          4   NaN   50   5.0
          5   NaN   60   6.0

The post Pandas Append() | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas Join() | Mastering in Python Pandas Library

Indian AI Production — Thu, 18 Jul 2019 04:31:09 +0000

Pandas Join() Method in Python

import pandas as pd

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]})


df2 = pd.DataFrame({'C': [4,5,6],
                   'D': [40,50,60]})

display(df1, df2)

df1.join(df2)

Output >>>
             A    B   C    D
         0   1   10   4   40
         1   2   20   5   50
         2   3   30   6   60

df2.join(df1)

Output >>>
             C    D   A    B
         0   4   40   1   10
         1   5   50   2   20
         2   6   60   3   30

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]},
                   index = ['a','b','c'])

df2 = pd.DataFrame({'C': [4,5,6],
                   'D': [40,50,60]},
                  index = ['a','b','c'])

display(df1, df2)

df1.join(df2)

Output >>>
             A    B   C   D
         a   1   10   4  40
         b   2   20   5  50
         c   3   30   6  60

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]},
                   index = ['a','b','c'])

df2 = pd.DataFrame({'C': [4,5],
                   'D': [40,50]},
                  index = ['a','b'])

display(df1, df2)

df1.join(df2)

Output >>>
              A    B     C      D
          a   1   10   4.0   40.0
          b   2   20   5.0   50.0
          c   3   30   NaN    NaN

df1.join(df2, how = 'right')

Output >>>          
             A    B   C    D
         a   1   10   4   40
         b   2   20   5   50

df1.join(df2, how = 'inner')

Output >>>
              A    B   C    D
          a   1   10   4   40
          b   2   20   5   50

df1.join(df2, how = 'outer')

Output >>>
              A    B    C       D
          a   1   10   4.0   40.0
          b   2   20   5.0   50.0
          c   3   30   NaN    NaN

df1 = pd.DataFrame({'A': [1,2,3],
                   'B': [10,20,30]})

df2 = pd.DataFrame({'A': [4,5,6],
                   'D': [40,50,60]})

display(df1, df2)

df1.join(df2, lsuffix = '_1')

Output >>>
            A_1    B     A      D
          a   1   10   4.0   40.0
          b   2   20   5.0   50.0
          c   3   30   NaN    NaN

df1.join(df2, rsuffix = '_1')

Output >>>                   
              A    B   A_1      D
          a   1   10   4.0   40.0
          b   2   20   5.0   50.0
          c   3   30   NaN    NaN

The post Pandas Join() | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas Concat() | Mastering in Python Pandas Library

Indian AI Production — Sun, 14 Jul 2019 10:02:49 +0000

Pandas concate() Function in Python

import pandas as pd

sr1 = pd.Series([0,1,2])
sr1

Output >>>
          0    0
          1    1
          2    2
          dtype: int64

sr2 = pd.Series([3,4,5,6,7])
sr2

Output >>>
          0    3
          1    4
          2    5
          3    6
          4    7
          dtype: int64

pd.concat([sr1, sr2])

Output >>>
          0    0
          1    1
          2    2
          0    3
          1    4
          2    5
          3    6
          4    7
          dtype: int64

df1 = pd.DataFrame({'ID': [1,2,3,4],
                  'Name': ['A', 'B', 'C','D'],
                  'Class': [5,6,7,8]})
df1

Output >>>
             ID  Name  Class
          0   1     A      5
          1   2     B      6
          2   3     C      7
          3   4     D      8

df2 = pd.DataFrame({'ID': [5,6,7,8],
                  'Name': ['E', 'F', 'G', 'H'],
                  'Class': [9,10,11,12]})
df2

Output >>>
             ID  Name  Class
          0   5     E      9
          1   6     F     10
          2   7     G     11
          3   8     H     12

pd.concat([df2, df1])

Output >>>
             ID  Name  Class
          0   5     E      9
          1   6     F     10
          2   7     G     11
          3   8     H     12
          0   1     A      5
          1   2     B      6
          2   3     C      7
          3   4     D      8

pd.concat([df2, df1], axis = 1)

Output >>>
          
            ID  Name  Class  ID  Name  Class
         0   5     E      9   1     A      5
         1   6     F     10   2     B      6
         2   7     G     11   3     C      7
         3   8     H     12   4     D      8

pd.concat([df1, df2], axis = 0, ignore_index = True)

Output >>>
          
            ID  Name  Class
         0   1     A      5
         1   2     B      6
         2   3     C      7
         3   4     D      8
         4   5     E      9
         5   6     F     10
         6   7     G     11
         7   8     H     12

df1 = pd.DataFrame({'ID': [1,2,3,4],
                  'Name': ['A', 'B', 'C','D'],
                  'Class': [5,6,7,8]})
df1

Output >>>
          	ID	Name	Class
         0	 1	   A	    5
         1	 2	   B	    6
         2	 3	   C	    7
         3	 4	   D	    8

df2 = pd.DataFrame({'ID': [3,4],
                  'Name': ['C','D'],
                  'Class': [7,8]})
df2

Output >>>        
            ID	Name  Class
         0   3	   C      7
         1   4	   D	  8

pd.concat([df1, df2])

Output >>>
          	ID	Name  Class
          0	 1	   A	  5
          1	 2	   B	  6
          2	 3	   C	  7
          3	 4	   D	  8
          0	 3	   C	  7
          1	 4	   D	  8

pd.concat([df1, df2], axis = 1)

Output >>>
          	ID	Name  Class	  ID	Name  Class
          0	 1	   A	  5	 3.0	   C	7.0
          1	 2	   B	  6	 4.0	   D	8.0
          2	 3	   C	  7	 NaN	 NaN	NaN
          3	 4	   D	  8	 NaN	 NaN	NaN

pd.concat([df1, df2], axis = 1, join = 'inner')

Output >>>
               ID  Name  Class	 ID  Name  Class
          0	1     A	     5	  3	C      7
          1	2     B	     6	  4     D      8

pd.concat([df1, df2], axis = 1, join_axes = [df1.index])

Output >>>
               ID  Name	 Class	  ID	 Name	Class
          0	1     A	     5	 3.0	    C	  7.0
          1	2     B	     6	 4.0	    D	  8.0
          2	3     C	     7	 NaN	  NaN	  NaN
          3	4     D	     8	 NaN	  NaN	  NaN

pd.concat([df1, df2], axis = 1, join_axes = [df2.index])

Output >>>
             ID  Name  Class	ID  Name  Class
          0   1     A	   5	 3     C      7
          1   2     B	   6	 4     D      8

df1 = pd.DataFrame({'ID': [1,2,3,4],
                  'Name': ['A', 'B', 'C','D'],
                  'Class': [5,6,7,8]})
df1

Output >>>
          	ID	Name	Class
          0	 1	   A	    5
          1	 2	   B	    6
          2	 3	   C	    7
          3	 4	   D	    8

df2 = pd.DataFrame({'ID': [5,6,7,8],
                  'Name': ['E', 'F', 'G', 'H'],
                  'Class': [9,10,11,12]})
df2

Output >>>
             ID  Name  Class
         0    5	    E	   9
         1    6	    F	  10
         2    7	    G	  11
         3    8	    H     12

pd.concat([df1, df2], keys = ['df1','df2'])

Output >>>
                 ID	Name	Class
              0	  1	   A	    5
          df1 1	  2	   B	    6
              2	  3	   C	    7
              3	  4	   D	    8
              0	  5	   E	    9
          df2 1	  6	   F	   10
              2	  7	   G	   11
              3	  8	   H	   12

pd.concat([df1, df2], keys = ['First df','Second df'])

Output >>>
                         ID	Name	Class
                      0	  1	   A	    5
          First df    1	  2	   B	    6
                      2	  3	   C	    7
                      3	  4	   D	    8
               	      0	  5	   E	    9
          Second df   1	  6	   F	   10
                      2	  7	   G	   11
                      3	  8	   H	   12

pd.concat([df1, df2], axis = 1,  keys = ['First df','Second df'])

Output >>>
                          First df	            Second df
               ID	Name	Class	ID	Name	Class
          0     1	   A	    5	 5	   E	    9
          1     2	   B	    6	 6	   F	   10
          2     3	   C	    7	 7	   G	   11
          3     4	   D	    8	 8	   H	   12

df1 = pd.DataFrame({'ID': [1,2,3,4],
                  'Name': ['A', 'B', 'C','D'],
                  'Class': [5,6,7,8]})
df1

Output >>>
          	ID	Name	Class
          0	 1	   A	    5
          1	 2	   B	    6
          2	 3	   C	    7
          3	 4	   D	    8

df2 = pd.DataFrame({'Marks': [40, 63, 91, 34]})
df2

pd.concat([df1, df2])

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.


Output >>>
            ID	Name	Class	Marks
         0	1.0	   A	  5.0	  NaN
         1	2.0	   B	  6.0	  NaN
         2	3.0	   C	  7.0	  NaN
         3	4.0	   D	  8.0	  NaN
         0	NaN	 NaN	  NaN	  40.0
         1	NaN	 NaN	  NaN	  63.0
         2	NaN	 NaN	  NaN	  91.0
         3	NaN	 NaN	  NaN	  34.0

pd.concat([df1, df2], sort = False)

Output >>>
            ID	Name	Class	Marks
         0	1.0	   A	  5.0	  NaN
         1	2.0	   B	  6.0	  NaN
         2	3.0	   C	  7.0	  NaN
         3	4.0	   D	  8.0	  NaN
         0	NaN	 NaN	  NaN	  40.0
         1	NaN	 NaN	  NaN	  63.0
         2	NaN	 NaN	  NaN	  91.0
         3	NaN	 NaN	  NaN	  34.0

Download Jupyter file of Pandas Concat Function source code

Pandas Concat Function source code

Visit the official site of pandas.concat

The post Pandas Concat() | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas DataFrame | Mastering in Python Pandas Library

Indian AI Production — Tue, 09 Jul 2019 18:22:02 +0000

Python Pandas DataFrame

Pandas DataFrame is two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes(rows & columns).

Here practically explanation about DataFrame.

Creating DataFrame with different ways

1. Creating empty dataframe

import pandas as pd

emt_df = pd.DataFrame()
print(emt_df)

Output >>>
          Empty DataFrame
          Columns: []
          Index: []

2. Creating dataframe from list

lst = ['a', 'b', 'c']   # First creating a list
print(lst)

Output >>>   ['a', 'b', 'c']

df1 = pd.DataFrame(lst)    # Creating dataframe from above list
print(df1)

We can also inline print that command just using that variable name, without using print function

df1

Here first row(0) is data values column index/label and first column is index (which is start from 0) and second column have data values.

3. Creating dataframe from list of list

ls_of_ls = [[1,2,3], [2,3,4], [4,5,6]]   # Creating list of list
print(ls_of_ls)

Output >>>   [[1, 2, 3], [2, 3, 4], [4, 5, 6]]

df2 = pd.DataFrame(ls_of_ls)   # Creating dataframe form above list of list
df2

Output >>>	
               0     1     2
          0	   1	 2     3
          1	   2	 3     4
          2	   4	 5     6

Here first row (0,1,2) is column index/label and three data values columns

4. Creating dataframe from dict or dictionary or python dictionary

dict1 = {'ID': [11,22,33,44]}   # Creating dict
dict1

Output >>>   {'ID': [11, 22, 33, 44]}

df3 = pd.DataFrame(dict1)    # Creating dataframe from above dict
df3

For more data values columns

dict2 = {'ID': [11,22,33,44], 'SN': [1,2,3,4]}
dict2

Output >>>   {'ID': [11, 22, 33, 44], 'SN': [1, 2, 3, 4]}

df4 = pd.DataFrame(dict1)
df4

Here dataframe have two columns

5. Creating dataframe from list of dict

ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4}]   # Creating list of dict
df5 = pd.DataFrame(ls_dict)   # Creating dataframe from list of dict
df5

# Creating dataframe from list of dict with different way

ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4, 'c':5}]   
df6 = pd.DataFrame(ls_dict)
df6

Output >>>
             a	b	c
          0	 1	2	NaN
          1	 3	4	5.0

Here in first dictionary ‘c’ is not defined but that command not gives error because pandas has function to handle missing values (which is shown by NaN)
NaN means not a number

6. Creating dataframe from dict of series

dict_sr = {'ID': pd.Series([1,2,3]), 'SN': pd.Series([111,222,333])}
df7 = pd.DataFrame(dict_sr)
df7

Learn more Python Libraries

Python Pandas Tutorial

Python NumPY Tutorial

Python Matplotlib Tutorial

The post Pandas DataFrame | Mastering in Python Pandas Library appeared first on Indian AI Production.

Pandas Series | Mastering in Python Pandas Library

Indian AI Production — Tue, 09 Jul 2019 11:55:52 +0000

pandas.Series

Pandas Series is a One Dimensional indexed array. It is most similar to the NumPy array. pandas.Series is a method to create a series.

Here practically explanation about Series.
For using pandas library in Jupyter Notebook IDE or any Python IDE or IDLE, we need to import Pandas, using the import keyword

import pandas as pd

Here we are using as keyword to short pandas name as “pd“

The latest version of Pandas Library is 0.24.2 released on 12 March 2019. To know the version of Jupyter Notebook IDE

pd.__version__

Output >>>  '0.24.2'

Series is similar to python list but series have additional functionality, methods, and operators, because of these series is advanced than a list.

Methods of Creating a Series

1. Creating series from list

but first, we are creating a list

list_1 = [1, 2, -3, 4.5, 'indian']
print(list_1)

Output >>>   [1, 2, -3, 4.5, 'indian']

Python list stores int, float, string data types

Creating series using the above list

series1 = pd.Series(list_1)
print(series1)

Output >>>
          0         1
          1         2
          2        -3
          3       4.5
          4    indian
          dtype: object

Here it is showing 0 1 2 3 4 is index and 1 2 -3 4.5 Indian are data values.

type(series1)

Output >>>   pandas.core.series.Series

pandas.core.series.Series means series is a one-dimensional array, which can store indexed data

2. Creating Empty Series

Empty series is like an empty list, we can create empty series using an empty list

empty_s = pd.Series([])
print(empty_s)

Output >>>   Series([], dtype: float64)

3. Creating Series using a different method
List inside the series

series2 = pd.Series([1,2,3,4,5])
print(series2)

Output >>>
          0    1
          1    2
          2    3
          3    4
          4    5
          dtype: int64

in index parameter, default index is start from 0 to n (0,1,2,….n) when index is not identified
Here we are creating series with the index parameter
Index length should have equal to the number of data values, otherwise, it shows error

series2 = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c'])
print(series2)

Output >>>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
----> 1 series2 = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c'])
      2 print(series2)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    247                             'Length of passed values is {val}, '
    248                             'index implies {ind}'
--> 249                             .format(val=len(data), ind=len(index)))
    250                 except TypeError:
    251                     pass

ValueError: Length of passed values is 5, index implies 3

ValueError: Length of passed value is 5, index implies 3
We got an error because we passed 3 indexes for 5 data values

series2 = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
print(series2)

Output >>>
          a    1
          b    2
          c    3
          d    4
          e    5
          dtype: int64

We can change index to any numbers, alphabates, names etc.

Above you can see dtype: int64, this means our data type has stored in integer 64 bit.
we can change the data type of series
Changing data type of series (Convert int into a float)

series2 = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'], dtype = float)
print(series2)

Output >>>
          a    1.0
          b    2.0
          c    3.0
          d    4.0
          e    5.0
          dtype: float64

4. Creating series from scalar values
scalar values means single value
e.g. 1, 0.5, ‘indian’

s3_scalar = pd.Series(2)
print(s3_scalar)

Output >>>
          0    2
          dtype: int64

for more data values index should be needed.

s3_scalar = pd.Series(2, index = [1,2,3,4,5])
print(s3_scalar)

Output >>>
          1    2
          2    2
          3    2
          4    2
          5    2
          dtype: int64

5. Creating series from python dictionary

s4_dict = pd.Series({'a':1, 'b':2, 'c':3})
print(s4_dict)

Output >>>
          a    1
          b    2
          c    3
          dtype: int64

Accessing element from series

Pandas Series supports most Python functions.
Now, we are accessing element from series2

print(series2)

Output >>>
          a    1.0
          b    2.0
          c    3.0
          d    4.0
          e    5.0
          dtype: float64

We can access any value or data from series by putting index value

series2[3]

Output >>>
          4.0

series2[4]

Output >>>
          5.0

Slicing series

Here we are slicing series with index value 1 to 4 that means 1 is inclusive(it can be taken) and 4 is exclusive(it can be not taken)

series2[1:4]

Output >>>
          b    2.0
          c    3.0
          d    4.0
          dtype: float64

series can be done by using mathematical operators

Adding two serieses

s5 = pd.Series([1,2,3,4,5])
s6 = pd.Series([1,2,3,4,5])

a = s5 + s6
print(a)

Output >>>
          0     2
          1     4
          2     6
          3     8
          4    10
          dtype: int64

we can also add series using add method

s5.add(s6)

Output >>>
          0     2
          1     4
          2     6
          3     8
          4    10
          dtype: int64

min() operator gives minimum value of particular series

min(a)

Output >>>   2

max() operator gives maximum value

max(a)

Output >>>   10

Conditional operator

If you want to print less than 8 values

a[a < 8]

Output >>>
          0    2
          1    4
          2    6
          dtype: int64

Using drop() function we can eliminate any index value

a.drop(4)

Output >>>
          0    2
          1    4
          2    6
          3    8
          dtype: int64

Now we are printing series6 (s6)

print(s6)

Output >>>
          0    1
          1    2
          2    3
          3    4
          4    5
          dtype: int64

s7 = pd.Series([1,2,3])
print(s7)

Output >>>
          0    1
          1    2
          2    3
          dtype: int64

Pandas have additional functions to fill missing values, it does not show an error when the value is missing. Missing values are shown by NaN.
See below example:

In pandas, we can add the unequal data values series. Here series s6 have 5 data values and s7 have 3 data values, when we perform addition operation it adds successfully

s6 + s7

Output >>>
          0    2.0
          1    4.0
          2    6.0
          3    NaN
          4    NaN
          dtype: float64