Pandas-handling-missing-values

Pandas DataFrame | Mastering in Python Pandas Library

Python Pandas DataFrame

Pandas DataFrame is two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes(rows & columns).

Here practically explanation about DataFrame.

Creating DataFrame with different ways

1. Creating empty dataframe

import pandas as pd
emt_df = pd.DataFrame()
print(emt_df)
Output >>>
          Empty DataFrame
          Columns: []
          Index: []

2. Creating dataframe from list

lst = ['a', 'b', 'c']   # First creating a list
print(lst)
Output >>>   ['a', 'b', 'c']
df1 = pd.DataFrame(lst)    # Creating dataframe from above list
print(df1)
  Output >>>
                0
            0   a
            1   b
            2   c

We can also inline print that command just using that variable name, without using print function

df1
Output >>>
              0
          0	  a
          1	  b
          2	  c

Here first row(0) is data values column index/label and first column is index (which is start from 0) and second column have data values.

3. Creating dataframe from list of list

ls_of_ls = [[1,2,3], [2,3,4], [4,5,6]]   # Creating list of list
print(ls_of_ls)
Output >>>   [[1, 2, 3], [2, 3, 4], [4, 5, 6]]
df2 = pd.DataFrame(ls_of_ls)   # Creating dataframe form above list of list
df2
Output >>>	
               0     1     2
          0	   1	 2     3
          1	   2	 3     4
          2	   4	 5     6

Here first row (0,1,2) is column index/label and three data values columns

4. Creating dataframe from dict or dictionary or python dictionary

dict1 = {'ID': [11,22,33,44]}   # Creating dict
dict1
Output >>>   {'ID': [11, 22, 33, 44]}
df3 = pd.DataFrame(dict1)    # Creating dataframe from above dict
df3
Output >>>
     	      ID
          0	  11
          1	  22
          2	  33
          3	  44

For more data values columns

dict2 = {'ID': [11,22,33,44], 'SN': [1,2,3,4]}
dict2
Output >>>   {'ID': [11, 22, 33, 44], 'SN': [1, 2, 3, 4]}
df4 = pd.DataFrame(dict1)
df4
Output >>>
             ID  SN
          0	 11	 1
          1	 22	 2
          2	 33	 3
          3	 44	 4

Here dataframe have two columns 

5. Creating dataframe from list of dict

ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4}]   # Creating list of dict
df5 = pd.DataFrame(ls_dict)   # Creating dataframe from list of dict
df5
Output >>>
             a  b
          0	 1	2
          1	 3	4
           
# Creating dataframe from list of dict with different way

ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4, 'c':5}]   
df6 = pd.DataFrame(ls_dict)
df6
Output >>>
             a	b	c
          0	 1	2	NaN
          1	 3	4	5.0

Here in first dictionary ‘c’ is not defined but that command not gives error because pandas has function to handle missing values (which is shown by NaN)
NaN means not a number

6. Creating dataframe from dict of series

dict_sr = {'ID': pd.Series([1,2,3]), 'SN': pd.Series([111,222,333])}
df7 = pd.DataFrame(dict_sr)
df7
Output >>>
          	ID	SN
          0	1	111
          1	2	222
          2	3	333

Learn more Python Libraries

Python Pandas Tutorial

Python NumPY Tutorial

Python Matplotlib Tutorial

Leave a Reply