Pandas DataFrame | Mastering in Python Pandas Library
Python Pandas DataFrame
Pandas DataFrame is two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes(rows & columns).
Here practically explanation about DataFrame.
Creating DataFrame with different ways
1. Creating empty dataframe
import pandas as pd
emt_df = pd.DataFrame() print(emt_df)
Output >>> Empty DataFrame Columns: [] Index: []
2. Creating dataframe from list
lst = ['a', 'b', 'c'] # First creating a list print(lst)
Output >>> ['a', 'b', 'c']
df1 = pd.DataFrame(lst) # Creating dataframe from above list print(df1)
Output >>> 0 0 a 1 b 2 c
We can also inline print that command just using that variable name, without using print function
df1
Output >>> 0 0 a 1 b 2 c
Here first row(0) is data values column index/label and first column is index (which is start from 0) and second column have data values.
3. Creating dataframe from list of list
ls_of_ls = [[1,2,3], [2,3,4], [4,5,6]] # Creating list of list print(ls_of_ls)
Output >>> [[1, 2, 3], [2, 3, 4], [4, 5, 6]]
df2 = pd.DataFrame(ls_of_ls) # Creating dataframe form above list of list df2
Output >>> 0 1 2 0 1 2 3 1 2 3 4 2 4 5 6
Here first row (0,1,2) is column index/label and three data values columns
4. Creating dataframe from dict or dictionary or python dictionary
dict1 = {'ID': [11,22,33,44]} # Creating dict dict1
Output >>> {'ID': [11, 22, 33, 44]}
df3 = pd.DataFrame(dict1) # Creating dataframe from above dict df3
Output >>> ID 0 11 1 22 2 33 3 44
For more data values columns
dict2 = {'ID': [11,22,33,44], 'SN': [1,2,3,4]} dict2
Output >>> {'ID': [11, 22, 33, 44], 'SN': [1, 2, 3, 4]}
df4 = pd.DataFrame(dict1) df4
Output >>> ID SN 0 11 1 1 22 2 2 33 3 3 44 4
Here dataframe have two columns
5. Creating dataframe from list of dict
ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4}] # Creating list of dict df5 = pd.DataFrame(ls_dict) # Creating dataframe from list of dict df5
Output >>> a b 0 1 2 1 3 4
# Creating dataframe from list of dict with different way ls_dict = [{'a':1, 'b':2}, {'a':3, 'b':4, 'c':5}] df6 = pd.DataFrame(ls_dict) df6
Output >>> a b c 0 1 2 NaN 1 3 4 5.0
Here in first dictionary ‘c’ is not defined but that command not gives error because pandas has function to handle missing values (which is shown by NaN)
NaN means not a number
6. Creating dataframe from dict of series
dict_sr = {'ID': pd.Series([1,2,3]), 'SN': pd.Series([111,222,333])} df7 = pd.DataFrame(dict_sr) df7
Output >>> ID SN 0 1 111 1 2 222 2 3 333
Learn more Python Libraries