Data is a powerful weapon to handle the world on the finger. As you know Customer is the king of business but customer’s data is the god of the customer and if you have customer’s god in your database then you can handle customers according to you. It is the truth.
Now let’s talk about data cleaning, In this tutorial, we are working on How to handle missing values or data by different methods to make Machine Learning model powerful to provide an accurate prediction.
What are the Methods to Handle Missing Values/Data?
In general, there are the best 6 methods to handle missing data or values. It is the part of Data Preprocessing and this is the most important step to build Machine Learning/Data Science project. The following are the most popular methods to handle missing data.
•Ignore missing values row / Delete row
•Fill missing value manually
•Use global constant
•Measure of central tendency (Mean, Median & Mode)
•Measure of central tendency for each class
•Most probable value ( ML Algorithms)
1. Ignore Missing Values Row / Delete Row & Columns
This is the simple method to clean missing data by just deleting rows and columns.
Download the Jupyter Notebook file given bellow, also watch the premium video for better understanding. Video link in the Jupyter Notebook file.
2. Measure of central tendency (Mean, Median & Mode)
3. Measure of Central Tendency for Each Class
Numerical Missing Value Imputation By Class
4. Measure of Central Tendency using Mode
Categorical Missing Value Imputation By Mode, Global Constant, Manually
5. Missing Value Imputation using Scikit-Learn
We explained how to clean categorical and numerical data using powerful library ‘Scikit-Learn’ by the following methods to handle missing value:
- Missing value imputation using the measure of central tendency (Mean, Median, Mode)
- Fill missing value manually
- Fill missing value by a global constant
- Scikit-Learn Official Site => Click Here
- Installation of Scikit-Learn => Click Here
- Impute missing value using sklearn.impute.SimpleImputer => Click Here
Different Strategy for Different Variables
This tutorial under construction…….