Bangalore House Price Prediction App:
In the Machine Learning/Data Science End to End Project in Python Tutorial in Hindi, we explained each and every step of Machine Learning Project / Data Science Project in detail.
Project name: Bangalore house price prediction machine learning project
Project Prerequisites
Steps of Machine Learning Project
Project Journey Start
Project Demo
How to use google colab?
How to import libraries?
How to load dataset?
Exploratory data analysis (EDA)
Prepare data for ML Model
Data Cleaning
How to split data?
Feature Scaling
Model selection and train
How to test Machine Learning Model?
Hyperparameter tuning
Cross validation
Calculate the accuracy of ML Model
Present your solution
How to save model?
How to load model?
How to launch, monitor and maintain model?
Project Source Code
Data Preprocessing
Data Source: https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data
"""
# Bangalore House Price Prediction - Supervised Regression Problem
## Data Preprocessing
**** Project Steps*****
-----------------------------
1. Look at the big picture.
2. Get the data.
3. Discover and visualize the data to gain insights.
4. Prepare the data for Machine Learning algorithms.
5. Select a model and train it.
6. Fine-tune your model.
7. Present your solution.
8. Launch, monitor, and maintain your system.
# 1. Business Problem
The main goal of this project is to find the price of the Bangalorer house using their features.
# Import Libraries
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
"""# 2. Load dataset
Load csv file from google drive
<br>
Main Source: https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data
"""
path = "https://drive.google.com/uc?export=download&id=13mP8FeMX09L3utbPcCDp-U2fXnf53gwx"
df_raw = pd.read_csv(path)
df_raw.shape
df_raw.head()
df_raw.tail()
"""## 3. Exploratory Data Analysis"""
df = df_raw.copy() # get the copy of raw data
# get the information of data
df.info()
# We have only 3 neumerical features - bath, balcony and price
# 6 categorical features - area type, availability, size, society, and total_srft
# Target Feature =======>>>>>> price >>>>>>
# Price in lakh
df.describe()
#observe 75% and max value it shows huge diff
sns.pairplot(df)
# bath and price have slightly linear correlation with some outliers
# value count of each feature
def value_count(df):
for var in df.columns:
print(df[var].value_counts())
print("--------------------------------")
value_count(df)
# correlation heatmap
num_vars = ["bath", "balcony", "price"]
sns.heatmap(df[num_vars].corr(),cmap="coolwarm", annot=True)
# correlation of bath is greater than a balcony with price
"""# 4. Preare Data for Machine Learning Model
## Data cleaning
"""
df.isnull().sum() # find the homuch missing data available
df.isnull().mean()*100 # % of measing value
#society has 41.3% missing value (need to drop)
# visualize missing value using heatmap to get idea where is the value missing
plt.figure(figsize=(16,9))
sns.heatmap(df.isnull())
# Drop ----------> society feature
# because 41.3% missing value
df2 = df.drop('society', axis='columns')
df2.shape
# fill mean value in --------> balcony feature
# because it contain 4.5% missing value
df2['balcony'] = df2['balcony'].fillna(df2['balcony'].mean())
df2.isnull().sum()
# drop na value rows from df2
# because there is very less % value missing
df3 = df2.dropna()
df3.shape
df3.isnull().sum()
df3.head()
"""## Feature Engineering"""
# to show all th ecolumns and rows
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
"""### Converting 'total_sqft' cat feature in numeric"""
df3['total_sqft'].value_counts()
# here we observe that 'total_sqft' contain string value in diff format
#float, int like value 1689.28,817
# range value: 540 - 740
# number and string: 142.84Sq. Meter, 117Sq. Yards, 1Grounds
# best strategy is to convert it into number by spliting it
total_sqft_int = []
for str_val in df3['total_sqft']:
try:
total_sqft_int.append(float(str_val)) # if '123.4' like this value in str then conver in float
except:
try:
temp = []
temp = str_val.split('-')
total_sqft_int.append((float(temp[0])+float(temp[-1]))/2) # '123 - 534' this str value split and take mean
except:
total_sqft_int.append(np.nan) # if value not contain in above format then consider as nan
# reset the index of dataframe
df4 = df3.reset_index(drop=True) # drop=True - don't add index column in df
# join df4 and total_srft_int list
df5 = df4.join(pd.DataFrame({'total_sqft_int':total_sqft_int}))
df5.head()
df5.tail()
df5.isnull().sum()
# drop na value
df6 = df5.dropna()
df6.shape
df6.info()
"""## Working on <<<< Size >>>> feature"""
df6['size'].value_counts()
# size feature shows the number of rooms
"""
in size feature we assume that
2 BHK = 2 Bedroom == 2 RK
so takes only number and remove sufix text
"""
size_int = []
for str_val in df6['size']:
temp=[]
temp = str_val.split(" ")
try:
size_int.append(int(temp[0]))
except:
size_int.append(np.nan)
print("Noice = ",str_val)
df6 = df6.reset_index(drop=True)
# join df6 and list size_int
df7 = df6.join(pd.DataFrame({'bhk':size_int}))
df7.shape
df7.tail()
"""## Finding Outlier and Removing"""
# function to create histogram, Q-Q plot and boxplot
# for Q-Q plots
import scipy.stats as stats
def diagnostic_plots(df, variable):
# function takes a dataframe (df) and
# the variable of interest as arguments
# define figure size
plt.figure(figsize=(16, 4))
# histogram
plt.subplot(1, 3, 1)
sns.distplot(df[variable], bins=30)
plt.title('Histogram')
# Q-Q plot
plt.subplot(1, 3, 2)
stats.probplot(df[variable], dist="norm", plot=plt)
plt.ylabel('Variable quantiles')
# boxplot
plt.subplot(1, 3, 3)
sns.boxplot(y=df[variable])
plt.title('Boxplot')
plt.show()
num_var = ["bath","balcony","total_sqft_int","bhk","price"]
for var in num_var:
print("******* {} *******".format(var))
diagnostic_plots(df7, var)
# here we observe outlier using histogram,, qq plot and boxplot
# here we consider 1 BHK requierd min 350 sqft are
df7[df7['total_sqft_int']/df7['bhk'] < 350].head()
# no we found outliers
# if 1 BHK total_sqft are < 350 then we ae going to remove them
df8 = df7[~(df7['total_sqft_int']/df7['bhk'] < 350)]
df8.shape
# create new feature that is price per squre foot
# it help to find the outliers
#price in lakh so conver into rupee and then / by total_sqft_int
df8['price_per_sqft'] = df8['price']*100000 / df8['total_sqft_int']
df8.head()
df8.price_per_sqft.describe()
#here we can see huge difference between min and max price_per_sqft
# min 6308.502826 max 176470.588235
# Removing outliers using help of 'price per sqrt' taking std and mean per location
def remove_pps_outliers(df):
df_out = pd.DataFrame()
for key, subdf in df.groupby('location'):
m=np.mean(subdf.price_per_sqft)
st=np.std(subdf.price_per_sqft)
reduced_df = subdf[(subdf.price_per_sqft>(m-st))&(subdf.price_per_sqft<=(m+st))]
df_out = pd.concat([df_out, reduced_df], ignore_index = True)
return df_out
df9 = remove_pps_outliers(df8)
df9.shape
def plot_scatter_chart(df,location):
bhk2 = df[(df.location==location) & (df.bhk==2)]
bhk3 = df[(df.location==location) & (df.bhk==3)]
plt.figure(figsize=(16,9))
plt.scatter(bhk2.total_sqft_int, bhk2.price, color='Blue', label='2 BHK', s=50)
plt.scatter(bhk3.total_sqft_int, bhk3.price, color='Red', label='3 BHK', s=50, marker="+")
plt.xlabel("Total Square Feet Area")
plt.ylabel("Price")
plt.title(location)
plt.legend()
plot_scatter_chart(df9, "Rajaji Nagar")
# in below scatterplot we observe that at same location price of
# 2 bhk house is greater than 3 bhk so it is outlier
plot_scatter_chart(df9, "Hebbal")
# in below scatterplot we observe that at same location price of
# 3 bhk house is less than 2 bhk so it is outlier
# Removing BHK outliers
def remove_bhk_outliers(df):
exclude_indices = np.array([])
for location, location_df in df.groupby('location'):
bhk_stats = {}
for bhk, bhk_df in location_df.groupby('bhk'):
bhk_stats[bhk]={
'mean':np.mean(bhk_df.price_per_sqft),
'std':np.std(bhk_df.price_per_sqft),
'count':bhk_df.shape[0]}
for bhk, bhk_df in location_df.groupby('bhk'):
stats=bhk_stats.get(bhk-1)
if stats and stats['count']>5:
exclude_indices = np.append(exclude_indices, bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
return df.drop(exclude_indices, axis='index')
df10 = remove_bhk_outliers(df9)
df10.shape
plot_scatter_chart(df10, "Hebbal")
# In below scatter plot most of the red data point remove fron blue points
"""### Remove outliers using the help of 'bath' feature"""
df10.bath.unique()
df10[df10.bath > df10.bhk+2]
# here we are considering data only total no. bathroom = bhk + 1
df11 = df10[df10.bath < df10.bhk+2]
df11.shape
plt.figure(figsize=(16,9))
for i,var in enumerate(num_var):
plt.subplot(3,2,i+1)
sns.boxplot(df11[var])
df11.head()
df12 = df11.drop(['area_type', 'availability',"location","size","total_sqft"], axis =1)
df12.head()
df12.to_csv("clean_data.csv", index=False) # test ml model on this data
# ML model train on this data and got best score >>>> XGBoost=0.914710
"""# Categorical Variable Encoding"""
df13 = df11.drop(["size","total_sqft"], axis =1)
df13.head()
df14 = pd.get_dummies(df13, drop_first=True, columns=['area_type','availability','location'])
df14.shape
df14.head()
df14.to_csv('oh_encoded_data.csv', index=False) # test ml model on this data
"""In ['area_type','availability','location'] contain multiple classe and if we convert them into OHE so it increase the size of DF
so try to use those classes which are *frequently* present in the car var
## Working on <<<<<< area_type >>>>> feature
"""
df13['area_type'].value_counts()
df15 = df13.copy()
# appy Ohe-Hot encoding on 'area_type' feature
for cat_var in ["Super built-up Area","Built-up Area","Plot Area"]:
df15["area_type"+cat_var] = np.where(df15['area_type']==cat_var, 1,0)
df15.shape
df15.head(2)
"""## Working with <<<<< availability >>>>> Feature"""
df15["availability"].value_counts()
# in availability feature, 10525 house 'Ready to Move" and remaining will be redy on perticuler date
# so we crate new feature ""availability_Ready To Move"" and add vale 1 if availability is Ready To Move else 0
df15["availability_Ready To Move"] = np.where(df15["availability"]=="Ready To Move",1,0)
df15.shape
df15.tail()
"""## Working on <<<< Location >>>> feature"""
location_value_count = df15['location'].value_counts()
location_value_count
location_gert_20 = location_value_count[location_value_count>=20].index
location_gert_20
# location count is greter than 19 then we create column of that feature
# then if this location present in location feature then set value 1 else 0 ( ohe hot encoding)
df16 = df15.copy()
for cat_var in location_gert_20:
df16['location_'+cat_var]=np.where(df16['location']==cat_var, 1,0)
df16.shape
df16.head()
"""## Drop categorical variable"""
df17 = df16.drop(["area_type","availability",'location'], axis =1)
df17.shape
df17.head()
df17.to_csv('ohe_data_reduce_cat_class.csv', index=False)
Machine Learning Model Training and Testing & Save
# -*- coding: utf-8 -*-
"""v2-ML Model-bangalore_house_price_prediction.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1XWPm8WX4DN7ig9rUPXwC1lV2azVn44O2
# Bangalore House Price Prediction - Outlier Detection
This notebook only train ML model on different ml algorithms
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
"""from google.colab import files
files=files.upload()
df = pd.read_csv('oh_encoded_data.csv')"""
# Get clean data
path = r"https://drive.google.com/uc?export=download&id=1P49POlAk27uRzWKXoR2WaEfb1lyyfiRJ" # oh_encoded_data.csv from drive
# This file contain [area_type availability location bath balcony price total_sqft_int bhk price_per_sqft]
# and ['area_type','availability','location'] this are cat var
# We encoded few classes from above car var in OHE
df = pd.read_csv(path)
df.shape
df.shape
df.head()
df = df.drop(['Unnamed: 0'], axis=1)
df.head()
df.shape
"""## Split Dataset in train and test"""
X = df.drop("price", axis=1)
y = df['price']
print('Shape of X = ', X.shape)
print('Shape of y = ', y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 51)
print('Shape of X_train = ', X_train.shape)
print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ', y_test.shape)
"""## Feature Scaling"""
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train= sc.transform(X_train)
X_test = sc.transform(X_test)
"""## Machine Learning Model Training
## Linear Regression
"""
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
lr = LinearRegression()
lr_lasso = Lasso()
lr_ridge = Ridge()
def rmse(y_test, y_pred):
return np.sqrt(mean_squared_error(y_test, y_pred))
lr.fit(X_train, y_train)
lr_score = lr.score(X_test, y_test) # with all num var 0.7842744111909903
lr_rmse = rmse(y_test, lr.predict(X_test))
lr_score, lr_rmse
# Lasso
lr_lasso.fit(X_train, y_train)
lr_lasso_score=lr_lasso.score(X_test, y_test) # with balcony 0.5162364637824872
lr_lasso_rmse = rmse(y_test, lr_lasso.predict(X_test))
lr_lasso_score, lr_lasso_rmse
"""## Support Vector Machine"""
from sklearn.svm import SVR
svr = SVR()
svr.fit(X_train,y_train)
svr_score=svr.score(X_test,y_test) # with 0.2630802200711362
svr_rmse = rmse(y_test, svr.predict(X_test))
svr_score, svr_rmse
"""## Random Forest Regressor"""
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor()
rfr.fit(X_train,y_train)
rfr_score=rfr.score(X_test,y_test) # with 0.8863376025408044
rfr_rmse = rmse(y_test, rfr.predict(X_test))
rfr_score, rfr_rmse
"""## XGBoost"""
import xgboost
xgb_reg = xgboost.XGBRegressor()
xgb_reg.fit(X_train,y_train)
xgb_reg_score=xgb_reg.score(X_test,y_test) # with 0.8838865742273464
xgb_reg_rmse = rmse(y_test, xgb_reg.predict(X_test))
xgb_reg_score, xgb_reg_rmse
print(pd.DataFrame([{'Model': 'Linear Regression','Score':lr_score, "RMSE":lr_rmse},
{'Model': 'Lasso','Score':lr_lasso_score, "RMSE":lr_lasso_rmse},
{'Model': 'Support Vector Machine','Score':svr_score, "RMSE":svr_rmse},
{'Model': 'Random Forest','Score':rfr_score, "RMSE":rfr_rmse},
{'Model': 'XGBoost','Score':xgb_reg_score, "RMSE":xgb_reg_rmse}],
columns=['Model','Score','RMSE']))
"""## Cross Validation"""
'''from sklearn.model_selection import KFold,cross_val_score
cvs = cross_val_score(xgb_reg, X_train,y_train, cv = 10)
cvs, cvs.mean() # 0.9845963377450353)'''
'''cvs_rfr = cross_val_score(rfr, X_train,y_train, cv = 10)
cvs_rfr, cvs_rfr.mean() # 0.9652425691235843)'''
from sklearn.model_selection import cross_val_score
cvs_rfr2 = cross_val_score(RandomForestRegressor(), X_train,y_train, cv = 10)
cvs_rfr2, cvs_rfr2.mean() # 0.9652425691235843)'''
"""# Hyper Parmeter Tuning"""
from sklearn.model_selection import GridSearchCV
from xgboost.sklearn import XGBRegressor
'''
# Various hyper-parameters to tune
xgb1 = XGBRegressor()
parameters = {'learning_rate': [0.1,0.03, 0.05, 0.07], #so called `eta` value, # [default=0.3] Analogous to learning rate in GBM
'min_child_weight': [1,3,5], #[default=1] Defines the minimum sum of weights of all observations required in a child.
'max_depth': [4, 6, 8], #[default=6] The maximum depth of a tree,
'gamma':[0,0.1,0.001,0.2], #Gamma specifies the minimum loss reduction required to make a split.
'subsample': [0.7,1,1.5], #Denotes the fraction of observations to be randomly samples for each tree.
'colsample_bytree': [0.7,1,1.5], #Denotes the fraction of columns to be randomly samples for each tree.
'objective':['reg:linear'], #This defines the loss function to be minimized.
'n_estimators': [100,300,500]}
xgb_grid = GridSearchCV(xgb1,
parameters,
cv = 2,
n_jobs = -1,
verbose=True)
xgb_grid.fit(X_train, y_train)
print(xgb_grid.best_score_) # 0.9397345161940295
print(xgb_grid.best_params_)'''
'''xgb_tune = xgb_grid.estimator
xgb_tune.fit(X_train,y_train) # 0.9117591385438816
xgb_tune.score(X_test,y_test)'''
'''cvs = cross_val_score(xgb_tune, X_train,y_train, cv = 10)
cvs, cvs.mean() # 0.9645582338461773)'''
#[i/10.0 for i in range(1,6)]
#xgb_grid.estimator
xgb_tune2 = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=0.6, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.25, max_delta_step=0,
max_depth=4, min_child_weight=1, missing=None, n_estimators=400,
n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
xgb_tune2.fit(X_train,y_train) # 0.9412851220926807
xgb_tune2.score(X_test,y_test)
'''parameters = {'learning_rate': [0.1,0.03, 0.05, 0.07], #so called `eta` value, # [default=0.3] Analogous to learning rate in GBM
'min_child_weight': [1,3,5], #[default=1] Defines the minimum sum of weights of all observations required in a child.
'max_depth': [4, 6, 8], #[default=6] The maximum depth of a tree,
'gamma':[0,0.1,0.001,0.2], #Gamma specifies the minimum loss reduction required to make a split.
'subsample': [0.7,1,1.5], #Denotes the fraction of observations to be randomly samples for each tree.
'colsample_bytree': [0.7,1,1.5], #Denotes the fraction of columns to be randomly samples for each tree.
'objective':['reg:linear'], #This defines the loss function to be minimized.
'n_estimators': [100,300,500]}'''
xgb_tune2 = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=0.9, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.05, max_delta_step=0,
max_depth=4, min_child_weight=5, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
xgb_tune2.fit(X_train,y_train) # 0.9412851220926807
xgb_tune2.score(X_test,y_test)
cvs = cross_val_score(xgb_tune2, X_train,y_train, cv = 5)
cvs, cvs.mean() # 0.9706000326331659'''
np.sqrt(mean_squared_error(y_test, xgb_tune2.predict(X_test)))
"""## Test Model"""
list(X.columns)
# it help to get predicted value of hosue by providing features value
def predict_house_price(model,bath,balcony,total_sqft_int,bhk,price_per_sqft,area_type,availability,location):
x =np.zeros(len(X.columns)) # create zero numpy array, len = 107 as input value for model
# adding feature's value accorind to their column index
x[0]=bath
x[1]=balcony
x[2]=total_sqft_int
x[3]=bhk
x[4]=price_per_sqft
if "availability"=="Ready To Move":
x[8]=1
if 'area_type'+area_type in X.columns:
area_type_index = np.where(X.columns=="area_type"+area_type)[0][0]
x[area_type_index] =1
#print(area_type_index)
if 'location_'+location in X.columns:
loc_index = np.where(X.columns=="location_"+location)[0][0]
x[loc_index] =1
#print(loc_index)
#print(x)
# feature scaling
x = sc.transform([x])[0] # give 2d np array for feature scaling and get 1d scaled np array
#print(x)
return model.predict([x])[0] # return the predicted value by train XGBoost model
predict_house_price(model=xgb_tune2, bath=3,balcony=2,total_sqft_int=1672,bhk=3,price_per_sqft=8971.291866,area_type="Plot Area",availability="Ready To Move",location="Devarabeesana Halli")
##test sample
#area_type availability location bath balcony price total_sqft_int bhk price_per_sqft
#2 Super built-up Area Ready To Move Devarabeesana Halli 3.0 2.0 150.0 1750.0 3 8571.428571
predict_house_price(model=xgb_tune2, bath=3,balcony=2,total_sqft_int=1750,bhk=3,price_per_sqft=8571.428571,area_type="Super built-up",availability="Ready To Move",location="Devarabeesana Halli")
##test sample
#area_type availability location bath balcony price total_sqft_int bhk price_per_sqft
#1 Built-up Area Ready To Move Devarabeesana Halli 3.0 3.0 149.0 1750.0 3 8514.285714
predict_house_price(model=xgb_tune2,bath=3,balcony=3,total_sqft_int=1750,bhk=3,price_per_sqft=8514.285714,area_type="Built-up Area",availability="Ready To Move",location="Devarabeesana Halli")
"""# Save model & load model"""
import joblib
# save model
joblib.dump(xgb_tune2, 'bangalore_house_price_prediction_model.pkl')
joblib.dump(rfr, 'bangalore_house_price_prediction_rfr_model.pkl')
# load model
bangalore_house_price_prediction_model = joblib.load("bangalore_house_price_prediction_model.pkl")
# predict house price
predict_house_price(bangalore_house_price_prediction_model,bath=3,balcony=3,total_sqft_int=150,bhk=3,price_per_sqft=8514.285714,area_type="Built-up Area",availability="Ready To Move",location="Devarabeesana Halli")
ML Model Deployment Code
HTML File
<!-- Bangalore House Price Predictor -->
<!DOCTYPE html>
<html >
<head>
<meta charset="UTF-8">
<title>Bangalore House Price Predictor ML App</title>
<style>
body {
background-image: url('static/images/house3.jpg');
background-repeat: no-repeat;
background-attachment: fixed;
background-size: cover;
}
h1 {color: red;} /* CSS code for heading h1 */
p {color: yellow;} /* CSS code for heading h1 */
/* CSS code for button */
.button_css {
color: #494949 !important;
text-transform: uppercase;
text-decoration: none;
background: #ffffff;
padding: 20px;
border: 4px solid #494949 !important;
display: inline-block;
transition: all 0.4s ease 0s;
}
.button_css:hover {
color: #ffffff !important;
background: #f6b93b;
border-color: #f6b93b !important;
transition: all 0.4s ease 0s;
}
.footer {
position: fixed;
left: 0;
bottom: 0;
width: 100%;
background-color: #203864;
color: white;
text-align: center;
/* unvisited link */
a:link { color: White; }
/* visited link */
a:visited { color: green; }
}
</style>
</head>
<body>
<!-- Show Oxstandhard Univercity Banner-->
<div>
<img src="static/images/bangalore house banner.png" class="w3-border w3-padding" alt="Indian AI Production" style="width:100%">
</div>
<div class="login">
<!-- Form Get input to predict Marks-->
<center>
<form action="{{ url_for('predict')}}"method="post">
<h1>*Enter the Information of House to Predict the Price*</h1>
<input align="center" type="number" name="bathrooms" placeholder="Bathrooms" required="required" width="48" height="10" step=".01"/><br>
<input align="center" type="number" name="balcony" placeholder="Balcony" required="required" width="48" height="10" step=".01"/><br>
<input align="center" type="number" name="total_sqft_int" placeholder="Total Squre Foot" required="required" width="48" height="10" step=".01"/><br>
<input align="center" type="number" name="bhk" placeholder="BHK" required="required" width="48" height="10" step=".01"/><br>
<input align="center" type="number" name="price_per_sqft" placeholder="Price Per Squre Foot" required="required" width="48" height="10" step=".01"/><br>
<input type="text" name="area_type" placeholder="Area Type" required="required" /><br>
<input type="text" name="availability" placeholder="House Availability" required="required" /><br>
<input type="text" name="location" placeholder="House Location" required="required" />
<br>
<br>
<!-- Show button -->
<div class="button_cont" align="center"><a class="button_css" href="https://indianaiproduction.com/" target="_blank" rel="nofollow noopener">
<button type="submit" class="btn btn-primary btn-block btn-large"><strong>Predict House Price</strong></button></a>
</div>
</form>
</center>
<!-- Show predicted output using ML model -->
<div>
<center>
<h2>{{ prediction_text }}</h2>
</center>
</div>
</div>
<div class="footer">
<p>Indian AI Production<br>
<a href="http://youtube.com/indianaiproduction">Channel |</a>
<a href="https://indianaiproduction.com/">| Website</a>
</p>
</div>
</body>
</html>
Model File
#Import Libraries
import numpy as np
import pandas as pd
import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
#load data
df = pd.read_csv("data/ohe_data_reduce_cat_class.csv")
# Split data
X= df.drop('price', axis=1)
y= df['price']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=51)
# feature scaling
sc = StandardScaler()
sc.fit(X_train)
X_train = sc.transform(X_train)
X_test = sc.transform(X_test)
###### Load Model
model = joblib.load('bangalore_house_price_prediction_rfr_model.pkl')
# it help to get predicted value of house by providing features value
def predict_house_price(bath,balcony,total_sqft_int,bhk,price_per_sqft,area_type,availability,location):
x =np.zeros(len(X.columns)) # create zero numpy array, len = 107 as input value for model
# adding feature's value accorind to their column index
x[0]=bath
x[1]=balcony
x[2]=total_sqft_int
x[3]=bhk
x[4]=price_per_sqft
if "availability"=="Ready To Move":
x[8]=1
if 'area_type'+area_type in X.columns:
area_type_index = np.where(X.columns=="area_type"+area_type)[0][0]
x[area_type_index] =1
if 'location_'+location in X.columns:
loc_index = np.where(X.columns=="location_"+location)[0][0]
x[loc_index] =1
# feature scaling
x = sc.transform([x])[0] # give 2d np array for feature scaling and get 1d scaled np array
return model.predict([x])[0] # return the predicted value by train XGBoost model
App File
#Import Libraries
from flask import Flask, request, render_template
import model # load model.py
app = Flask(__name__)
# render htmp page
@app.route('/')
def home():
return render_template('index.html')
# get user input and the predict the output and return to user
@app.route('/predict',methods=['POST'])
def predict():
#take data from form and store in each feature
input_features = [x for x in request.form.values()]
bath = input_features[0]
balcony = input_features[1]
total_sqft_int = input_features[2]
bhk = input_features[3]
price_per_sqft = input_features[4]
area_type = input_features[5]
availability = input_features[6]
location = input_features[7]
# predict the price of house by calling model.py
predicted_price = model.predict_house_price(bath,balcony,total_sqft_int,bhk,price_per_sqft,area_type,availability,location)
# render the html page and show the output
return render_template('index.html', prediction_text='Predicted Price of Bangalore House is {}'.format(predicted_price))
# if __name__ == "__main__":
# app.run(host="0.0.0.0", port="8080")
if __name__ == "__main__":
app.run()
Download All above Files
For any help and feedback share your comment…..:)
Can You make Tutorial on Time Series Forecasting and Principal Component Analysis and How to Do Parameter Tuning in ML Models .
Thanks
yes
Can you provide documentation for this project or at least abstract and the base paper
sir plz reupload this projects rar file again,
that rar file is in your website not opening properly
Hi nice tutorial. Can you please tell me for same files how to deploy on heroku. I tried to do but heroku is not reading the second python file. Can you please help ?
I am not able to run this file ??? Help me out
Can u give me project report