In this ML course tutorial, we are going to learn the “Linear Regression Machine Learning Algorithm in detail. we covered Simple Linear regression and Multiple Linear regression supervised regression learning algorithm by practical and theoretical intuition.
# Business Problem - Predict the Price of Bangalore House
#Using Linear Regression - Supervised Machine Learning Algorithm
### Load Libraries
"""
import pandas as pd
"""### Load Data"""
path = r"https://drive.google.com/uc?export=download&id=1xxDtrZKfuWQfl-6KA9XEd_eatitNPnkB"
df = pd.read_csv(path)
df.head()
"""### Split Data"""
X = df.drop('price', axis=1)
y = df['price']
print('Shape of X = ', X.shape)
print('Shape of y = ', y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=51)
print('Shape of X_train = ', X_train.shape)
print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ', y_test.shape)
"""### Feature Scaling"""
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train = sc.transform(X_train)
X_test = sc.transform(X_test)
"""## Linear Regression - ML Model Training"""
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
lr.coef_
lr.intercept_
"""## Predict the value of Home and Test"""
X_test[0, :]
lr.predict([X_test[0, :]])
lr.predict(X_test)
y_test
lr.score(X_test, y_test)
I am using this data set for bangalore house value prediction but that contains categorical value but for ML Algorithm data requires in the Numeric format
you have to preprocess the data or you can use the path provided above
Sir , please provide Dataset link .
just copy paste the provided path
path = r”https://drive.google.com/uc?export=download&id=1xxDtrZKfuWQfl-6KA9XEd_eatitNPnkB”
use it
what if i have json dataset
Sir this code of machine learning is very useful for us.Thank you so much.
Can you explain how you have created dummy variable for location? there are more than 1k locations mentioned in the dataset but in the processed file shared by you, there are hardly 98 locations added? what about size column where only bedrooms are mentioned such as 1 bedroom, 2 bedroom or 5 bedrooms etc.? there are few records where size, bath and colony are blank so do we need to remove these records?