House Prices: Advanced Regression Techniques¶

Goal of the Project¶

Predict the price of a house by its features. If you are a buyer or seller of the house but you don’t know the exact price of the house, so supervised machine learning regression algorithms can help you to predict the price of the house just providing features of the target house.

Import essential libraries¶

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load Data Set¶

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

print("Shape of train: ", train.shape)
print("Shape of test: ", test.shape)

Shape of train:  (1460, 81)
Shape of test:  (1459, 80)

train.head(10)

test.head(10)

## concat train and test
df = pd.concat((train, test))
temp_df = df
print("Shape of df: ", df.shape)

Shape of df:  (2919, 81)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

df.head(6)

df.tail(6)

Exploratory Data Analysis (EDA)¶

# To show the all columns
pd.set_option("display.max_columns", 2000)
pd.set_option("display.max_rows", 85)

df.head(6)

df.tail(6)

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2919 entries, 0 to 1458
Data columns (total 81 columns):
1stFlrSF         2919 non-null int64
2ndFlrSF         2919 non-null int64
3SsnPorch        2919 non-null int64
Alley            198 non-null object
BedroomAbvGr     2919 non-null int64
BldgType         2919 non-null object
BsmtCond         2837 non-null object
BsmtExposure     2837 non-null object
BsmtFinSF1       2918 non-null float64
BsmtFinSF2       2918 non-null float64
BsmtFinType1     2840 non-null object
BsmtFinType2     2839 non-null object
BsmtFullBath     2917 non-null float64
BsmtHalfBath     2917 non-null float64
BsmtQual         2838 non-null object
BsmtUnfSF        2918 non-null float64
CentralAir       2919 non-null object
Condition1       2919 non-null object
Condition2       2919 non-null object
Electrical       2918 non-null object
EnclosedPorch    2919 non-null int64
ExterCond        2919 non-null object
ExterQual        2919 non-null object
Exterior1st      2918 non-null object
Exterior2nd      2918 non-null object
Fence            571 non-null object
FireplaceQu      1499 non-null object
Fireplaces       2919 non-null int64
Foundation       2919 non-null object
FullBath         2919 non-null int64
Functional       2917 non-null object
GarageArea       2918 non-null float64
GarageCars       2918 non-null float64
GarageCond       2760 non-null object
GarageFinish     2760 non-null object
GarageQual       2760 non-null object
GarageType       2762 non-null object
GarageYrBlt      2760 non-null float64
GrLivArea        2919 non-null int64
HalfBath         2919 non-null int64
Heating          2919 non-null object
HeatingQC        2919 non-null object
HouseStyle       2919 non-null object
Id               2919 non-null int64
KitchenAbvGr     2919 non-null int64
KitchenQual      2918 non-null object
LandContour      2919 non-null object
LandSlope        2919 non-null object
LotArea          2919 non-null int64
LotConfig        2919 non-null object
LotFrontage      2433 non-null float64
LotShape         2919 non-null object
LowQualFinSF     2919 non-null int64
MSSubClass       2919 non-null int64
MSZoning         2915 non-null object
MasVnrArea       2896 non-null float64
MasVnrType       2895 non-null object
MiscFeature      105 non-null object
MiscVal          2919 non-null int64
MoSold           2919 non-null int64
Neighborhood     2919 non-null object
OpenPorchSF      2919 non-null int64
OverallCond      2919 non-null int64
OverallQual      2919 non-null int64
PavedDrive       2919 non-null object
PoolArea         2919 non-null int64
PoolQC           10 non-null object
RoofMatl         2919 non-null object
RoofStyle        2919 non-null object
SaleCondition    2919 non-null object
SalePrice        1460 non-null float64
SaleType         2918 non-null object
ScreenPorch      2919 non-null int64
Street           2919 non-null object
TotRmsAbvGrd     2919 non-null int64
TotalBsmtSF      2918 non-null float64
Utilities        2917 non-null object
WoodDeckSF       2919 non-null int64
YearBuilt        2919 non-null int64
YearRemodAdd     2919 non-null int64
YrSold           2919 non-null int64
dtypes: float64(12), int64(26), object(43)
memory usage: 1.8+ MB

df.describe()

df.select_dtypes(include=['int64', 'float64']).columns

Index(['1stFlrSF', '2ndFlrSF', '3SsnPorch', 'BedroomAbvGr', 'BsmtFinSF1',
       'BsmtFinSF2', 'BsmtFullBath', 'BsmtHalfBath', 'BsmtUnfSF',
       'EnclosedPorch', 'Fireplaces', 'FullBath', 'GarageArea', 'GarageCars',
       'GarageYrBlt', 'GrLivArea', 'HalfBath', 'Id', 'KitchenAbvGr', 'LotArea',
       'LotFrontage', 'LowQualFinSF', 'MSSubClass', 'MasVnrArea', 'MiscVal',
       'MoSold', 'OpenPorchSF', 'OverallCond', 'OverallQual', 'PoolArea',
       'SalePrice', 'ScreenPorch', 'TotRmsAbvGrd', 'TotalBsmtSF', 'WoodDeckSF',
       'YearBuilt', 'YearRemodAdd', 'YrSold'],
      dtype='object')

df.select_dtypes(include=['object']).columns

Index(['Alley', 'BldgType', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1',
       'BsmtFinType2', 'BsmtQual', 'CentralAir', 'Condition1', 'Condition2',
       'Electrical', 'ExterCond', 'ExterQual', 'Exterior1st', 'Exterior2nd',
       'Fence', 'FireplaceQu', 'Foundation', 'Functional', 'GarageCond',
       'GarageFinish', 'GarageQual', 'GarageType', 'Heating', 'HeatingQC',
       'HouseStyle', 'KitchenQual', 'LandContour', 'LandSlope', 'LotConfig',
       'LotShape', 'MSZoning', 'MasVnrType', 'MiscFeature', 'Neighborhood',
       'PavedDrive', 'PoolQC', 'RoofMatl', 'RoofStyle', 'SaleCondition',
       'SaleType', 'Street', 'Utilities'],
      dtype='object')

# Set index as Id column
df = df.set_index("Id")

df.head(6)

# Show the null values using heatmap
plt.figure(figsize=(16,9))
sns.heatmap(df.isnull())

<matplotlib.axes._subplots.AxesSubplot at 0x1bb7ac45908>

# Get the percentages of null value
null_percent = df.isnull().sum()/df.shape[0]*100
null_percent

1stFlrSF          0.000000
2ndFlrSF          0.000000
3SsnPorch         0.000000
Alley            93.216855
BedroomAbvGr      0.000000
BldgType          0.000000
BsmtCond          2.809181
BsmtExposure      2.809181
BsmtFinSF1        0.034258
BsmtFinSF2        0.034258
BsmtFinType1      2.706406
BsmtFinType2      2.740665
BsmtFullBath      0.068517
BsmtHalfBath      0.068517
BsmtQual          2.774923
BsmtUnfSF         0.034258
CentralAir        0.000000
Condition1        0.000000
Condition2        0.000000
Electrical        0.034258
EnclosedPorch     0.000000
ExterCond         0.000000
ExterQual         0.000000
Exterior1st       0.034258
Exterior2nd       0.034258
Fence            80.438506
FireplaceQu      48.646797
Fireplaces        0.000000
Foundation        0.000000
FullBath          0.000000
Functional        0.068517
GarageArea        0.034258
GarageCars        0.034258
GarageCond        5.447071
GarageFinish      5.447071
GarageQual        5.447071
GarageType        5.378554
GarageYrBlt       5.447071
GrLivArea         0.000000
HalfBath          0.000000
Heating           0.000000
HeatingQC         0.000000
HouseStyle        0.000000
KitchenAbvGr      0.000000
KitchenQual       0.034258
LandContour       0.000000
LandSlope         0.000000
LotArea           0.000000
LotConfig         0.000000
LotFrontage      16.649538
LotShape          0.000000
LowQualFinSF      0.000000
MSSubClass        0.000000
MSZoning          0.137033
MasVnrArea        0.787941
MasVnrType        0.822199
MiscFeature      96.402878
MiscVal           0.000000
MoSold            0.000000
Neighborhood      0.000000
OpenPorchSF       0.000000
OverallCond       0.000000
OverallQual       0.000000
PavedDrive        0.000000
PoolArea          0.000000
PoolQC           99.657417
RoofMatl          0.000000
RoofStyle         0.000000
SaleCondition     0.000000
SalePrice        49.982871
SaleType          0.034258
ScreenPorch       0.000000
Street            0.000000
TotRmsAbvGrd      0.000000
TotalBsmtSF       0.034258
Utilities         0.068517
WoodDeckSF        0.000000
YearBuilt         0.000000
YearRemodAdd      0.000000
YrSold            0.000000
dtype: float64

col_for_drop = null_percent[null_percent > 20].keys() # if the null value % 20 or > 20 so need to drop it

# drop columns
df = df.drop(col_for_drop, "columns")
df.shape

(2919, 74)

# find the unique value count
for i in df.columns:
    print(i + "\t" + str(len(df[i].unique())))

1stFlrSF	1083
2ndFlrSF	635
3SsnPorch	31
BedroomAbvGr	8
BldgType	5
BsmtCond	5
BsmtExposure	5
BsmtFinSF1	992
BsmtFinSF2	273
BsmtFinType1	7
BsmtFinType2	7
BsmtFullBath	5
BsmtHalfBath	4
BsmtQual	5
BsmtUnfSF	1136
CentralAir	2
Condition1	9
Condition2	8
Electrical	6
EnclosedPorch	183
ExterCond	5
ExterQual	4
Exterior1st	16
Exterior2nd	17
Fireplaces	5
Foundation	6
FullBath	5
Functional	8
GarageArea	604
GarageCars	7
GarageCond	6
GarageFinish	4
GarageQual	6
GarageType	7
GarageYrBlt	104
GrLivArea	1292
HalfBath	3
Heating	6
HeatingQC	5
HouseStyle	8
KitchenAbvGr	4
KitchenQual	5
LandContour	4
LandSlope	3
LotArea	1951
LotConfig	5
LotFrontage	129
LotShape	4
LowQualFinSF	36
MSSubClass	16
MSZoning	6
MasVnrArea	445
MasVnrType	5
MiscVal	38
MoSold	12
Neighborhood	25
OpenPorchSF	252
OverallCond	9
OverallQual	10
PavedDrive	3
PoolArea	14
RoofMatl	8
RoofStyle	6
SaleCondition	6
SaleType	10
ScreenPorch	121
Street	2
TotRmsAbvGrd	14
TotalBsmtSF	1059
Utilities	3
WoodDeckSF	379
YearBuilt	118
YearRemodAdd	61
YrSold	5

# find unique values of each column
for i in df.columns:
    print("Unique value of:>>> {} ({})\n{}\n".format(i, len(df[i].unique()), df[i].unique()))

Unique value of:>>> 1stFlrSF (1083)
[ 856 1262  920 ... 1778 1650 1960]

Unique value of:>>> 2ndFlrSF (635)
[ 854    0  866  756 1053  566  983  752 1142 1218  668 1320  631  716
  676  860 1519  530  808  977 1330  833  765  462  213  548  960  670
 1116  876  612 1031  881  790  755  592  939  520  639  656 1414  884
  729 1523  728  351  688  941 1032  848  836  475  739 1151  448  896
  524 1194  956 1070 1096  467  547  551  880  703  901  720  316 1518
  704 1178  754  601 1360  929  445  564  882  920  518  817 1257  741
  672 1306  504 1304 1100  730  689  591  888 1020  828  700  842 1286
  864  829 1092  709  844 1106  596  807  625  649  698  840  780  568
  795  648  975  702 1242 1818 1121  371  804  325  809 1200  871 1274
 1347 1332 1177 1080  695  167  915  576  605  862  495  403  838  517
 1427  784  711  468 1081  886  793  665  858  874  526  590  406 1157
  299  936  438 1098  766 1101 1028 1017 1254  378 1160  682  110  600
  678  834  384  512  930  868  224 1103  560  811  878  574  910  620
  687  546  902 1000  846 1067  914  660 1538 1015 1237  611  707  527
 1288  832  806 1182 1040  439  717  511 1129 1370  636  533  745  584
  812  684  595  988  800  677  573 1066  778  661 1440  872  788  843
  713  567  651  762  482  738  586  679  644  900  887 1872 1281  472
 1312  319  978 1093  473  664 1540 1276  441  348 1060  714  744 1203
  783 1097  734  767 1589  742  686 1128 1111 1174  787 1072 1088 1063
  545  966  623  432  581  540  769 1051  761  779  514  455 1426  785
  521  252  813 1120 1037 1169 1001 1215  928 1140 1243  571 1196 1038
  561  979  701  332  368  883 1336 1141  634  912  798  985  826  831
  750  456  602  855  336  408  980  998 1168 1208  797  850  898 1054
  895  954  772 1230  727  454  370  628  304  582 1122 1134  885  640
  580 1112  653  220  240 1362  534  539  650  918  933  712 1796  971
 1175  743  523 1216 2065  272  685  776  630  984  875  913  464 1039
 1259  940  892  725  924  764  925 1479  192  589  992  903  430  748
  587  994  950 1323  732 1357  557 1296  390 1185  873 1611  457  796
  908  550  989  932  358 1392  349  691 1349  768  208  622  857  556
 1044  708  626  904  510 1104  830  981  870  694 1152  563  823  604
  715  532  537  505  424  606  185  498  492  608 1074  662  499  180
  942  558  614  328 1788 1075  380  615  645  663 1275  816  839 1325
 1012 1295  683 1126 1089 1221  967  841 1209  897  786 1629  782 1369
  972 1315  726  322  760  629  496  690  646  917  624  320  588  425
  747 1114 1619  718  815  926  444  436 1240  516 1420 1158 1162 1139
 1285 1061 1250  919  861  794  825  893 1319  959  792 1345  453  412
  182  501  375  680  658  552  396  308  973  363  594  554  428  536
  486 1721 1099  735  899 1198  343  673  442  890  943  330  420  770
 1342 1377  845 1402 1036  570 1238  923  757 1048 1131 1407 1171 1277
  995  528  863 1232  976 1008 1309  228  500  544 1778  616  494  642
  659  671  144  525  423 1164  356  245 1042  477 1005 1087  638  400
  376  916  927  869  753  450 1133  674  125  531  585  775  851  957
 1340  955  990 1384 1862 1371 1405 1358  465  466 1335  814  488 1321
 1029 1368 1567 1189 1234 1248  821 1007  476  502  867  297  810  434
  583  341 1836  541 1246 1124 1045  827 1150  312  218  493  736  818
  610  549  697  360 1004]

Unique value of:>>> 3SsnPorch (31)
[  0 320 407 130 180 168 140 508 238 245 196 144 182 162  23 216  96 153
 290 304 224 255 225 360 150 174 120 219 176  86 323]

Unique value of:>>> BedroomAbvGr (8)
[3 4 1 2 0 5 6 8]

Unique value of:>>> BldgType (5)
['1Fam' '2fmCon' 'Duplex' 'TwnhsE' 'Twnhs']

Unique value of:>>> BsmtCond (5)
['TA' 'Gd' nan 'Fa' 'Po']

Unique value of:>>> BsmtExposure (5)
['No' 'Gd' 'Mn' 'Av' nan]

Unique value of:>>> BsmtFinSF1 (992)
[7.060e+02 9.780e+02 4.860e+02 2.160e+02 6.550e+02 7.320e+02 1.369e+03
 8.590e+02 0.000e+00 8.510e+02 9.060e+02 9.980e+02 7.370e+02 7.330e+02
 5.780e+02 6.460e+02 5.040e+02 8.400e+02 1.880e+02 2.340e+02 1.218e+03
 1.277e+03 1.018e+03 1.153e+03 1.213e+03 7.310e+02 6.430e+02 9.670e+02
 7.470e+02 2.800e+02 1.790e+02 4.560e+02 1.351e+03 2.400e+01 7.630e+02
 1.820e+02 1.040e+02 1.810e+03 3.840e+02 4.900e+02 6.490e+02 6.320e+02
 9.410e+02 7.390e+02 9.120e+02 1.013e+03 6.030e+02 1.880e+03 5.650e+02
 3.200e+02 4.620e+02 2.280e+02 3.360e+02 4.480e+02 1.201e+03 3.300e+01
 5.880e+02 6.000e+02 7.130e+02 1.046e+03 6.480e+02 3.100e+02 1.162e+03
 5.200e+02 1.080e+02 5.690e+02 1.200e+03 2.240e+02 7.050e+02 4.440e+02
 2.500e+02 9.840e+02 3.500e+01 7.740e+02 4.190e+02 1.700e+02 1.470e+03
 9.380e+02 5.700e+02 3.000e+02 1.200e+02 1.160e+02 5.120e+02 5.670e+02
 4.450e+02 6.950e+02 4.050e+02 1.005e+03 6.680e+02 8.210e+02 4.320e+02
 1.300e+03 5.070e+02 6.790e+02 1.332e+03 2.090e+02 6.800e+02 7.160e+02
 1.400e+03 4.160e+02 4.290e+02 2.220e+02 5.700e+01 6.600e+02 1.016e+03
 3.700e+02 3.510e+02 3.790e+02 1.288e+03 3.600e+02 6.390e+02 4.950e+02
 2.880e+02 1.398e+03 4.770e+02 8.310e+02 1.904e+03 4.360e+02 3.520e+02
 6.110e+02 1.086e+03 2.970e+02 6.260e+02 5.600e+02 3.900e+02 5.660e+02
 1.126e+03 1.036e+03 1.088e+03 6.410e+02 6.170e+02 6.620e+02 3.120e+02
 1.065e+03 7.870e+02 4.680e+02 3.600e+01 8.220e+02 3.780e+02 9.460e+02
 3.410e+02 1.600e+01 5.500e+02 5.240e+02 5.600e+01 3.210e+02 8.420e+02
 6.890e+02 6.250e+02 3.580e+02 4.020e+02 9.400e+01 1.078e+03 3.290e+02
 9.290e+02 6.970e+02 1.573e+03 2.700e+02 9.220e+02 5.030e+02 1.334e+03
 3.610e+02 6.720e+02 5.060e+02 7.140e+02 4.030e+02 7.510e+02 2.260e+02
 6.200e+02 5.460e+02 3.920e+02 4.210e+02 9.050e+02 9.040e+02 4.300e+02
 6.140e+02 4.500e+02 2.100e+02 2.920e+02 7.950e+02 1.285e+03 8.190e+02
 4.200e+02 8.410e+02 2.810e+02 8.940e+02 1.464e+03 7.000e+02 2.620e+02
 1.274e+03 5.180e+02 1.236e+03 4.250e+02 6.920e+02 9.870e+02 9.700e+02
 2.800e+01 2.560e+02 1.619e+03 4.000e+01 8.460e+02 1.124e+03 7.200e+02
 8.280e+02 1.249e+03 8.100e+02 2.130e+02 5.850e+02 1.290e+02 4.980e+02
 1.270e+03 5.730e+02 1.410e+03 1.082e+03 2.360e+02 3.880e+02 3.340e+02
 8.740e+02 9.560e+02 7.730e+02 3.990e+02 1.620e+02 7.120e+02 6.090e+02
 3.710e+02 5.400e+02 7.200e+01 6.230e+02 4.280e+02 3.500e+02 2.980e+02
 1.445e+03 2.180e+02 9.850e+02 6.310e+02 1.280e+03 2.410e+02 6.900e+02
 2.660e+02 7.770e+02 8.120e+02 7.860e+02 1.116e+03 7.890e+02 1.056e+03
 5.000e+01 1.128e+03 7.750e+02 1.309e+03 1.246e+03 9.860e+02 6.160e+02
 1.518e+03 6.640e+02 3.870e+02 4.710e+02 3.850e+02 3.650e+02 1.767e+03
 1.330e+02 6.420e+02 2.470e+02 3.310e+02 7.420e+02 1.606e+03 9.160e+02
 1.850e+02 5.440e+02 5.530e+02 3.260e+02 7.780e+02 3.860e+02 4.260e+02
 3.680e+02 4.590e+02 1.350e+03 1.196e+03 6.300e+02 9.940e+02 1.680e+02
 1.261e+03 1.567e+03 2.990e+02 8.970e+02 6.070e+02 8.360e+02 5.150e+02
 3.740e+02 1.231e+03 1.110e+02 3.560e+02 4.000e+02 6.980e+02 1.247e+03
 2.570e+02 3.800e+02 2.700e+01 1.410e+02 9.910e+02 6.500e+02 5.210e+02
 1.436e+03 2.260e+03 7.190e+02 3.770e+02 1.330e+03 3.480e+02 1.219e+03
 7.830e+02 9.690e+02 6.730e+02 1.358e+03 1.260e+03 1.440e+02 5.840e+02
 5.540e+02 1.002e+03 6.190e+02 1.800e+02 5.590e+02 3.080e+02 8.660e+02
 8.950e+02 6.370e+02 6.040e+02 1.302e+03 1.071e+03 2.900e+02 7.280e+02
 2.000e+00 1.441e+03 9.430e+02 2.310e+02 4.140e+02 3.490e+02 4.420e+02
 3.280e+02 5.940e+02 8.160e+02 1.460e+03 1.324e+03 1.338e+03 6.850e+02
 1.422e+03 1.283e+03 8.100e+01 4.540e+02 9.030e+02 6.050e+02 9.900e+02
 2.060e+02 1.500e+02 4.570e+02 4.800e+01 8.710e+02 4.100e+01 6.740e+02
 6.240e+02 4.800e+02 1.154e+03 7.380e+02 4.930e+02 1.121e+03 2.820e+02
 5.000e+02 1.310e+02 1.696e+03 8.060e+02 1.361e+03 9.200e+02 1.721e+03
 1.870e+02 1.138e+03 9.880e+02 1.930e+02 5.510e+02 7.670e+02 1.186e+03
 8.920e+02 3.110e+02 8.270e+02 5.430e+02 1.003e+03 1.059e+03 2.390e+02
 9.450e+02 2.000e+01 1.455e+03 9.650e+02 9.800e+02 8.630e+02 5.330e+02
 1.084e+03 1.173e+03 5.230e+02 1.148e+03 1.910e+02 1.234e+03 3.750e+02
 8.080e+02 7.240e+02 1.520e+02 1.180e+03 2.520e+02 8.320e+02 5.750e+02
 9.190e+02 4.390e+02 3.810e+02 4.380e+02 5.490e+02 6.120e+02 1.163e+03
 4.370e+02 3.940e+02 1.416e+03 4.220e+02 7.620e+02 9.750e+02 1.097e+03
 2.510e+02 6.860e+02 6.560e+02 5.680e+02 5.390e+02 8.620e+02 1.970e+02
 5.160e+02 6.630e+02 6.080e+02 1.636e+03 7.840e+02 2.490e+02 1.040e+03
 4.830e+02 1.960e+02 5.720e+02 3.380e+02 3.300e+02 1.560e+02 1.390e+03
 5.130e+02 4.600e+02 6.590e+02 3.640e+02 5.640e+02 3.060e+02 5.050e+02
 9.320e+02 7.500e+02 6.400e+01 6.330e+02 1.170e+03 8.990e+02 9.020e+02
 1.238e+03 5.280e+02 1.024e+03 1.064e+03 2.850e+02 2.188e+03 4.650e+02
 3.220e+02 8.600e+02 5.990e+02 3.540e+02 6.300e+01 2.230e+02 3.010e+02
 4.430e+02 4.890e+02 2.840e+02 2.940e+02 8.140e+02 1.650e+02 5.520e+02
 8.330e+02 4.640e+02 9.360e+02 7.720e+02 1.440e+03 7.480e+02 9.820e+02
 3.980e+02 5.620e+02 4.840e+02 4.170e+02 6.990e+02 6.960e+02 8.960e+02
 5.560e+02 1.106e+03 6.510e+02 8.670e+02 8.540e+02 1.646e+03 1.074e+03
 5.360e+02 1.172e+03 9.150e+02 5.950e+02 1.237e+03 2.730e+02 6.840e+02
 3.240e+02 1.165e+03 1.380e+02 1.513e+03 3.170e+02 1.012e+03 1.022e+03
 5.090e+02 9.000e+02 1.085e+03 1.104e+03 2.400e+02 3.830e+02 6.440e+02
 3.970e+02 7.400e+02 8.370e+02 2.200e+02 5.860e+02 5.350e+02 4.100e+02
 7.500e+01 8.240e+02 5.920e+02 1.039e+03 5.100e+02 4.230e+02 6.610e+02
 2.480e+02 7.040e+02 4.120e+02 1.032e+03 2.190e+02 7.080e+02 4.150e+02
 1.004e+03 3.530e+02 7.020e+02 3.690e+02 6.220e+02 2.120e+02 6.450e+02
 8.520e+02 1.150e+03 1.258e+03 2.750e+02 1.760e+02 2.960e+02 5.380e+02
 1.157e+03 4.920e+02 1.198e+03 1.387e+03 5.220e+02 6.580e+02 1.216e+03
 1.480e+03 2.096e+03 1.159e+03 4.400e+02 1.456e+03 8.830e+02 5.470e+02
 7.880e+02 4.850e+02 3.400e+02 1.220e+03 4.270e+02 3.440e+02 7.560e+02
 1.540e+03 6.660e+02 8.030e+02 1.000e+03 8.850e+02 1.386e+03 3.190e+02
 5.340e+02 1.250e+02 1.314e+03 6.020e+02 1.920e+02 5.930e+02 8.040e+02
 1.053e+03 5.320e+02 1.158e+03 1.014e+03 1.940e+02 1.670e+02 7.760e+02
 5.644e+03 6.940e+02 1.572e+03 7.460e+02 1.406e+03 9.250e+02 4.820e+02
 1.890e+02 7.650e+02 8.000e+01 1.443e+03 2.590e+02 7.350e+02 7.340e+02
 1.447e+03 5.480e+02 3.150e+02 1.282e+03 4.080e+02 3.090e+02 2.030e+02
 8.650e+02 2.040e+02 7.900e+02 1.320e+03 7.690e+02 1.070e+03 2.640e+02
 7.590e+02 1.373e+03 9.760e+02 7.810e+02 2.500e+01 1.110e+03 4.040e+02
 5.800e+02 6.780e+02 9.580e+02 1.336e+03 1.079e+03 4.900e+01 8.300e+02
 9.230e+02 7.910e+02 2.630e+02 9.350e+02 1.051e+03 5.140e+02 1.100e+02
 1.414e+03 1.260e+02 1.129e+03 1.298e+03 3.760e+02 4.660e+02 2.440e+02
 1.137e+03 6.870e+02 1.010e+03 1.500e+03 6.700e+02 9.440e+02 1.188e+03
 8.560e+02 3.390e+02 4.810e+02 7.170e+02 5.790e+02 2.740e+02 7.800e+02
 2.830e+02 4.740e+02 4.520e+02 2.760e+02 9.600e+02 7.660e+02 1.026e+03
 7.300e+01 7.360e+02 1.319e+03 2.670e+02 1.092e+03 9.640e+02 9.540e+02
 1.346e+03 1.433e+03 8.700e+02 1.980e+02 1.682e+03 2.380e+02 3.430e+02
 7.600e+01 6.150e+02 7.800e+01 4.200e+01 4.690e+02 2.070e+02 4.580e+02
 4.760e+02 1.341e+03 8.440e+02 8.470e+02 8.500e+02 1.965e+03 7.410e+02
 3.630e+02 2.250e+02 1.333e+03 8.880e+02 6.360e+02 7.260e+02 2.540e+02
 4.350e+02 3.890e+02 2.790e+02 1.360e+03 1.232e+03 2.288e+03 1.531e+03
 1.230e+03 1.015e+03 1.037e+03 1.142e+03 1.262e+03 1.972e+03 8.810e+02
 8.760e+02 2.146e+03 1.557e+03 8.000e+02 6.520e+02 4.940e+02 6.830e+02
 9.130e+02 1.294e+03 2.158e+03 6.820e+02 1.430e+03 7.710e+02 5.400e+01
 5.200e+01 6.800e+01 8.640e+02 1.400e+02 1.733e+03 6.010e+02 9.620e+02
 1.252e+03 1.210e+02 9.550e+02 1.000e+02 1.312e+03 1.720e+02 1.550e+02
 9.310e+02 8.720e+02 7.450e+02 6.210e+02 4.330e+02 8.260e+02 1.340e+02
 1.690e+02 7.490e+02 1.152e+03 5.270e+02 3.420e+02 1.730e+02 7.000e+01
 1.094e+03 8.200e+02 1.021e+03 1.359e+03 7.550e+02 9.500e+02 6.060e+02
 1.259e+03 7.100e+02 1.111e+03 1.478e+03 3.320e+02 7.930e+02 2.460e+02
 1.540e+02 6.500e+01 1.476e+03 5.500e+01 1.758e+03 1.115e+03 1.640e+03
 1.140e+02 7.180e+02 4.960e+02 1.337e+03 1.034e+03 9.830e+02 1.206e+03
 8.900e+02 1.023e+03 1.190e+02 2.860e+02 1.728e+03 1.375e+03 1.420e+03
 2.257e+03 1.149e+03 1.075e+03 3.720e+02 1.204e+03 1.073e+03 1.087e+03
 1.660e+03 1.096e+03 7.290e+02 3.620e+02 5.370e+02 4.720e+02 5.300e+01
 7.640e+02 1.900e+02 1.027e+03 1.141e+03 6.810e+02 8.130e+02 1.280e+02
 1.044e+03 2.600e+02 5.830e+02 3.200e+01 5.310e+02 1.480e+02 7.440e+02
 9.600e+01 5.900e+02 2.000e+02 4.060e+02 1.750e+02 2.010e+02       nan
 7.580e+02 2.210e+02 6.340e+02 1.035e+03 7.790e+02 1.271e+03 3.550e+02
 2.085e+03 7.700e+02 7.220e+02 1.308e+03 6.880e+02 8.800e+01 1.194e+03
 1.538e+03 1.593e+03 1.033e+03 3.660e+02 1.474e+03 1.383e+03 8.930e+02
 1.029e+03 1.223e+03 1.011e+03 1.571e+03 3.180e+02 5.010e+02 7.850e+02
 6.380e+02 6.470e+02 8.380e+02 1.860e+02 9.260e+02 1.101e+03 1.047e+03
 7.970e+02 1.558e+03 1.328e+03 3.140e+02 9.300e+02 7.250e+02 1.151e+03
 1.304e+03 1.812e+03 1.684e+03 6.690e+02 1.178e+03 1.030e+03 8.480e+02
 9.180e+02 5.740e+02 1.181e+03 1.048e+03 3.350e+02 1.225e+03 7.270e+02
 9.680e+02 6.000e+01 9.370e+02 9.010e+02 1.732e+03 1.632e+03 9.730e+02
 9.100e+02 3.460e+02 7.920e+02 6.540e+02 1.300e+02 8.730e+02 9.080e+02
 4.410e+02 8.500e+01 2.420e+02 9.520e+02 1.098e+03 7.820e+02 1.220e+02
 3.160e+02 2.580e+02 5.870e+02 4.910e+02 4.530e+02 5.570e+02 1.080e+03
 4.970e+02 5.100e+01 5.020e+02 6.710e+02 1.412e+03 7.090e+02 1.320e+02
 4.010e+03 4.670e+02 7.700e+01 1.130e+02 5.770e+02 4.340e+02 1.001e+03
 1.392e+03 1.239e+03 9.240e+02 9.490e+02 2.150e+02 1.329e+03 1.112e+03
 7.960e+02 8.110e+02 1.090e+03 5.960e+02 1.127e+03 2.050e+02 1.191e+03
 9.510e+02 3.820e+02 3.730e+02 1.505e+03 1.290e+03 8.800e+02 1.038e+03
 1.182e+03 1.562e+03 1.836e+03 2.780e+02 1.810e+02 1.118e+03 7.600e+02
 7.990e+02 9.960e+02 9.390e+02 9.140e+02 2.710e+02 4.880e+02 7.010e+02
 4.550e+02 8.090e+02 9.530e+02 2.080e+02 1.430e+02 5.760e+02 3.470e+02
 7.940e+02 2.300e+02 2.610e+02 3.930e+02 1.576e+03 1.122e+03 8.530e+02
 4.750e+02 6.910e+02 4.240e+02 3.050e+02 5.260e+02 1.564e+03 9.090e+02
 1.136e+03 1.243e+03 1.490e+02 1.224e+03 3.370e+02]

Unique value of:>>> BsmtFinSF2 (273)
[   0.   32.  668.  486.   93.  491.  506.  712.  362.   41.  169.  869.
  150.  670.   28. 1080.  181.  768.  215.  374.  208.  441.  184.  279.
  306.  180.  580.  690.  692.  228.  125. 1063.  620.  175.  820. 1474.
  264.  479.  147.  232.  380.  544.  294.  258.  121.  391.  531.  344.
  539.  713.  210.  311. 1120.  165.  532.   96.  495.  174. 1127.  139.
  202.  645.  123.  551.  219.  606.  612.  480.  182.  132.  336.  468.
  287.   35.  499.  723.  119.   40.  117.  239.   80.  472.   64. 1057.
  127.  630.  128.  377.  764.  345. 1085.  435.  823.  500.  290.  324.
  634.  411.  841. 1061.  466.  396.  354.  149.  193.  273.  465.  400.
  682.  557.  230.  106.  791.  240.  547.  469.  177.  108.  600.  492.
  211.  168. 1031.  438.  375.  144.   81.  906.  608.  276.  661.   68.
  173.  972.  105.  420.  546.  334.  352.  872.  110.  627.  163. 1029.
   78.  859.  981.   42.   46.  162.  350.  263. 1073.   12.  159.  474.
  453.  684.  387.  688.  252.  590.  284.  622.  113. 1526.  360.  774.
  364.  596.  884.   92.  216.  136.  201.  512.  247.  483.  750.   60.
  102.   95.   63.  262.  393.  286.  450.   72.  243.  694.  875.  507.
  419.  250.  116.  624.   76.  270.  288.  186.  449.   48.  613.  852.
  555.  799.  811.  842.  382.  456.  308.   52.  196.  488.  319.   nan
  956.  120.  679.  604.  153.  619.    6.  351. 1037.  829.   38.  206.
  167.  543.  259.  404.  138.  955.  691.   66.  154.  442.  448.  227.
  398.  722.  761.  529.  522.  873.  891.  755.  321.  915.  417.  432.
  831.  278. 1020.  530.  904.  156. 1393. 1039.  497.  402.  748.  281.
  912.  373.  982.  826.  850. 1164. 1083.  337.  297.]

Unique value of:>>> BsmtFinType1 (7)
['GLQ' 'ALQ' 'Unf' 'Rec' 'BLQ' nan 'LwQ']

Unique value of:>>> BsmtFinType2 (7)
['Unf' 'BLQ' nan 'ALQ' 'Rec' 'LwQ' 'GLQ']

Unique value of:>>> BsmtFullBath (5)
[ 1.  0.  2.  3. nan]

Unique value of:>>> BsmtHalfBath (4)
[ 0.  1.  2. nan]

Unique value of:>>> BsmtQual (5)
['Gd' 'TA' 'Ex' nan 'Fa']

Unique value of:>>> BsmtUnfSF (1136)
[ 150.  284.  434. ...  129.   45. 1503.]

Unique value of:>>> CentralAir (2)
['Y' 'N']

Unique value of:>>> Condition1 (9)
['Norm' 'Feedr' 'PosN' 'Artery' 'RRAe' 'RRNn' 'RRAn' 'PosA' 'RRNe']

Unique value of:>>> Condition2 (8)
['Norm' 'Artery' 'RRNn' 'Feedr' 'PosN' 'PosA' 'RRAn' 'RRAe']

Unique value of:>>> Electrical (6)
['SBrkr' 'FuseF' 'FuseA' 'FuseP' 'Mix' nan]

Unique value of:>>> EnclosedPorch (183)
[   0  272  228  205  176   87  172  102   37  144   64  114  202  128
  156   44   77  192  140  180  183   39  184   40  552   30  126   96
   60  150  120  112  252   52  224  234  244  268  137   24  108  294
  177  218  242   91  160  130  169  105   34  248  236   32   80  115
  291  116  158  210   36  200   84  148  136  240   54  100  189  293
  164  216  239   67   90   56  129   98  143   70  386  154  185  134
  196  264  275  230  254   68  194  318   48   94  138  226  174   19
  170  220  214  280  190  330  208  145  259   81   42  123  162  286
  168   20  301  198  221  212   50   99  186  113  135  334  246   18
   41   35  364   45   86  265  222  209  260  203  432   25  238   51
  213  288  211   55   57   78   72  368  165   92   16   66  109  139
  219  101  117  204  122  231  121  207  249  290  175   26   88 1012
   43  584  133  324  161   75  167   28  104  296  256  225  429  132
   23]

Unique value of:>>> ExterCond (5)
['TA' 'Gd' 'Fa' 'Po' 'Ex']

Unique value of:>>> ExterQual (4)
['Gd' 'TA' 'Ex' 'Fa']

Unique value of:>>> Exterior1st (16)
['VinylSd' 'MetalSd' 'Wd Sdng' 'HdBoard' 'BrkFace' 'WdShing' 'CemntBd'
 'Plywood' 'AsbShng' 'Stucco' 'BrkComm' 'AsphShn' 'Stone' 'ImStucc'
 'CBlock' nan]

Unique value of:>>> Exterior2nd (17)
['VinylSd' 'MetalSd' 'Wd Shng' 'HdBoard' 'Plywood' 'Wd Sdng' 'CmentBd'
 'BrkFace' 'Stucco' 'AsbShng' 'Brk Cmn' 'ImStucc' 'AsphShn' 'Stone'
 'Other' 'CBlock' nan]

Unique value of:>>> Fireplaces (5)
[0 1 2 3 4]

Unique value of:>>> Foundation (6)
['PConc' 'CBlock' 'BrkTil' 'Wood' 'Slab' 'Stone']

Unique value of:>>> FullBath (5)
[2 1 3 0 4]

Unique value of:>>> Functional (8)
['Typ' 'Min1' 'Maj1' 'Min2' 'Mod' 'Maj2' 'Sev' nan]

Unique value of:>>> GarageArea (604)
[ 548.  460.  608.  642.  836.  480.  636.  484.  468.  205.  384.  736.
  352.  840.  576.  516.  294.  853.  280.  534.  572.  270.  890.  772.
  319.  240.  250.  271.  447.  556.  691.  672.  498.  246.    0.  440.
  308.  504.  300.  670.  826.  386.  388.  528.  894.  565.  641.  288.
  645.  852.  558.  220.  667.  360.  427.  490.  379.  297.  283.  509.
  405.  758.  461.  400.  462.  420.  432.  506.  684.  472.  366.  476.
  410.  740.  648.  273.  546.  325.  792.  450.  180.  430.  594.  390.
  540.  264.  530.  435.  453.  750.  487.  624.  471.  318.  766.  660.
  470.  720.  577.  380.  434.  866.  495.  564.  312.  625.  680.  678.
  726.  532.  216.  303.  789.  511.  616.  521.  451. 1166.  252.  497.
  682.  666.  786.  795.  856.  473.  398.  500.  349.  454.  644.  299.
  210.  431.  438.  675.  968.  721.  336.  810.  494.  457.  818.  463.
  604.  389.  538.  520.  309.  429.  673.  884.  868.  492.  413.  924.
 1053.  439.  671.  338.  573.  732.  505.  575.  626.  898.  529.  685.
  281.  539.  418.  588.  282.  375.  683.  843.  552.  870.  888.  746.
  708.  513. 1025.  656.  872.  292.  441.  189.  880.  676.  301.  474.
  706.  617.  445.  200.  592.  566.  514.  296.  244.  610.  834.  639.
  501.  846.  560.  596.  600.  373.  947.  350.  396.  864.  304.  784.
  696.  569.  628.  550.  493.  578.  198.  422.  228.  526.  525.  908.
  499.  508.  694.  874.  164.  402.  515.  286.  603.  900.  583.  889.
  858.  502.  392.  403.  527.  765.  367.  426.  615.  871.  570.  406.
  590.  612.  650. 1390.  275.  452.  842.  816.  621.  544.  486.  230.
  261.  531.  393.  774.  749.  364.  627.  260.  256.  478.  442.  562.
  512.  839.  330.  711. 1134.  416.  779.  702.  567.  832.  326.  551.
  606.  739.  408.  475.  704.  983.  768.  632.  541.  320.  800.  831.
  554.  878.  752.  614.  481.  496.  423.  841.  895.  412.  865.  630.
  605.  602.  618.  444.  397.  455.  409.  820. 1020.  598.  857.  595.
  433.  776. 1220.  458.  613.  456.  436.  812.  686.  611.  425.  343.
  479.  619.  902.  574.  523.  414.  738.  354.  483.  327.  756.  690.
  284.  833.  601.  533.  522.  788.  555.  689.  796.  808.  510.  255.
  424.  305.  368.  824.  328.  160.  437.  665.  290.  912.  905.  542.
  716.  586.  467.  582. 1248. 1043.  254.  712.  719.  862.  928.  782.
  466.  714. 1052.  225.  234.  324.  306.  830.  807.  358.  186.  693.
  482.  813.  995.  757. 1356.  459.  701.  322.  315.  668.  404.  543.
  954.  850.  477.  276.  518. 1014.  753. 1418.  213.  844.  860.  748.
  248.  287.  825.  647.  342.  770.  663.  377.  804.  936.  722.  208.
  662.  754.  622.  620.  370. 1069.  372.  923.  192.  730.  751.  958.
  962.  762.  713.  535.  517.  263.  780.  363.  365.  231.  591.  209.
 1017.  580.  399.  741.  253.  581.  345.  896.  932.  640.  927.  700.
  886.  949.  649.  394.  658.  815.  623.  972.  984.  692.  845.  559.
  465.  524.  561.  549.  907.  162.  357.  207. 1184.  316.  226.  340.
  266. 1138.  904. 1231.  195.  313.  215.  307.  295.  351.  885.  920.
  698.  557.  489. 1314.  787. 1150. 1003.  944.  428.  687.  938.  783.
  851.  545.  469.  464.  267. 1488.  401.  311.  828.  869.  355.  249.
 1348.  811.  725.  715.  814.  369.  599.  344.  356.  185.  892.  257.
  729. 1110.  724.  585.  488. 1040. 1174.  728.  916.  876.  631.  925.
  806.  933. 1092.  859.  744. 1105.  310.  293.  371. 1200.  184.  374.
  331.  224.  217.  323.  638.  332.  674.  747.  242.  597.  579. 1154.
   nan  100.  571. 1041.  963.  443.  773.  485. 1085.  899.  959.  803.
  760.  584.  449.  688.  568.  353.  791. 1008.  378.  258.  848.  317.
  646.  265.  609.  272.]

Unique value of:>>> GarageCars (7)
[ 2.  3.  1.  0.  4.  5. nan]

Unique value of:>>> GarageCond (6)
['TA' 'Fa' nan 'Gd' 'Po' 'Ex']

Unique value of:>>> GarageFinish (4)
['RFn' 'Unf' 'Fin' nan]

Unique value of:>>> GarageQual (6)
['TA' 'Fa' 'Gd' nan 'Ex' 'Po']

Unique value of:>>> GarageType (7)
['Attchd' 'Detchd' 'BuiltIn' 'CarPort' nan 'Basment' '2Types']

Unique value of:>>> GarageYrBlt (104)
[2003. 1976. 2001. 1998. 2000. 1993. 2004. 1973. 1931. 1939. 1965. 2005.
 1962. 2006. 1960. 1991. 1970. 1967. 1958. 1930. 2002. 1968. 2007. 2008.
 1957. 1920. 1966. 1959. 1995. 1954. 1953.   nan 1983. 1977. 1997. 1985.
 1963. 1981. 1964. 1999. 1935. 1990. 1945. 1987. 1989. 1915. 1956. 1948.
 1974. 2009. 1950. 1961. 1921. 1900. 1979. 1951. 1969. 1936. 1975. 1971.
 1923. 1984. 1926. 1955. 1986. 1988. 1916. 1932. 1972. 1918. 1980. 1924.
 1996. 1940. 1949. 1994. 1910. 1978. 1982. 1992. 1925. 1941. 2010. 1927.
 1947. 1937. 1942. 1938. 1952. 1928. 1922. 1934. 1906. 1914. 1946. 1908.
 1929. 1933. 1917. 1896. 1895. 2207. 1943. 1919.]

Unique value of:>>> GrLivArea (1292)
[1710 1262 1786 ... 2315  641 1778]

Unique value of:>>> HalfBath (3)
[1 0 2]

Unique value of:>>> Heating (6)
['GasA' 'GasW' 'Grav' 'Wall' 'OthW' 'Floor']

Unique value of:>>> HeatingQC (5)
['Ex' 'Gd' 'TA' 'Fa' 'Po']

Unique value of:>>> HouseStyle (8)
['2Story' '1Story' '1.5Fin' '1.5Unf' 'SFoyer' 'SLvl' '2.5Unf' '2.5Fin']

Unique value of:>>> KitchenAbvGr (4)
[1 2 3 0]

Unique value of:>>> KitchenQual (5)
['Gd' 'TA' 'Ex' 'Fa' nan]

Unique value of:>>> LandContour (4)
['Lvl' 'Bnk' 'Low' 'HLS']

Unique value of:>>> LandSlope (3)
['Gtl' 'Mod' 'Sev']

Unique value of:>>> LotArea (1951)
[ 8450  9600 11250 ...  1894 20000 10441]

Unique value of:>>> LotConfig (5)
['Inside' 'FR2' 'Corner' 'CulDSac' 'FR3']

Unique value of:>>> LotFrontage (129)
[ 65.  80.  68.  60.  84.  85.  75.  nan  51.  50.  70.  91.  72.  66.
 101.  57.  44. 110.  98.  47. 108. 112.  74. 115.  61.  48.  33.  52.
 100.  24.  89.  63.  76.  81.  95.  69.  21.  32.  78. 121. 122.  40.
 105.  73.  77.  64.  94.  34.  90.  55.  88.  82.  71. 120. 107.  92.
 134.  62.  86. 141.  97.  54.  41.  79. 174.  99.  67.  83.  43. 103.
  93.  30. 129. 140.  35.  37. 118.  87. 116. 150. 111.  49.  96.  59.
  36.  56. 102.  58.  38. 109. 130.  53. 137.  45. 106. 104.  42.  39.
 144. 114. 128. 149. 313. 168. 182. 138. 160. 152. 124. 153.  46.  26.
  25. 119.  31.  28. 117. 113. 125. 135. 136.  22. 123. 195. 155. 126.
 200. 131. 133.]

Unique value of:>>> LotShape (4)
['Reg' 'IR1' 'IR2' 'IR3']

Unique value of:>>> LowQualFinSF (36)
[   0  360  513  234  528  572  144  392  371  390  420  473  156  515
   80   53  232  481  120  514  397  479  205  384  362 1064  431  436
  259  312  108  697  512  114  140  450]

Unique value of:>>> MSSubClass (16)
[ 60  20  70  50 190  45  90 120  30  85  80 160  75 180  40 150]

Unique value of:>>> MSZoning (6)
['RL' 'RM' 'C (all)' 'FV' 'RH' nan]

Unique value of:>>> MasVnrArea (445)
[1.960e+02 0.000e+00 1.620e+02 3.500e+02 1.860e+02 2.400e+02 2.860e+02
 3.060e+02 2.120e+02 1.800e+02 3.800e+02 2.810e+02 6.400e+02 2.000e+02
 2.460e+02 1.320e+02 6.500e+02 1.010e+02 4.120e+02 2.720e+02 4.560e+02
 1.031e+03 1.780e+02 5.730e+02 3.440e+02 2.870e+02 1.670e+02 1.115e+03
 4.000e+01 1.040e+02 5.760e+02 4.430e+02 4.680e+02 6.600e+01 2.200e+01
 2.840e+02 7.600e+01 2.030e+02 6.800e+01 1.830e+02 4.800e+01 2.800e+01
 3.360e+02 6.000e+02 7.680e+02 4.800e+02 2.200e+02 1.840e+02 1.129e+03
 1.160e+02 1.350e+02 2.660e+02 8.500e+01 3.090e+02 1.360e+02 2.880e+02
 7.000e+01 3.200e+02 5.000e+01 1.200e+02 4.360e+02 2.520e+02 8.400e+01
 6.640e+02 2.260e+02 3.000e+02 6.530e+02 1.120e+02 4.910e+02 2.680e+02
 7.480e+02 9.800e+01 2.750e+02 1.380e+02 2.050e+02 2.620e+02 1.280e+02
 2.600e+02 1.530e+02 6.400e+01 3.120e+02 1.600e+01 9.220e+02 1.420e+02
 2.900e+02 1.270e+02 5.060e+02 2.970e+02       nan 6.040e+02 2.540e+02
 3.600e+01 1.020e+02 4.720e+02 4.810e+02 1.080e+02 3.020e+02 1.720e+02
 3.990e+02 2.700e+02 4.600e+01 2.100e+02 1.740e+02 3.480e+02 3.150e+02
 2.990e+02 3.400e+02 1.660e+02 7.200e+01 3.100e+01 3.400e+01 2.380e+02
 1.600e+03 3.650e+02 5.600e+01 1.500e+02 2.780e+02 2.560e+02 2.250e+02
 3.700e+02 3.880e+02 1.750e+02 2.960e+02 1.460e+02 1.130e+02 1.760e+02
 6.160e+02 3.000e+01 1.060e+02 8.700e+02 3.620e+02 5.300e+02 5.000e+02
 5.100e+02 2.470e+02 3.050e+02 2.550e+02 1.250e+02 1.000e+02 4.320e+02
 1.260e+02 4.730e+02 7.400e+01 1.450e+02 2.320e+02 3.760e+02 4.200e+01
 1.610e+02 1.100e+02 1.800e+01 2.240e+02 2.480e+02 8.000e+01 3.040e+02
 2.150e+02 7.720e+02 4.350e+02 3.780e+02 5.620e+02 1.680e+02 8.900e+01
 2.850e+02 3.600e+02 9.400e+01 3.330e+02 9.210e+02 7.620e+02 5.940e+02
 2.190e+02 1.880e+02 4.790e+02 5.840e+02 1.820e+02 2.500e+02 2.920e+02
 2.450e+02 2.070e+02 8.200e+01 9.700e+01 3.350e+02 2.080e+02 4.200e+02
 1.700e+02 4.590e+02 2.800e+02 9.900e+01 1.920e+02 2.040e+02 2.330e+02
 1.560e+02 4.520e+02 5.130e+02 2.610e+02 1.640e+02 2.590e+02 2.090e+02
 2.630e+02 2.160e+02 3.510e+02 6.600e+02 3.810e+02 5.400e+01 5.280e+02
 2.580e+02 4.640e+02 5.700e+01 1.470e+02 1.170e+03 2.930e+02 6.300e+02
 4.660e+02 1.090e+02 4.100e+01 1.600e+02 2.890e+02 6.510e+02 1.690e+02
 9.500e+01 4.420e+02 2.020e+02 3.380e+02 8.940e+02 3.280e+02 6.730e+02
 6.030e+02 1.000e+00 3.750e+02 9.000e+01 3.800e+01 1.570e+02 1.100e+01
 1.400e+02 1.300e+02 1.480e+02 8.600e+02 4.240e+02 1.047e+03 2.430e+02
 8.160e+02 3.870e+02 2.230e+02 1.580e+02 1.370e+02 1.150e+02 1.890e+02
 2.740e+02 1.170e+02 6.000e+01 1.220e+02 9.200e+01 4.150e+02 7.600e+02
 2.700e+01 7.500e+01 3.610e+02 1.050e+02 3.420e+02 2.980e+02 5.410e+02
 2.360e+02 1.440e+02 4.230e+02 4.400e+01 1.510e+02 9.750e+02 4.500e+02
 2.300e+02 5.710e+02 2.400e+01 5.300e+01 2.060e+02 1.400e+01 3.240e+02
 2.950e+02 3.960e+02 6.700e+01 1.540e+02 4.250e+02 4.500e+01 1.378e+03
 3.370e+02 1.490e+02 1.430e+02 5.100e+01 1.710e+02 2.340e+02 6.300e+01
 7.660e+02 3.200e+01 8.100e+01 1.630e+02 5.540e+02 2.180e+02 6.320e+02
 1.140e+02 5.670e+02 3.590e+02 4.510e+02 6.210e+02 7.880e+02 8.600e+01
 7.960e+02 3.910e+02 2.280e+02 8.800e+01 1.650e+02 4.280e+02 4.100e+02
 5.640e+02 3.680e+02 3.180e+02 5.790e+02 6.500e+01 7.050e+02 4.080e+02
 2.440e+02 1.230e+02 3.660e+02 7.310e+02 4.480e+02 2.940e+02 3.100e+02
 2.370e+02 4.260e+02 9.600e+01 4.380e+02 1.940e+02 1.190e+02 2.000e+01
 5.040e+02 4.920e+02 6.150e+02 1.095e+03 1.159e+03 2.650e+02 9.100e+01
 7.710e+02 4.700e+01 1.770e+02 3.710e+02 4.300e+02 4.400e+02 2.290e+02
 7.260e+02 4.180e+02 7.240e+02 3.830e+02 7.300e+02 4.700e+02 3.080e+02
 6.340e+02 3.720e+02 1.980e+02 1.210e+02 2.640e+02 1.410e+02 2.830e+02
 5.090e+02 2.170e+02 3.000e+00 6.570e+02 1.240e+02 4.440e+02 2.300e+01
 2.420e+02 3.640e+02 3.520e+02 4.060e+02 4.020e+02 4.220e+02 3.560e+02
 6.800e+02 1.110e+03 2.210e+02 7.140e+02 6.470e+02 1.290e+03 4.950e+02
 5.680e+02 1.790e+02 1.050e+03 1.870e+02 5.200e+01 2.760e+02 3.900e+01
 1.900e+02 2.510e+02 2.270e+02 1.340e+02 2.220e+02 5.800e+01 6.680e+02
 6.740e+02 1.970e+02 7.100e+02 9.450e+02 5.490e+02 2.530e+02 4.000e+02
 9.700e+02 5.020e+02 3.940e+02 2.350e+02 5.150e+02 5.260e+02 7.540e+02
 3.530e+02 5.250e+02 8.700e+01 2.910e+02 6.900e+01 2.790e+02 3.230e+02
 2.140e+02 5.190e+02 1.224e+03 6.520e+02 8.860e+02 9.020e+02 4.340e+02
 6.620e+02 7.340e+02 5.500e+02 5.140e+02 3.850e+02 5.180e+02 5.720e+02
 3.220e+02 8.770e+02 3.970e+02 7.380e+02 5.010e+02 1.180e+02 6.920e+02
 3.320e+02 5.220e+02 3.790e+02 5.320e+02 6.200e+01 1.990e+02 3.550e+02
 4.050e+02 3.270e+02 2.570e+02 3.820e+02]

Unique value of:>>> MasVnrType (5)
['BrkFace' 'None' 'Stone' 'BrkCmn' nan]

Unique value of:>>> MiscVal (38)
[    0   700   350   500   400   480   450 15500  1200   800  2000   600
  3500  1300    54   620   560  1400  8300  1150  2500 12500  1500   300
    80   490   650   900   750  6500  1000  4500  3000 17000  1512   455
   460   420]

Unique value of:>>> MoSold (12)
[ 2  5  9 12 10  8 11  4  1  7  3  6]

Unique value of:>>> Neighborhood (25)
['CollgCr' 'Veenker' 'Crawfor' 'NoRidge' 'Mitchel' 'Somerst' 'NWAmes'
 'OldTown' 'BrkSide' 'Sawyer' 'NridgHt' 'NAmes' 'SawyerW' 'IDOTRR'
 'MeadowV' 'Edwards' 'Timber' 'Gilbert' 'StoneBr' 'ClearCr' 'NPkVill'
 'Blmngtn' 'BrDale' 'SWISU' 'Blueste']

Unique value of:>>> OpenPorchSF (252)
[ 61   0  42  35  84  30  57 204   4  21  33 213 112 102 154 159 110  90
  56  32  50 258  54  65  38  47  64  52 138 104  82  43 146  75  72  70
  49  11  36 151  29  94 101 199  99 234 162  63  68  46  45 122 184 120
  20  24 130 205 108  80  66  48  25  96 111 106  40 114   8 136 132  62
 228  60 238 260  27  74  16 198  26  83  34  55  22  98 172 119 208 105
 140 168  28  39 148  12  51 150 117 250  10  81  44 144 175 195 128  76
  17  59 214 121  53 231 134 192 123  78 187  85 133 176 113 137 125 523
 100 285  88 406 155  73 182 502 274 158 142 243 235 312 124 267 265  87
 288  23 152 341 116 160 174 247 291  18 170 156 166 129 418 240  77 364
 188 207  67  69 131 191  41 118 252 189 282 135  95 224 169 319  58  93
 244 185 200  92 180 263 304 229 103 211 287 292 241 547  91  86 262 210
 141  15 126 236 278 197 273 190 183 165 226 178 177 254 215 222 193 201
 173 153 251 230 299 365 139 216  89 372 217 276 164 368 203 127 256 194
 324 171 570 484 742 444 266  97  37 246  31 382   6 115 253 245 107 225]

Unique value of:>>> OverallCond (9)
[5 8 6 7 4 2 3 9 1]

Unique value of:>>> OverallQual (10)
[ 7  6  8  5  9  4 10  3  1  2]

Unique value of:>>> PavedDrive (3)
['Y' 'N' 'P']

Unique value of:>>> PoolArea (14)
[  0 512 648 576 555 480 519 738 144 368 444 228 561 800]

Unique value of:>>> RoofMatl (8)
['CompShg' 'WdShngl' 'Metal' 'WdShake' 'Membran' 'Tar&Grv' 'Roll'
 'ClyTile']

Unique value of:>>> RoofStyle (6)
['Gable' 'Hip' 'Gambrel' 'Mansard' 'Flat' 'Shed']

Unique value of:>>> SaleCondition (6)
['Normal' 'Abnorml' 'Partial' 'AdjLand' 'Alloca' 'Family']

Unique value of:>>> SaleType (10)
['WD' 'New' 'COD' 'ConLD' 'ConLI' 'CWD' 'ConLw' 'Con' 'Oth' nan]

Unique value of:>>> ScreenPorch (121)
[  0 176 198 291 252  99 184 168 130 142 192 410 224 266 170 154 153 144
 128 259 160 271 234 374 185 182  90 396 140 276 180 161 145 200 122  95
 120  60 126 189 260 147 385 287 156 100 216 210 197 204 225 152 175 312
 222 265 322 190 233  63  53 143 273 288 263  80 163 116 480 178 440 155
 220 119 165  40 256 240 148 166 108 490 196 121  92 342 255 111 112 231
 110 117 195 115 141 208  94 164  64 576 227 221 171 135 174 217 201 109
 150  84 228 138  88 280 123 264 270 162 348 113 104]

Unique value of:>>> Street (2)
['Pave' 'Grvl']

Unique value of:>>> TotRmsAbvGrd (14)
[ 8  6  7  9  5 11  4 10 12  3  2 14 13 15]

Unique value of:>>> TotalBsmtSF (1059)
[ 856. 1262.  920. ...  498.  432. 1381.]

Unique value of:>>> Utilities (3)
['AllPub' 'NoSeWa' nan]

Unique value of:>>> WoodDeckSF (379)
[   0  298  192   40  255  235   90  147  140  160   48  240  171  100
  406  222  288   49  203  113  392  145  196  168  112  106  857  115
  120   12  576  301  144  300   74  127  232  158  352  182  180  166
  224   80  367   53  188  105   24   98  276  200  409  239  400  476
  178  574  237  210  441  116  280  104   87  132  238  149  355   60
  139  108  351  209  216  248  143  365  370   58  197  263  123  138
  333  250  292   95  262   81  289  124  172  110  208  468  256  302
  190  340  233  184  201  142  122  155  670  135  495  536  306   64
  364  353   66  159  146  296  125   44  215  264   88   89   96  414
  519  206  141  260  324  156  220   38  261  126   85  466  270   78
  169  320  268   72  349   42   35  326  382  161  179  103  253  148
  335  176  390  328  312  185  269  195   57  236  517  304  198  426
   28  316  322  307  257  219  416  344  380   68  114  327  165  187
  181   92  228  245  503  315  241  303  133  403   36   52  265  207
  150  290  486  278   70  418  234   26  342   97  272  121  243  511
  154  164  173  384  202   56  321   86  194  421  305  117  550  509
  153  394  371   63  252  136  186  170  474  214  199  728  436   55
  431  448  361  362  162  229  439  379  356   84  635  325   33  212
  314  242  294   30  128   45  177  227  218  309  404  500  668  402
  283  183  175  586  295   32  366  736  393  360  157  483  275   23
  277  657   51   54  221  226  496  336  450   71  331  375  174   22
  287  129  225  319   99  230  231  297  205  462  502  501  266  244
  189  131   73  329  279  467  119  308  152   16  411  358  385   20
   25  490   76  204  311  102   50  424  339  211  259  134  213  318
  428  282  167  407  130  460  286  193  455  284  285   14  521  646
  386  405  546  118  291  274 1424  690  330  246  444  354  247  870
  432    4  641   94  191   75  631  345  520   27   77  684  453  413
  530]

Unique value of:>>> YearBuilt (118)
[2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 1965 2005 1962 2006
 1960 1929 1970 1967 1958 1930 2002 1968 2007 1951 1957 1927 1920 1966
 1959 1994 1954 1953 1955 1983 1975 1997 1934 1963 1981 1964 1999 1972
 1921 1945 1982 1998 1956 1948 1910 1995 1991 2009 1950 1961 1977 1985
 1979 1885 1919 1990 1969 1935 1988 1971 1952 1936 1923 1924 1984 1926
 1940 1941 1987 1986 2008 1908 1892 1916 1932 1918 1912 1947 1925 1900
 1980 1989 1992 1949 1880 1928 1978 1922 1996 2010 1946 1913 1937 1942
 1938 1974 1893 1914 1906 1890 1898 1904 1882 1875 1911 1917 1872 1905
 1907 1896 1902 1895 1879 1901]

Unique value of:>>> YearRemodAdd (61)
[2003 1976 2002 1970 2000 1995 2005 1973 1950 1965 2006 1962 2007 1960
 2001 1967 2004 2008 1997 1959 1990 1955 1983 1980 1966 1963 1987 1964
 1972 1996 1998 1989 1953 1956 1968 1981 1992 2009 1982 1961 1993 1999
 1985 1979 1977 1969 1958 1991 1971 1952 1975 2010 1984 1986 1994 1988
 1954 1957 1951 1978 1974]

Unique value of:>>> YrSold (5)
[2008 2007 2006 2009 2010]

# Describe the target 
train["SalePrice"].describe()

count      1460.000000
mean     180921.195890
std       79442.502883
min       34900.000000
25%      129975.000000
50%      163000.000000
75%      214000.000000
max      755000.000000
Name: SalePrice, dtype: float64

# Plot the distplot of target
plt.figure(figsize=(10,8))
bar = sns.distplot(train["SalePrice"])
bar.legend(["Skewness: {:.2f}".format(train['SalePrice'].skew())])

<matplotlib.legend.Legend at 0x1bb7d6acbe0>

# correlation heatmap
plt.figure(figsize=(25,25))
ax = sns.heatmap(train.corr(), cmap = "coolwarm", annot=True, linewidth=2)

# to fix the bug "first and last row cut in half of heatmap plot"
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

(38.0, 0.0)

# correlation heatmap of higly correlated features with SalePrice
hig_corr = train.corr()
hig_corr_features = hig_corr.index[abs(hig_corr["SalePrice"]) >= 0.5]
hig_corr_features

Index(['OverallQual', 'YearBuilt', 'YearRemodAdd', 'TotalBsmtSF', '1stFlrSF',
       'GrLivArea', 'FullBath', 'TotRmsAbvGrd', 'GarageCars', 'GarageArea',
       'SalePrice'],
      dtype='object')

plt.figure(figsize=(10,8))
ax = sns.heatmap(train[hig_corr_features].corr(), cmap = "coolwarm", annot=True, linewidth=3)
# to fix the bug "first and last row cut in half of heatmap plot"
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

(11.0, 0.0)

# Plot regplot to get the nature of highly correlated data
plt.figure(figsize=(16,9))
for i in range(len(hig_corr_features)):
    if i <= 9:
        plt.subplot(3,4,i+1)
        plt.subplots_adjust(hspace = 0.5, wspace = 0.5)
        sns.regplot(data=train, x = hig_corr_features[i], y = 'SalePrice')

Handling Missing Value¶

missing_col = df.columns[df.isnull().any()]
missing_col

Index(['BsmtCond', 'BsmtExposure', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtFinType1',
       'BsmtFinType2', 'BsmtFullBath', 'BsmtHalfBath', 'BsmtQual', 'BsmtUnfSF',
       'Electrical', 'Exterior1st', 'Exterior2nd', 'Functional', 'GarageArea',
       'GarageCars', 'GarageCond', 'GarageFinish', 'GarageQual', 'GarageType',
       'GarageYrBlt', 'KitchenQual', 'LotFrontage', 'MSZoning', 'MasVnrArea',
       'MasVnrType', 'SaleType', 'TotalBsmtSF', 'Utilities'],
      dtype='object')

Handling missing value of Bsmt feature¶

bsmt_col = ['BsmtCond', 'BsmtExposure', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtFinType1',
       'BsmtFinType2', 'BsmtFullBath', 'BsmtHalfBath', 'BsmtQual', 'BsmtUnfSF', 'TotalBsmtSF']
bsmt_feat = df[bsmt_col]
bsmt_feat

bsmt_feat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2919 entries, 1 to 2919
Data columns (total 11 columns):
BsmtCond        2837 non-null object
BsmtExposure    2837 non-null object
BsmtFinSF1      2918 non-null float64
BsmtFinSF2      2918 non-null float64
BsmtFinType1    2840 non-null object
BsmtFinType2    2839 non-null object
BsmtFullBath    2917 non-null float64
BsmtHalfBath    2917 non-null float64
BsmtQual        2838 non-null object
BsmtUnfSF       2918 non-null float64
TotalBsmtSF     2918 non-null float64
dtypes: float64(6), object(5)
memory usage: 273.7+ KB

bsmt_feat.isnull().sum()

BsmtCond        82
BsmtExposure    82
BsmtFinSF1       1
BsmtFinSF2       1
BsmtFinType1    79
BsmtFinType2    80
BsmtFullBath     2
BsmtHalfBath     2
BsmtQual        81
BsmtUnfSF        1
TotalBsmtSF      1
dtype: int64

bsmt_feat = bsmt_feat[bsmt_feat.isnull().any(axis=1)]
bsmt_feat

bsmt_feat_all_nan = bsmt_feat[(bsmt_feat.isnull() | bsmt_feat.isin([0])).all(1)]
bsmt_feat_all_nan

bsmt_feat_all_nan.shape

(79, 11)

qual = list(df.loc[:, df.dtypes == 'object'].columns.values)
qual

['BldgType',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinType1',
 'BsmtFinType2',
 'BsmtQual',
 'CentralAir',
 'Condition1',
 'Condition2',
 'Electrical',
 'ExterCond',
 'ExterQual',
 'Exterior1st',
 'Exterior2nd',
 'Foundation',
 'Functional',
 'GarageCond',
 'GarageFinish',
 'GarageQual',
 'GarageType',
 'Heating',
 'HeatingQC',
 'HouseStyle',
 'KitchenQual',
 'LandContour',
 'LandSlope',
 'LotConfig',
 'LotShape',
 'MSZoning',
 'MasVnrType',
 'Neighborhood',
 'PavedDrive',
 'RoofMatl',
 'RoofStyle',
 'SaleCondition',
 'SaleType',
 'Street',
 'Utilities']

# Fillinf the mising value in bsmt features
for i in bsmt_col:
    if i in qual:
        bsmt_feat_all_nan[i] = bsmt_feat_all_nan[i].replace(np.nan, 'NA') # replace the NAN value by 'NA'
    else:
        bsmt_feat_all_nan[i] = bsmt_feat_all_nan[i].replace(np.nan, 0) # replace the NAN value inplace of 0

bsmt_feat.update(bsmt_feat_all_nan) # update bsmt_feat df by bsmt_feat_all_nan
df.update(bsmt_feat_all_nan) # update df by bsmt_feat_all_nan

"""
>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6
"""

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:5819: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[col] = expressions.where(mask, this, that)

"\n>>> df = pd.DataFrame({'A': [1, 2, 3],\n...                    'B': [400, 500, 600]})\n>>> new_df = pd.DataFrame({'B': [4, 5, 6],\n...                        'C': [7, 8, 9]})\n>>> df.update(new_df)\n>>> df\n   A  B\n0  1  4\n1  2  5\n2  3  6\n"

bsmt_feat = bsmt_feat[bsmt_feat.isin([np.nan]).any(axis=1)]
bsmt_feat

bsmt_feat.shape

(9, 11)

print(df['BsmtFinSF2'].max())
print(df['BsmtFinSF2'].min())

1526.0
0.0

pd.cut(range(0,1526),5) # create a bucket

[(-1.525, 305.0], (-1.525, 305.0], (-1.525, 305.0], (-1.525, 305.0], (-1.525, 305.0], ..., (1220.0, 1525.0], (1220.0, 1525.0], (1220.0, 1525.0], (1220.0, 1525.0], (1220.0, 1525.0]]
Length: 1526
Categories (5, interval[float64]): [(-1.525, 305.0] < (305.0, 610.0] < (610.0, 915.0] < (915.0, 1220.0] < (1220.0, 1525.0]]

df_slice = df[(df['BsmtFinSF2'] >= 305) & (df['BsmtFinSF2'] <= 610)]
df_slice

bsmt_feat.at[333,'BsmtFinType2'] = df_slice['BsmtFinType2'].mode()[0] # replace NAN value of BsmtFinType2 by mode of buet ((305.0, 610.0)

bsmt_feat

bsmt_feat['BsmtExposure'] = bsmt_feat['BsmtExposure'].replace(np.nan, df[df['BsmtQual'] =='Gd']['BsmtExposure'].mode()[0])

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

bsmt_feat['BsmtCond'] = bsmt_feat['BsmtCond'].replace(np.nan, df['BsmtCond'].mode()[0])
bsmt_feat['BsmtQual'] = bsmt_feat['BsmtQual'].replace(np.nan, df['BsmtQual'].mode()[0])

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df.update(bsmt_feat)

bsmt_feat.isnull().sum()

BsmtCond        0
BsmtExposure    0
BsmtFinSF1      0
BsmtFinSF2      0
BsmtFinType1    0
BsmtFinType2    0
BsmtFullBath    0
BsmtHalfBath    0
BsmtQual        0
BsmtUnfSF       0
TotalBsmtSF     0
dtype: int64

Handling missing value of Garage feature¶

df.columns[df.isnull().any()]

Index(['Electrical', 'Exterior1st', 'Exterior2nd', 'Functional', 'GarageArea',
       'GarageCars', 'GarageCond', 'GarageFinish', 'GarageQual', 'GarageType',
       'GarageYrBlt', 'KitchenQual', 'LotFrontage', 'MSZoning', 'MasVnrArea',
       'MasVnrType', 'SaleType', 'Utilities'],
      dtype='object')

garage_col = ['GarageArea', 'GarageCars', 'GarageCond', 'GarageFinish', 'GarageQual', 'GarageType', 'GarageYrBlt',]
garage_feat = df[garage_col]
garage_feat = garage_feat[garage_feat.isnull().any(axis=1)]
garage_feat

garage_feat.shape

(159, 7)

garage_feat_all_nan = garage_feat[(garage_feat.isnull() | garage_feat.isin([0])).all(1)]
garage_feat_all_nan.shape

(157, 7)

for i in garage_feat:
    if i in qual:
        garage_feat_all_nan[i] = garage_feat_all_nan[i].replace(np.nan, 'NA')
    else:
        garage_feat_all_nan[i] = garage_feat_all_nan[i].replace(np.nan, 0)
        
garage_feat.update(garage_feat_all_nan)
df.update(garage_feat_all_nan)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until

garage_feat = garage_feat[garage_feat.isnull().any(axis=1)]
garage_feat

for i in garage_col:
    garage_feat[i] = garage_feat[i].replace(np.nan, df[df['GarageType'] == 'Detchd'][i].mode()[0])

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

garage_feat.isnull().any()

GarageArea      False
GarageCars      False
GarageCond      False
GarageFinish    False
GarageQual      False
GarageType      False
GarageYrBlt     False
dtype: bool

df.update(garage_feat)

Handling missing value of remain feature¶

df.columns[df.isnull().any()]

Index(['Electrical', 'Exterior1st', 'Exterior2nd', 'Functional', 'KitchenQual',
       'LotFrontage', 'MSZoning', 'MasVnrArea', 'MasVnrType', 'SaleType',
       'Utilities'],
      dtype='object')

df['Electrical'] = df['Electrical'].fillna(df['Electrical'].mode()[0])
df['Exterior1st'] = df['Exterior1st'].fillna(df['Exterior1st'].mode()[0])
df['Exterior2nd'] = df['Exterior2nd'].fillna(df['Exterior2nd'].mode()[0])
df['Functional'] = df['Functional'].fillna(df['Functional'].mode()[0])
df['KitchenQual'] = df['KitchenQual'].fillna(df['KitchenQual'].mode()[0])
df['MSZoning'] = df['MSZoning'].fillna(df['MSZoning'].mode()[0])
df['SaleType'] = df['SaleType'].fillna(df['SaleType'].mode()[0])
df['Utilities'] = df['Utilities'].fillna(df['Utilities'].mode()[0])
df['MasVnrType'] = df['MasVnrType'].fillna(df['MasVnrType'].mode()[0])

df.columns[df.isnull().any()]

Index(['LotFrontage', 'MasVnrArea'], dtype='object')

df[df['MasVnrArea'].isnull() == True]['MasVnrType'].unique()

array(['None'], dtype=object)

df.loc[(df['MasVnrType'] == 'None') & (df['MasVnrArea'].isnull() == True), 'MasVnrArea'] = 0

df.isnull().sum()/df.shape[0] * 100

1stFlrSF          0.000000
2ndFlrSF          0.000000
3SsnPorch         0.000000
BedroomAbvGr      0.000000
BldgType          0.000000
BsmtCond          0.000000
BsmtExposure      0.000000
BsmtFinSF1        0.000000
BsmtFinSF2        0.000000
BsmtFinType1      0.000000
BsmtFinType2      0.000000
BsmtFullBath      0.000000
BsmtHalfBath      0.000000
BsmtQual          0.000000
BsmtUnfSF         0.000000
CentralAir        0.000000
Condition1        0.000000
Condition2        0.000000
Electrical        0.000000
EnclosedPorch     0.000000
ExterCond         0.000000
ExterQual         0.000000
Exterior1st       0.000000
Exterior2nd       0.000000
Fireplaces        0.000000
Foundation        0.000000
FullBath          0.000000
Functional        0.000000
GarageArea        0.000000
GarageCars        0.000000
GarageCond        0.000000
GarageFinish      0.000000
GarageQual        0.000000
GarageType        0.000000
GarageYrBlt       0.000000
GrLivArea         0.000000
HalfBath          0.000000
Heating           0.000000
HeatingQC         0.000000
HouseStyle        0.000000
KitchenAbvGr      0.000000
KitchenQual       0.000000
LandContour       0.000000
LandSlope         0.000000
LotArea           0.000000
LotConfig         0.000000
LotFrontage      16.649538
LotShape          0.000000
LowQualFinSF      0.000000
MSSubClass        0.000000
MSZoning          0.000000
MasVnrArea        0.000000
MasVnrType        0.000000
MiscVal           0.000000
MoSold            0.000000
Neighborhood      0.000000
OpenPorchSF       0.000000
OverallCond       0.000000
OverallQual       0.000000
PavedDrive        0.000000
PoolArea          0.000000
RoofMatl          0.000000
RoofStyle         0.000000
SaleCondition     0.000000
SaleType          0.000000
ScreenPorch       0.000000
Street            0.000000
TotRmsAbvGrd      0.000000
TotalBsmtSF       0.000000
Utilities         0.000000
WoodDeckSF        0.000000
YearBuilt         0.000000
YearRemodAdd      0.000000
YrSold            0.000000
dtype: float64

Handling missing value of LotFrontage feature¶

lotconfig = ['Corner', 'Inside', 'CulDSac', 'FR2', 'FR3']
for i in lotconfig:
    df['LotFrontage'] = pd.np.where((df['LotFrontage'].isnull() == True) & (df['LotConfig'] == i) , df[df['LotConfig'] == i] ['LotFrontage'].mean(), df['LotFrontage'])

df.isnull().sum()

1stFlrSF         0
2ndFlrSF         0
3SsnPorch        0
BedroomAbvGr     0
BldgType         0
BsmtCond         0
BsmtExposure     0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtFinType1     0
BsmtFinType2     0
BsmtFullBath     0
BsmtHalfBath     0
BsmtQual         0
BsmtUnfSF        0
CentralAir       0
Condition1       0
Condition2       0
Electrical       0
EnclosedPorch    0
ExterCond        0
ExterQual        0
Exterior1st      0
Exterior2nd      0
Fireplaces       0
Foundation       0
FullBath         0
Functional       0
GarageArea       0
GarageCars       0
GarageCond       0
GarageFinish     0
GarageQual       0
GarageType       0
GarageYrBlt      0
GrLivArea        0
HalfBath         0
Heating          0
HeatingQC        0
HouseStyle       0
KitchenAbvGr     0
KitchenQual      0
LandContour      0
LandSlope        0
LotArea          0
LotConfig        0
LotFrontage      0
LotShape         0
LowQualFinSF     0
MSSubClass       0
MSZoning         0
MasVnrArea       0
MasVnrType       0
MiscVal          0
MoSold           0
Neighborhood     0
OpenPorchSF      0
OverallCond      0
OverallQual      0
PavedDrive       0
PoolArea         0
RoofMatl         0
RoofStyle        0
SaleCondition    0
SaleType         0
ScreenPorch      0
Street           0
TotRmsAbvGrd     0
TotalBsmtSF      0
Utilities        0
WoodDeckSF       0
YearBuilt        0
YearRemodAdd     0
YrSold           0
dtype: int64

Feature Transformation¶

df.columns

Index(['1stFlrSF', '2ndFlrSF', '3SsnPorch', 'BedroomAbvGr', 'BldgType',
       'BsmtCond', 'BsmtExposure', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtFinType1',
       'BsmtFinType2', 'BsmtFullBath', 'BsmtHalfBath', 'BsmtQual', 'BsmtUnfSF',
       'CentralAir', 'Condition1', 'Condition2', 'Electrical', 'EnclosedPorch',
       'ExterCond', 'ExterQual', 'Exterior1st', 'Exterior2nd', 'Fireplaces',
       'Foundation', 'FullBath', 'Functional', 'GarageArea', 'GarageCars',
       'GarageCond', 'GarageFinish', 'GarageQual', 'GarageType', 'GarageYrBlt',
       'GrLivArea', 'HalfBath', 'Heating', 'HeatingQC', 'HouseStyle',
       'KitchenAbvGr', 'KitchenQual', 'LandContour', 'LandSlope', 'LotArea',
       'LotConfig', 'LotFrontage', 'LotShape', 'LowQualFinSF', 'MSSubClass',
       'MSZoning', 'MasVnrArea', 'MasVnrType', 'MiscVal', 'MoSold',
       'Neighborhood', 'OpenPorchSF', 'OverallCond', 'OverallQual',
       'PavedDrive', 'PoolArea', 'RoofMatl', 'RoofStyle', 'SaleCondition',
       'SaleType', 'ScreenPorch', 'Street', 'TotRmsAbvGrd', 'TotalBsmtSF',
       'Utilities', 'WoodDeckSF', 'YearBuilt', 'YearRemodAdd', 'YrSold'],
      dtype='object')

# converting columns in str which have categorical nature but in int64
feat_dtype_convert = ['MSSubClass', 'YearBuilt', 'YearRemodAdd', 'GarageYrBlt', 'YrSold']
for i in feat_dtype_convert:
    df[i] = df[i].astype(str)

df['MoSold'].unique() # MoSold = Month of sold

array([ 2,  5,  9, 12, 10,  8, 11,  4,  1,  7,  3,  6], dtype=int64)

# conver in month abbrevation
import calendar
df['MoSold'] = df['MoSold'].apply(lambda x : calendar.month_abbr[x])

df['MoSold'].unique()

array(['Feb', 'May', 'Sep', 'Dec', 'Oct', 'Aug', 'Nov', 'Apr', 'Jan',
       'Jul', 'Mar', 'Jun'], dtype=object)

quan = list(df.loc[:, df.dtypes != 'object'].columns.values)

quan

['1stFlrSF',
 '2ndFlrSF',
 '3SsnPorch',
 'BedroomAbvGr',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtFullBath',
 'BsmtHalfBath',
 'BsmtUnfSF',
 'EnclosedPorch',
 'Fireplaces',
 'FullBath',
 'GarageArea',
 'GarageCars',
 'GrLivArea',
 'HalfBath',
 'KitchenAbvGr',
 'LotArea',
 'LotFrontage',
 'LowQualFinSF',
 'MasVnrArea',
 'MiscVal',
 'OpenPorchSF',
 'OverallCond',
 'OverallQual',
 'PoolArea',
 'ScreenPorch',
 'TotRmsAbvGrd',
 'TotalBsmtSF',
 'WoodDeckSF']

len(quan)

30

obj_feat = list(df.loc[:, df.dtypes == 'object'].columns.values)
obj_feat

['BldgType',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinType1',
 'BsmtFinType2',
 'BsmtQual',
 'CentralAir',
 'Condition1',
 'Condition2',
 'Electrical',
 'ExterCond',
 'ExterQual',
 'Exterior1st',
 'Exterior2nd',
 'Foundation',
 'Functional',
 'GarageCond',
 'GarageFinish',
 'GarageQual',
 'GarageType',
 'GarageYrBlt',
 'Heating',
 'HeatingQC',
 'HouseStyle',
 'KitchenQual',
 'LandContour',
 'LandSlope',
 'LotConfig',
 'LotShape',
 'MSSubClass',
 'MSZoning',
 'MasVnrType',
 'MoSold',
 'Neighborhood',
 'PavedDrive',
 'RoofMatl',
 'RoofStyle',
 'SaleCondition',
 'SaleType',
 'Street',
 'Utilities',
 'YearBuilt',
 'YearRemodAdd',
 'YrSold']

Conver categorical code into order¶

from pandas.api.types import CategoricalDtype
df['BsmtCond'] = df['BsmtCond'].astype(CategoricalDtype(categories=['NA', 'Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes

df['BsmtCond'].unique()

array([3, 4, 0, 2, 1], dtype=int64)

df['BsmtExposure'] = df['BsmtExposure'].astype(CategoricalDtype(categories=['NA', 'Mn', 'Av', 'Gd'], ordered = True)).cat.codes

df['BsmtExposure'].unique()

array([-1,  3,  1,  2,  0], dtype=int64)

df['BsmtFinType1'] = df['BsmtFinType1'].astype(CategoricalDtype(categories=['NA', 'Unf', 'LwQ', 'Rec', 'BLQ','ALQ', 'GLQ'], ordered = True)).cat.codes
df['BsmtFinType2'] = df['BsmtFinType2'].astype(CategoricalDtype(categories=['NA', 'Unf', 'LwQ', 'Rec', 'BLQ','ALQ', 'GLQ'], ordered = True)).cat.codes
df['BsmtQual'] = df['BsmtQual'].astype(CategoricalDtype(categories=['NA', 'Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['ExterQual'] = df['ExterQual'].astype(CategoricalDtype(categories=['Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['ExterCond'] = df['ExterCond'].astype(CategoricalDtype(categories=['Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['Functional'] = df['Functional'].astype(CategoricalDtype(categories=['Sal', 'Sev', 'Maj2', 'Maj1', 'Mod','Min2','Min1', 'Typ'], ordered = True)).cat.codes
df['GarageCond'] = df['GarageCond'].astype(CategoricalDtype(categories=['NA', 'Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['GarageQual'] = df['GarageQual'].astype(CategoricalDtype(categories=['NA', 'Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['GarageFinish'] = df['GarageFinish'].astype(CategoricalDtype(categories=['NA', 'Unf', 'RFn', 'Fin'], ordered = True)).cat.codes
df['HeatingQC'] = df['HeatingQC'].astype(CategoricalDtype(categories=['Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['KitchenQual'] = df['KitchenQual'].astype(CategoricalDtype(categories=['Po', 'Fa', 'TA', 'Gd', 'Ex'], ordered = True)).cat.codes
df['PavedDrive'] = df['PavedDrive'].astype(CategoricalDtype(categories=['N', 'P', 'Y'], ordered = True)).cat.codes
df['Utilities'] = df['Utilities'].astype(CategoricalDtype(categories=['ELO', 'NASeWa', 'NASeWr', 'AllPub'], ordered = True)).cat.codes

df['Utilities'].unique()

array([ 3, -1], dtype=int64)

Show skewness of feature with distplot¶

skewed_features = ['1stFlrSF',
 '2ndFlrSF',
 '3SsnPorch',
 'BedroomAbvGr',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtFullBath',
 'BsmtHalfBath',
 'BsmtUnfSF',
 'EnclosedPorch',
 'Fireplaces',
 'FullBath',
 'GarageArea',
 'GarageCars',
 'GrLivArea',
 'HalfBath',
 'KitchenAbvGr',
 'LotArea',
 'LotFrontage',
 'LowQualFinSF',
 'MasVnrArea',
 'MiscVal',
 'OpenPorchSF',
 'PoolArea',
 'ScreenPorch',
 'TotRmsAbvGrd',
 'TotalBsmtSF',
 'WoodDeckSF']

quan == skewed_features

False

plt.figure(figsize=(25,20))
for i in range(len(skewed_features)):
    if i <= 28:
        plt.subplot(7,4,i+1)
        plt.subplots_adjust(hspace = 0.5, wspace = 0.5)
        ax = sns.distplot(df[skewed_features[i]])
        ax.legend(["Skewness: {:.2f}".format(df[skewed_features[i]].skew())], fontsize = 'xx-large')

df_back = df

# decrease the skewnwnes of the data
for i in skewed_features:
    df[i] = np.log(df[i] + 1)

plt.figure(figsize=(25,20))
for i in range(len(skewed_features)):
    if i <= 28:
        plt.subplot(7,4,i+1)
        plt.subplots_adjust(hspace = 0.5, wspace = 0.5)
        ax = sns.distplot(df[skewed_features[i]])
        ax.legend(["Skewness: {:.2f}".format(df[skewed_features[i]].skew())], fontsize = 'xx-large')

SalePrice = np.log(train['SalePrice'] + 1)

# get object feature to conver in numeric using dummy variable
obj_feat = list(df.loc[:,df.dtypes == 'object'].columns.values)
len(obj_feat)

29

# dummy varaibale
dummy_drop = []
clean_df = df
for i in obj_feat:
    dummy_drop += [i + '_' + str(df[i].unique()[-1])]

df = pd.get_dummies(df, columns = obj_feat)
df = df.drop(dummy_drop, axis = 1)

df.shape

(2919, 500)

#sns.pairplot(df)

# scaling dataset with robust scaler
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
scaler.fit(df)
df = scaler.transform(df)

Machine Learning Model Building¶

train_len = len(train)

X_train = df[:train_len]
X_test = df[train_len:]
y_train = SalePrice

print(X_train.shape)
print(X_test.shape)
print(len(y_train))

(1460, 500)
(1459, 500)
1460

Cross Validation¶

from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import make_scorer, r2_score

def test_model(model, X_train=X_train, y_train=y_train):
    cv = KFold(n_splits = 3, shuffle=True, random_state = 45)
    r2 = make_scorer(r2_score)
    r2_val_score = cross_val_score(model, X_train, y_train, cv=cv, scoring = r2)
    score = [r2_val_score.mean()]
    return score

Linear Regression¶

import sklearn.linear_model as linear_model
LR = linear_model.LinearRegression()
test_model(LR)

[-4.499253758245961e+19]

# Cross validation
cross_validation = cross_val_score(estimator = LR, X = X_train, y = y_train, cv = 10)
print("Cross validation accuracy of LR model = ", cross_validation)
print("\nCross validation mean accuracy of LR model = ", cross_validation.mean())

Cross validation accuracy of XGBoost model =  [-3.59049263e+18 -2.69794256e+16 -5.01430840e+20 -1.24195688e+20
 -1.56157918e+20 -3.80303041e+20 -6.92624737e+20 -1.81535501e+20
 -1.18431954e+19 -8.96500637e+19]

Cross validation mean accuracy of XGBoost model =  -2.1413584565212335e+20

rdg = linear_model.Ridge()
test_model(rdg)

[0.8646898178967032]

lasso = linear_model.Lasso(alpha=1e-4)
test_model(lasso)

[0.8677128206058571]

Fitting Polynomial Regression to the dataset¶

from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree = 2) X_poly = poly_reg.fit_transform(X_train) poly_reg.fit(X_poly, y_train) lin_reg_2 = LinearRegression()

lin_reg_2.fit(X_poly, y_train)¶

test_model(lin_reg_2,X_poly)¶

import sklearn.linear_model as linear_model lin_reg_2 = linear_model.LinearRegression()

lin_reg_2.fit(X_poly, y_train)¶

test_model(lin_reg_2,X_poly)

Support Vector Machine¶

from sklearn.svm import SVR
svr_reg = SVR(kernel='rbf')
test_model(svr_reg)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

[0.8897490696206058]

Decision Tree Regressor¶

from sklearn.tree import DecisionTreeRegressor
dt_reg = DecisionTreeRegressor(random_state=21)
test_model(dt_reg)

[0.6977699373506714]

Random Forest Regressor¶

from sklearn.ensemble import RandomForestRegressor
rf_reg = RandomForestRegressor(n_estimators = 1000, random_state=51)
test_model(rf_reg)

[0.8562626036810235]

Bagging & boosting¶

from sklearn.ensemble import BaggingRegressor, GradientBoostingRegressor
br_reg = BaggingRegressor(n_estimators=1000, random_state=51)
gbr_reg = GradientBoostingRegressor(n_estimators=1000, learning_rate=0.1, loss='ls', random_state=51)

test_model(br_reg)

[0.8566634227077645]

test_model(gbr_reg)

[0.8814693894754249]

XGBoost¶

import xgboost
#xgb_reg=xgboost.XGBRegressor()
xgb_reg = xgboost.XGBRegressor(bbooster='gbtree', random_state=51)
test_model(xgb_reg)

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:12:43] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:12:45] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:12:46] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

[0.8841700820661896]

SVM Model Bulding¶

svr_reg.fit(X_train,y_train)
y_pred = np.exp(svr_reg.predict(X_test)).round(2)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

y_pred

array([116235.81, 159145.5 , 184603.73, ..., 175995.91, 115960.91,
       228554.78])

submit_test1 = pd.concat([test['Id'],pd.DataFrame(y_pred)], axis=1)
submit_test1.columns=['Id', 'SalePrice']

submit_test1

submit_test1.to_csv('sample_submission.csv', index=False )

SVM Model Bulding Hyperparameter Tuning¶

Hyperparameter Tuning¶

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV params = {‘kernel’: [‘linear’, ‘rbf’, ‘sigmoid’], ‘gamma’: [1, 0.1, 0.01, 0.001, 0.0001], ‘C’: [0.1, 1, 10, 100, 1000], ‘epsilon’: [1, 0.2, 0.1, 0.01, 0.001, 0.0001]}

rand_search = RandomizedSearchCV(svr_reg, param_distributions=params, n_jobs=-1, cv=11) rand_search.fit(X_train, y_train) rand_search.bestparams

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
params = {'kernel': ['rbf'],
         'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
         'C': [0.1, 1, 10, 100, 1000],
         'epsilon': [1, 0.2, 0.1, 0.01, 0.001, 0.0001]}
rand_search = RandomizedSearchCV(svr_reg, param_distributions=params, n_jobs=-1, cv=11)
rand_search.fit(X_train, y_train)
rand_search.best_score_

0.8931459336116102

svr_reg= SVR(C=100, cache_size=200, coef0=0.0, degree=3, epsilon=0.01, gamma=0.0001,
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
test_model(svr_reg)

[0.8937335862549801]

svr_reg.fit(X_train,y_train)
y_pred = np.exp(svr_reg.predict(X_test)).round(2)

y_pred

array([113161.6 , 161976.13, 183930.61, ..., 175456.87, 118566.68,
       213315.75])

submit_test3 = pd.concat([test['Id'],pd.DataFrame(y_pred)], axis=1)
submit_test3.columns=['Id', 'SalePrice']

submit_test3.to_csv('sample_submission.csv', index=False)
submit_test3

Name Submitted Wait time Execution time Score sample_submission.csv 3 days ago 0 seconds 0 seconds 0.12612

XGBoost parameter tuning¶

xgb2_reg = xgboost.XGBRegressor() params_xgb = { ‘max_depth’: range(2, 20, 2), ‘n_estimators’: range(99, 2001, 80), ‘learning_rate’: [0.2, 0.1, 0.01, 0.05], ‘booster’: [‘gbtree’], ‘mon_child_weight’: range(1, 8, 1) } rand_search_xgb = RandomizedSearchCV(estimator = xgb2_reg, param_distributions=params_xgb, n_iter=100, n_jobs=-1, cv=11, verbose=11, random_state=51, return_train_score =True, scoring=’neg_mean_absolute_error’) rand_search_xgb.fit(X_train,y_train)

rand_search_xgb.bestscore

rand_search_xgb.bestparams

xgb2_reg=xgboost.XGBRegressor(n_estimators= 899,
 mon_child_weight= 2,
 max_depth= 4,
 learning_rate= 0.05,
 booster= 'gbtree')

test_model(xgb2_reg)

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:13:53] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:14:09] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:14:25] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

[0.8899316609591396]

xgb2_reg.fit(X_train,y_train)
y_pred_xgb_rs=xgb2_reg.predict(X_test)

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \

[18:14:42] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

np.exp(y_pred_xgb_rs).round(2)

array([123535.19, 169676.48, 190203.95, ..., 154335.52, 118554.99,
       211244.77], dtype=float32)

y_pred_xgb_rs = np.exp(xgb2_reg.predict(X_test)).round(2)
xgb_rs_solution = pd.concat([test['Id'], pd.DataFrame(y_pred_xgb_rs)], axis=1)
xgb_rs_solution.columns=['Id', 'SalePrice']
xgb_rs_solution.to_csv('sample_submission.csv', index=False)

xgb_rs_solution

1603 0.12484 2 1d Your Best Entry Your submission scored 0.12484, which is an improvement of your previous score of 0.12612. Great job! Tweet this!

Feature Engineering / Selection to improve accuracy¶

# correlation Barplot
plt.figure(figsize=(9,16))
corr_feat_series = pd.Series.sort_values(train.corrwith(train.SalePrice))
sns.barplot(x=corr_feat_series, y=corr_feat_series.index, orient='h')

<matplotlib.axes._subplots.AxesSubplot at 0x1bb01fb41d0>

df_back1 = df_back

df_back1.to_csv('df_for_feature_engineering.csv', index=False)

list(corr_feat_series.index)

['KitchenAbvGr',
 'EnclosedPorch',
 'MSSubClass',
 'OverallCond',
 'YrSold',
 'LowQualFinSF',
 'Id',
 'MiscVal',
 'BsmtHalfBath',
 'BsmtFinSF2',
 '3SsnPorch',
 'MoSold',
 'PoolArea',
 'ScreenPorch',
 'BedroomAbvGr',
 'BsmtUnfSF',
 'BsmtFullBath',
 'LotArea',
 'HalfBath',
 'OpenPorchSF',
 '2ndFlrSF',
 'WoodDeckSF',
 'LotFrontage',
 'BsmtFinSF1',
 'Fireplaces',
 'MasVnrArea',
 'GarageYrBlt',
 'YearRemodAdd',
 'YearBuilt',
 'TotRmsAbvGrd',
 'FullBath',
 '1stFlrSF',
 'TotalBsmtSF',
 'GarageArea',
 'GarageCars',
 'GrLivArea',
 'OverallQual',
 'SalePrice']