Unable to build the regression model.

1 回表示 (過去 30 日間)
Sneha Sunil
Sneha Sunil 2020 年 10 月 21 日
編集済み: Sneha Sunil 2020 年 10 月 21 日
error is in last line.
Goal of the Project

To predict the price of a house based on its features. It's not necessary that all individuals around the world will be updated with house price and all. Suppose one has to sell a house and he is not aware of anything about price of a house. Here comes Machine learning algorithms, it will predict the price just by looking at the features.

Importing Training and Testing Data.
train = readtable('train.csv');
test = readtable('test.csv');
disp(train)
disp(test)
Now let us see how many datas are there in training set and test set for that let us see use height for number of rows and width for number of columns.
height(train) % For Training Data
width(train) % For Training Data
height(test) % For Testing Data
width(test) % For Testing Data
Here, in training set we have 1460 rows and 81 columns and in test set we have 1459 rows and 80 column/feature. So, we can see that one extra feature is there in training set which is our target variable. What is our target variable? Its Sale Price. We will use 80 features to predict the Price.

Let's overview training set

head(train,10)
We can see that our dataset is dirty, we have null values, categorial variables, numerical values. SO, we have to clean the data.
head(test,10)
Here, we have to clean the data. For that we have to merge our yest and train datasets followed by cleaning of the data and then train-test split.
Data Analysis
In Data Analysis, what we do is, we try to find out if there are following stuffs:
  1. Missing values.
  2. All the Numerical Variables.
  3. Distribultion of Numerical Variables.
  4. Categorial Variables.
  5. Cardinality of Categorial Variables.
  6. Outliers.
  7. Relationship between independent and dependent features.
numericVars = varfun(@isnumeric,train,'output','uniform')

Missing Values

% Here we'll check the percentage of null values in each features.
% Step 1: Make the list of features which has missing values.
TF = ismissing(train,{'NA' NaN});
colWithMissing = train(:,any(TF));
width(colWithMissing)
numericVars1 = varfun(@isnumeric,colWithMissing,'output','uniform')
k=0
for i = 1:19
if numericVars1(i) == 1
k=k+1
disp(colWithMissing.Properties.VariableNames(i))
end
end
fprintf('number of numeric missing value column = %g',k)

I have computed the percentage of missing value from python.

LotFrontage 17.74 % missing valuess
Alley 93.77 % missing valuess
MasVnrType 0.55 % missing valuess
MasVnrArea 0.55 % missing valuess
BsmtQual 2.53 % missing valuess
BsmtCond 2.53 % missing valuess
BsmtExposure 2.6 % missing valuess
BsmtFinType1 2.53 % missing valuess
BsmtFinType2 2.6 % missing valuess
FireplaceQu 47.26 % missing valuess
GarageType 5.55 % missing valuess
GarageYrBlt 5.55 % missing valuess
GarageFinish 5.55 % missing valuess
GarageQual 5.55 % missing valuess
GarageCond 5.55 % missing valuess
PoolQC 99.52 % missing valuess
Fence 80.75 % missing valuess
MiscFeature 96.3 % missing valuess
numericVars1 = varfun(@isnumeric,train,'output','uniform')
k=0
for i = 1:81
if numericVars1(i) == 1
k=k+1
end
end
fprintf('number of numeric column in raw dataset = %g',k)
numericVars1 = varfun(@isnumeric,colWithMissing,'output','uniform')
k=0
for i = 1:19
if numericVars1(i) == 1
k=k+1
disp(colWithMissing.Properties.VariableNames(i))
end
end
fprintf('number of numeric missing value column = %g',k)
strVars1 = varfun(@ischar,train,'output','uniform')
k=0
for i = 1:81
if strVars1(i) == 1
k=k+1
end
end
fprintf('number of numeric column = %g',k)
boxplot(train.LotFrontage)
boxplot(train.MasVnrArea)
boxplot(train.GarageYrBlt)

Cleaning

m1 = nanmedian(train.LotFrontage)
m2 = nanmedian(train.MasVnrArea)
m3 = nanmedian(train.GarageYrBlt)
train.LotFrontage = fillmissing(train.LotFrontage,'constant',m1)
train.MasVnrArea = fillmissing(train.MasVnrArea,'constant',m2)
train.GarageYrBlt = fillmissing(train.GarageYrBlt,'constant',m3)
tf=cellfun(@(x)strcmp(x,'NA'),train.Alley);
train.Alley(tf)={NaN};
for k = 1:1460
if isnan(train.Alley{k})
train.Alley{k} = 'missing';
end
end
train.Alley
tf=cellfun(@(x)strcmp(x,'NA'),train.MasVnrType);
train.MasVnrType(tf)={NaN};
for k = 1:1460
if isnan(train.MasVnrType{k})
train.MasVnrType{k} = 'missing';
end
end
train.MasVnrType
tf=cellfun(@(x)strcmp(x,'NA'),train.BsmtQual);
train.BsmtQual(tf)={NaN};
for k = 1:1460
if isnan(train.BsmtQual{k})
train.BsmtQual{k} = 'missing';
end
end
train.BsmtQual
tf=cellfun(@(x)strcmp(x,'NA'),train.BsmtCond);
train.BsmtCond(tf)={NaN};
for k = 1:1460
if isnan(train.BsmtCond{k})
train.BsmtCond{k} = 'missing';
end
end
train.BsmtCond
tf=cellfun(@(x)strcmp(x,'NA'),train.BsmtExposure);
train.BsmtExposure(tf)={NaN};
for k = 1:1460
if isnan(train.BsmtExposure{k})
train.BsmtExposure{k} = 'missing';
end
end
train.BsmtExposure
tf=cellfun(@(x)strcmp(x,'NA'),train.BsmtFinType1);
train.BsmtFinType1(tf)={NaN};
for k = 1:1460
if isnan(train.BsmtFinType1{k})
train.BsmtFinType1{k} = 'missing';
end
end
train.BsmtFinType1
tf=cellfun(@(x)strcmp(x,'NA'),train.BsmtFinType2);
train.BsmtFinType2(tf)={NaN};
for k = 1:1460
if isnan(train.BsmtFinType2{k})
train.BsmtFinType2{k} = 'missing';
end
end
train.BsmtFinType2
tf=cellfun(@(x)strcmp(x,'NA'),train.FireplaceQu);
train.FireplaceQu(tf)={NaN};
for k = 1:1460
if isnan(train.FireplaceQu{k})
train.FireplaceQu{k} = 'missing';
end
end
train.FireplaceQu
tf=cellfun(@(x)strcmp(x,'NA'),train.GarageType);
train.GarageType(tf)={NaN};
for k = 1:1460
if isnan(train.GarageType{k})
train.GarageType{k} = 'missing';
end
end
train.GarageType
tf=cellfun(@(x)strcmp(x,'NA'),train.GarageFinish);
train.GarageFinish(tf)={NaN};
for k = 1:1460
if isnan(train.GarageFinish{k})
train.GarageFinish{k} = 'missing';
end
end
train.GarageFinish
tf=cellfun(@(x)strcmp(x,'NA'),train.GarageCond);
train.GarageCond(tf)={NaN};
for k = 1:1460
if isnan(train.GarageCond{k})
train.GarageCond{k} = 'missing';
end
end
train.GarageCond
tf=cellfun(@(x)strcmp(x,'NA'),train.PoolQC);
train.PoolQC(tf)={NaN};
for k = 1:1460
if isnan(train.PoolQC{k})
train.PoolQC{k} = 'missing';
end
end
train.PoolQC
tf=cellfun(@(x)strcmp(x,'NA'),train.Fence);
train.Fence(tf)={NaN};
for k = 1:1460
if isnan(train.Fence{k})
train.Fence{k} = 'missing';
end
end
train.Fence
tf=cellfun(@(x)strcmp(x,'NA'),train.MiscFeature);
train.MiscFeature(tf)={NaN};
for k = 1:1460
if isnan(train.MiscFeature{k})
train.MiscFeature{k} = 'missing';
end
end
train.MiscFeature
head(train,50)
train.MSZoning = onehotencode(categorical(train.MSZoning),8, 'double')
train.Street = onehotencode(categorical(train.Street), 2, 'double')
train.Alley = onehotencode(categorical(train.Alley), 3, 'double')
train.LotShape = onehotencode(categorical(train.LotShape), 4, 'double')
train.LandContour = onehotencode(categorical(train.LandContour), 3, 'double')
train.Utilities = onehotencode(categorical(train.Utilities), 4, 'double')
train.LotConfig = onehotencode(categorical(train.LotConfig), 5, 'double')
train.LandSlope = onehotencode(categorical(train.LandSlope), 3, 'double')
train.Neighborhood = onehotencode(categorical(train.Neighborhood), 25, 'double')
train.Condition1 = onehotencode(categorical(train.Condition1), 9, 'double')
train.Condition2 = onehotencode(categorical(train.Condition2), 9, 'double')
train.BldgType = onehotencode(categorical(train.BldgType), 5, 'double')
train.HouseStyle = onehotencode(categorical(train.HouseStyle), 8, 'double')
train.RoofStyle = onehotencode(categorical(train.RoofStyle), 6, 'double')
train.RoofMatl = onehotencode(categorical(train.RoofMatl), 8, 'double')
train.Exterior1st = onehotencode(categorical(train.Exterior1st), 17, 'double')
train.Exterior2nd = onehotencode(categorical(train.Exterior2nd), 17, 'double')
train.MasVnrType = onehotencode(categorical(train.MasVnrType), 5, 'double')
train.ExterQual = onehotencode(categorical(train.ExterQual), 5, 'double')
train.ExterCond = onehotencode(categorical(train.ExterCond), 5, 'double')
train.Foundation = onehotencode(categorical(train.Foundation), 6, 'double')
train.BsmtQual = onehotencode(categorical(train.BsmtQual), 6, 'double')
train.BsmtCond = onehotencode(categorical(train.BsmtCond), 6, 'double')
train.BsmtExposure = onehotencode(categorical(train.BsmtExposure), 5, 'double')
train.BsmtFinType1 = onehotencode(categorical(train.BsmtFinType1), 7, 'double')
train.BsmtFinType2 = onehotencode(categorical(train.BsmtFinType2), 7, 'double')
train.Heating = onehotencode(categorical(train.Heating), 6, 'double')
train.HeatingQC = onehotencode(categorical(train.HeatingQC), 5, 'double')
train.CentralAir = onehotencode(categorical(train.CentralAir), 2, 'double')
train.Electrical = onehotencode(categorical(train.Electrical), 5, 'double')
train.KitchenQual = onehotencode(categorical(train.KitchenQual), 5, 'double')
train.Functional = onehotencode(categorical(train.Functional), 5, 'double')
train.FireplaceQu = onehotencode(categorical(train.FireplaceQu), 6, 'double')
train.GarageFinish = onehotencode(categorical(train.GarageFinish), 4, 'double')
train.GarageQual = onehotencode(categorical(train.GarageQual), 6, 'double')
train.GarageCond = onehotencode(categorical(train.GarageCond), 6, 'double')
train.PavedDrive = onehotencode(categorical(train.PavedDrive), 3, 'double')
train.PoolQC = onehotencode(categorical(train.PoolQC), 5, 'double')
train.Fence = onehotencode(categorical(train.Fence), 5, 'double')
train.MiscFeature = onehotencode(categorical(train.MiscFeature), 6, 'double')
train.SaleType = onehotencode(categorical(train.SaleType), 10, 'double')
train.SaleCondition = onehotencode(categorical(train.SaleCondition), 6, 'double')
train.GarageType = onehotencode(categorical(train.GarageType), 7, 'double')
lm = fitlm(train,'SalePrice~YrSold+MoSold')
  2 件のコメント
Sneha Sunil
Sneha Sunil 2020 年 10 月 21 日
編集済み: Sneha Sunil 2020 年 10 月 21 日
lm = fitlm(train,'SalePrice~YrSold+MoSold')
getting error in this line
I used this code to fit the regression model. I was getting error so I thought to do for two variables first, then I will add into it the remaining variables.
dataset has been attached:
Sneha Sunil
Sneha Sunil 2020 年 10 月 21 日
Please help me with this.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeStochastic Differential Equation (SDE) Models についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by