Regression tree and prediction equation

1 回表示 (過去 30 日間)
Danish Nasir
Danish Nasir 2022 年 11 月 3 日
コメント済み: the cyclist 2022 年 11 月 5 日
Suppose i have 3 independent variables A,B and C and dependent variable T. The variable A is discrete and B,C are continuous. The output variable T is also continuous. In such situation we need to create Regression tree. How can we generate prediction equation for such regression tree in MATLAB?
E.g.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
  5 件のコメント
Danish Nasir
Danish Nasir 2022 年 11 月 4 日
Yes the length of each variable should be same. Say it is 6 for each variable ( take first 6 values ). I want to consider A as categorical variable. When one of the input is categorical, we can't use Multiple regression but instead use Regression tree. How can i predict T using MATLAB?
dpb
dpb 2022 年 11 月 4 日
As noted above, the MATLAB fitlm know how to handle the categorical variables automagically.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tABC=array2table([A;B(1:numel(A));C(1:numel(A));T(1:numel(A))].','VariableNames',{'A','B','C','T'})
tABC = 6×4 table
A B C T ___ ____ _ ___ 50 0.45 3 1.2 75 0.55 4 1.8 100 0.75 5 2.1 125 0.8 6 2.3 150 0.9 7 2.5 175 1 8 2.7
mdl=fitlm(tABC,'categorical',{'A'})
Warning: Regression design matrix is rank deficient to within machine precision.
mdl =
Linear regression model: T ~ 1 + A + B + C Estimated Coefficients: Estimate SE tStat pValue ________ __ _____ ______ (Intercept) 0.3 0 Inf NaN A_75 0.3 0 Inf NaN A_100 0.3 0 Inf NaN A_125 0.2 0 Inf NaN A_150 0.1 0 Inf NaN A_175 0 0 NaN NaN B 0 0 NaN NaN C 0.3 0 Inf NaN Number of observations: 6, Error degrees of freedom: 0 R-squared: 1, Adjusted R-Squared: NaN F-statistic vs. constant model: NaN, p-value = NaN
While it runs, the toy dataset is deficient in that the three independent variables are all almost exact linear combinations of the first so there's only one of the three that is estimable...observe
corrcoef(tABC{:,:})
ans = 4×4
1.0000 0.9876 1.0000 0.9694 0.9876 1.0000 0.9876 0.9770 1.0000 0.9876 1.0000 0.9694 0.9694 0.9770 0.9694 1.0000

サインインしてコメントする。

採用された回答

the cyclist
the cyclist 2022 年 11 月 5 日
This model is probably nonsense, because of the linear dependencies that @dpb points out. But perhaps your real data will yield a useful model. (Note that I transposed all your variables before putting them in a table.)
A = [ 50 75 100 125 150 175 ]';
Acat = categorical(A);
B = [ 0.45 0.55 0.75 0.8 0.9 1]';
C = [3 4 5 6 7 8]';
T= [ 1.2 1.8 2.1 2.3 2.5 2.7]';
tbl = table(Acat,B,C,T);
mdl=fitrtree(tbl,"T ~ Acat + B + C")
mdl =
RegressionTree PredictorNames: {'Acat' 'B' 'C'} ResponseName: 'T' CategoricalPredictors: 1 ResponseTransform: 'none' NumObservations: 6 Properties, Methods
  2 件のコメント
Danish Nasir
Danish Nasir 2022 年 11 月 5 日
Yes the data set for each variable has 400 elements. A (categorical variable) has the mentioned 6 values kept repeating. The range of B is 0.5 to 2 while the range of C is 3 to 23. The range of T is 2 to 7.
A=400x1,B=400x1,C=400x1,T=400x1
Now i need a prediction model which can predict T using Regression Tree in Matlab.
the cyclist
the cyclist 2022 年 11 月 5 日
The model the way I specified it should do what you want. You can then use that model's predict method to predict T for new values.

サインインしてコメントする。

その他の回答 (0 件)

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by