Nonlinear regression with categorical predictor?

Question

wesleynotwise 2017 年 5 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/341807-nonlinear-regression-with-categorical-predictor

コメント済み: wesleynotwise 2017 年 5 月 25 日

採用された回答: Michelangelo Ricciulli

MATLAB Online で開く

Suppose I have the following equation

y = (k1|x1)*(k2*x2)*(k3*x3^(1/k4))

where k1 is a coefficient depending on variable x1. Both k2 and k3 are also coefficients, and k4 is a power term, which all the ks are not known to me.

x1 is categorical, say adult male, young male, adult female and young female, and x2 and x3 are numeric, say x2 = weight 75kg, 62kg, 89kg... and x3 = height 180 cm, 172 cm, 170 cm...

Anyone knows how to perform a regression for such a combination of data to find all the ks? and eventually the model has two values for k1, for example: if x1 = male, k1 = 2.5; if x1 = female, k1 = 1.5.

2 件のコメント
なしを表示なしを非表示

the cyclist 2017 年 5 月 24 日

MATLAB Online で開く

I haven't thought about how to model this whole thing, but the term

k1*x1

is problematic, I think, when x1 is categorical. For example, what does "6 times male" mean?

Since you didn't mention explicitly, I assume that x2 and x3 are interval data?

wesleynotwise 2017 年 5 月 24 日

編集済み: wesleynotwise 2017 年 5 月 24 日

Ah. Good question. Let me modify my the first term and my question. x2 and x3 are interval data, for example, weight and height in this case.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Michelangelo Ricciulli 2017 年 5 月 24 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/341807-nonlinear-regression-with-categorical-predictor#answer_268314

MATLAB Online で開く

Ok, a very simple way to do it is the following.

You can use the function fminunc, that finds the minimum value of something. What is this something? You want your model to predict very well the y value, so the mean square error between y and the model is what you want to minimize. Let's define this function as errFunc depending on a vector param:

errFunc=@(param) mean((y - k1(x1)*(param(1)*x2)*(param(2)*x3.^(1/param(3)))).^2);

This works if you already have in your environment the data x1, x2,x3,y and also the function k1 that returns the right coefficient based on k1.

Then, you just need to call fminunc with the function you just created and a guess of the 3 values you are searching (let's just put random numbers as guess)

fminunc(errFunc,randn(3,1))

this will output the value of param you are searching for.

Probably you'll need to add another another coefficient, let's call it param(4), that is summed to your model to better fit the data.

12 件のコメント
10 件の古いコメントを表示10 件の古いコメントを非表示

Michelangelo Ricciulli 2017 年 5 月 25 日

Check the other answer, it makes a very good point. Sorry if I didn't notice that before

wesleynotwise 2017 年 5 月 25 日

MATLAB Online で開く

I tried your suggestion for a small sample size, and... IT WORKS!!! You've brighten up my day.... Well, I receieved the following error message though, I assume that was due to my data size, or the variables I used, which should be easy to resolve.

Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well
(they are not identifiable).  Use caution in making predictions. 
> In nlinfit (line 376)
In NonLinearModel/fitter (line 1123)
In classreg.regr.FitObject/doFit (line 94)
In NonLinearModel.fit (line 1430)
In fitnlm (line 94)

サインインしてコメントする。

Answer 2

Ilya 2017 年 5 月 25 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/341807-nonlinear-regression-with-categorical-predictor#answer_268441

Unless I misunderstood your dot notation, the problem is ill-defined. It has an infinite number of solutions. Rewrite it in this form:

y = ((k1|x1)*k2*k3) * x2*x3^(1/k4)

Observe that you can only fit for ((k1|x1)*k2*k3), but not for separate coefficients k1, k2 and k3.

Generally, the best way to handle multiplicative models is to turn them into additive models by taking the log of both sides. If you do that, you get

q = (c1|x1) + c4*z3

where

q = log(y) - log(x2)

c1 = log(k123|x1)

k123|x1 = (k1|x1)*k2*k3

c4 = 1/k4

z3 = log(x3)

This model can be easily fitted by fitlme. (If you have only two levels in x1, you can easily get away without fitlme.) The formula would be something like 'q ~ (1|x1) + z3'.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

wesleynotwise 2017 年 5 月 25 日

MATLAB Online で開く

Hi IIya. Sorry I didn't see your message early. Yes. The dots and asterisks mean multiply in my example. Just realise that mistake. I'll correct them. The expression should look like this

y = (k1|x1)*(k2*x2)*(k3*x3^(1/k4))

where k1 depends on x1, which is a categorical variable, eg: adult male, young male, adult female and young female. So, the equation will have 4 different k1. I think solving my problem using linear regression is another possible method, which I want to give it a try! So, If converting the equation to log term, it becomes

q = (c1|x1) + c4*z3
where q = log (y) - log (x2);
c1|x1 = log (k1|x1) + log (k2) +log (k3);
c4 = 1/k4;
z3 = log (x3);

after running the linear regression, the matlab is likely to give me values for c1 for different x1 and c4. Does that mean I still have to do a separate regression to find k1, k2 and k3? My intention is to have the equation in the original form (non-log), so that one just need to feed in x1, x2 and x3 to find y in the future. Any thoughts?

Ilya 2017 年 5 月 25 日

You can't separate k1, k2 and k3. The form of your problem does not allow that. You can feed your problem into some minimizer and get a numeric answer (likely with some warnings about ill-conditioning). But the numeric answer such as {k1a,k1b,k2,k3} (a and b are used to enumerate categorical levels) is not in any way different from {k1a,k1b,k2/2,2*k3} or {2*k1a,2*k1b,k2/2,k3} etc. If you run fitlme, you will get K values for the intercept c1=log(k1*k2*k3), where K is the number of categorical levels in x1, plus one linear coefficient for c4. That's all you can do with this problem.

wesleynotwise 2017 年 5 月 25 日

MATLAB Online で開く

Ah. If that's the case, the equation can at most be re-written to this

log (y) = (c1|x1) + c4*log(x3) + log (x2)

This means once one feeds all the variable data of x1, x2 and x3, one needs to inverse the log of y in order to obtain real y value?

サインインしてコメントする。

Nonlinear regression with categorical predictor?

2 件のコメント
なしを表示なしを非表示

採用された回答

12 件のコメント
10 件の古いコメントを表示10 件の古いコメントを非表示

その他の回答 (1 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Nonlinear regression with categorical predictor?

2 件のコメント なしを表示なしを非表示

採用された回答

12 件のコメント 10 件の古いコメントを表示10 件の古いコメントを非表示

その他の回答 (1 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

12 件のコメント
10 件の古いコメントを表示10 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示