How to replicate Regression Learner app based training using Matlab script?

Question

Quazi Hussain 2025 年 8 月 15 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2179394-how-to-replicate-regression-learner-app-based-training-using-matlab-script

編集済み: dpb 2025 年 8 月 18 日

I have trained a ML model in regression learner app using optimizable GPR model using the default setting such as 5 k-fold validation and 30 iterations etc. Now I am traying to do the same using the Matlab script.using the following where X are the resgressors and Y is the response variable.

>> ML_mdl=fitrgp(X,Y,'OptimizeHyperparameters','all','HyperparameterOptimizationOptions',struct('KFold',5))

Are the two resulting models more or less equivalent? I know there will be some difference due to the probabilistic nature of the algorithm. When I test it on the entire training set, the R squared value is practically 1.0. Is it overfitting even with K-fold cross-correlation? The prediction on unseen testing set is not that good. Any suggestions?

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

dpb 2025 年 8 月 16 日

編集済み: dpb 2025 年 8 月 17 日

"Is it overfitting even with K-fold cross-correlation? The prediction on unseen testing set is not that good. Any suggestions?"

Possibly. Depends on how much data you've got although the other possibility is that the other dataset simply is different from the dataset used for training.

Without the data to look at, we're shooting in the dark.

As an aside, regarding @Umar's comment "slightly different", a recent thread here in the forum illustrated that the randomized selection of the training dataset occasionally produced a grossly different result from the same overall dataset. That indicated that there were subsets of the total dataset that had markedly different characteristics than other random subsets. One cannot naively assume that recalculating with a different training subset will always produce model estimates that are similar; that will be true only if all random subsets of the overall data are similar to each other in their pertinent characteristics. In particular, different models are sensitive to different things; for example some may be very susceptible to outliers in which case a single training set that happens to pick one outlier may result in a very different model from a training set without any such extreme values. Unfortunately, "it all depends" and about the only way to know with such algorithms is to run a number of times and observe just how stable (or unstable) the results are.

OLS on the other hand, uses the entire dataset and so is deterministic although again the results may be affected by the presence of outliers and just how strongly is still dependent upon the particular model chosen.

Umar 2025 年 8 月 17 日

@dpb - You're absolutely right, I oversimplified that. The "slightly different" comment assumes well-behaved data, but as you point out, some datasets can produce dramatically different models depending on the random subset selection.

For @Quazi Hussain's case, this variability could actually explain the overfitting issue. If CV folds are inconsistent due to data heterogeneity, the hyperparameter optimization might be fitting noise rather than signal.

Good suggestion to run multiple times with different seeds to check stability - high variability would indicate the dataset sensitivity you mentioned.

Thanks for the clarification.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2025 年 8 月 15 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2179394-how-to-replicate-regression-learner-app-based-training-using-matlab-script#answer_1569406

To replicate the fit, save generate the function in the learner app.

To produce the identical results, set the random number seed before doing the fit calculation in both the Learner app and from the command line.

2 件のコメント
なしを表示なしを非表示

Quazi Hussain 2025 年 8 月 18 日

In script, I can set the random number generator to a seed, say 1, by calling rng(1) right before the fit command. How do I do that in regressionLearenr? Do I do that in Matlab command window prior to involking the app?

>> rng(1)

>> regressionLearner

or, there is somewhere in the app setting I can do that? Thanks.

dpb 2025 年 8 月 18 日

編集済み: dpb 2025 年 8 月 18 日

Yes, set in Matlab command window prior to invoking(*) the app; the random number generator stream is global in MATLAB so it will pick up from the last invocation/reset.

This means, of course, that you can't call anything else that generates another random number between the setting and the evaluation in order to be at the same point between the two.

It probably would not be a bad enhancement request to ask for there to be a way to set the seed inside the app to facilitate such use.

ADDENDUM:

(*) Actually, you should be able to just go to the command line while in the app and reset the seed...that would be easy enough to check that if set first, then run a fit that if then reset the seed to the same value that can replicate the fit.

サインインしてコメントする。

How to replicate Regression Learner app based training using Matlab script?

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How to replicate Regression Learner app based training using Matlab script?

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示