Main Content

ブースティング回帰アンサンブル回帰の最適化

この例では、ブースティング回帰アンサンブルのハイパーパラメーターを最適化する方法を示します。最適化によりモデルの交差検証損失を最小化します。

問題は、自動車の加速度、エンジン排気量、馬力および重量に基づいてガロンあたりの走行マイル数で燃費をモデル化することです。これらの予測子および他の予測子が含まれている carsmall データを読み込みます。

load carsmall
X = [Acceleration Displacement Horsepower Weight];
Y = MPG;

LSBoost アルゴリズムと代理分岐を使用してアンサンブル回帰をデータに当てはめます。学習サイクル数、代理分岐の最大数および学習率を変化させることにより、生成されたモデルを最適化します。さらに、最適化で反復ごとに交差検証を再分割できるようにします。

再現性を得るために、乱数シードを設定し、'expected-improvement-plus' の獲得関数を使用します。

rng('default')
Mdl = fitrensemble(X,Y, ...
    'Method','LSBoost', ...
    'Learner',templateTree('Surrogate','on'), ...
    'OptimizeHyperparameters',{'NumLearningCycles','MaxNumSplits','LearnRate'}, ...
    'HyperparameterOptimizationOptions',struct('Repartition',true, ...
    'AcquisitionFunctionName','expected-improvement-plus'))
|====================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   | NumLearningC-|    LearnRate | MaxNumSplits |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    | ycles        |              |              |
|====================================================================================================================|
|    1 | Best   |      3.5219 |      16.451 |      3.5219 |      3.5219 |          383 |      0.51519 |            4 |
|    2 | Best   |      3.4752 |     0.72328 |      3.4752 |      3.4777 |           16 |      0.66503 |            7 |
|    3 | Best   |      3.1575 |      1.3444 |      3.1575 |      3.1575 |           33 |       0.2556 |           92 |
|    4 | Accept |      6.3076 |     0.45946 |      3.1575 |      3.1579 |           13 |    0.0053227 |            5 |
|    5 | Accept |      3.4449 |      7.9374 |      3.1575 |      3.1579 |          277 |      0.45891 |           99 |
|    6 | Accept |      3.9806 |     0.46745 |      3.1575 |      3.1584 |           10 |      0.13017 |           33 |
|    7 | Best   |       3.059 |     0.70593 |       3.059 |        3.06 |           10 |      0.30126 |            3 |
|    8 | Accept |      3.1707 |      0.4324 |       3.059 |      3.1144 |           10 |      0.28991 |           16 |
|    9 | Accept |      3.0937 |      1.0461 |       3.059 |      3.1046 |           10 |      0.31488 |           13 |
|   10 | Accept |       3.196 |     0.64573 |       3.059 |      3.1233 |           10 |      0.32005 |           11 |
|   11 | Best   |      3.0495 |     0.44804 |      3.0495 |      3.1083 |           10 |      0.27882 |           85 |
|   12 | Best   |       2.946 |     0.89551 |       2.946 |      3.0774 |           10 |      0.27157 |            7 |
|   13 | Accept |      3.2026 |      0.4742 |       2.946 |      3.0995 |           10 |      0.25734 |           20 |
|   14 | Accept |      5.7151 |      10.185 |       2.946 |      3.0996 |          376 |     0.001001 |           43 |
|   15 | Accept |       3.207 |      12.683 |       2.946 |      3.0937 |          499 |     0.027394 |           18 |
|   16 | Accept |      3.8606 |      2.0742 |       2.946 |      3.0937 |           36 |     0.041427 |           12 |
|   17 | Accept |      3.2026 |      13.304 |       2.946 |       3.095 |          443 |     0.019836 |           76 |
|   18 | Accept |      3.4832 |      5.1018 |       2.946 |      3.0956 |          205 |      0.99989 |            8 |
|   19 | Accept |      5.6285 |      5.3813 |       2.946 |      3.0942 |          192 |    0.0022197 |            2 |
|   20 | Accept |      3.0896 |      5.2907 |       2.946 |      3.0938 |          188 |     0.023227 |           93 |
|====================================================================================================================|
| Iter | Eval   | Objective:  | Objective   | BestSoFar   | BestSoFar   | NumLearningC-|    LearnRate | MaxNumSplits |
|      | result | log(1+loss) | runtime     | (observed)  | (estim.)    | ycles        |              |              |
|====================================================================================================================|
|   21 | Accept |      3.2654 |       4.418 |       2.946 |      3.0951 |          167 |     0.023242 |           86 |
|   22 | Accept |      6.1202 |     0.65228 |       2.946 |      3.0904 |           16 |     0.010203 |           42 |
|   23 | Accept |      2.9963 |      9.1459 |       2.946 |      3.0985 |          440 |     0.076162 |            1 |
|   24 | Accept |      3.2801 |      4.0515 |       2.946 |       3.097 |          171 |     0.067074 |           69 |
|   25 | Accept |        3.47 |      10.895 |       2.946 |      3.0968 |          497 |      0.13969 |            6 |
|   26 | Accept |      3.4413 |      13.028 |       2.946 |      3.0945 |          497 |     0.051993 |           50 |
|   27 | Best   |      2.9095 |      4.8378 |      2.9095 |      2.9126 |          216 |     0.036052 |            1 |
|   28 | Accept |      3.0866 |      1.7361 |      2.9095 |      2.9153 |           78 |      0.24579 |            1 |
|   29 | Accept |      3.0473 |      5.7134 |      2.9095 |      2.9713 |          239 |     0.032173 |            1 |
|   30 | Accept |      3.0383 |     0.77333 |      2.9095 |       2.972 |           25 |      0.31894 |            1 |

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 171.5486 seconds
Total objective function evaluation time: 141.3017

Best observed feasible point:
    NumLearningCycles    LearnRate    MaxNumSplits
    _________________    _________    ____________

           216           0.036052          1      

Observed objective function value = 2.9095
Estimated objective function value = 2.972
Function evaluation time = 4.8378

Best estimated feasible point (according to models):
    NumLearningCycles    LearnRate    MaxNumSplits
    _________________    _________    ____________

           216           0.036052          1      

Estimated objective function value = 2.972
Estimated function evaluation time = 5.7064

Figure contains an axes object. The axes object with title Min objective vs. Number of function evaluations, xlabel Function evaluations, ylabel Min objective contains 2 objects of type line. These objects represent Min observed objective, Estimated min objective.

Mdl = 
  RegressionEnsemble
                         ResponseName: 'Y'
                CategoricalPredictors: []
                    ResponseTransform: 'none'
                      NumObservations: 94
    HyperparameterOptimizationResults: [1x1 BayesianOptimization]
                           NumTrained: 216
                               Method: 'LSBoost'
                         LearnerNames: {'Tree'}
                 ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                              FitInfo: [216x1 double]
                   FitInfoDescription: {2x1 cell}
                       Regularization: []


この損失を、最適化されていないブースティングされたモデルの損失および既定のアンサンブルの損失と比較します。

loss = kfoldLoss(crossval(Mdl,'kfold',10))
loss = 18.2889
Mdl2 = fitrensemble(X,Y, ...
    'Method','LSBoost', ...
    'Learner',templateTree('Surrogate','on'));
loss2 = kfoldLoss(crossval(Mdl2,'kfold',10))
loss2 = 29.4663
Mdl3 = fitrensemble(X,Y);
loss3 = kfoldLoss(crossval(Mdl3,'kfold',10))
loss3 = 37.7424

このアンサンブルを最適化する別の方法については、交差検証の使用によるアンサンブル回帰の最適化を参照してください。