What does it mean for a tree in a TreeBagger ensemble to have to have >80% error? What is the best way to reduce error?

Question

Austin Jordan 2017 年 4 月 22 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/336732-what-does-it-mean-for-a-tree-in-a-treebagger-ensemble-to-have-to-have-80-error-what-is-the-best-w

編集済み: Ilya 2017 年 4 月 23 日

I originally ran my data through the code at this link and got an average error rate of ~3%. When I realized I couldn't easily calculate variable importance with that code, I switched over to TreeBagger.

RF_ensemble = TreeBagger(ntrees,meanValuesPerPitcher,string(pitcherClusters),'Method','classification',... 'OOBPredictorImportance','on');

oobError(RF_ensemble,'Mode','individual') = vector with values ranging from 0.7 to 0.94.

oobError(RF_ensemble,'Mode','ensemble') = 0.44

I would rather go with the TreeBagger function since I'm more confident it is correct, but I don't understand how or why the error rate is so high.

My data is a 50x14 matrix (50 observations with 14 variables), and my labels vector is a 50x1 numeric vector with a cluster number 1-10 for each observation.

I must be doing something wrong because there is no way the error is this high, but I don't know what to do. Let me know if more information would be helpful.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ilya 2017 年 4 月 23 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/336732-what-does-it-mean-for-a-tree-in-a-treebagger-ensemble-to-have-to-have-80-error-what-is-the-best-w#answer_264065

It's hard to identify the source of discrepancy without understanding what the package at that link does and how you used it. However, 3% OOB error for 50 observations with 10 classes seems unlikely. A bootstrap replica contains, on average, 63% of observations from the input set. With such a low observation per class ratio, you grow trees on datasets in which some, possibly many, classes have no observations or only one observation. Correct prediction for classes not represented in the training set is impossible, and correct prediction for classes with only one observation in the training set is possible but rather unlikely. Given that the probability of correct prediction by a random guess for 10 classes is 10% (which means 90% error), getting 44% error for such a small set is not necessarily bad. My guess would be that you used the other package to compute not classification OOB error but something else; maybe you computed training error or maybe somehow you performed regression instead of classification. You could also play with the 'NumPredictorsToSample' option and see if it improves your results.

2 件のコメント
なしを表示なしを非表示

Austin Jordan 2017 年 4 月 23 日

I was able to address this issue but have come across another, which I listed as a separate post. Thank you for your help!

Ilya 2017 年 4 月 23 日

編集済み: Ilya 2017 年 4 月 23 日

I don't normally answer on stackexchange. It would help if you posted questions here. The answer to your question is in the doc for the OOBPermutedPredictorDeltaError property (boldface is mine):

This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble.

Because of that division, the range is from -inf to +inf.

サインインしてコメントする。

What does it mean for a tree in a TreeBagger ensemble to have to have >80% error? What is the best way to reduce error?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

What does it mean for a tree in a TreeBagger ensemble to have to have >80% error? What is the best way to reduce error?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示