Matlab TreeBagger Cost argument not working as it works with similar function fitensemble

Question

UT 2012 年 8 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/46743-matlab-treebagger-cost-argument-not-working-as-it-works-with-similar-function-fitensemble

The cost function of my TreeBagger class and fitensemble (Bag method) are both [0 8;1 0] for binary classification. The confusion matrix on fitensemble shows that the classfication tends to turn in the favor of the costy class (like [100 0; 20 80] favoring false negatives) but the same on TreeBagger does not hold. I have tried it on 3 datasets and TreeBagger seems to ignore the cost as I have omitted the cost argument and the confusion matrix is like when you dont give cost at all. Is there a Problem with my code or is it the TreeBagger? (BTW the problem is not misinterpretation of confusion matrix, it just doesn't work)

My Code

(TREEBAGGER WHICH DOES NOT WORK)

RF=TreeBagger(150,Xtrain,Ytrain,'oobpred','on','cost',[0 8;1 0])

(FITENSEMBLE WHICH WORKS)

Bag=(Xtrain,Ytrain,'Bag',150,'type','classification','cost',[0 8;1 0]);

Thank you for your help.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ilya 2012 年 8 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/46743-matlab-treebagger-cost-argument-not-working-as-it-works-with-similar-function-fitensemble#answer_57173

fitensemble and TreeBagger use different approaches to incorporate the cost matrix.

fitensemble uses the cost matrix to update the class prior probabilities. This scheme is described here: http://www.mathworks.com/help/toolbox/stats/bsvjye9.html#bsw73lr The data are then bootstrapped for every tree using these class priors as multinomial sampling probabilities. That is, every bootstrap replica is dominated by observations of the class with the higher misclassification cost.

TreeBagger bootstraps data using an unweighted scheme. Every replica has the same proportions, on average, of the classes as the original data. The prior probabilities and the cost matrix are passed to the individual trees.

Both approaches are sensitive to the misclassification cost. Because the bagged trees are typically split to the finest level (minleaf=1), the predicted posterior probabilities from each tree are either 0 or 1. The scheme in which bootstrap replicas are dominated by observations of a certain class then shows more sensitivity to the class priors and misclassification cost, simply because there are more leaves with observations from the class with the higher misclassification cost.

If you want to have ultimate control of how misclassification costs are accounted for, you could train a TreeBagger without passing a cost matrix or class prior probabilities and then adjust the predicted posterior probabilities to include the knowledge of cost. One recipe for such an adjustment is described here: http://www.mathworks.com/help/toolbox/stats/bq_u8tb.html#bs31lmr

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Ilya 2012 年 8 月 27 日

MATLAB Online で開く

The cost matrix does function in TreeBagger, but its effect is less noticeable than in fitensemble. It may be small for your data but not so small for other datasets. The random forest algorithm amounts to bagging and random selection of predictors for every split. The cost matrix can be accommodated in a number of ways. Going beyond random forest, the cost matrix can be accommodated in a number of ways for many learning algorithms. What I suggested is another popular solution.

The formula shown in the 2nd link in my reply finds the class with minimal expected misclassification cost:

% load X and Y
b = TreeBagger(100,X,Y,'oobpred','on');
[~,posterior] = oobPredict(b);
% define cost, a misclassification cost matrix
expCost = posterior*cost;

Every row in expCost now represents the expected cost of misclassification for this observation. You can assign the class label by choosing the class with the miminal cost, that is

[~,classIndex] = min(expCost,[],2);
Yfit = b.ClassNames(classIndex);

UT 2012 年 8 月 27 日

The method you presented is an excellent way of handling misclassification externally. However, I have seen that over many of the datasets I used, the cost (even misclassification costs such as [0 100;1 0] or [0 1000; 1 0]) does not change the confusion matrix. The misclassifications in confusion mat are equal for both of my classes. Such reasons and it has came to my mind using over-sampling in a very simple fashion as duplicating my costy class observations. It worked very good and the classifier worked in favor of costy class. Do you have a particular dataset in mind that may work well with the treebagger?

サインインしてコメントする。

Matlab TreeBagger Cost argument not working as it works with similar function fitensemble

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Matlab TreeBagger Cost argument not working as it works with similar function fitensemble

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示