Matlab TreeBagger Cost argument not working as it works with similar function fitensemble
5 ビュー (過去 30 日間)
古いコメントを表示
The cost function of my TreeBagger class and fitensemble (Bag method) are both [0 8;1 0] for binary classification. The confusion matrix on fitensemble shows that the classfication tends to turn in the favor of the costy class (like [100 0; 20 80] favoring false negatives) but the same on TreeBagger does not hold. I have tried it on 3 datasets and TreeBagger seems to ignore the cost as I have omitted the cost argument and the confusion matrix is like when you dont give cost at all. Is there a Problem with my code or is it the TreeBagger? (BTW the problem is not misinterpretation of confusion matrix, it just doesn't work)
My Code
(TREEBAGGER WHICH DOES NOT WORK)
RF=TreeBagger(150,Xtrain,Ytrain,'oobpred','on','cost',[0 8;1 0])
(FITENSEMBLE WHICH WORKS)
Bag=(Xtrain,Ytrain,'Bag',150,'type','classification','cost',[0 8;1 0]);
Thank you for your help.
0 件のコメント
回答 (1 件)
Ilya
2012 年 8 月 27 日
fitensemble and TreeBagger use different approaches to incorporate the cost matrix.
fitensemble uses the cost matrix to update the class prior probabilities. This scheme is described here: http://www.mathworks.com/help/toolbox/stats/bsvjye9.html#bsw73lr The data are then bootstrapped for every tree using these class priors as multinomial sampling probabilities. That is, every bootstrap replica is dominated by observations of the class with the higher misclassification cost.
TreeBagger bootstraps data using an unweighted scheme. Every replica has the same proportions, on average, of the classes as the original data. The prior probabilities and the cost matrix are passed to the individual trees.
Both approaches are sensitive to the misclassification cost. Because the bagged trees are typically split to the finest level (minleaf=1), the predicted posterior probabilities from each tree are either 0 or 1. The scheme in which bootstrap replicas are dominated by observations of a certain class then shows more sensitivity to the class priors and misclassification cost, simply because there are more leaves with observations from the class with the higher misclassification cost.
If you want to have ultimate control of how misclassification costs are accounted for, you could train a TreeBagger without passing a cost matrix or class prior probabilities and then adjust the predicted posterior probabilities to include the knowledge of cost. One recipe for such an adjustment is described here: http://www.mathworks.com/help/toolbox/stats/bq_u8tb.html#bs31lmr
3 件のコメント
Ilya
2012 年 8 月 27 日
The cost matrix does function in TreeBagger, but its effect is less noticeable than in fitensemble. It may be small for your data but not so small for other datasets. The random forest algorithm amounts to bagging and random selection of predictors for every split. The cost matrix can be accommodated in a number of ways. Going beyond random forest, the cost matrix can be accommodated in a number of ways for many learning algorithms. What I suggested is another popular solution.
The formula shown in the 2nd link in my reply finds the class with minimal expected misclassification cost:
% load X and Y
b = TreeBagger(100,X,Y,'oobpred','on');
[~,posterior] = oobPredict(b);
% define cost, a misclassification cost matrix
expCost = posterior*cost;
Every row in expCost now represents the expected cost of misclassification for this observation. You can assign the class label by choosing the class with the miminal cost, that is
[~,classIndex] = min(expCost,[],2);
Yfit = b.ClassNames(classIndex);
参考
カテゴリ
Help Center および File Exchange で Classification Ensembles についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!