Imbalanced data classification with boosting algorithm

3 ビュー (過去 30 日間)
soudeh
soudeh 2013 年 11 月 10 日
コメント済み: Ilya 2013 年 11 月 10 日
I am working on a binary data classification problem. The dataset is imbalanced, it consists of 92% 'false' labels and 8% 'true' labels. The number of features is 18 and I have a small number of 650 data points. I want to use boosting algorithms in matlab like 'GentleBoost' to solve this problem. I assign uniform for prior as follows:
ada = fitensemble(Xtrain,Ytrain,'GentleBoost',10,'Tree','LearnRate',0.1, 'prior', 'uniform');
but the performance is consistently poor. How should I set the parameters? Is it necessary to set a cost? How can I do this?Is there any classifier that perform better than this?
  1 件のコメント
Ilya
Ilya 2013 年 11 月 10 日
Was my answer on stack exchange helpful?

サインインしてコメントする。

採用された回答

the cyclist
the cyclist 2013 年 11 月 10 日
I think it is very difficult to assess this, because it is possible that your dataset is simply difficult to classify. (For example, maybe it is actually almost random, independent of the features.)
My suggestion would be to create an artificial dataset in which you know that the features determine the response. Test the syntax with that dataset, to ensure that you are coding it correctly.

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeStatistics and Machine Learning Toolbox についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by