Cannot test model using cross validation using crossval and kFoldLoss

Question

I am very new to machine learning, but due to my course I have followed the materials and been able to fit a random forest on my data, and get an error rate that makes sense (beats a dumb prediction and gets better with better chosen features).

My predictor matrix (zscored, this is a subset) is:

  -0.0767889379600161 1.43666113298993    4.83220576535887    4.59650550158967
  -0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
  -0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
  -0.0767889379600161 -0.114493297876403  -0.187208672625236  -0.00955946380486005
  -0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
  -0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
  7.39424877391969    1.12643024681666    -0.145180082833503  -0.187718580390875
  -0.0767889379600161 2.05712290533646    -0.211225009649084  -0.187718580390875
  -0.0767889379600161 0.195737588296863   1.35584098115696    0.229434473078818

And my response is:

  'Highly Active'
  'Inactive'
  'Inactive'
  'Inactive'
  'Inactive'
  'Highly Active'
  'Highly Active'
  'Highly Active'
  'Inactive'
  'Highly Active'
  'Inactive'
  'Highly Active'

My previous method was:

  rng default
  c = cvpartition(catresponse, 'HoldOut', 0.3);
  
  % Extract the indices of the training and test sets.
  trainIdx = training(c);
  testIdx = test(c);
  % Create the training and test data sets.
  XTrain = predictormatrix(trainIdx, :);
  XTest = predictormatrix(testIdx, :);
  yTrain = catresponse(trainIdx);
  yTest = catresponse(testIdx);
  
  % Create an ensemble of 100 trees.
  forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
                              'Tree', 'Type', 'Classification'); 
  
  % Predict and evaluate the ensemble model.
  forestPred = predict(forestModel, XTest);
  % errs = forestPred ~= yTest;
  % testErrRateForest = 100*sum(errs)/numel(errs);
  % display(testErrRateForest)
  
  % Perform 10-fold cross validation.
  cvModel = crossval(forestModel); % 10-fold is default 
  cvErrorForest = 100*kfoldLoss(cvModel);
  display(cvErrorForest)
  
  % Confusion matrix.
  C = confusionmat(yTest, forestPred);
  figure(figOpts{:})
  imagesc(C)
  colorbar
  colormap('cool')
  [Xgrid, Ygrid] = meshgrid(1:size(C, 1));
  Ctext = num2str(C(:));
  text(Xgrid(:), Ygrid(:), Ctext)
  labels = categories(catresponse);
  set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
           'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
           'XTickLabelRotation', 30, ...
           'TickLabelInterpreter', 'none')
  xlabel('Predicted Class')
  ylabel('Known Class')
  title('Forest Confusion Matrix')


Questions:

* Am I doing my cross validation in the right way - my cvLoss code is based on a model built using the 30% holdout, and not something like cvpartition KFold so I am concerned about what cvLoss is actually calculating here.
* Is my cross validation confusion matrix based on the cross validation, or the simpler holdout version with the above code?
* How can I alter my code so that the whole model is "cross validated"?

Cannot test model using cross validation using crossval and kFoldLoss

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (0 件)

カテゴリ

製品

タグ

Community Treasure Hunt

Cannot test model using cross validation using crossval and kFoldLoss

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (0 件)

カテゴリ

製品

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示