Matlab: Error using classreg.l​earning.Fi​tTemplate/​fit with hyperparameter optimization of SVM

19 ビュー (過去 30 日間)
I am using Bayesian optimization (bayesopt function) in Matlab for hyperparameter optimization of SVM classifier. The optimization goal is to minimize 10-fold cross validation error. Here is the code that I use:
KernelFlag = 1;
c = cvpartition(size(XTrain,1),'KFold',10);
sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder = optimizableVariable('polyOrder',[2,4]);
fun = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
'PlotFcn',{@plotMinObjective},...
'AcquisitionFunctionName','expected-improvement-plus');
and mysvmfunTest:
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
t = templateSVM('Standardize',1,'KernelFunction','RBF',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
'RemoveDuplicates',true);
else
t = templateSVM('Standardize',1,'KernelFunction','linear',...
'BoxConstraint',z.box,'KernelScale',z.sigma,...
'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames);
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
I have used this code before, with different datasets. But, lately when I try to use it on a new dataset, it throws me an error:
Error using classreg.learning.FitTemplate/fit (line 249) You passed a cvpartition object for 27152 observations, but the input data have only 10395 observations. Some observations may have been removed because they have NaN values for all predictors, missing response values or zero weights. When cross-validating an existing object, consider using the RowsUsed property to determine what size partition is required.
I checked all the data, there is no nan, or missing values in my data. I even removed all the samples which have any feature between 0 and .01 (all my features are positive). Still have the same problem and get the same error. I guess the error is due to the existence of samples that are perhaps too close, resulting into removal of many of the observations, but I am not sure that is the case. Any idea where this error might come from or any suggestion how I can solve this issue?

採用された回答

Ilya
Ilya 2018 年 10 月 26 日
You are passing ClassNames to fitcecoc - are your ClassNames a subset of all class names you have in yTrain?
Train one ECOC model using
SVMModel = fitcecoc(XTrain,yTrain,'Learners',t,'ClassNames',classNames);
and look at the size of property X in SVMModel. Does it have as many rows as XTrain does?

その他の回答 (1 件)

Don Mathis
Don Mathis 2018 年 10 月 26 日
編集済み: Don Mathis 2018 年 10 月 26 日
Maybe your use of 'RemoveDuplicates' is causing observations to be removed?
I ran your code on some synthetic data that has no duplicates in XTrain and it works fine:
XTrain = rand(1000,10);
yTrain = categorical(round(XTrain(:,1)*3));
classNames = categories(yTrain);
KernelFlag = 1;
c = cvpartition(size(XTrain,1),'KFold',10);
sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder = optimizableVariable('polyOrder',[2,4]);
fun = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
'PlotFcn',{@plotMinObjective},...
'AcquisitionFunctionName','expected-improvement-plus');
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
t = templateSVM('Standardize',1,'KernelFunction','RBF',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
'RemoveDuplicates',true);
else
t = templateSVM('Standardize',1,'KernelFunction','linear',...
'BoxConstraint',z.box,'KernelScale',z.sigma,...
'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames);
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
end
By the way, it's probably best to declare polyOrder to be an integer:
polyOrder = optimizableVariable('polyOrder',[2,4],'Type','integer');
  1 件のコメント
Nick Zadeh
Nick Zadeh 2018 年 10 月 26 日
編集済み: Nick Zadeh 2018 年 10 月 26 日
Thank you for your response Don! I tried removing ' RemoveDuplicates',true, but I still get the same error message. As I mentioned before, I used this code on different datasets and I never got any error. Do you have any idea what might cause the problem in this case? Is there any part of this code that automatically removes samples that are too close? Or do you know of any existing bug?

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeModel Building and Assessment についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by