How can i make this piece of code faster?

Question

Sudipta Banerjee 2018 年 3 月 23 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/390311-how-can-i-make-this-piece-of-code-faster

コメント済み: Sudipta Banerjee 2018 年 3 月 31 日

I am new to Machine Learning.I am trying to select 300 features from the dataset i.e. (77*7071) using sequentialfs.I have tried scaling the data using a scaling function.But it takes a lot of time.Can anybody help me please? Where checkknn is a function handle implementing KNN classifier and returning the correctRate of the classifier Here's the code:

clc;
clear all;
x=load('dataset.txt');
y = x(:,end);
scaled_data=Scale(x(:,1:end-1),-1,1);
x=scaled_data;
x(:,end+1)=y;
c = cvpartition(y,'k',10);
opts = statset('display','iter');
fun = @checkknn;
[fs,history] = sequentialfs(fun,x,y,'cv',c,'options',opts,'direction','backward','Nf',300);

Thanks in advance!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Bernhard Suhm 2018 年 3 月 27 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/390311-how-can-i-make-this-piece-of-code-faster#answer_312266

Sequential feature selection is expected to take a long time, especially when dealing with such a large number of features. If your hardware provides multiple cores, you could speed up computation by fanning out to multiple cores (but that would require the Parallel Computing Toolbox).

The right choice of feature selection depends on the classification or regression technique you apply. In this case, you applied nearest neighbors, but with the default distance metric which is Euclidean. If the user has 77 observations and 7000+ features, Euclidean distance is not a good choice. The cosine distance could perform well in this setting.

Or do you really have just 77 observations and 7000+ features? Don't expect a solid model from such a small and feature-heavy dataset!

For your dataset is indeed that "wide", linear techniques are best. Regularized linear discriminant could work well, and we have a doc example showing how to use it for feature selection (see LDA-with-variable-selection for more information) . Another good option would be a linear model with lasso penalty fit by fitclinear using the SpaRSA solver. However, that's not a linear techniques. Did you make a deliberate choice in avoiding linear techniques?

Finally, there is no need to copy the class label variable into the last column in the predictor matrix before running sequentialfs.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Sudipta Banerjee 2018 年 3 月 31 日

Thank you very much!

Yes i was avoiding linear methods as I have already implemented some and was trying wrapper based feature selection methods.I have read this article and taken your suggestions and modified my code and have reduced the time.

サインインしてコメントする。

How can i make this piece of code faster?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

How can i make this piece of code faster?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示