How to decide value of 'ndim' when using ,[residuals,reconstructed] = pcares(X,ndim) , for Feature size reduction using PCA?

Question

ipwork 2015 年 6 月 27 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/225530-how-to-decide-value-of-ndim-when-using-residuals-reconstructed-pcares-x-ndim-for-feature-s

回答済み: Ayush Aniket 2025 年 1 月 20 日

I am working on feature classification using KNN and SVM. Data size is 2000 images and features are histograms (of various bin sizes ) of large size for each image. I read above function can be used but my query is how to decide value of ndim which will ensure that the best features are retained ?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ayush Aniket 2025 年 1 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/225530-how-to-decide-value-of-ndim-when-using-residuals-reconstructed-pcares-x-ndim-for-feature-s#answer_1557767

MATLAB Online で開く

In MATLAB, you can experiment with different values of ndim (the number of retained components) to see which works best for your classification task. To choose the appropriate value of ndim you should aim to capture enough variance (from PCA) while avoiding overfitting.

To determine the value, you can apply PCA to the feature set and look at the cumulative variance explained by the principal components by plotting the explained variance:

% Assuming 'features' is a matrix of size [num_samples, num_features]
% where each row corresponds to an image, and each column is a feature
% Perform PCA on the feature set
[coeff, score, latent, ~, explained] = pca(features);
% Plot the cumulative explained variance
cumulative_variance = cumsum(explained);  % Cumulative sum of explained variance
figure;
plot(1:length(cumulative_variance), cumulative_variance, 'b-', 'LineWidth', 2);
xlabel('Number of Principal Components');
ylabel('Cumulative Explained Variance (%)');
title('Explained Variance vs. Number of Principal Components');
grid on;
% Decide on the number of components to keep, based on a threshold of variance
threshold = 95;  % Retain 95% of the variance
ndim = find(cumulative_variance >= threshold, 1); 

For choosing ndim a typical choice is to retain enough components to explain 95% or 99% of the variance. The exact choice of ndim depends on your classification performance (e.g., using KNN or SVM) and the tradeoff between reducing dimensionality and retaining useful information.

After determining the ndim, you should evaluate the performance of your classifier (KNN or SVM) using cross-validation. This will help ensure that the reduced feature set doesn’t hurt your model's performance. Refer to the following documentation link to know the steps of performing cross-validation:

https://www.mathworks.com/help/stats/fitcknn.html#buecexu-1

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to decide value of 'ndim' when using ,[residuals,reconstructed] = pcares(X,ndim) , for Feature size reduction using PCA?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to decide value of 'ndim' when using ,[residual​s,reconstr​ucted] = pcares(X,ndim) , for Feature size reduction using PCA?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to decide value of 'ndim' when using ,[residuals,reconstructed] = pcares(X,ndim) , for Feature size reduction using PCA?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示