How to decide value of 'ndim' when using ,[residual​s,reconstr​ucted] = pcares(X,ndim) , for Feature size reduction using PCA?

1 回表示 (過去 30 日間)
ipwork
ipwork 2015 年 6 月 27 日
回答済み: Ayush Aniket 2025 年 1 月 20 日
I am working on feature classification using KNN and SVM. Data size is 2000 images and features are histograms (of various bin sizes ) of large size for each image. I read above function can be used but my query is how to decide value of ndim which will ensure that the best features are retained ?

回答 (1 件)

Ayush Aniket
Ayush Aniket 2025 年 1 月 20 日
In MATLAB, you can experiment with different values of ndim (the number of retained components) to see which works best for your classification task. To choose the appropriate value of ndim you should aim to capture enough variance (from PCA) while avoiding overfitting.
To determine the value, you can apply PCA to the feature set and look at the cumulative variance explained by the principal components by plotting the explained variance:
% Assuming 'features' is a matrix of size [num_samples, num_features]
% where each row corresponds to an image, and each column is a feature
% Perform PCA on the feature set
[coeff, score, latent, ~, explained] = pca(features);
% Plot the cumulative explained variance
cumulative_variance = cumsum(explained); % Cumulative sum of explained variance
figure;
plot(1:length(cumulative_variance), cumulative_variance, 'b-', 'LineWidth', 2);
xlabel('Number of Principal Components');
ylabel('Cumulative Explained Variance (%)');
title('Explained Variance vs. Number of Principal Components');
grid on;
% Decide on the number of components to keep, based on a threshold of variance
threshold = 95; % Retain 95% of the variance
ndim = find(cumulative_variance >= threshold, 1);
For choosing ndim a typical choice is to retain enough components to explain 95% or 99% of the variance. The exact choice of ndim depends on your classification performance (e.g., using KNN or SVM) and the tradeoff between reducing dimensionality and retaining useful information.
After determining the ndim, you should evaluate the performance of your classifier (KNN or SVM) using cross-validation. This will help ensure that the reduced feature set doesn’t hurt your model's performance. Refer to the following documentation link to know the steps of performing cross-validation:

カテゴリ

Help Center および File ExchangeDimensionality Reduction and Feature Extraction についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by