Reducing dimensionality of features with PCA

Question

Sepp 2015 年 6 月 4 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/222546-reducing-dimensionality-of-features-with-pca

コメント済み: Clinton Kayson 2021 年 5 月 27 日

I'm totally confused regarding PCA. I have a 4D image of size 90 x 60 x 12 x 350. That means that each voxel is a vector of size 350 (time series).

Now I divide the 3D image (90 x 60 x 12) into cubes. So let's say a cube contains n voxels, so I have n vectors of size 350. I want to reduce this n vectors to only one vector and then calculate the correlations between all vectors of all cubes.

So for a cube I can construct the matrix M where I just put each voxel after each other, i.e. M = [v1 v2 v3 ... vn] and each v is of size 350.

Now I can apply PCA in Matlab by using [coeff, score, latent, ~, explained] = pca(M); and taking the first component. And now my confusion begins.

Should I transpose the matrix M, i.e. PCA(M')?
Should I take the first column of coeff or of score?
This third question is now a bit unrelated. Let's assume we have a matrix A = rand(30,100) where the rows are the datapoints and the columns are the features. Now I want to reduce the dimensionality of the feature vectors but keeping all data points. How can I do this with PCA? When I do [coeff, score, latent, ~, explained] = pca(M); then coeff is of dimension 100 x 29 and score is of size 30 x 29. I'm totally confused.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Matlaber 2019 年 2 月 19 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/222546-reducing-dimensionality-of-features-with-pca#answer_361811

MATLAB Online で開く

Is there any setting input arguement of

coeff = pca(X)
coeff = pca(X,Name,Value)
[coeff,score,latent] = pca(___)
[coeff,score,latent,tsquared] = pca(___)
[coeff,score,latent,tsquared,explained,mu] = pca(___)

for reducing a matrix of (400 * 40) to (400 * 20) ?

Thanks

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Clinton Kayson 2021 年 5 月 27 日

can i get the codes for 4 columns

サインインしてコメントする。

Answer 2

Alfonso Nieto-Castanon 2015 年 6 月 5 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/222546-reducing-dimensionality-of-features-with-pca#answer_181637

MATLAB Online で開く

If you use:

 [coeff,score] = pca(M);
 Comp_PCA1 = score(:,1);

where M is a (300 by n) matrix of voxel timeseries, and you keep the first column of the resulting matrix score, that will have the (300 by 1) timeseries/vector of component scores most representative of the timeseries variance within your cube.

Note that pca(X) first subtracts the mean effect mean(X,1) from X and then performs SVD on the residuals to decompose the resulting covariance in its principal components. You do not want to use pca(M') because then you would be disregarding the average timeseries across all your voxels within each cube (which often contains useful information). Using pca(M) will instead disregard the average signal across all your timepoints for each voxel, which is fine if you are planning to use this for correlation analyses (since the correlations are invariant to the average value of the timeseries)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Sepp 2015 年 6 月 5 日

編集済み: Sepp 2015 年 6 月 5 日

MATLAB Online で開く

Thank you so much, it is of great help.

In the end, after applying PCA, I have one vector of size 300 by 1 for each cube. Then I will perform the correlations between all this vectors, this gives me a feature vector (each correlation is a feature). This I will do for all my e.g. 30 images which will give me 30 feature vectors.

Should I apply z score (zero mean, unit variance) before applying PCA (z score on the columns of M) or should I apply it before feeding the feature vectors to my machine learning classifier (z score on each feature vector)?

Second, does it make sense to do a second PCA step on my resulting feature vectors? That means in the end I have a matrix A where the columns are the features and the rows are my 30 feature vectors. I thought of this:

[coeff, score] = pca(A);
reducedDimension = coeff(:,1:5);
reducedData = A * reducedDimension;

With this I could reduce the dimensionality of the feature vectors.

Or just taking the first e.g. 10 columns of score?

Alfonso Nieto-Castanon 2015 年 6 月 5 日

Regarding normalization of your features, that depends on the classifier that you are planning to use. Many classifiers (e.g. random forest, SVM's) will be invariant to this form of scaling, while others (e.g. logistic regression, gaussian mixture models) will not.

Regarding your the second question, assumming that your feature vectors are mean-centered, the two methods (keeping the first 10 columns of score or multiplying A by the first 10 columns of coeff) are exactly equivalent.

Last, if you are planning to use all this in a machine learning context, please be aware that you need to define your features consistently across your training and validation datasets. That typically means that you do not want to apply PCA on the validation dataset but rather store the coeff matrices computed from the training set and use those to project the validation data. This is particularly important when using PCA since the resulting coefficients/scores are scale/reflection invariant (e.g. you could arbitrarily get -coeff and -score instead of coeff and score as your coefficients/scores resulting from PCA)

Sepp 2015 年 6 月 6 日

Thank you for the answer.

I have now also tried out to take the first column of "coeff" instead of "score" and the result is much better (66% compared to 54% with a classification of 4 classes).

But I have now problems with the size of the vectors. Let me explain.

What I'm currently doing is to take only cubes which lies fully in the brain, so my "coeff" vector sizes are all the same but I'm loosing information from the border of the brain in this way.

The problem is the following. Let's say a cube has 18 voxels in it, then I got a matrix M of (time x voxels) of (350 x 18). When I do PCA and extract the first column of coeff I'm getting a vector of size 18.

Now, let's say we have a border cube with only 4 voxels (all other voxels of the cube are outside of brain). Then my Matrix M is of size (350 x 4) and the first column of coeff is of size 4.

To be able to calculate the correlations I need of course same vector sizes.

How would you solve this problem?

サインインしてコメントする。

Answer 3

Bhuvana P 2018 年 1 月 25 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/222546-reducing-dimensionality-of-features-with-pca#answer_301650

I need a matlab code for converting 2d image into 1d image

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Reducing dimensionality of features with PCA

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Reducing dimensionality of features with PCA

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示