Issues in pca transformation
1 回表示 (過去 30 日間)
古いコメントを表示
As I was trying to conduct PCA transformation for testing set based on the training set in Matlab 2019b on a Windows 10 machine, the matrix after PCA transformation seems to be different in 3 different conditions which I originally expect to be identical. I engaged in testing the behavior of pca transformation after I found that the classification accuracy in SVM was not perfectly 50% when the testing set is composed of 2 identical set of data, one coded 1 and another coded 0. That cause my suspection and later I figure that was the problem of the pca transformation I conducted.
%generate training set with normalization per every observation
X=rand(100,50000);
train=zeros(100,50000);
for i=1:100
train(i,:)=normalize(X(i,:));
end
%generate testing set with normalization per every observation
x=rand(10,50000);
test=zeros(10,50000);
for i=1:10
test(i,:)=normalize(x(i,:));
end
%compute pca coefficient based on training set
[coeff,~,latent]=pca(train);
%record if difference exist
allDiff=[]; %reocrd if exist difference among one two and three
tempDiff=[]; %reocrd if exist difference between two and three
%selection of number of pca component to be included in the transformation
for count=1:size(coeff,2)
matrix=coeff(:,1:count);
%cases
one=test*matrix;
temp=[test;test]*matrix;
two=temp(1:10,:);
three=temp(11:20,:);
%comparison on whether the three expectedly identical matrix are indeed identical
if isequal(one,two,three)==false %check if exist difference among one two and three
allDiff=[allDiff,count];
end
if isequal(two,three)==false %check if exist difference between two and three
tempDiff=[tempDiff,count];
end
end
The difference as recorded in allDiff starts at 2 pca component, while that of tempDiff at 20 pca components. Occasionally, some component count will return with identical matrixs among one, two and three.
Is this issue related to the rounding error in matrix multiplication? And more importantly, which is the correct matrix after pca transformation? (I guess that is one) Thanks.
0 件のコメント
回答 (1 件)
Image Analyst
2020 年 8 月 10 日
Since you're using random numbers, why do you think that exactly 50% of your points should fall into each of two classes? Your numbers are continuously valued. It's not like they're in two distinct, well separated clusters. So of course there may not be exactly 50% in each class.
10 件のコメント
Image Analyst
2020 年 8 月 14 日
Either I don't know what you're doing or you don't. Because something doesn't make sense to me. With the pca() function, you pass it data and it gives you back the data in the new PC coordinate system. It figures out what the transform is, not you. So I don't understand it when you say " the same PCA coeff is used in all pca trasnformations." If you start with different sets of data, you will not end up with the same coefficients. They may be close but they will not be the same. It almost sounds like you're transforming your data, like rotating your coordinate system, and getting new coordinates and expecting/hoping that pca() will give you the same transform you used.
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!