The pca() documentation says that the raw data is automatically centered at the start of the process. If true, then pca(X) should be equal to pca(Y), where Y = centered data. But they're not (specific data below). Additionally, when I use either eig() or svd() to compute the principal components, I can only get them to match the pca output when I first manually center the data before using pca(). Ultimately my question is simply how do I correctly calculate the principal components of raw data? I.e. do I need to manually center and scale it first? Only manually center? Only manually scale?
Sample data: X =
1.0000 -3.0000 -1.0000; 2.0000 -2.0000 -0.5000; 3.0000 -0.5000 0.2500; 4.0000 2.0000 1.0000; 5.0000 5.0000 2.5000;
Centering X -> Y= -2.0000 -3.3000 -1.4500; -1.0000 -2.3000 -0.9500; 0 -0.8000 -0.2000; 1.0000 1.7000 0.5500; 2.0000 4.7000 2.0500;
pca(X) = -0.7360 -0.6037 -0.3062; -0.6688 0.7186 0.1907; -0.1049 -0.3452 0.9327;
pca(Y) =
0.4058 0.8414 0.3569
0.9124 -0.3960 -0.1036
0.0542 0.3676 -0.9284
svd(Y) = 0.4058 0.9124 0.0542; 0.8414 -0.3960 0.3676; 0.3569 -0.1036 -0.9284;
eig(cov(Y)) = 0.0542 0.9124 0.4058; 0.3676 -0.3960 0.8414; -0.9284 -0.1036 0.3569; ^this is the same output just in a different order.
0 件のコメント
サインインしてコメントする。