I would like to compute PCA on a large amount of data and thus use the tall array feature of the newest versions of Matlab. My data consists of multiple blocks of features that I gather from big images, i.e. blocks Di of size (Ni,d).
Let's say I have M such blocks and I want to compute PCA for all of them, i.e. something like
[coeff, score, latent] = pca([D0; D1; D2; ...
but the data array [D0; D1; D2; ... ; Dn] does not fit into memory (several GB of data). Every block Di fits in memory by itself, but not their concatenation.
What is the best way to generate a datastore from these multiple blocks of data?
Note: I could compute pcacov using the eigen decomposition manually since the computation of the covariance matrix can be done using the sum of the outer products, which can be easily computed whatever the size of the data matrix, but I read PCA is more stable.