MATLAB Answers

0

How to create tall datastore from multiple data parts?

Alexandre Kaspar さんによって質問されました 2016 年 11 月 14 日
最新アクティビティ Rick Amos
さんによって 回答されました 2016 年 11 月 29 日
I would like to compute PCA on a large amount of data and thus use the tall array feature of the newest versions of Matlab. My data consists of multiple blocks of features that I gather from big images, i.e. blocks Di of size (Ni,d).
Let's say I have M such blocks and I want to compute PCA for all of them, i.e. something like
[coeff, score, latent] = pca([D0; D1; D2; ... ; Dn]);
but the data array [D0; D1; D2; ... ; Dn] does not fit into memory (several GB of data). Every block Di fits in memory by itself, but not their concatenation.
What is the best way to generate a datastore from these multiple blocks of data?
Note: I could compute pcacov using the eigen decomposition manually since the computation of the covariance matrix can be done using the sum of the outer products, which can be easily computed whatever the size of the data matrix, but I read PCA is more stable.

  0 件のコメント

サインイン to comment.

1 件の回答

Rick Amos
回答者: Rick Amos
2016 年 11 月 29 日

A datastore can be created from a collection of folders and so the easiest way to achieve this is to place each block of data into its own folder using tall/write. The following code does both this as well as creating the datastore:
baseFolder = fullfile(pwd, 'MyFolder');
for ii = 1 : numBlocks
block = calculateBlock(ii);
subfolder = fullfile(baseFolder, num2str(ii, '%05i'));
write(subfolder, tall(block));
end
wildcardPattern = fullfile(baseFolder, '*');
ds = datastore(wildcardPattern);

  0 件のコメント

サインイン to comment.



Translated by