How to calculate the contingency table of a large dataset without memory issues?

14 ビュー (過去 30 日間)
pietro
pietro 2017 年 11 月 13 日
コメント済み: pietro 2017 年 11 月 13 日
Hi all,
I am calculating a contingency table (using crosstab function) from a large array and the result will be a matrix of a size of [175473x175473]. I am running the script with a laptop with 32GB and unfortunately the RAM is not enough. I have noticed that Matlab by default, allocates memory for double variables which requice 4 times more memory than double. The contingency table should be an integer variable and not double. Is there any way to force crosstab to compute the result on integer data so that I can save memory for running the calculation?

採用された回答

the cyclist
the cyclist 2017 年 11 月 13 日
I don't think you can do that with the built-in crosstab function.
Depending on the number of data elements (specifically how sparse the resulting cross table is), you might just be able to "manually" fill in the crosstable, as either a single-precision array, or as a sparse matrix.
All three techniques illustrated below:
x = single([1 1 2 3 1]);
y = single([1 2 5 3 1]);
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
% Memory-intensive double-precision cross table
table = crosstab(x,y)
% "Manually" create single-precision cross table
table_single = single(zeros(numel(unique_x),numel(unique_y)));
for ii = 1:numel(x)
table_single(jx(ii),jy(ii)) = table_single(jx(ii),jy(ii)) + 1;
end
% "Manually" create sparse cross table
table_sparse = sparse(numel(unique_x),numel(unique_y));
for ii = 1:numel(x)
table_sparse(jx(ii),jy(ii)) = table_sparse(jx(ii),jy(ii)) + 1;
end
  1 件のコメント
pietro
pietro 2017 年 11 月 13 日
Thanks a lot! Your suggestion of using the for in combination with unique is superbly clever.
Thanks a lot

サインインしてコメントする。

その他の回答 (0 件)

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by