How to calculate the contingency table of a large dataset without memory issues?
5 ビュー (過去 30 日間)
古いコメントを表示
Hi all,
I am calculating a contingency table (using crosstab function) from a large array and the result will be a matrix of a size of [175473x175473]. I am running the script with a laptop with 32GB and unfortunately the RAM is not enough. I have noticed that Matlab by default, allocates memory for double variables which requice 4 times more memory than double. The contingency table should be an integer variable and not double. Is there any way to force crosstab to compute the result on integer data so that I can save memory for running the calculation?
0 件のコメント
採用された回答
the cyclist
2017 年 11 月 13 日
I don't think you can do that with the built-in crosstab function.
Depending on the number of data elements (specifically how sparse the resulting cross table is), you might just be able to "manually" fill in the crosstable, as either a single-precision array, or as a sparse matrix.
All three techniques illustrated below:
x = single([1 1 2 3 1]);
y = single([1 2 5 3 1]);
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
% Memory-intensive double-precision cross table
table = crosstab(x,y)
% "Manually" create single-precision cross table
table_single = single(zeros(numel(unique_x),numel(unique_y)));
for ii = 1:numel(x)
table_single(jx(ii),jy(ii)) = table_single(jx(ii),jy(ii)) + 1;
end
% "Manually" create sparse cross table
table_sparse = sparse(numel(unique_x),numel(unique_y));
for ii = 1:numel(x)
table_sparse(jx(ii),jy(ii)) = table_sparse(jx(ii),jy(ii)) + 1;
end
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Hypothesis Tests についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!