How to calculate the contingency table of a large dataset without memory issues?

6 ビュー (過去 30 日間)
Hi all,
I am calculating a contingency table (using crosstab function) from a large array and the result will be a matrix of a size of [175473x175473]. I am running the script with a laptop with 32GB and unfortunately the RAM is not enough. I have noticed that Matlab by default, allocates memory for double variables which requice 4 times more memory than double. The contingency table should be an integer variable and not double. Is there any way to force crosstab to compute the result on integer data so that I can save memory for running the calculation?


the cyclist
the cyclist 2017 年 11 月 13 日
I don't think you can do that with the built-in crosstab function.
Depending on the number of data elements (specifically how sparse the resulting cross table is), you might just be able to "manually" fill in the crosstable, as either a single-precision array, or as a sparse matrix.
All three techniques illustrated below:
x = single([1 1 2 3 1]);
y = single([1 2 5 3 1]);
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
% Memory-intensive double-precision cross table
table = crosstab(x,y)
% "Manually" create single-precision cross table
table_single = single(zeros(numel(unique_x),numel(unique_y)));
for ii = 1:numel(x)
table_single(jx(ii),jy(ii)) = table_single(jx(ii),jy(ii)) + 1;
% "Manually" create sparse cross table
table_sparse = sparse(numel(unique_x),numel(unique_y));
for ii = 1:numel(x)
table_sparse(jx(ii),jy(ii)) = table_sparse(jx(ii),jy(ii)) + 1;
  1 件のコメント
pietro 2017 年 11 月 13 日
Thanks a lot! Your suggestion of using the for in combination with unique is superbly clever.
Thanks a lot


その他の回答 (0 件)


Find more on Numeric Types in Help Center and File Exchange


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by