Internal problem while evaluating tall expression (requested 40.5 GB array)
古いコメントを表示
Hi, I'm working with a large data set with approximately 500k rows and 6k columns. I'm using a datastore and tall array to handle the loading. The file itself is comma separated file while with most of its values coded with integers or strings. I have a dictionary for decoding these values. What I am trying to do is to replace codes with the actual meaning and save the decoded file to local.
Below I copied a structure of my program
classdef myTable < handle
% ...
methods
function this = myTable
end
% ...
end
methods
function loadCsv(this)
% ...
ds = datastore(this.csvSource);
ds.SelectedFormats = repmat({'%q'}, 1, length(ds.VariableNames));
this.csvTable = tall(ds);
end
% ...
function decoding(this)
% ...
end
function export(this)
% ...
write([this.outputDir '/' this.csvTableName '_decoded_*.csv'], this.csvTable, 'WriteFcn', @myWriter);
end
end
end
%% helper
function myWriter(info, data)
filename = info.SuggestedFilename;
writetable(data, filename, 'FileType', 'text', 'Delimiter', ',')
end
Error occured at this.export:
Error using digraph/distances
Internal problem while evaluating tall expression. The problem was:
Requested 73733x73733 (40.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
Question: I was thinking that the write function should be partitioning the data while exporting. Isn't that true? Why did MATLAB still try to create such a big array?
I am using a windows machine with 16GB RAM. MATLAB R2020a (tried on 19a first and just upgraded to 20a).
Thank you!
16 件のコメント
Peng Li
2020 年 3 月 23 日
Peng Li
2020 年 3 月 23 日
Peng Li
2020 年 3 月 24 日
Peng Li
2020 年 3 月 24 日
per isakson
2020 年 3 月 24 日
編集済み: per isakson
2020 年 3 月 24 日
You are asking for too much. I've have looked at your code and I have made a working example based on an example in the documentation. It seems to work. I fail to understand what's going wrong for you. Your code include a lot of irrelevant stuff.
Proposal
- present a MWE (Minimal working example) that produces this error
- upload one (or a few) row of your data set.
Sean de Wolski
2020 年 3 月 24 日
Yes, please provide a few sample rows.
Peng Li
2020 年 3 月 24 日
Peng Li
2020 年 3 月 25 日
Sean de Wolski
2020 年 3 月 25 日
Your understanding is correct.
But we need to know why digraph is trying to create a 73733x73733 array. It could be you have something shadowed so it's not calling a builtin, it could be expected and you need to partition differently, I don't know.
Peng Li
2020 年 3 月 25 日
Peng Li
2020 年 3 月 25 日
Walter Roberson
2020 年 3 月 25 日
A complete error message showing traceback would help.
Peng Li
2020 年 3 月 25 日
Sean de Wolski
2020 年 3 月 26 日
Tall uses a digraph to figure out the fewest number of lower level operations that need to be done so it can efficiently traverse the data set as few a times and without repetition as possible.
Peng Li
2020 年 3 月 26 日
Peng Li
2020 年 3 月 27 日
回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Matrix Indexing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!