Building tall table from tall arrays generates error
1 回表示 (過去 30 日間)
古いコメントを表示
clear
dataFile = 'data.csv';
ds = tabularTextDatastore(dataFile, FileExtensions='.csv');
ds.ReadVariableNames = true;
ds.Delimiter = ',';
ds.SelectedVariableNames = ["hash", "count"];
ds.SelectedFormats = {'%s', '%f'};
data = tall(ds);
[g, THash] = findgroups(data.hash);
TCount = splitapply(@(x) {x}, data.count, g);
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
T1 = table(hash, count);
%% This is the intended code but doesn't work
TT = table(THash,TCount);
write(fullfile(pwd,'data'),TT,FileType="parquet");
0 件のコメント
回答 (1 件)
Oguz Kaan Hancioglu
2023 年 3 月 15 日
Your code wasn't work because "gather(TCount)" returns cell array for each element. Therefore you are trying to write double array in to one single cell. You can find the length of each array into the cell. I hope this solves your problem.
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
cellsz = cellfun(@size,count,'uni',false);
newCount = cellfun(@(x) x(1),cellsz,'UniformOutput',false)
T1 = table(hash, newCount);
参考
カテゴリ
Help Center および File Exchange で Analysis of Big Data with Tall Arrays についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!