Big Data tall array gathering a part of tall array does not work

1 回表示 (過去 30 日間)
TOSA2016
TOSA2016 2019 年 10 月 1 日
コメント済み: TOSA2016 2019 年 10 月 1 日
I have a huge set of data which I saved it as txt on my hard disk.
I wanted to do some calculations on the data which the tall array calculations does not support.
My solution was gathering a chunk of data each time, do the calculations and then print them to a txt file.
To get the chunk of data, I used a window of 10000 rows. For instatnce, I gather the data between rows 10000 to 20000 from the tall array, do the calculations and then save/print the data in another file. Here is an example of what I want to do
cost_temp = tall(something);
window_pan = 10000;
for i = 1:10
Temp = cost_temp((i-1) * window_pan+1 :(i) * window_pan,: );
[best_Cost, index_cost] = min(gather(Temp),[], 2);
end
The method works until row 70000. and after that I get this error
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 2: Completed in 6.1 sec
- Pass 2 of 2: Completed in 2.4 sec
Evaluation completed in 11 sec
Error using tall/gather>iAssertAdaptorMatches (line 126)
Internal problem while evaluating tall expression. The problem was:
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Error in tall/gather>iGather (line 73)
cellfun(@iAssertAdaptorMatches, gatheredTalls, varargin(isArgTall));
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in ...
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in second_attempt (line 241)
[best_Cost, index_cost] = min(gather(Temp),[], 2);
Caused by:
Error using tall/gather>iAssertAdaptorMatches (line 126)
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Interestinlgy, when I gather the whole data, there is no problem. However, I will need to apply this method to a larger data and I cannot gather that data.
Thanks!
  2 件のコメント
Guillaume
Guillaume 2019 年 10 月 1 日
編集済み: Guillaume 2019 年 10 月 1 日
What is the height of the array? Does the error occur on the last window which may not have a height of 10000? At present your code will only work if the number of rows in the whole file is exactly a multiple of window_pan.
Maybe your workflow is more suited for mapreduce?
TOSA2016
TOSA2016 2019 年 10 月 1 日
Hi Guillaume,
The height of the array is 273000 (rows). I intentionally choise the max number of iterations in the for loop as 10 so this would not affect the question. Thanks for pointing it out.
Let me check the mapreduce to see if it helps.
Thanks for your response.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeTall Arrays についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by