Gather Tall array Error - No Workers are available for Queue excution

6 ビュー (過去 30 日間)
Abderrahim. B
Abderrahim. B 2023 年 10 月 21 日
コメント済み: Abderrahim. B 2023 年 10 月 26 日
Hi everyone,
I hope you all doing well!
I need help to fix an issue that I am facing with gather function, or maybe a workaround. Let me describe it further:
  1. I am working on a sound classification problem and decided to start with this MathWorks example Acoustic Scene Recognition Using Late Fusion
  2. I have not done any modifications yet, just trying to reproduce same results, however I am getting an error saying ( Error using parallel.internal.bigdata.ParallelPoolBroadcastMap>iRunOnAll Internal problem while evaluating tall expression. The problem was One or more futures resulted in an error.) More details about the error can be found in the attached file.
  3. I did an extensive debugging to understand why gathering tall array resluted in such an error, and as conclusion I suspect that it has something to do with out of memory ... but still not sure as I am quite new to tall funciton and functions that apply to it such as gather.
  4. I set speedupExample to true, and I was able to run the example without any issue, however the validation and test accuracy is bad and that 's because of the small amount of data that I used for the training
speedupExample = true;
if speedupExample
adsTrain = splitEachLabel(adsTrain,20);
adsTest = splitEachLabel(adsTest,10);
end
Your help is highly appreciated .
Many thanks,
Abderrahim

採用された回答

Walter Roberson
Walter Roberson 2023 年 10 月 21 日
In some discussion not long ago, some people including some Mathworkers were talking about what happens when an error is detected on the workers. In at least some cases, Parallel Computing Toolbox removes the erroring worker from being able to execute jobs, reducing the number of workers (tasks not yet done that were assigned to the worker get requeued.)
In cases where having multiple workers is leading to too much total memory being requested to be able to process properly, killing off workers reduces the total amount of memory in use simultaneously. If any single worker only needs an amount of memory that is acceptable but multiple workers together is too much, then this acts to prune down the number of workers until the total memory used between what is left fits into what can be handled.
But if every worker errors (for example if they all individually need more memory than the system can supply), then you might be left with no workers left to process the queue at all.
  5 件のコメント
Abderrahim. B
Abderrahim. B 2023 年 10 月 21 日
So was able to fix the worker issue, but then ran into out of memory issue.
How I workaround the first issue?
I started the parallel loop programmatically as below:
parpool("Processes", 14, "SpmdEnabled",false)
Any tips&Tricks how to workaround out of memory issue !
Thanks
Abderrahim. B
Abderrahim. B 2023 年 10 月 26 日
Just want to share that I have solved the memory issue as well. Below how I did it:
  • Increased MATLAB Workspace Memory
  • Increased Java Heap Memory

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeParallel Computing Fundamentals についてさらに検索

製品


リリース

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by