Why does TreeBagger in Matlab 2014a/b only use few workers from a parallel pool?

3 ビュー (過去 30 日間)
Dylan Muir
Dylan Muir 2014 年 12 月 5 日
コメント済み: Ilya 2014 年 12 月 9 日
I'm using the TreeBagger class provided by Matlab (R2014a&b), in conjunction with the distributed computing toolbox. I have a local cluster running, with 30 workers, on a Windows 7 machine with 40 cores.
I call the TreeBagger constructor to generate a regression forest (an ensemble containing 32 trees), passing an options structure with 'UseParallel' set to 'always'.
However, TreeBagger seems to only make use of 8 or so workers, out of the 30 available (judging by CPU usage per process, observed using the Task Manager). When I try to test the pool with a simple parfor loop:
parfor i=1:30
a = fft(rand(20000));
end
Then all 30 workers are engaged.
My question is: (How) can I force TreeBagger to use all available resources?

回答 (1 件)

Ilya
Ilya 2014 年 12 月 5 日
TreeBagger does not limit the number of used cores in any way. Everything is set by your parpool configuration.
The answer may be in the data you pass to TreeBagger. Make sure all trees in the returned TreeBagger object are deep (which means training did take place). If it takes little time to grow these 32 trees, increase the number of trees and see if the load changes.
  4 件のコメント
Dylan Muir
Dylan Muir 2014 年 12 月 9 日
As far as I can tell, training has taken place: each tree contains a long list of conditions.
There seems to be a performance issue with running TreeBagger on a parallel pool. TreeBagger internally uses "internal.stats.parallel.smartForSliceout" to automatically run a nested function "TreeBagger>localGrowTrees>loopbody". If I modify the TreeBagger code to call parfor directly, while incorporating the lines from "internal.stats.parallel.smartForSliceout" and from "TreeBagger>localGrowTrees>loopbody", then the speed of the training step doubles with the same parallel configuration.
Ilya
Ilya 2014 年 12 月 9 日
Any help I could provide from this point on would depend on various technical details such as the size of your data respective to the memory on the head node, size of trees and exact parpool configuration, to name a few. If you are content with this solution, use it. Otherwise please get in touch with the MathWorks tech support and work with them to make reproducible steps.
Keep in mind that a speed-up or slow-down you observe for one dataset does not necessarily hold for a different dataset. The data size and the average size of grown trees would be factors. It's possible your data are fairly small and so dispatching to smartForSliceout gives a noticeable overhead. But I don't want to hypothesize too much.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeParallel and Cloud についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by