フィルターのクリア

No parallel computing when using parfor

8 ビュー (過去 30 日間)
Joe
Joe 2013 年 11 月 10 日
コメント済み: Joe 2013 年 11 月 12 日
Hello,
I have some code that uses parfor to use parallel computing. The code does not give any error and runs well, provides the right output, etc. However it is important to do parallel computing as the calculation takes quite a bit of time (currently 2 minutes each iteration), and needs to be run thousands of times.
In terms of context:
  • temperature_v9 is the function that does the true analysis, takes as inputs temperature measurements and targets and develops some predictions. The inputs have tens of millions of rows
  • temperature_wrapper_v9 gets the same inputs but split in tranches (the split is done with another function) and then runs temperature_v9 for each tranch. the idea is to do it in parallel to speed up time. For example to split the data in 10 tranches, run 10 instances of temperature_v9 in parallel, and then concatenate the results of the 10 instances at the end
  • Both approaches give the same results
  • The second approach does not paralellize de facto, and the processing time is slightly higher than with the first approach
  • In both approaches only one core is at 100% and 11 cores at at very low load
  • In both approaches there is plenty of RAM memory available
  • I have used the profiler and the time is spent in many different tasks, there is nothing above 5%-10%. The 2 biggest activities are 2 calls at the std function, which I can not avoid. So I want to focus on solving the problem by parallel computing if possible.
Can anyone shed some light on how to paralelize this calculation? (code pasted below), I must be missing something here.
Thanks in advance,
Joe
function [statisticTemp, totalTemp, tempFunction, AccTempFunction, PF, increasePercent, avgInstantTemp, instantsNumber, decreaseFunction, increaseFunction, PercentTempFunction] = temperature_wrapper_v9(lowTempSizeMatrixTranches, lowTempMatrixTranches, highTempSizeMatrixTranches, highTempMatrixTranches, trueMidPointsTranches, trueLowsTranches, trueHighsTranches, trueSpreadsTranches, window, refreshRate, expectedIncrease, depthOfMeasure, numDevMaxEntry, numDevMinEntry, numDevExit, changeMin, changeMax, numDevMinSpread, maxSpread, alpha, SLT, print, graph)
sizeData=size(lowTempMatrixTranches);
tranches=sizeData(1,1);
statisticTempTranches= zeros(tranches, 1);
totalTempTranches= zeros(tranches, 1);
tempFunctionTranches= zeros(tranches, sizeData(1,2));
AccTempFunctionTranches= zeros(tranches, sizeData(1,2));
PFTranches= zeros(tranches, 1);
increasePercentTranches= zeros(tranches, 1);
avgInstantTempTranches= zeros(tranches, 1);
instantsNumberTranches= zeros(tranches, 1);
decreaseFunctionTranches= zeros(tranches, sizeData(1,2));
increaseFunctionTranches= zeros(tranches, sizeData(1,2));
PercentTempFunctionTranches= zeros(tranches, sizeData(1,2));
statisticTempAux= 0;
totalTempAux= 0;
tempFunctionAux= zeros(1, sizeData(1,2));
AccTempFunctionAux= zeros(1, sizeData(1,2));
PFAux= 0;
increasePercentAux= 0;
avgInstantTempAux= 0;
instantsNumberAux= 0;
decreaseFunctionAux= zeros(1, sizeData(1,2));
increaseFunctionAux= zeros(1, sizeData(1,2));
PercentTempFunctionAux= zeros(1, sizeData(1,2));
parfor i=1:tranches
[statisticTempAux, totalTempAux, tempFunctionAux, AccTempFunctionAux, PFAux, increasePercentAux, avgInstantTempAux, instantsNumberAux, decreaseFunctionAux, increaseFunctionAux, PercentTempFunctionAux] = temperature_v9(squeeze(lowTempSizeMatrixTranches(i,:,:)), squeeze(lowTempMatrixTranches(i,:,:)), squeeze(highTempSizeMatrixTranches(i,:,:)), squeeze(highTempMatrixTranches(i,:,:)), squeeze(trueMidPointsTranches(:,i)), squeeze(trueLowsTranches(:,i)), squeeze(trueHighsTranches(:,i)), squeeze(trueSpreadsTranches(:,i)), window, refreshRate, expectedIncrease, depthOfMeasure, numDevMaxEntry, numDevMinEntry, numDevExit, changeMin, changeMax, numDevMinSpread, maxSpread, alpha, SLT, 0, 0);
statisticTempTranches (i,:)=statisticTempAux;
totalTempTranches (i,:)=totalTempAux;
tempFunctionTranches (i,:)=tempFunctionAux;
AccTempFunctionTranches (i,:)=AccTempFunctionAux;
PFTranches (i,:)=PFAux;
increasePercentTranches (i,:)=increasePercentAux;
avgInstantTempTranches (i,:)=avgInstantTempAux;
instantsNumberTranches (i,:)=instantsNumberAux;
decreaseFunctionTranches (i,:)=decreaseFunctionAux;
increaseFunctionTranches (i,:)=increaseFunctionAux;
PercentTempFunctionTranches (i,:)=PercentTempFunctionAux;
end
%Postprocessing: I reassemble all the outputs of the different tranches in a single one
  2 件のコメント
Walter Roberson
Walter Roberson 2013 年 11 月 10 日
Could you re-arrange the order of the dimensions for lowTempSizeMatrixTranches ? Perhaps at an outer level? And also for your other variables?
permute(lowTempSizeMatrixTranches, [2 3 1])
that would set things up so you index by the third dimension, making each slice into contiguous memory and removing the need for the squeeze().
When practical, index by the last dimension instead of the first.
Joe
Joe 2013 年 11 月 12 日
Hi Walter, I was not aware of this. Yes, rearranging should not be a problem, let me try it and see what happens.
Thanks a lot!

サインインしてコメントする。

回答 (1 件)

Joe
Joe 2013 年 11 月 12 日
Hi - it was easy: I needed to set up the matlabpools. I just did it and the parfor works. However, I am surprised that I still don't manage to leverage the full power of the processor: when launching the calculation without parfors, the CPU gets a 14% load aprox. When sending with the parfor, it gets to 60%. I have 6 real cores and 12 with hyperthreading, and I have tried to launch the calculation with 6 workers and 8 workers, in both cases I get to 60% workload.
Any suggestion on how to fully load the processor?
Thanks
  2 件のコメント
Marc
Marc 2013 年 11 月 12 日
Windows machine??
Joe
Joe 2013 年 11 月 12 日
Linux - Ubuntu 13.04

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeParallel for-Loops (parfor) についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by