Parfor solving optimization problems (Cplex) slower than for

1 回表示 (過去 30 日間)
Katarzyna Furmanska
Katarzyna Furmanska 2020 年 3 月 17 日
編集済み: Egor Buiko 2021 年 4 月 14 日
Hello,
I am trying to solve a bunch of optimization problems in parallel using Matlab Parallel Toolbox 2018b on my client (Win10) + Matlab Distributed Server 2018b on my 3 node-cluster (Win7) with 52 workers. These are rather small problems, but there's hundreds of them so, theoretically, parfor should be helpful in this case.
I am reading these problems from .lp files into cell array and then I am solving them within parfor loop, as below:
% subp_array is 1xn cell array % with Cplex problems
nThreads = 1; % I don't see any time benefit of giving it more than 1 thread
parpool('MJSProfile1',nWorkers);
totalTime = tic;
parfor subp_index = 1:length(subp_array)
iterTime = tic;
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
prob.Param.parallel.Cur = -1; % set parallel option to opportunistic
prob.Param.threads.Cur = nThreads; % set number of threads per problem
prob.Param.mip.tolerances.mipgap.Cur = 0.01;
prob.solve();
% get time of particular iteration
elapsedTime{subp_index} = toc(iterTime);
end
% get time of entire loop
elapsedTotalTime = toc(totalTime);
The problem is that this parfor loop with 10 problems on 16 workers runs for 32 sec comparing go 1.5 sec (sic!) of regular for loop. When examinating time results, it comes out that elapsed time of particular iterations are very short, but overall loop time is still large...
These are values of elapsedTime array:
{[0.0275]} {[0.0317]} {[0.0274]} {[0.0314]} {[0.0695]} {[0.4816]} {[0.0808]} {[0.0343]} {[0.0399]} {[0.0845]}
which is in total less than 1 second!
Is there anything in the syntax that may cause time delays? I am using sliced variables, assigning prob firstly not to call the variable multiple times, no idea what else can be done... Apparently, if I run parfor with M = 0 (sequential), it gets the result immediately (in particular the difference is visible for few hundreds of problems). What may cause my parallel computing so slow?
Thanks in advance
Kasia

回答 (2 件)

Edric Ellis
Edric Ellis 2020 年 3 月 17 日
Does the performance improve much / not much / not at all if you run the parfor loop a second time without closing the pool?
If the performance does improve a lot, then it's likely that the slow-down was caused by the parfor infrastructure having to work out that the code wasn't available, and attaching it to the pool. A message is printed when this occurs, or you can check the result of calling listAutoAttachedFiles:
listAutoAttachedFiles(gcp())
You can either live with that first-time slow-down, or attach the files up-front using addAttachedFiles
If the performance remains the same, perhaps the problem is the amount of data being transferred. Use ticBytes and tocBytes to investigate this. You could also experiment with stubbing-out most of the loop body. I.e. if you run a loop like this:
parfor subp_index = 1:length(subp_array)
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
end
how does that perform? That loop incurs the same amount of data transfer.
  11 件のコメント
Edric Ellis
Edric Ellis 2020 年 3 月 25 日
There's definitely a chunk of time in distcompdeserialize - that's an internal PCT function that is used when transferring data from the client process to a MATLAB worker process.
However, looking at the absolute times - there's still a big chunk of time going somewhere. The total time taken by remoteParallelFunction (which is the worker-side wrapper for the body of a parfor loop) is only ~0.6 seconds, but (if I've understood correctly) the overall loop takes much longer. I don't really have any good way to explain that.
I would go back to trying to run a version of the parfor loop with the data transfer in place, but the actual computations stubbed out. My suspicion is that that will still take basically the same amount of time. This points to data transfer being the bottleneck - despite the actual number of bytes being transferred being relatively not that large...
Katarzyna Furmanska
Katarzyna Furmanska 2020 年 3 月 25 日
Yup, the results of parfor loop with only data transfer yields similar time, also dedicated mostly to distcompdeserialize.
The entire parfor loop takes around 100s, and either I have no idea where this time comes from... When being executed, each Cplex problem prints a log and you can see how slowly new logs are appearing. It just looks like workers would wait for the next problem to catch. And when running large number of problems on all available cores, I noticed that task manager does not show any heavy calculations - CPU usage is just a few percent, the small windows indicating cores do not show any jumping green lines. Could it be on the machine site that it does not allow for solving the problems in parallel, but rather puts in in sequential queue?
Thanks a milion again for giving me so many useful tips!

サインインしてコメントする。


Egor Buiko
Egor Buiko 2021 年 4 月 14 日
編集済み: Egor Buiko 2021 年 4 月 14 日
No need go to far for optimization problem solving. Even for :
X=zeros(1,10^6);
Parfor i=1:10^6, X(i) = i; End
Which by documentation forged for parallel pool, works two times slower then regular for.

カテゴリ

Help Center および File ExchangeParallel Computing Fundamentals についてさらに検索

製品


リリース

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by