improve and speed up parfor loop

Hello,
I have a code that has a 10000 iteration. The code involves a Monte Carlo simulation using Normal distributions. Number of simulation is 4,000,000. I tried to use parfor to speed up the code. However, when I compare its time to for loop is almost the same.
Is there a way to speed up the code so it works with parfor loop?
Thanks,
Here is my code
clc;
clear;
close all;
...
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));
...
A=readmatrix("x.csv");
runs = 4000000;
results=zeros(10000,1);
meanG=constant;
sdG=constant;
parfor j=1:x
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
record(j)=d;
end

1 件のコメント

darova
darova 2020 年 4 月 17 日
Maybe if you can show something more and exaplain what this code does someone can help you

サインインしてコメントする。

回答 (2 件)

Matt J
Matt J 2020 年 4 月 17 日
編集済み: Matt J 2020 年 4 月 17 日

0 投票

We can't see all the operations in your loop, but the ones we can see are pretty basic ones. Operations as common and basic as those would probably be coded already to utilize a multicore CPU very efficiently, so there probably isn't much room for improvement with parfor. To get a clearer idea how much improvement is possible, though, we would need to see screen shots of your CPU usage and the usage of all its cores (e.g., from the Task Manager, if you are on a Windows OS).
Some of the randomization steps you are doing though look like they could be hoisted out of the loop, e.g.,
B=normrnd(mean,sd,[81,runs]);
for n=1:0.5:40
F=equation
...
end

9 件のコメント

Salam Al-Rubaye
Salam Al-Rubaye 2020 年 4 月 18 日
Thanks, That helps alot. i am using high performance computing cluster. I am requesting 20 cpu and I can assing any memory for it. I thought the parfor will help when I do that by factor of 20. but it did not.
Matt J
Matt J 2020 年 4 月 18 日
編集済み: Matt J 2020 年 4 月 18 日
We need to see what percentage of CPU usage occurs when the ordinary for-loop is running, and what percentage is used on each of the 20 cluster CPUs when parfor is being used.
Salam Al-Rubaye
Salam Al-Rubaye 2020 年 4 月 18 日
According to Cluster, it was 99 % for both parfor and for loop. i am not sure what is the problem.
Matt J
Matt J 2020 年 4 月 18 日
Do you share the cluster? Does the 99% usage represent your jobs, or other peoples' as well?
Salam Al-Rubaye
Salam Al-Rubaye 2020 年 4 月 18 日
Yes, it is only represent the 20 CPU that I have requested.
Matt J
Matt J 2020 年 4 月 18 日
編集済み: Matt J 2020 年 4 月 18 日
But if other users are using the same CPUs then, you might be using only 10% of the 99%.
Salam Al-Rubaye
Salam Al-Rubaye 2020 年 4 月 18 日
I do not think so. I am submitting the Job as batch and I request the amount that I need. These tasks I request should not be used by someone else.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=10GB
#SBATCH --job-name=invertRandArray
#SBATCH --error=parallel.%J.err
#SBATCH --output=parallel.%J.out
Matt J
Matt J 2020 年 4 月 21 日
編集済み: Matt J 2020 年 4 月 21 日
I don't know bash very well, but the nodes=1 suggests to me that you are not running on multiple CPUs. Or, if you are, your for-loop has access to them as well, just as if you were running on a single 20-core CPU. If this is the case, then once again your for loop and your parfor loop have access to the exact same computing hardware, and there is no guarantee that you will get significant speed-up.
It might tell us more if you show us the output of,
>> gcp
Matt J
Matt J 2020 年 4 月 21 日
It might tell us more if you show us the output of,
Never mind this part. Raymond has pointed out that your workers are obviously non-remote.

サインインしてコメントする。

Raymond Norris
Raymond Norris 2020 年 4 月 21 日

0 投票

It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this. Try running MATLAB in single thread mode (-singleCompThread) and then benchmark your code again.
You might consider posting a bit more of you code to provide more guidance for your parfor.
  1. As it's written, A is not a sliced input, it's a broadcast variable, which could impact performance.
  2. Is record(j) supposed to be results(j)?
  3. For a particular iteration of j, what happens if C is never greater than 10 (and d does not get defined)?
  4. Again, without all of the code, it's hard to make the following recommendation, but I would consider refactoring your code as such:
parfor j = 1:x
results(j) = unit_of_work(A,runs,j);
end
function d = unit_of_work(A,runs,j)
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
end

4 件のコメント

Matt J
Matt J 2020 年 4 月 21 日
編集済み: Matt J 2020 年 4 月 21 日
It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this.
No, the OP has said that he is running on a cluster.
Raymond Norris
Raymond Norris 2020 年 4 月 21 日
I thought he's running MATLAB on the cluster. You can still run local workers on a remote cluster. Local workers is "local" to where you're running the MATLAB client, not necessarily your desktop.
Matt J
Matt J 2020 年 4 月 21 日
I see, but I think the OPs intention is to have non-local workers.
Raymond Norris
Raymond Norris 2020 年 4 月 21 日
Doesn't appear that way. Notice the reference to local here:
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeParallel for-Loops (parfor) についてさらに検索

質問済み:

2020 年 4 月 17 日

コメント済み:

2020 年 4 月 21 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by