Nested parfor and for-Loops and Other parfor Requirements

Nested `parfor` and `for`-Loops and Other `parfor` Requirements

Nested `parfor`-Loops

You cannot use a parfor-loop inside another parfor-loop. As an example, the following nesting of parfor-loops is not allowed:

parfor i = 1:10
    parfor j = 1:5
        ...
    end
end

Tip

You cannot nest parfor directly within another parfor-loop. A parfor-loop can call a function that contains a parfor-loop, but you do not get any additional parallelism.

Code Analyzer in the MATLAB^® Editor flags the use of parfor inside another parfor-loop:

You cannot nest parfor-loops because parallelization can be performed at only one level. Therefore, choose which loop to run in parallel, and convert the other loop to a for-loop.

Consider the following performance issues when dealing with nested loops:

Parallel processing incurs overhead. Generally, you should run the outer loop in parallel, because overhead only occurs once. If you run the inner loop in parallel, then each of the multiple parfor executions incurs an overhead. See Convert Nested for-Loops to parfor-Loops for an example how to measure parallel overhead.
Make sure that the number of iterations exceeds the number of workers. Otherwise, you do not use all available workers.
Try to balance the parfor-loop iteration times. parfor tries to compensate for some load imbalance.

Tip

Always run the outermost loop in parallel, because you reduce parallel overhead.

You can also use a function that uses parfor and embed it in a parfor-loop. Parallelization occurs only at the outer level. In the following example, call a function MyFun.m inside the outer parfor-loop. The inner parfor-loop embedded in MyFun.m runs sequentially, not in parallel.

parfor i = 1:10
    MyFun(i)
end

function MyFun(i)
    parfor j = 1:5
        ...
    end
end

Tip

Nested parfor-loops generally give you no computational benefit.

Convert Nested `for`-Loops to `parfor`-Loops

A typical use of nested loops is to step through an array using a one-loop variable to index one dimension, and a nested-loop variable to index another dimension. The basic form is:

X = zeros(n,m);
for a = 1:n
    for b = 1:m
        X(a,b) = fun(a,b)
    end
end

The following code shows a simple example. Use tic and toc to measure the computing time needed.

A = 100;
tic
for i = 1:100
    for j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
toc

Elapsed time is 49.376732 seconds.

You can parallelize either of the nested loops, but you cannot run both in parallel. The reason is that the workers in a parallel pool cannot start or access further parallel pools.

If the loop counted by i is converted to a parfor-loop, then each worker in the pool executes the nested loops using the j loop counter. The j loops themselves cannot run as a parfor on each worker.

Because parallel processing incurs overhead, you must choose carefully whether you want to convert either the inner or the outer for-loop to a parfor-loop. The following example shows how to measure the parallel overhead.

First convert only the outer for-loop to a parfor-loop. Use tic and toc to measure the computing time needed. Use ticBytes and tocBytes to measure how much data is transferred to and from the workers in the parallel pool.

Run the new code, and run it again. The first run is slower than subsequent runs, because the parallel pool takes some time to start and make the code available to the workers.

A = 100;
tic
ticBytes(gcp);
parfor i = 1:100
    for j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
tocBytes(gcp)
toc

             BytesSentToWorkers    BytesReceivedFromWorkers
             __________________    ________________________

    1             32984                 24512              
    2             33784                 25312              
    3             33784                 25312              
    4             34584                 26112              
    Total    1.3514e+05            1.0125e+05              

Elapsed time is 14.130674 seconds.

Next convert only the inner loop to a parfor-loop. Measure the time needed and data transferred as in the previous case.

A = 100;
tic
ticBytes(gcp);
for i = 1:100
    parfor j = 1:100
        a(i,j) = max(abs(eig(rand(A))));
    end
end
tocBytes(gcp)
toc

             BytesSentToWorkers    BytesReceivedFromWorkers
             __________________    ________________________

    1        1.3496e+06             5.487e+05              
    2        1.3496e+06            5.4858e+05              
    3        1.3677e+06            5.6034e+05              
    4        1.3476e+06            5.4717e+05              
    Total    5.4144e+06            2.2048e+06              

Elapsed time is 48.631737 seconds.

If you convert the inner loop to a parfor-loop, both the time and amount of data transferred are much greater than in the parallel outer loop. In this case, the elapsed time is almost the same as in the nested for-loop example. The speedup is smaller than running the outer loop in parallel, because you have more data transfer and thus more parallel overhead. Therefore if you execute the inner loop in parallel, you get no computational benefit compared to running the serial for-loop.

If you want to reduce parallel overhead and speed up your computation, run the outer loop in parallel.

If you convert the inner loop instead, then each iteration of the outer loop initiates a separate parfor-loop. That is, the inner loop conversion creates 100 parfor-loops. Each of the multiple parfor executions incurs overhead. If you want to reduce parallel overhead, you should run the outer loop in parallel instead, because overhead only occurs once.

Tip

If you want to speed up your code, always run the outer loop in parallel, because you reduce parallel overhead.

Nested `for`-Loops: Requirements and Limitations

If you want to convert a nested for-loop to a parfor-loop, you must ensure that your loop variables are properly classified, see Troubleshoot Variables in parfor-Loops. If your code does not adhere to the guidelines and restrictions labeled as Required, you get an error. MATLAB catches some of these errors at the time it reads the code. These errors are labeled as Required (static).

Required (static): You must define the range of a for-loop nested in a parfor-loop by constant numbers or broadcast variables.

In the following example, the code on the left does not work because you define the upper limit of the for-loop by a function call. The code on the right provides a workaround by first defining a broadcast or constant variable outside the parfor-loop:

Invalid	Valid
A = zeros(100, 200); parfor i = 1:size(A, 1) for j = 1:size(A, 2) A(i, j) = i + j; end end	A = zeros(100, 200); n = size(A, 2); parfor i = 1:size(A,1) for j = 1:n A(i, j) = i + j; end end

Required (static): The index variable for the nested for-loop must never be explicitly assigned other than by its for statement.

Following this restriction is required. If the nested for-loop variable is changed anywhere in a parfor-loop other than by its for statement, the region indexed by the for-loop variable is not guaranteed to be available at each worker.

The code on the left is not valid because it tries to modify the value of the nested for-loop variable j in the body of the loop. The code on the right provides a workaround by assigning the nested for-loop variable to a temporary variable t, and then updating t.

Invalid	Valid
A = zeros(10); parfor i = 1:10 for j = 1:10 A(i, j) = 1; j = j+1; end end	A = zeros(10); parfor i = 1:10 for j = 1:10 A(i, j) = 1; t = j; t = t + 1; end end

Required (static): You cannot index or subscript a nested for-loop variable.

Following this restriction is required. If a nested for-loop variable is indexed, iterations are not guaranteed to be independent.

The example on the left is invalid because it attempts to index the nested for-loop variable j. The example on the right removes this indexing.

Invalid	Valid
A = zeros(10); parfor i = 1:10 for j = 1:10 j(1); end end	A = zeros(10); parfor i = 1:10 for j = 1:10 j; end end

Required (static): When using the nested for-loop variable for indexing a sliced array, you must use the variable in plain form, not as part of an expression.

For example, the following code on the left does not work, but the code on the right does:

Invalid	Valid
A = zeros(4, 11); parfor i = 1:4 for j = 1:10 A(i, j + 1) = i + j; end end	A = zeros(4, 11); parfor i = 1:4 for j = 2:11 A(i, j) = i + j - 1; end end

Required (static): If you use a nested for-loop to index into a sliced array, you cannot use that array elsewhere in the parfor-loop.

In the following example, the code on the left does not work because A is sliced and indexed inside the nested for-loop. The code on the right works because v is assigned to A outside of the nested loop:

Invalid	Valid
A = zeros(4, 10); parfor i = 1:4 for j = 1:10 A(i, j) = i + j; end disp(A(i, j)) end	A = zeros(4, 10); parfor i = 1:4 v = zeros(1, 10); for j = 1:10 v(j) = i + j; end disp(v(j)) A(i, :) = v; end

`parfor`-Loop Limitations

Nested Functions

The body of a parfor-loop cannot reference a nested function. However, it can call a nested function by a function handle. Try the following example. Note that A(idx) = nfcn(idx) in the parfor-loop does not work. You must use feval to invoke the fcn handle in the parfor-loop body.

function A = pfeg
    function out = nfcn(in)
        out = 1 + in;
    end
    
    fcn = @nfcn;
    
    parfor idx = 1:10
        A(idx) = feval(fcn, idx);
    end
end

>> pfeg
Starting parallel pool (parpool) using the 'Processes' profile ... connected to 4 workers.

ans =

     2     3     4     5     6     7     8     9    10    11

Tip

If you use function handles that refer to nested functions inside a parfor-loop, then the values of externally scoped variables are not synchronized among the workers.

Nested `parfor`-Loops

The body of a parfor-loop cannot contain a parfor-loop. For more information, see Nested parfor-Loops.

Nested `spmd` Statements

The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop. The reason is that workers cannot start or access further parallel pools.

`break` and `return` Statements

The body of a parfor-loop cannot contain break or return statements. Consider parfeval or parfevalOnAll instead, because you can use cancel on them.

Global and Persistent Variables

The body of a parfor-loop cannot contain global or persistent variable declarations. The reason is that these variables are not synchronized between workers. You can use global or persistent variables within functions, but their value is visible only to the worker that creates them. Instead of global variables, it is a better practice to use function arguments to share values.

To learn more about variable requirements, see Troubleshoot Variables in parfor-Loops.

Scripts

If a script introduces a variable, you cannot call this script from within a parfor-loop or spmd statement. The reason is that this script would cause a transparency violation. For more details, see Ensure Transparency in parfor-Loops or spmd Statements.

Anonymous Functions

You can define an anonymous function inside the body of a parfor-loop. However, sliced output variables inside anonymous functions are not supported. You can work around this by using a temporary variable for the sliced variable, as shown in the following example.

x = 1:10;
parfor i=1:10
    temp = x(i);
    anonymousFunction = @() 2*temp;
    x(i) = anonymousFunction() + i;
end
disp(x);

For more information on sliced variables, see Sliced Variables.

`inputname` Functions

Using inputname to return the workspace variable name corresponding to an argument number is not supported inside parfor-loops. The reason is that parfor workers do not have access to the workspace of the MATLAB desktop. To work around this, call inputname before parfor, as shown in the following example.

a = 'a';
myFunction(a)

function X = myFunction(a)
    name = inputname(1);
    
    parfor i=1:2
        X(i).(name) = i;
    end
end

`load` Functions

The syntaxes of load that do not assign to an output structure are not supported inside parfor-loops. Inside parfor, always assign the output of load to a structure.

`nargin` or `nargout` Functions

The following uses are not supported inside parfor-loops:

Using nargin or nargout without a function argument
Using narginchk or nargoutchk to validate the number of input or output arguments in a call to the function that is currently executing

The reason is that workers do not have access to the workspace of the MATLAB desktop. To work around this, call these functions before parfor, as shown in the following example.

myFunction('a','b')

function X = myFunction(a,b)
    nin = nargin;
    parfor i=1:2
        X(i) = i*nin;
    end
end

P-Code Scripts

You can call P-code script files from within a parfor-loop, but P-code scripts cannot contain a parfor-loop. To work around this, use a P-code function instead of a P-code script.