GPU Coder cannot parallelize loop
古いコメントを表示
I have a for-loop that I am trying to parallelize with GPU Coder, which looks like this
% n_out is of type uint64
% input_array is of type single array
function out = my_func(n_out, input_array) %# codegen
coder.gpu.kernelfun;
out = zeros(1, n_out, 'single');
for i = 1:n_out % loop I want to parallelize
temp = 0.0;
%%
% code that changes temp depending on input_array(i). There are no reads from or writes to
% variable 'out' here
%%
out(i) = temp; % GPU Coder says this is a loop carried dependency?
end
end
When I run GPU Coder, it does not create a kernel and the build report states:
"Unable to parallelize loop because of loop carried dependencies. Check the use of variable 'out' in function 'my_func'".
1) Why is the assignment
out(i) = temp;
a "loop carried dependency"?
2) How do I remove such a "loop carried dependency"?
EDIT: removed syntax error in for loop index declaration
2 件のコメント
Walter Roberson
2025 年 2 月 22 日
I would be curious about what would happen if you wrote into a temporary array, and eventually copied the temporary array to the output variable?
I also wonder whether there are cases where out(i) is not assigned to, leading to a dependancy on the initialization of zeros()
Chao Luo
2025 年 2 月 24 日
Hi Jeffrey,
Thanks for posting the question. There is a syntax error at line 4,
for i:n_out
I guess you mean
for i = 1:n_out
After fixing it, I am able to see the loop get parallelized when n_out type is a double scalar.
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Get Started with GPU Coder についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!