Can this GPU code snippet be redone without nested loops?
古いコメントを表示
Hello, I have two matrices: matrix1 is a logical array of 1s and 0s (1000 x 800) matrix2 is a different logical array (2000 x 800)
I am essentially taking the first row of matrix 1 and calculating the row summation of common elements / total number of elements. Both of these arrays are gpuArrays. What I finding out:
for j=gpuArray.colon(1,x)
for k=gpuArray.colon(1,y)
output(j,k)=sum(matrix1(j,:) & matrix2(k,:)) / sum(matrix1(j,:) | matrix2(k,:))
end
end
Runs very fast for small values of x and y, but once x,y is large is takes exponentially longer to run on the GPU
I am investigating the use of repmat here but I am not sure how to implement. Any ideas here? Or if there is another option for to get rid of the nested for loops?
Thanks
採用された回答
その他の回答 (1 件)
Sean de Wolski
2013 年 11 月 11 日
編集済み: Sean de Wolski
2013 年 11 月 11 日
Is output preallocated?
Before the loops:
output = gpuArray.zeros(x,y);
This should speed it up dramatically.
3 件のコメント
Amr Ragab
2013 年 11 月 11 日
Sean de Wolski
2013 年 11 月 11 日
編集済み: Sean de Wolski
2013 年 11 月 11 日
Do matrix1 and matrix2 already live on the gpu, i.e. are they gpuArrays?
Amr Ragab
2013 年 11 月 11 日
カテゴリ
ヘルプ センター および File Exchange で Parallel Computing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!