Optimization of loops with deep learning functions

1 回表示 (過去 30 日間)
Radians
Radians 2021 年 6 月 22 日
コメント済み: Radians 2021 年 6 月 28 日
Hello,
I am trying to optimize the following code, which performs the deep learning convolutions on input arrays:
parfor k=1:500 %number of images
for j =1:8 %number of channels of the relevant filters
dldf_O_dlconv3_temp_1(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,1,k),[8 8 1]),'SSC'),0,'Padding',padding);%1st filter
dldf_O_dlconv3_temp_2(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,2,k),[8 8 1]),'SSC'),0,'Padding',padding);%2nd filter
dldf_O_dlconv3_temp_3(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,3,k),[8 8 1]),'SSC'),0,'Padding',padding);%3rd filter
dldf_O_dlconv3_temp_4(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,4,k),[8 8 1]),'SSC'),0,'Padding',padding);%4th filter
dldf_O_dlconv3_temp_5(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,5,k),[8 8 1]),'SSC'),0,'Padding',padding);%5th filter
dldf_O_dlconv3_temp_6(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,6,k),[8 8 1]),'SSC'),0,'Padding',padding);%6th filter
dldf_O_dlconv3_temp_7(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,7,k),[8 8 1]),'SSC'),0,'Padding',padding);%7th filter
dldf_O_dlconv3_temp_8(:,:,j,k)=dlconv(dlarray(reshape(O_maxpool2(:,:,j,k),[8 8 1]), 'SSC'),dlarray(reshape(DLDO_O_dlconv3(:,:,8,k),[8 8 1]),'SSC'),0,'Padding',padding);%8th filter
end
end
As you can see, I am already using the parfor loop. I tried using GPU arrays with O_maxpool2 and DLDO_O_dlconv3, but instead of speeding anything up I think it became a bit slower if anything. My GPU device details are as follows:
CUDADevice with properties:
Name: 'GeForce RTX 2080 Ti'
Index: 1
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.1811e+10
AvailableMemory: 9.4411e+09
MultiprocessorCount: 68
ClockRateKHz: 1545000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Please let me know if there is anything else I could do to speed this code, and also why using gpuArrays have not sped it up.
Much thanks.

採用された回答

Joss Knight
Joss Knight 2021 年 6 月 24 日
編集済み: Joss Knight 2021 年 6 月 24 日
dlconv is designed to work in batch with multiple input channels, multiple filters, and multiple input observations in a single call. Read the documentation for dlconv.
  1 件のコメント
Radians
Radians 2021 年 6 月 28 日
Thanks alot...Even though it was not straight forward what I was trying to do(that's why I had to use forloops), but your comment made me think about how I could utilize the built-in functionality of dlconv to remove the loops, and after reshaping my data a bit I was able to do it. Thanks again.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by