How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

Question

Julius Å 2021 年 3 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/787969-how-can-i-make-my-custom-convolutional-layer-using-dlconv-more-memory-efficient-in-order-to-improv

コメント済み: Julius Å 2021 年 4 月 5 日

Hi.

I have created a custom layer that takes a batch of 3*10 feature maps as input, giving the input size 256x256x64x30 ([Spatial, Spatial, Channel, Batch]). The layer then reshapes the input dlarray to the size 256x256x64x3x10 ([Spatial, Spatial, Channel, Time, Batch]) using the line:

Z = reshape(X{:}, [sz(1), sz(2), sz(3), numTimedims, sz(4)/numTimedims]);

This variable is called Z. Then, by separating the three channels of Z in the 4th dimension, feature addition using the results from two 2D channel-wise separable convolutional operations are performed using the following lines (doing this in a single line gave memory errors), yielding the sum Z2 of size 256x256x64x10:

Z2 = dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(Z(:, :, :, 2, :));
Z2 = Z2 + dlconv(double(squeeze(Z(:, :, :, 2, :)-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');

where KiMinus and KPlus are 3x3x1x1x64 filters (following the structure [filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups], making the convolutions channel-wise separable) and layer.bias is a 2x1 array.

For the forward pass, this convolutional layer seems to work fine, not showing any significant slowness. However, the backward function is very slow. The profiler shows that 68% of the runtime of the dlfeval(@modelGradients, dlnet, dlim, dlmask)-function in my custom training loop is given by dlarray.dlgradient>RecordingArray.backwardPass>ParenReferenceOp>ParenReferenceOp.backward>internal_parenReferenceBackward, where the function dX = accumarray(linSubscripts,dZ(:),shapeX); (line 32) seems to demand the most time.

Is there any obvious way for me to improve my implementation of this convolutional layer in order to get a more memory efficient backward pass? Is there a more memory efficient way to perform the reshaping?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Gautam Pendse 2021 年 4 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/787969-how-can-i-make-my-custom-convolutional-layer-using-dlconv-more-memory-efficient-in-order-to-improv#answer_665274

MATLAB Online で開く

Hi Julius,

One approach that you can try is to rewrite the code like this:

ZChannel2 = Z(:, :, :, 2, :);
Z2 = dlconv(double(squeeze(ZChannel2-Z(:, :, :, 1, :))), KiMinus, layer.bias(1), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');
Z2 = Z2 + squeeze(ZChannel2);
Z2 = Z2 + dlconv(double(squeeze(ZChannel2-Z(:, :, :, 3, :))), KiPlus, layer.bias(2), ...
'Stride', [1 1], 'Padding', 'same', 'DataFormat', 'SSCB');

This introduces an intermediate variable ZChannel2 to avoid repeatedly indexing into Z.

Does that help?

Gautam

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Julius Å 2021 年 4 月 5 日

Hello Gautam. Thank you for your answer.

This seems to slightly improve the speed! Thanks.

However, the reshape()-function also seems fairly computationally heavy, at least in the context of the backward pass in this custom layer. As a beginner with coding things efficiently for the GPU, I don't really know how to handle this. Do you have any suggestions on how this function could be avoided or re-implemented in this context?

サインインしてコメントする。

How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How can I make my custom convolutional layer (using dlconv) more memory efficient in order to improve the speed of the backward pass?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示