Why does layerNormalizationLayer in Deep Learning Toolbox include T dimension into the batch?
古いコメントを表示
Hello,
While implementing a ViT transformer in Matlab, I found at that the layerNormalizationLayer does include the T dimension in the statistics calculated for each sample in the batch. This is problematics when implementing a transformer, since tokens correspond to the T dimension and reference implementations calculate the statistics separately for each token.
Thx
採用された回答
その他の回答 (1 件)
Matt J
2023 年 3 月 13 日
0 投票
Perhaps you can fold your T dimension into the C dimension and use a groupNormalizationLayer instead, with the groups defined so that different T belong to different groups.
7 件のコメント
John Smith
2023 年 3 月 13 日
I don't see why that has to make it painful. Why couldn't you adopt a modular structure in your code like below? You could also make a reusable custom layer of your own, as we've discussed in earlier threads.
numTimes=2000;
GN=groupNormalizerTimeIndep(numTimes);
layers=[layer1,layer2,GN,layer3,layer4,GN,layer5,... ]
net = trainNetwork(sequences,layers);
function normalizerLayers=groupNormalizerTimeIndep(numTimes)
pre=functionLayer(@reshapeForw);
nlayer = groupNormalizationLayer(numTimes);
post=functionLayer(@(z)reshapeBack(z,numTimes));
normalizerLayers=[pre,nlayer,post];
end
function Xr=reshapeForw(X)
[H,W,C,T,B]=size(X);
Xr=reshape(X,H,W,C*T,B);
end
function X=reshapeBack(Xr,T)
[H,W,~,B]=size(Xr);
X=reshape(Xr,H,W,[],T,B);
end
John Smith
2023 年 3 月 14 日
As I wrote, it's doable, but a PITA.
Well, I don't think lamenting that will get you anywhere. If you think there is an alternative solution, you can wait for other posts, but if we both haven't found one, I doubt it's coming.
In addition, the number of layers grows by 2 for every normalization layer. For a 12 level transformer this adds a whopping 24 layers. The performance hit is not insignificant.
I don't see why it would be. The functionLayers don't have any learnable parameters
Matt J
2023 年 3 月 14 日
Well, I don't think lamenting that will get you anywhere.
That said, I do agree it would be useful to have a more configurable normalization layer type, where you could explicitly specify which dimensions are to be included in the normalization.
John Smith
2023 年 3 月 15 日
Matt J
2023 年 3 月 15 日
That happens sometimes, but usually you have to submit a formal enhancement request.
カテゴリ
ヘルプ センター および File Exchange で Deep Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!