Does the selfattentionLayer also perform softmax and scaling?
古いコメントを表示
In https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html, it states that:
A self-attention layer computes single-head or multihead self-attention of its input.
The layer:
- Computes the queries, keys, and values from the input
- Computes the scaled dot-product attention across heads using the queries, keys, and values
- Merges the results from the heads
- Performs a linear transformation on the merged result
I wonder if the layer also apply softmax to the scaling (i.e. divide (Q*K) by sqrt(dim))? My understanding is that, within step 2, this softmax and scaling should happen.
Please clarify that for me or more general users.
Thanks.
採用された回答
その他の回答 (1 件)
xingxingcui
2024 年 1 月 11 日
編集済み: xingxingcui
2024 年 4 月 27 日
0 投票
Hi,@Chih
-------------------------Off-topic interlude, 2024-------------------------------
I am currently looking for a job in the field of CV algorithm development, based in Shenzhen, Guangdong, China,or a remote support position. I would be very grateful if anyone is willing to offer me a job or make a recommendation. My preliminary resume can be found at: https://cuixing158.github.io/about/ . Thank you!
Email: cuixingxing150@gmail.com
カテゴリ
ヘルプ センター および File Exchange で Deep Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!