Computer Vision Toolbox Model for Vision Transformer Network

Implementation of several variants of the vision transformer (ViT) model.
ダウンロード: 606
更新 2024/3/20
The Vision Transformer (ViT) model is a pretrained transformer model for image classification. It is also used as a backbone for other computer vision tasks such as object detection. The support package consists of three variants of the ViT model:
  • Base-16 model
  • Small-16 model
  • Tiny-16 model
Here, “base”, “small” and “tiny” represent the model architecture and size, and 16 represents the patch size hyper-parameter. Each variant has been pretrained on ImageNet data set with input resolution of 384 and is stored as a .MAT file.
MATLAB リリースの互換性
作成: R2023b
R2023b 以降 R2024a 以前と互換性あり
プラットフォームの互換性
Windows macOS (Apple シリコン) macOS (Intel) Linux
タグ タグを追加

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!