メインコンテンツ

extractEmbeddings

Extract feature embeddings from Segment Anything Model (SAM) encoder

Since R2024a

Description

embeddings = extractEmbeddings(sam,I) extracts the feature embeddings of the input image I from the encoder of a Segment Anything Model (SAM), sam.

Note

To use any of the SAM 2 models, this functionality requires the Image Processing Toolbox™ Model for Segment Anything Model 2 add-on if you use any of the SAM 2 models. To use the base SAM model, this functionality requires the Image Processing Toolbox Model for Segment Anything Model add-on.

example

Examples

collapse all

Create a Segment Anything Model (SAM) for image segmentation.

sam = segmentAnythingModel;

Read and display an image.

I = imread("pears.png");
imshow(I)

Calculate the image size.

imageSize = size(I);

Extract the feature embeddings from the image.

embeddings = extractEmbeddings(sam,I);

Specify visual prompts corresponding to the object that you want to segment, such as a pear along the bottom edge of the image. This example selects two foreground points within the pear. Refine the segmentation by including one background point outside the object to segment.

fore = [512 400; 480 420];
back = [340 300];

Overlay the foreground points in green and the background point in red.

hold on
plot(fore(:,1),fore(:,2),"g*",back(:,1),back(:,2),"r*",Parent=gca)
hold off

Segment an object in the image using SAM segmentation.

masks = segmentObjectsFromEmbeddings(sam,embeddings,imageSize, ...
    ForegroundPoints=fore,BackgroundPoints=back);

Overlay the detected object mask on the test image.

imMask = insertObjectMask(I,masks);
imshow(imMask)

Input Arguments

collapse all

Segment Anything Model for image segmentation, specified as a segmentAnythingModel object.

Image or batch of images from which to extract feature embeddings, specified as a 2-D, 3-D, or 4-D numeric array, depending on the number and type of images.

Number of ImagesData Format
Single RGB image3-D numeric array of size H-by-W-by-3
Batch of B RGB images4-D numeric array of size H-by-W-by-3-by-B
Single grayscale image2-D numeric array of size H-by-W
Batch of B grayscale images4-D numeric array of size H-by-W-by-1-by-B

Output Arguments

collapse all

Feature embeddings extracted from the Segment Anything Model encoder, returned as a numeric array or cell array, depending on the model variant and the number of input images.

If the segmentAnythingModel object sam uses the base SAM model, embeddings is a numeric array with size dependent on the number of input images.

Number of Input ImagesEmbeddings Format
Single grayscale or RGB image64-by-64-by-256 array
Batch of B grayscale or RGB images64-by-64-by-256-by-B array

If the segmentAnythingModel object sam uses any of the SAM 2 models, embeddings is a cell array containing image embeddings along with two high-resolution feature maps useful for the localization of objects and improving the accuracy of segmentation.

Number of Input ImagesEmbeddings Format
Single grayscale or RGB image

1-by-3 cell array containing these elements:

  • Feature embeddings, returned as 64-by-64-by-256 array.

  • High-resolution feature map, returned as a 256-by-256-by-32 array.

  • High-resolution feature map, returned as a 128-by-128-by-64 array.

Batch of B grayscale or RGB images

1-by-3 cell array containing these elements:

  • Feature embeddings, returned as 64-by-64-by-256-by-B array.

  • High-resolution feature map, returned as a 256-by-256-by-32-by-B array.

  • High-resolution feature map, returned as a 128-by-128-by-64-by-B array.

Tips

  • For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale the range of pixel values using the rescale function.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

[2] Ravi, Nikhila, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, et al. “SAM 2: Segment Anything in Images and Videos.” arXiv, October 28, 2024. https://doi.org/10.48550/arXiv.2408.00714.

Version History

Introduced in R2024a