Create Datastores for Medical Image Semantic Segmentation

Semantic segmentation deep learning networks segment a medical image by assigning a class label, such as tumor or lung, to every pixel in the image. To train a semantic segmentation network, you must have a collection of images, or data sources, and a collection of label images that contain labels for the pixels in the data source images. Manage training data for semantic segmentation by using datastores:

Load data source images by using an imageDatastore object.
Load pixel label images by using a pixelLabelDatastore (Computer Vision Toolbox) object.
Pair data source images and pixel label images by using a CombinedDatastore or a randomPatchExtractionDatastore object.

Data source and label image for a 2-D chest CT slice

Medical Image Ground Truth Data

You can use the Medical Image Labeler app to label 2-D or 3-D medical images to generate training data for semantic segmentation networks. The app stores labeling results in a groundTruthMedical object, which specifies the filenames of data source and pixel label images in its DataSource and LabelData properties, respectively. The table shows how a groundTruthMedical object formats the data source and label image information for 2-D versus 3-D data.

Type of Data Data Source Format Label Data Format

2-D medical images or multiframe 2-D image series

Type of Data	Data Source Format	Label Data Format
2-D medical images or multiframe 2-D image series	The `DataSource` property contains an `ImageSource` object that specifies 2-D images or image series in one of these formats: Single DICOM file. Single NIfTI file. Note A `groundTruthMedical` object can specify a combination of 2-D DICOM and NIfTI data sources.	The `LabelData` property contains a string array. Each element specifies the filename of the label image for the corresponding data source. 2-D label images are MAT files, regardless of the data source file format. If a data source has no labels, the corresponding element of `LabelData` is an empty string, `""`.
3-D medical image volumes	The `DataSource` property contains a `VolumeSource` object that specifies 3-D image volumes in one of these formats: Directory of DICOM files corresponding to one volume. Single DICOM file. Single NIfTI file. Single NRRD file. Note A `groundTruthMedical` object can specify a combination of 3-D DICOM, NIfTI, and NRRD data sources.	The `LabelData` property contains a string array. Each element specifies the filename of the label image for the corresponding data source. 3-D label images are NIfTI files, regardless of the data source file format. If a data source has no labels, the corresponding element of `LabelData` is an empty string, `""`.

The DataSource property contains an ImageSource object that specifies 2-D images or image series in one of these formats:

Single DICOM file.
Single NIfTI file.

Note

A groundTruthMedical object can specify a combination of 2-D DICOM and NIfTI data sources.

The LabelData property contains a string array. Each element specifies the filename of the label image for the corresponding data source.

2-D label images are MAT files, regardless of the data source file format.
If a data source has no labels, the corresponding element of LabelData is an empty string, "".

3-D medical image volumes

The DataSource property contains a VolumeSource object that specifies 3-D image volumes in one of these formats:

Directory of DICOM files corresponding to one volume.
Single DICOM file.
Single NIfTI file.
Single NRRD file.

Note

A groundTruthMedical object can specify a combination of 3-D DICOM, NIfTI, and NRRD data sources.

The LabelData property contains a string array. Each element specifies the filename of the label image for the corresponding data source.

3-D label images are NIfTI files, regardless of the data source file format.
If a data source has no labels, the corresponding element of LabelData is an empty string, "".

Datastores for Semantic Segmentation

You can perform medical image semantic segmentation using 2-D or 3-D deep learning networks. A 2-D network accepts 2-D input images and predicts segmentation labels using 2-D convolution kernels. The input images can be one of these sources:

Images from 2-D modalities, such as X-ray.
Individual frames extracted from a multiframe 2-D image series, such as an ultrasound video.
Individual slices extracted from a 3-D image volume, such as a CT or MRI scan.

A 3-D network accepts 3-D input images and predicts segmentation labels using 3-D convolution kernels. The input images are 3-D medical volumes, such as entire CT or MRI volumes.

The benefits of 2-D networks include faster prediction speeds and lower memory requirements. Additionally, you can generate many 2-D training images from one image volume or series. Therefore, fewer scans are required to train a 2-D network that segments a volume slice-by-slice versus training a fully 3-D network. The major benefit of 3-D networks is that they use information from adjacent slices or frames to predict segmentation labels, which can produce more accurate results.

For an example that shows how to create datastores that contain 2-D ultrasound frames, see Convert Ultrasound Image Series into Training Data for 2-D Semantic Segmentation Network.
For an example that shows how to create, preprocess, and augment 3-D datastores for segmentation, see Create Training Data for 3-D Medical Image Semantic Segmentation.