boxLabelDatastore

Datastore for bounding box label data

Description

Use a boxLabelDatastore object to read labeled bounding box data for object detection.

To read bounding box label data from a boxLabelDatastore, use the read function. This function returns a table that contains bounding boxes in the first column and labels in the second column. You can create a datastore that combines the boxLabelDatastore with an ImageDatastore using the combine function. Use the combined datastore to train object detectors using the training functions such as trainYOLOv2ObjectDetector and trainFasterRCNNObjectDetector.

Creation

Create a boxLabelDatastore object using the boxLabelDatastore function. After you create the object, you can use functions to access and manage the data. Use dot notation to modify the ReadSize property.

Description

example

blds = boxLabelDatastore(tbl) creates a boxLabelDatastore object from a table tbl containing labeled bounding box data.

blds = boxLabelDatastore(tbl1,tbl2,...) creates a boxLabelDatastore object using multiple tables.

Input Arguments

expand all

Labeled bounding box data, specified as a table with one or more columns. The training table can be in one of two formats:

A table with one or more columns:

  • All columns contain bounding boxes. Each column must be a cell vector containing anM-by-4 matrices that represent a single object class, such as stopSign, carRear, or carFront. The columns contain 4-element double arrays of M bounding boxes in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the bounding box in the corresponding image.

  • A table with two columns:

    • The first column contains bounding boxes and must be a cell vector. Each element in the cell vector contains M-by-4 matrices in the format [x,y,width,height].

    • The second column must be a cell vector that contains the label names corresponding to each bounding box. Each element in the cell vector must be an M-by-1 categorical or string vector.

To create a ground truth table, use the Image Labeler app or the Video Labeler app. To create a table of training data from the generated ground truth, use the objectDetectorTrainingData function.

Data Types: table

Properties

expand all

This property is read-only.

Labeled bounding box data, specified as an N-by-2 cell matrix.

  • The first column must be a cell vector that contains bounding boxes. Each element in the cell vector contains M-by-4 matrices in the format [x,y,width,height].

  • The second column must be a cell vector that contains the label names corresponding to each bounding box. An M-by-1 categorical vector represents each label name.

Maximum number of rows of label data to read in each call to the read function, specified as a positive integer.

Object Functions

combineCombine data from multiple datastores
countEachLabelCount occurrence of pixel or box labels
hasdataDetermine if data is available to read from datastore
numpartitionsNumber of partitions for a datastore
partitionPartition a label datastore
progressPercentage of data read from a datastore
readRead data from a datastore
readallRead all data in datastore
resetReset datastore to initial state
transformTransform datastore

Examples

collapse all

This example shows how to estimate anchor boxes using a table containing the training data. The first column contains the training images and the remaining columns contain the labeled bounding boxes.

data = load('vehicleTrainingData.mat');
trainingData = data.vehicleTrainingData;

Create a boxLabelDatastore using the labeled bounding boxes from the training data.

blds = boxLabelDatastore(trainingData(:,2:end));

Estimate anchor boxes using the boxLabelDatastore.

numAnchors = 5;
anchorBoxes = estimateAnchorBoxes(blds,numAnchors);

Specify the image size.

inputImageSize = [128,228,3];

Specify the number of classes to detect.

numClasses = 1;

Use a pretrained ResNet-50 network as a base network for the YOLO v2 network.

network = resnet50();

Specify the network layer to use for feature extraction. You can use analyzeNetwork to see all the layer names in a network.

featureLayer = 'activation_49_relu';

Create the YOLO v2 object detection network.

lgraph = yolov2Layers(inputImageSize,numClasses,anchorBoxes,network, featureLayer)
lgraph = 
  LayerGraph with properties:

         Layers: [182×1 nnet.cnn.layer.Layer]
    Connections: [197×2 table]

Visualize the network using the network analyzer.

analyzeNetwork(lgraph)

Anchor boxes are important parameters of deep learning object detectors such as Faster R-CNN and YOLO v2. The shape, scale, and number of anchor boxes impact the efficiency and accuracy of the detectors.

For more information, see Anchor Boxes for Object Detection.

Load Training Data

Load the vehicle dataset, which contains 295 images and associated box labels.

data = load('vehicleTrainingData.mat');
vehicleDataset = data.vehicleTrainingData;

Add the full path to the local vehicle data folder.

dataDir = fullfile(toolboxdir('vision'),'visiondata');
vehicleDataset.imageFilename = fullfile(dataDir,vehicleDataset.imageFilename);

Display the data set summary.

summary(vehicleDataset)
Variables:

    imageFilename: 295×1 cell array of character vectors

    vehicle: 295×1 cell

Visualize Ground Truth Box Distribution

Visualize the labeled boxes to better understand the range of object sizes present in the data set.

Combine all the ground truth boxes into one array.

allBoxes = vertcat(vehicleDataset.vehicle{:});

Plot the box area versus the box aspect ratio.

aspectRatio = allBoxes(:,3) ./ allBoxes(:,4);
area = prod(allBoxes(:,3:4),2);

figure
scatter(area,aspectRatio)
xlabel("Box Area")
ylabel("Aspect Ratio (width/height)");
title("Box Area vs. Aspect Ratio")

The plot shows a few groups of objects that are of similar size and shape, However, because the groups are spread out, manually choosing anchor boxes is difficult. A better way to estimate anchor boxes is to use a clustering algorithm that can group similar boxes together using a meaningful metric.

Estimate Anchor Boxes

Estimate anchor boxes from training data using the estimateAnchorBoxes function, which uses the intersection-over-union (IoU) distance metric.

A distance metric based on IoU is invariant to the size of boxes, unlike the Euclidean distance metric, which produces larger errors as the box sizes increase [1]. In addition, using an IoU distance metric leads to boxes of similar aspect ratios and sizes being clustered together, which results in anchor box estimates that fit the data.

Create a boxLabelDatastore using the ground truth boxes in the vehicle data set. If the preprocessing step for training an object detector involves resizing of the images, use transform and bboxresize to resize the bounding boxes in the boxLabelDatastore before estimating the anchor boxes.

trainingData = boxLabelDatastore(vehicleDataset(:,2:end));

Select the number of anchors and estimate the anchor boxes using estimateAnchorBoxes function.

numAnchors = 5;
[anchorBoxes,meanIoU] = estimateAnchorBoxes(trainingData,numAnchors);
anchorBoxes
anchorBoxes = 5×2

    21    27
    87   116
    67    92
    43    61
    86   105

Choosing the number of anchors is another training hyperparameter that requires careful selection using empirical analysis. One quality measure for judging the estimated anchor boxes is the mean IoU of the boxes in each cluster. The estimateAnchorBoxes function uses a k-means clustering algorithm with the IoU distance metric to calculate the overlap using the equation, 1 - bboxOverlapRatio(allBoxes,boxInCluster).

meanIoU
meanIoU = 0.8411

The mean IoU value greater than 0.5 ensures that the anchor boxes overlap well with the boxes in the training data. Increasing the number of anchors can improve the mean IoU measure. However, using more anchor boxes in an object detector can also increase the computation cost and lead to overfitting, which results in poor detector performance.

Sweep over a range of values and plot the mean IoU versus number of anchor boxes to measure the trade-off between number of anchors and mean IoU.

maxNumAnchors = 15;
meanIoU = zeros([maxNumAnchors,1]);
anchorBoxes = cell(maxNumAnchors, 1);
for k = 1:maxNumAnchors
    % Estimate anchors and mean IoU.
    [anchorBoxes{k},meanIoU(k)] = estimateAnchorBoxes(trainingData,k);    
end

figure
plot(1:maxNumAnchors,meanIoU,'-o')
ylabel("Mean IoU")
xlabel("Number of Anchors")
title("Number of Anchors vs. Mean IoU")

Using two anchor boxes results in a mean IoU value greater than 0.65, and using more than 7 anchor boxes yields only marginal improvement in mean IoU value. Given these results, the next step is to train and evaluate multiple object detectors using values between 2 and 6. This empirical analysis helps determine the number of anchor boxes required to satisfy application performance requirements, such as detection speed, or accuracy.

Introduced in R2019b