Multiclass Object Detection Using YOLO v2 Deep Learning
This example shows how to perform multiclass object detection on a custom data set.
Overview
Deep learning is a powerful machine learning technique that you can use to train robust multiclass object detectors such as YOLO v2, YOLO v4, YOLOX, SSD, and Faster R-CNN. This example trains a YOLO v2 multiclass object detector using the trainYOLOv2ObjectDetector
function. The trained object detector is able to detect and identify multiple indoor objects. For more information about training other multiclass object detectors, such as YOLOX, YOLO v4, SSD, and Faster R-CNN, see Get Started with Object Detection Using Deep Learning and Choose an Object Detector.
This example first shows you how to detect multiple objects in an image using a pretrained YOLO v2 object detector. Then, you can optionally download a data set and train YOLO v2 on a custom data set using transfer learning.
Load Pretrained Object Detector
Download a pretrained YOLO v2 object detector, and load it into the workspace.
pretrainedURL = "https://www.mathworks.com/supportfiles/vision/data/yolov2IndoorObjectDetector23b.zip"; pretrainedFolder = fullfile(tempdir,"pretrainedNetwork"); pretrainedNetworkZip = fullfile(pretrainedFolder,"yolov2IndoorObjectDetector23b.zip"); if ~exist(pretrainedNetworkZip,"file") mkdir(pretrainedFolder) disp("Downloading pretrained network (6 MB)...") websave(pretrainedNetworkZip,pretrainedURL) end unzip(pretrainedNetworkZip,pretrainedFolder) trainedNetwork = fullfile(pretrainedFolder,"yolov2IndoorObjectDetector.mat"); trainedNetwork = load(trainedNetwork); trainedDetector = trainedNetwork.detector;
Detect Multiple Indoor Objects
Read a test image that contains objects of the target classes, run the object detector on the image, and display an image annotated with the detection results.
I = imread("indoorTest.jpg"); [bbox,score,label] = detect(trainedDetector,I); LabelScoreStr = compose("%s-%.2f",label,score); annotatedImage = insertObjectAnnotation(I,"rectangle",bbox,LabelScoreStr,LineWidth=4,FontSize=24); figure imshow(annotatedImage)
Load Data for Training
This example uses the Indoor Object Detection Dataset created by Bishwo Adhikari [1]. The data set consists of 2213 labeled images collected from indoor scenes and contains 7 classes: fire extinguisher, chair, clock, trash bin, screen, and printer. Each image contains one or more labeled instances of these classes. Check whether the data set has already been downloaded and, if it is not, use websave
to download it.
dsURL = "https://zenodo.org/record/2654485/files/Indoor%20Object%20Detection%20Dataset.zip?download=1"; outputFolder = fullfile(tempdir,"indoorObjectDetection"); imagesZip = fullfile(outputFolder,"indoor.zip"); if ~exist(imagesZip,"file") mkdir(outputFolder) disp("Downloading 401 MB Indoor Objects Dataset images...") websave(imagesZip,dsURL) unzip(imagesZip,fullfile(outputFolder)) end
Create an imageDatastore
object to store the images from the data set.
datapath = fullfile(outputFolder,"Indoor Object Detection Dataset"); imds = imageDatastore(datapath,IncludeSubfolders=true, FileExtensions=".jpg");
The annotationsIndoor.mat
file contains annotations for each of the images in the data, as well as vectors that specify the indices of the data set images to use for the training, validation, and test sets. Load the file into the workspace, and extract annotations and the indices corresponding to the training, validation, and test sets from the data
variable. The indices specify 2207 images in total, instead of 2213 images, as 6 images have no labels associated with them. Use the indices of the images that contain labels to remove these 6 images from the image and annotations datastores.
data = load("annotationsIndoor.mat"); blds = data.BBstore; trainingIdx = data.trainingIdx; validationIdx = data.validationIdx; testIdx = data.testIdx; cleanIdx = data.idxs; % Remove the 6 images with no labels. imds = subset(imds,cleanIdx); blds = subset(blds,cleanIdx);
Analyze Training Data
Analyze the distribution of object class labels and sizes to understand the data better. This analysis is critical because it helps you determine how to prepare the training data and how to configure an object detector for this specific data set.
Analyze Class Distribution
Measure the distribution of bounding box class labels in the data set by using the countEachLabel
function.
tbl = countEachLabel(blds)
tbl=7×3 table
Label Count ImageCount
________________ _____ __________
exit 545 504
fireextinguisher 1684 818
chair 1662 850
clock 280 277
trashbin 228 170
screen 115 94
printer 81 81
Visualize the counts by class.
bar(tbl.Label,tbl.Count)
ylabel("Frequency")
The classes in this data set are unbalanced. This imbalance can be detrimental to the learning process because the process is biased in favor of the dominant classes. To address the imbalance, use one or more of these complementary techniques: add more data, oversample the underrepresented classes, modify the loss function, or apply data augmentation. Regardless of which approach you use, you must perform empirical analysis to determine the optimal solution for your data set. In this example, you will use data augmentation to reduce bias in the learning process.
Analyze Object Sizes and Choose Object Detector
Read all the bounding boxes and labels within the data set, and calculate the diagonal length of the bounding box.
data = readall(blds); bboxes = vertcat(data{:,1}); labels = vertcat(data{:,2}); diagonalLength = hypot(bboxes(:,3),bboxes(:,4));
Group the object lengths by class.
G = findgroups(labels); groupedDiagonalLength = splitapply(@(x){x},diagonalLength,G);
Visualize the distribution of object lengths for each class.
figure classes = tbl.Label; numClasses = numel(classes); for i = 1:numClasses len = groupedDiagonalLength{i}; x = repelem(i,numel(len),1); plot(x,len,"o") hold on end hold off ylabel("Object extent (pixels)") xticks(1:numClasses) xticklabels(classes)
This visualization highlights the important data set attributes that help you determine which type of object detector to configure:
The object size variance within each class
The object size variance across classes
This data set has a good amount of overlap between the size ranges across classes. In addition, the size variation within each class is not very large. This means that you can train one multiclass detector to handle a range of object sizes. If the size ranges do not overlap, or if the range of object sizes differs by an order of magnitude, then it is more practical to train multiple detectors for different size ranges.
You can determine which object detector to train based on the size variance. When size variance within each class is small, use a single-scale object detector such as YOLO v2. If each class contains large variance, choose a multi-scale object detector such as YOLO v4 or SSD. Since the object sizes in this data set are within the same order of magnitude, use YOLO v2 to start. Although advanced multi-scale detectors might perform better, training them can take more time and resources than YOLO v2. Use more advanced detectors when simpler solutions do not meet your performance requirements.
Use the size distribution information to select the training image size.A fixed size enables batch processing during training. The training image size dictates how large the batch size can be based on the resource constraints of your training environment, such as GPU memory. Process larger batches of data to improve throughput and reduce training time, especially when using a GPU. However, the training image size can impact the resolution of objects if you drastically resize the original data to a smaller size.
In the following section, configure a YOLO v2 object detector using the size analysis information for this data set.
Define YOLO v2 Object Detector Architecture
Configure a YOLO v2 object detector using these steps:
Choose a pretrained detector for transfer learning.
Choose a training image size.
Select which network features to use for predicting object locations and classes.
Estimate anchor boxes from the preprocessed data used to train the object detector.
Select a pretrained Tiny YOLO v2 detector for transfer learning. Tiny YOLO v2 is a lightweight network trained on COCO [2], a large object detection data set. Transfer learning using a pretrained object detector reduces training time compared to training a network from scratch. Alternatively, you can use the larger Darknet-19 YOLO v2 pretrained detector, but consider starting with a simpler network to establish a performance baseline before experimenting with a larger network. Using the Tiny or Darknet-19 YOLO v2 pretrained detector requires the Computer Vision Toolbox™ Model for YOLO v2 Object Detection.
pretrainedDetector = yolov2ObjectDetector("tiny-yolov2-coco");
Next, choose the size of the training images for YOLO v2. When choosing the training image size, consider these size parameters:
The distribution of object sizes in the images, and the impact of resizing the images on the object sizes.
The computational resources required to batch process data at the selected size.
The minimum input size required by the network.
Determine the input size of the pretrained Tiny YOLO v2 network.
pretrainedDetector.Network.Layers(1).InputSize
ans = 1×3
416 416 3
The size of the images within the Indoor Object Detection Dataset is [720 1024 3]. Based on your object analysis, the smallest objects are approximately 20-by-20 pixels.
To maintain a balance between accuracy and the computational cost of running the example, specify a size of [720 720 3]. This size ensures that resizing each image does not drastically affect the spatial resolution of objects in this data set. If you adapt this example for your own data set, you must change the training image size based on your data. Determining the optimal input size requires empirical analysis.
inputSize = [720 720 3];
Combine the image and bounding box datastores.
ds = combine(imds,blds);
Use transform
to apply a preprocessing function that resizes images and their corresponding bounding boxes. The function also sanitizes the bounding boxes to convert them to a valid shape.
preprocessedData = transform(ds,@(data)resizeImageAndLabel(data,inputSize));
Display one of the preprocessed images and its bounding box labels to verify that the objects in the resized images still have visible features.
data = preview(preprocessedData);
I = data{1};
bbox = data{2};
label = data{3};
imshow(I)
showShape("rectangle",bbox,Label=label)
YOLO v2 is a single-scale detector because it uses features extracted from one network layer to predict the location and class of objects in the image. The feature extraction layer is an important hyperparameter for deep learning based object detectors. When selecting the feature extraction layer, choose a layer that outputs features at a spatial resolution that is suitable for the range of object sizes in the data set.
Most networks used in object detection spatially downsample features by powers of two as the data flows through the network. For example, starting from the specified input size, networks can have layers that produce feature maps downsampled spatially by 4x, 8x, 16x, and 32x. If object sizes in the data set are small (for example, less than 10-by-10 pixels), feature maps downsampled by 16x and 32x might not have sufficient spatial resolution to locate the objects precisely. Conversely, if the objects are large, feature maps downsampled by 4x or 8x might not encode enough global context for those objects.
For this data set, specify the "leaky_relu_5"
layer of the Tiny YOLO v2 network, which outputs feature maps downsampled by 16x. This amount of downsampling is a good trade-off between spatial resolution and the strength of the extracted features, as features extracted further down the network encode stronger image features at the cost of spatial resolution.
featureLayer = "leaky_relu_5";
You can use the analyzeNetwork
(Deep Learning Toolbox) function to visualize the Tiny YOLO v2 network and determine the name of the layer that outputs features downsampled by 16x.
Next, use estimateAnchorBoxes
to estimate anchor boxes from the training data. Estimating anchor boxes from the preprocessed data enables you to get an estimate based on the selected training image size. You can use the procedure defined in the Estimate Anchor Boxes From Training Data example to determine the number of anchor boxes suitable for the data set. Based on this procedure, five anchor boxes is a good trade-off between computational cost and accuracy. As with any other hyperparameter, you must optimize the number of anchor boxes for your data using empirical analysis.
numAnchors = 5; aboxes = estimateAnchorBoxes(preprocessedData,numAnchors);
Finally, configure the YOLO v2 network for transfer learning on seven classes with the selected training image size, and estimated anchor boxes.
pretrainedNet = pretrainedDetector.Network; classes = {'exit','fireextinguisher','chair','clock','trashbin','screen','printer'};
detector = yolov2ObjectDetector(pretrainedNet,classes,aboxes, ...
DetectionNetworkSource=featureLayer,InputSize= inputSize);
You can visualize the network using the analyzeNetwork
(Deep Learning Toolbox) function or Deep Network Designer (Deep Learning Toolbox) app.
Prepare Training Data
Initialize the random number generator with a seed of 0 using rng
, and shuffle the data set for reproducibility using the shuffle
function.
rng(0); preprocessedData = shuffle(preprocessedData);
Split the data set into training, test, and validation subsets using the subset
function.
dsTrain = subset(preprocessedData,trainingIdx); dsVal = subset(preprocessedData,validationIdx); dsTest = subset(preprocessedData,testIdx);
Data Augmentation
Use data augmentation to improve network accuracy by randomly transforming the original data during training. Data augmentation enables you to add more variety to the training data without increasing the number of labeled training samples. Use transform
to augment the training data using these steps:
Randomly flip the image and associated bounding box labels horizontally.
Randomly scale the image and associated bounding box labels.
Jitter the image color.
augmentedTrainingData = transform(dsTrain,@augmentData);
Display one of the training images and box labels.
data = read(augmentedTrainingData);
I = data{1};
bbox = data{2};
label = data{3};
imshow(I)
showShape("rectangle",bbox,Label=label)
Train YOLOv2 Object Detector
Specify the network training options using the trainingOptions
(Deep Learning Toolbox) function.
opts = trainingOptions("rmsprop", ... InitialLearnRate=0.001, ... MiniBatchSize=8, ... MaxEpochs=10, ... LearnRateSchedule="piecewise", ... LearnRateDropPeriod=5, ... VerboseFrequency=30, ... L2Regularization=0.001, ... ValidationData=dsVal, ... ValidationFrequency=50, ... OutputNetwork="best-validation-loss");
These training options have been selected using Experiment Manager. For more information on using Experiment Manager for hyperparameter tuning, see Train Object Detectors in Experiment Manager.
To use the trainYOLOv2ObjectDetector
function to train a YOLO v2 object detector, set doTraining
is set to true
.
doTraining = false; if doTraining [detector,info] = trainYOLOv2ObjectDetector(augmentedTrainingData,detector,opts); else detector = trainedDetector; end
This example was verified on an NVIDIA™ GeForce RTX 3090 Ti GPU with 24 GB of memory, which required approximately 45 minutes to complete training. Training time varies depending on the hardware you use. If your GPU has less memory, you may run out of memory. To use less memory, specify a lower MiniBatchSize
value when using the trainingOptions
(Deep Learning Toolbox) function.
Evaluate Object Detector
Evaluate the trained object detector on test images to measure the detector performance. The Computer Vision Toolbox™ provides an object detector evaluation function (evaluateObjectDetection
) to measure common metrics such as average precision and precision recall, with an option to specify the overlap, or intersection-over-union (IoU), thresholds at which to compute the metrics.
Run the detector on the test data set using the detect
object function. To evaluate the detector precision across the full range of recall values, set the detection threshold to a low value to detect as many objects as possible.
detectionThreshold = 0.01; results = detect(detector,dsTest,MiniBatchSize=8,Threshold=detectionThreshold);
Compute Metrics at Specified Overlap Thresholds
Compute the object detection metrics at specified overlap, or IoU, thresholds using the evaluateObjectDetection
function. The overlap threshold defines the amount of overlap required between a predicted bounding box and a ground truth bounding box for the predicted bounding box to count as a true positive. For example, an overlap threshold of 0.5 considers an overlap of 50% between boxes as a correct match, while an overlap threshold of 0.9 is stricter and requires the predicted bounding box to almost exactly coincide with the ground truth bounding box. Specify three overlap thresholds at which to compute metrics using the iouThresholds
variable.
iouThresholds = [0.5 0.75 0.9]; metrics = evaluateObjectDetection(results,dsTest,iouThresholds);
Evaluate Object Detection Metrics Summary
Evaluate the summarized detector performance at the overall dataset level and at the individual class level using the summarize
object function.
[datasetSummary,classSummary] = summarize(metrics)
datasetSummary=1×5 table
NumObjects mAPOverlapAvg mAP0.5 mAP0.75 mAP0.9
__________ _____________ _______ _______ ________
397 0.42971 0.77532 0.4759 0.037926
classSummary=7×5 table
NumObjects APOverlapAvg AP0.5 AP0.75 AP0.9
__________ ____________ _______ _______ ________
exit 42 0.54586 0.97619 0.631 0.030382
fireextinguisher 123 0.62289 0.98041 0.80168 0.086584
chair 167 0.58042 0.93661 0.72323 0.081403
clock 26 0.54996 0.96154 0.62122 0.067113
trashbin 20 0.3004 0.79942 0.10177 0
screen 12 0.14779 0.27671 0.16667 0
printer 7 0.26068 0.49634 0.28571 0
By default, the summarize
object function returns summarized data set and class metrics at all overlap thresholds. Detections with a higher IoU value correspond to a lower average precision (AP) or mean average precision (mAP) value. The AP metric provides a single number that incorporates the ability of the detector to make correct classifications (precision) and the ability of the detector to find all relevant objects (recall). The mAP is the average of the AP calculated for all the classes..
To evaluate the detector performance at the data set level, consider the mAP averaged over all overlap thresholds, returned in the mAPOverlapAvg
column of the datasetSummary
output. At the overlap threshold of 0.5, the mAP is 0.77, which indicates that the detector is able to find most objects without making too many spurious predictions. At higher overlap thresholds, such as 0.75 and 0.9, the mAP is lower since the matching condition between predicted and ground truth boxes is now stricter.
Similarly, to evaluate the detector performance at the class level, consider the mAP averaged over all IoU thresholds, returned in the mAPOverlapAvg
column of the classSummary
output.
Compute Average Precision
Compute the AP at each of the specified overlap thresholds for all classes using the averagePrecision
object function. To visualize how the AP value at the specified thresholds values varies across all the classes in the data set, plot a bar plot.
figure
classAP = averagePrecision(metrics);
bar(classAP)
xticklabels(metrics.ClassNames)
ylabel("AP")
legend(string(iouThresholds))
The plot demonstrates that the detector performed poorly on three classes (printer, screen, and trash bin), which had fewer samples compared to the other classes in the training data. The detector performance also degraded at higher overlap thresholds. To improve performance, consider the class imbalance problem identified in the Analyze Class Distribution section. To address class imbalance, add more images that contain the underrepresented classes or replicate images with these classes and use data augmentation.
Compute Precision and Recall Metrics
Compute the precision and recall metrics using the precisionRecall
object function. Plot the precision-recall (PR) curve and the detection confidence scores side-by-side. The PR curve highlights how precise a detector is at varying levels of recall for each class. By plotting the detector scores next to the PR curve, you can choose a detection threshold that achieves the precision and recall you require for your application.
Precision and Recall for a Single Class
Select a class, extract the precision and recall metrics for the class at the specified overlap thresholds, and plot the PR curves.
classes = metrics.ClassNames; class = classes(3); % Extract precision and recall values. [precision,recall,scores] = precisionRecall(metrics,ClassName=class); % Plot precision-recall curves. figure tiledlayout(1,3) nexttile plot(cat(1,recall{:})',cat(1,precision{:})') ylim([0 1]) xlim([0 1]) xlabel("Recall") ylabel("Precision") grid on axis square title(class + " Precision/Recall") legend(string(iouThresholds) + " IoU",Location="southoutside")
Plot the confidence scores for precision and recall at the specified overlap thresholds, to the right of the PR curve.
nexttile plot(scores{:},cat(1,recall{:})') ylim([0 1]) xlim([0 1]) ylabel("Recall") xlabel("Score") grid on axis square title(class + " Recall/Scores") legend(string(iouThresholds) + " IoU",Location="southoutside") nexttile plot(scores{:},cat(1,precision{:})') ylim([0 1]) xlim([0 1]) ylabel("Precision") xlabel("Score") grid on axis square title(class + " Precision/Scores") legend(string(iouThresholds) + " IoU",Location="southoutside")
Fine-tuning the detection threshold enables you to trade precision for recall. As the value of the detection score increases, the corresponding precision also increases at the cost of reduced recall, as shown Precision/Scores plot and the Recall/Scores plots.
Select a detection threshold that produces the precision-recall characteristics best suited for your application. For example, at an overlap threshold of 0.5, specify the detection threshold as 0.4 to tune the precision to 0.9 at a recall level of 0.9 for the chair
class. Before choosing a final detection threshold for your object detector, analyze the PR curves for all the classes, because the precision and recall characteristics might vary for each class.
Precision and Recall at a Single Overlap Threshold
Select an overlap threshold, extract the precision and recall metrics for a set of selected classes at the specified overlap threshold, and then plot the PR curves. To determine a final detection score threshold, use these plots to analyze the performance of a selected detection score threshold across multiple classes.
overlapThresholds = metrics.OverlapThreshold; iou = overlapThresholds(1); selectedClasses = ["chair","clock","trashbin"]; % Extract precision and recall values [precision,recall,scores] = precisionRecall(metrics,OverlapThreshold=iou,ClassName=selectedClasses); % Plot precision-recall curves. figure tiledlayout(1,3) nexttile % plot P-R curves for all classes at that IoU for c = 1:length(selectedClasses) plot(recall{c},precision{c}) hold on end ylim([0 1]) xlim([0 1]) xlabel("Recall") ylabel("Precision") grid on axis square title("Precision/Recall") legend(selectedClasses,Location="southoutside")
Plot the confidence scores for precision and recall for the specified classes, to the right of the PR curve.
nexttile for c = 1:length(selectedClasses) plot(scores{c},recall{c}) hold on end ylim([0 1]) xlim([0 1]) ylabel("Recall") xlabel("Score") grid on axis square title("Recall/Scores") legend(selectedClasses,Location="southoutside") nexttile for c = 1:length(selectedClasses) plot(scores{c},precision{c}) hold on end ylim([0 1]) xlim([0 1]) ylabel("Precision") xlabel("Score") grid on axis square title("Precision/Scores") legend(selectedClasses,Location="southoutside")
The figure shows the performance of the selected classes at each precision and recall value, and can help determine the detector threshold that results in sufficiently high precision while maintaining a minimum recall of 0.8. In this case, a detection score threshold of 0.5 is suitable for the chair and clock classes, but causes the recall for the trash bin class to fall below the desired limit, which you can see in the Recall/Scores plot. A lower threshold of 0.4 satisfies these conditions for the selected classes.
Evaluate Detector Errors Using Confusion Matrix
The confusion matrix enables you to quantify how well the object detector performs across different classes by providing a detailed breakdown of the detection errors. Investigate the types of classification errors made by the detector at a selected detection score threshold by using the confusionMatrix
object function. Use the detection score threshold you determine from the precision recall analysis to discard predictions below that threshold value.
iou = overlapThresholds(1); detectionThresh = 0.4; % Compute the confusion matrix at a specified score and IoU threshold [confMat,confusionClassNames] = confusionMatrix(metrics,scoreThreshold=detectionThresh,overlapThreshold=iou); % Display the confusion matrix as a confusion chart figure confusionchart(confMat{1},confusionClassNames)
The confusion chart shows misclassification errors on the off-diagonal positions, and the "unmatched" row and column values correspond to the predictions and ground truth bounding boxes that were not matched correctly at the specified overlap threshold.
The confusion chart result plotted in the image above shows that the detector does not confuse two foreground classes at all. Rather, detection errors occur when the detector is not able to detect an object (false negative) or detects an object where it doesn't exist (false positive). For example, ten objects of the screen class are missed, creating ten false negatives. For the chair class, 56 detections are classified as chairs but are actually background regions, creating 56 false positives.
The confusion matrix results agree with the earlier analysis of per-class average precision, demonstrating that the detector performs poorly on three classes (printer, screen, and trash bin), which have fewer training samples compared to the other classes.
Evaluate Object Size Impact on Detector Performance
Investigate the impact of object size on detector performance by using the metricsByArea
object function, which computes detector metrics for specific object size ranges. You can define the object size range based on a predefined set of size ranges for your application, or use the estimated anchor boxes as in this example. The anchor box estimation method automatically clusters the object sizes and provides a set of size ranges based on the input data.
Extract the anchor boxes from the detector, calculate their areas, and sort the areas.
areas = prod(detector.AnchorBoxes,2); areas = sort(areas);
Define area range limits using the calculated areas. The upper limit for the last range is set to three times the size of the largest area, which is sufficient for the objects in this data set.
lowerLimit = [0; areas]; upperLimit = [areas; 3*areas(end)]; areaRanges = [lowerLimit upperLimit]
areaRanges = 6×2
0 2774
2774 9177
9177 15916
15916 47799
47799 124716
124716 374148
Evaluate the object detection metrics across the defined size ranges for the chair
class by using the metricsByArea
function. You can specify other class names to evaluate the object detection metrics for those classes interactively.
classes = string(detector.ClassNames);
areaMetrics = metricsByArea(metrics,areaRanges,ClassName=classes(1))
areaMetrics=6×6 table
AreaRange NumObjects APOverlapAvg AP Precision Recall
________________________ __________ ____________ ____________ _______________ _______________
0 2774 15 0.48468 {3×1 double} {3×2321 double} {3×2321 double}
2774 9177 18 0.5569 {3×1 double} {3×652 double} {3×652 double}
9177 15916 5 0.64667 {3×1 double} {3×123 double} {3×123 double}
15916 47799 4 0.66667 {3×1 double} {3×159 double} {3×159 double}
47799 1.2472e+05 0 0 {3×1 double} {3×30 double} {3×30 double}
1.2472e+05 3.7415e+05 0 NaN {3×1 double} {3×1 double} {3×1 double}
The NumObjects
column shows how many objects in the test data set fall within the area range. Although the detector performed well on the chair
class overall, there is a size range where the detector has low average precision compared to the other size ranges. The range where the detector does not perform well has only 11 samples. To improve the performance in this size range, add more samples of this size to the training data, or use data augmentation to create more samples across the set of size ranges.
You can examine the other classes for more insight into how to improve detector performance.
Deployment
Once the detector has been trained and evaluated, you can optionally generate code and deploy the yolov2ObjectDetector
using GPU Coder™. For more information, see Code Generation for Object Detection by Using YOLO v2 (GPU Coder) example.
Summary
This example shows how to train and evaluate a multiclass object detector. When adapting this example to your own data, carefully assess the object class and size distribution in your data set. Your data might require using different hyperparameters or a different object detector, such as YOLO v4 or YOLOX, for optimal results.
Supporting Functions
function B = augmentData(A) % Apply random horizontal flipping, and random X/Y scaling, and jitter image color. % The function clips boxes scaled outside the bounds if the overlap is above 0.25. B = cell(size(A)); I = A{1}; sz = size(I); if numel(sz)==3 && sz(3) == 3 I = jitterColorHSV(I, ... Contrast=0.2, ... Hue=0, ... Saturation=0.1, ... Brightness=0.2); end % Randomly flip and scale image. tform = randomAffine2d(XReflection=true,Scale=[1 1.1]); rout = affineOutputView(sz,tform,BoundsStyle="CenterOutput"); B{1} = imwarp(I,tform,OutputView=rout); % Sanitize boxes, if needed. This helper function is attached to the example as a % supporting file. Open the example in MATLAB to use this function. A{2} = helperSanitizeBoxes(A{2}); % Apply same transform to boxes. [B{2},indices] = bboxwarp(A{2},tform,rout,OverlapThreshold=0.25); B{3} = A{3}(indices); % Return original data only when all boxes have been removed by warping. if isempty(indices) B = A; end end
function data = resizeImageAndLabel(data,targetSize) % Resize the images, and scale the corresponding bounding boxes. scale = (targetSize(1:2))./size(data{1},[1 2]); data{1} = imresize(data{1},targetSize(1:2)); data{2} = bboxresize(data{2},scale); data{2} = floor(data{2}); imageSize = targetSize(1:2); boxes = data{2}; % Set boxes with negative values to have value 1. boxes(boxes <= 0) = 1; % Validate if bounding box in within image boundary. boxes(:,3) = min(boxes(:,3),imageSize(2) - boxes(:,1) - 1); boxes(:,4) = min(boxes(:,4),imageSize(1) - boxes(:,2) - 1); data{2} = boxes; end
References
[1] Adhikari, Bishwo; Peltomaki, Jukka; Huttunen, Heikki. (2019). Indoor Object Detection Dataset [Data set]. 7th European Workshop on Visual Information Processing 2018 (EUVIP), Tampere, Finland.
[2] Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. “Microsoft COCO: Common Objects in Context,” May 1, 2014. https://arxiv.org/abs/1405.0312v3.