# trainFasterRCNNObjectDetector

Train a Faster R-CNN deep learning object detector

## Syntax

``trainedDetector = trainFasterRCNNObjectDetector(trainingData,network,options)``
``[trainedDetector,info] = trainFasterRCNNObjectDetector(___)``
``trainedDetector = trainFasterRCNNObjectDetector(trainingData,checkpoint,options)``
``trainedDetector = trainFasterRCNNObjectDetector(trainingData,detector,options)``
``trainedDetector = trainFasterRCNNObjectDetector(___,Name,Value)``

## Description

### Train a Detector

example

````trainedDetector = trainFasterRCNNObjectDetector(trainingData,network,options)` trains a Faster R-CNN (regions with convolution neural networks) object detector using deep learning. You can train a Faster R-CNN detector to detect multiple object classes.This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU. For information about the supported compute capabilities, see GPU Support by Release (Parallel Computing Toolbox).```
````[trainedDetector,info] = trainFasterRCNNObjectDetector(___)` also returns information on the training progress, such as training loss and accuracy, for each iteration.```

### Resume Training a Detector

````trainedDetector = trainFasterRCNNObjectDetector(trainingData,checkpoint,options)` resumes training from a detector checkpoint.```

### Fine-Tune a Detector

````trainedDetector = trainFasterRCNNObjectDetector(trainingData,detector,options)` continues training a Faster R-CNN object detector with additional fine-tuning options. Use this syntax with additional training data or to perform more training iterations to improve detector accuracy.```

````trainedDetector = trainFasterRCNNObjectDetector(___,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments and any of the previous inputs.```

## Examples

collapse all

```data = load('fasterRCNNVehicleTrainingData.mat'); trainingData = data.vehicleTrainingData; trainingData.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ... trainingData.imageFilename); ```

Randomly shuffle data for training.

```rng(0); shuffledIdx = randperm(height(trainingData)); trainingData = trainingData(shuffledIdx,:);```

Create an image datastore using the files from the table.

`imds = imageDatastore(trainingData.imageFilename);`

Create a box label datastore using the label columns from the table.

`blds = boxLabelDatastore(trainingData(:,2:end));`

Combine the datastores.

`ds = combine(imds, blds);`

Set up the network layers.

`lgraph = layerGraph(data.detector.Network);`

Configure training options.

``` options = trainingOptions('sgdm', ... 'MiniBatchSize', 1, ... 'InitialLearnRate', 1e-3, ... 'MaxEpochs', 7, ... 'VerboseFrequency', 200, ... 'CheckpointPath', tempdir);```

Train detector. Training will take a few minutes. Adjust the NegativeOverlapRange and PositiveOverlapRange to ensure training samples tightly overlap with ground truth.

```detector = trainFasterRCNNObjectDetector(ds, lgraph, options, ... 'NegativeOverlapRange',[0 0.3], ... 'PositiveOverlapRange',[0.6 1]);```
```************************************************************************* Training a Faster R-CNN Object Detector for the following object classes: * vehicle Training on single GPU. Initializing input data normalization. |=============================================================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | RPN Mini-batch | RPN Mini-batch | Base Learning | | | | (hh:mm:ss) | Loss | Accuracy | RMSE | Accuracy | RMSE | Rate | |=============================================================================================================================================| | 1 | 1 | 00:00:00 | 0.8771 | 97.30% | 0.83 | 91.41% | 0.71 | 0.0010 | | 1 | 200 | 00:01:15 | 0.5324 | 100.00% | 0.15 | 88.28% | 0.70 | 0.0010 | | 2 | 400 | 00:02:40 | 0.4732 | 100.00% | 0.15 | 92.19% | 0.63 | 0.0010 | | 3 | 600 | 00:04:03 | 0.4776 | 97.14% | 0.09 | 96.88% | 0.59 | 0.0010 | | 3 | 800 | 00:05:23 | 0.5269 | 97.44% | 0.18 | 89.06% | 0.68 | 0.0010 | | 4 | 1000 | 00:06:44 | 0.9749 | 100.00% | | 85.16% | 1.00 | 0.0010 | | 5 | 1200 | 00:08:07 | 1.1952 | 97.62% | 0.13 | 77.34% | 1.27 | 0.0010 | | 5 | 1400 | 00:09:24 | 0.6577 | 100.00% | | 76.38% | 0.72 | 0.0010 | | 6 | 1600 | 00:10:46 | 0.6951 | 100.00% | | 90.62% | 0.94 | 0.0010 | | 7 | 1800 | 00:12:08 | 0.5341 | 96.08% | 0.09 | 86.72% | 0.53 | 0.0010 | | 7 | 2000 | 00:13:26 | 0.3333 | 100.00% | 0.12 | 94.53% | 0.61 | 0.0010 | | 7 | 2065 | 00:13:52 | 1.0564 | 100.00% | | 71.09% | 1.23 | 0.0010 | |=============================================================================================================================================| Detector training complete. ******************************************************************* ```

Test the Faster R-CNN detector on a test image.

`img = imread('highway.png');`

Run the detector.

`[bbox, score, label] = detect(detector,img);`

Display detection results.

```detectedImg = insertShape(img,'Rectangle',bbox); figure imshow(detectedImg)```

## Input Arguments

collapse all

Labeled ground truth, specified as a datastore or a table.

Each bounding box must be in the format [x y width height].

• If you use a datastore, your data must be set up so that calling the datastore with the `read` and `readall` functions returns a cell array or table with two or three columns. When the output contains two columns, the first column must contain bounding boxes, and the second column must contain labels, {boxes,labels}. When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data.

databoxeslabels

The first column must be images.

M-by-4 matrices of bounding boxes of the form [x, y, width, height], where [x,y] represent the top-left coordinates of the bounding box.

The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories.

• If you use a table, the table must have two or more columns. The first column of the table must contain image file names with paths. The images must be grayscale or truecolor (RGB) and they can be in any format supported by `imread`. Each of the remaining columns must be a cell vector that contains M-by-4 matrices that represent a single object class, such as vehicle, flower, or stop sign. The columns contain 4-element double arrays of M bounding boxes in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the bounding box in the corresponding image. To create a ground truth table, you can use the Image Labeler app or Video Labeler app. To create a table of training data from the generated ground truth, use the `objectDetectorTrainingData` function.

Network, specified as a `SeriesNetwork` (Deep Learning Toolbox), an array of `Layer` (Deep Learning Toolbox) objects, a `layerGraph` (Deep Learning Toolbox) object, or by the network name. The network is trained to classify the object classes defined in the `trainingData` table. To use `SeriesNetwork` (Deep Learning Toolbox), `Layer` (Deep Learning Toolbox), and `layerGraph` (Deep Learning Toolbox) objects, you must have Deep Learning Toolbox.

• When you specify the network as a `SeriesNetwork`, an array of `Layer` objects, or by the network name, the function transforms the network into a Faster R-CNN network by adding a region proposal network (RPN), an ROI max pooling layer, and new classification and regression layers to support object detection. Additionally, the `GridSize` property of the ROI max pooling layer is set to the output size of the last max pooling layer in the network.

• The array of `Layer` (Deep Learning Toolbox) objects must contain a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. An example of an array of `Layer` (Deep Learning Toolbox) objects follows:

```layers = [imageInputLayer([28 28 3]) convolution2dLayer([5 5],10) reluLayer() fullyConnectedLayer(10) softmaxLayer() classificationLayer()]; ```

• When you specify the network as `SeriesNetwork` object, `Layer` array, or by the network name, the weights for additional convolution and fully connected layers are initialized to `'narrow-normal'`. The function adds these weights to create the network.

• The network name must be one of the following valid network names. You must also install the corresponding add-on.

Network NameFeature Extraction Layer NameROI Pooling Layer OutputSizeDescription
`alexnet` (Deep Learning Toolbox)`'relu5'`[6 6]Last max pooling layer is replaced by ROI max pooling layer
`vgg16` (Deep Learning Toolbox)`'relu5_3'`[7 7]
`vgg19` (Deep Learning Toolbox)`'relu5_4'`
`squeezenet` (Deep Learning Toolbox)`'fire5-concat'`[14 14]
`resnet18` (Deep Learning Toolbox)`'res4b_relu'`ROI pooling layer is inserted after the feature extraction layer.
`resnet50` (Deep Learning Toolbox)`'activation_40_relu'`
`resnet101` (Deep Learning Toolbox)`'res4b22_relu'`
`googlenet` (Deep Learning Toolbox)`'inception_4d-output'`
`mobilenetv2` (Deep Learning Toolbox)`'block_13_expand_relu'`
`inceptionv3` (Deep Learning Toolbox)`'mixed7'`[17 17]
`inceptionresnetv2` (Deep Learning Toolbox)`'block17_20_ac'`

• The `LayerGraph` object must be a valid Faster R-CNN object detection network. You can use the `fasterRCNNLayers` function to create a `LayerGraph` object to train a custom Faster R-CNN network.

Tip

If your network is a `DAGNetwork`, use the `layerGraph` (Deep Learning Toolbox) function to convert the network to a `LayerGraph` object. Then, create a custom Faster R-CNN network as described by the Create Faster R-CNN Object Detection Network example.

For more information on creating a Faster R-CNN network, see Getting Started with R-CNN, Fast R-CNN, and Faster R-CNN.

Training options, returned by the `trainingOptions` (Deep Learning Toolbox) function (requires Deep Learning Toolbox). To specify solver and other options for network training, use `trainingOptions`.

Note

`trainFasterRCNNObjectDetector` does not support these training options:

• Datastore inputs are not supported when you set the `DispatchInBackground` training option to `true`.

Additionally, the function does not support the following training options if you use a combined datastore input:

• `'once'` and `'every-epoch'` values for '`Shuffle`' argument

• `'parallel'` and `'multi-gpu'` values for '`ExecutionEnvironment`' argument

Saved detector checkpoint, specified as a `fasterRCNNObjectDetector` object. To periodically save a detector checkpoint during training, specify `CheckpointPath`. To control how frequently check points are saved see the `CheckPointFrequency` and `CheckPointFrequencyUnit` training options.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the `'CheckpointPath'` property of `options` is `'/tmp'`, load a checkpoint MAT-file using:

`data = load('/tmp/faster_rcnn_checkpoint__105__2016_11_18__14_25_08.mat');`

The name of the MAT-file includes the iteration number and a timestamp indicating when the detector checkpoint was saved. The detector is saved in the `detector` variable of the file. Pass this file back into the `trainFasterRCNNObjectDetector` function:

```frcnn = trainFasterRCNNObjectDetector(stopSigns,... data.detector,options);```

Previously trained Faster R-CNN object detector, specified as a `fasterRCNNObjectDetector` object. Use this syntax to continue training a detector with additional training data or to perform more training iterations to improve detector accuracy.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'PositiveOverlapRange',[0.75 1]`

Training method, specified as the comma-separated pair consisting of '`TrainingMethod`' and either `'end-to-end'` or `'four-step'`.

• `'end-to-end'` — Simultaneously train the region proposal and region classification subnetworks.

• `'four-step'` — Separately train the region proposal and region classification subnetworks in four steps.

Bounding box overlap ratios for positive training samples, specified as the comma-separated pair consisting of `'PositiveOverlapRange'` and one of the following:

• A 2-element vector that specifies an identical overlap ratio for all four training stages.

• A 2-by-2 matrix, used only for the end-to-end training method. The first row of the matrix defines the overlap ratios for the region proposal subnetwork. The second row defines the overlap ratios for the region classification subnetwork.

• A 4-by-2 matrix, used only for the four-step training method. Each row of the matrix specifies the overlap ratio for each of the four training stages.

Values are in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the `PositiveOverlapRange` and `NegativeOverlapRange` is defined as:

`$\frac{area\left(A\cap B\right)}{area\left(A\cup B\right)}$`

A and B are bounding boxes.

Bounding box overlap ratios for negative training samples, specified as the comma-separated pair consisting of '`NegativeOverlapRange`' and one of the following.

• A 2-element vector that specifies the overlap ratio.

• A 2-by-2 matrix, used only for the end-to-end training method. The first row of the matrix defines the overlap ratios for the region proposal subnetwork. The second row defines the overlap ratios for the region classification subnetwork.

• A 4-by-2 matrix, used only for the four-step training method. Each row of the matrix specifies the overlap ratio for each of the four training stages.

Values are in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

The overlap ratio used for both the `PositiveOverlapRange` and `NegativeOverlapRange` is defined as:

`$\frac{area\left(A\cap B\right)}{area\left(A\cup B\right)}$`

A and B are bounding boxes.

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of `'NumStrongestRegions'` and a positive integer. Reduce this value to speed up processing time at the cost of training accuracy. To use all region proposals, set this value to `Inf`.

Number of region proposals to randomly sample from each training image, specified as an integer, 1-by-2 vector, or a 1-by-4 vector. Use the 1-by-2 vector for end-to-end training. Use the 1-by-4 vector for the four-step training. Reduce the number of regions to sample to reduce memory usage and speed up training. Reducing the value can also decrease training accuracy.

When you set '`TrainingMethod`' to `'end-to-end'`, the number of region proposals can be set to a 1-by-2 vector. The first element of the vector must be the number of regions sampled for the region proposal subnetwork. The second element must be the number of regions sampled for the region classfication subnetwork.

When you set '`TrainingMethod`' to `'four-step'`, the number of region proposals can be set to a 1-by-4 vector. The ith element specifies the number of regions to sample for the ith training step.

Length of the smallest image dimension, either width or height, specified as the comma-separated pair consisting of `'SmallestImageDimension'` and a positive integer. Training images are resized such that the length of the shortest dimension is equal to the specified integer. By default, training images are not resized. Resizing training images helps reduce computational costs and memory used when training images are large. Typical values range from 400–600 pixels.

#### Dependencies

• The `SmallestImageDimension` property supports only table input training data. To resize the input data of a datastore input, use the `transform` function.

Minimum anchor box sizes for building the anchor box pyramid of the region proposal network (RPN), specified as the comma-separated pair consisting of`'MinBoxSizes'` and an m-by-2 matrix. Each row defines the [height width] of an anchor box.

The default `'auto'` setting uses the minimum size and the median aspect ratio from the bounding boxes for each class in the ground truth data. To remove redundant box sizes, the function keeps boxes that have an intersection-over-union value that is less than or equal to 0.5. This behavior ensures that the minimum number of anchor boxes is used to cover all the object sizes and aspect ratios.

When anchor boxes are computed based on `MinBoxSizes`, the ith anchor box size is:

`round(MinBoxSizes(i,:) .* BoxPyramidScale ,^ (0:NumBoxPyramidLevels-1)')`

#### Dependencies

• You cannot use this property if you specify the network as a `LayerGraph` object or if you resume training from a detector checkpoint.

• The `MinBoxSizes` property supports only input training in table format. To estimate anchor boxes for a datastore input, use the `estimateAnchorBoxes` function.

Anchor box pyramid scale factor used to successively upscale anchor box sizes, specified as the comma-separated pair consisting of `'BoxPyramidScale'` and a scalar. Recommended values are from 1 through 2. Increase this value for faster results. Decrease the number for greater accuracy.

#### Dependencies

• The `BoxPyramidScale` property supports only input training data in table format. To estimate anchor boxes for a datastore input, use the `estimateAnchorBoxes` function.

Number of levels in an anchor box pyramid, specified as the comma-separated pair consisting of `'NumBoxPyramidLevels'` and a scalar. Select a value that ensures that the multiscale anchor boxes are comparable in size to the size of objects in the ground truth data.

The default setting `'auto'` selects the number of levels based on the size of objects within the ground truth data. The number of levels is selected such that it covers the range of object sizes.

#### Dependencies

• The `NumBoxPyramidLevels` property supports only input training data in table format. To estimate anchor boxes for a datastore input, use the `estimateAnchorBoxes` function.

Frozen batch normalization during training, specified as the comma-separated pair consisting of '`FreezeBatchNormalization`' and `true` or `false`. The value indicates whether to freeze the input layers to the network during training. Set this value to `true` if you are training with a small mini-batch size. Small batch sizes result in poor estimates of the batch mean and variance, which are required for effective batch normalization.

If you do not specify a value for '`FreezeBatchNormalization`', the function sets the property to:

• `true` if the '`MiniBatchSize`' name-value argument for the `trainingOptions` (Deep Learning Toolbox) function is less than `8`.

• `false` if the '`MiniBatchSize`' name-value argument for the `trainingOptions` (Deep Learning Toolbox) function is greater than or equal to `8`.

You must specify a value for '`FreezeBatchNormalization`' to overide this default behavior.

Detector training experiment monitoring, specified as an `experiments.Monitor` (Deep Learning Toolbox) object for use with the Experiment Manager (Deep Learning Toolbox) app. You can use this object to track the progress of training, update information fields in the training results table, record values of the metrics used by the training, and to produce training plots. For an example using this app, see Train Object Detectors in Experiment Manager.

Information monitored during training:

• Training loss at each iteration.

• Training accuracy at each iteration.

• Training root mean square error (RMSE) for the box regression layer.

• Learning rate at each iteration.

Validation information when the training `options` input contains validation data:

• Validation loss at each iteration.

• Validation accuracy at each iteration.

• Validation RMSE at each iteration.

## Output Arguments

collapse all

Trained Faster R-CNN object detector, returned as a `fasterRCNNObjectDetector` object.

Training progress information, returned as a structure array with eight fields. Each field corresponds to a stage of training.

• `TrainingLoss` — Training loss at each iteration is the mean squared error (MSE) calculated as the sum of localization error, confidence loss, and classification loss. For more information about the training loss function, see Training Loss.

• `TrainingAccuracy` — Training set accuracy at each iteration.

• `TrainingRMSE` — Training root mean squared error (RMSE) is the RMSE calculated from the training loss at each iteration.

• `BaseLearnRate` — Learning rate at each iteration.

• `ValidationLoss` — Validation loss at each iteration.

• `ValidationAccuracy` — Validation accuracy at each iteration.

• `ValidationRMSE` — Validation RMSE at each iteration.

• `FinalValidationLoss` — Final validation loss at end of the training.

• `FinalValidationRMSE` — Final validation RMSE at end of the training.

Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are assigned as `NaN`. The struct contains `ValidationLoss`, `ValidationAccuracy`, `ValidationRMSE`, `FinalValidationLoss`, and `FinalValidationRMSE` fields only when `options` specifies validation data.

## Tips

• To accelerate data preprocessing for training, `trainFastRCNNObjectDetector` automatically creates and uses a parallel pool based on your parallel preference settings. For more details about setting these preferences, see parallel preference settings. Using parallel computing preferences requires Parallel Computing Toolbox.

• VGG-16, VGG-19, ResNet-101, and Inception-ResNet-v2 are large models. Training with large images can produce "out-of-memory" errors. To mitigate these errors, try one or more of these options:

• This function supports transfer learning. When you input a `network` by name, such as `'resnet50'`, then the function automatically transforms the network into a valid Faster R-CNN network model based on the pretrained `resnet50` (Deep Learning Toolbox) model. Alternatively, manually specify a custom Faster R-CNN network by using the `LayerGraph` (Deep Learning Toolbox) extracted from a pretrained DAG network. For more details, see Create Faster R-CNN Object Detection Network.

• This table describes how to transform each named network into a Faster R-CNN network. The feature extraction layer name specifies the layer for processing by the ROI pooling layer. The ROI output size specifies the size of the feature maps output by the ROI pooling layer.

Network NameFeature Extraction Layer NameROI Pooling Layer OutputSizeDescription
`alexnet` (Deep Learning Toolbox)`'relu5'`[6 6]Last max pooling layer is replaced by ROI max pooling layer
`vgg16` (Deep Learning Toolbox)`'relu5_3'`[7 7]
`vgg19` (Deep Learning Toolbox)`'relu5_4'`
`squeezenet` (Deep Learning Toolbox)`'fire5-concat'`[14 14]
`resnet18` (Deep Learning Toolbox)`'res4b_relu'`ROI pooling layer is inserted after the feature extraction layer.
`resnet50` (Deep Learning Toolbox)`'activation_40_relu'`
`resnet101` (Deep Learning Toolbox)`'res4b22_relu'`
`googlenet` (Deep Learning Toolbox)`'inception_4d-output'`
`mobilenetv2` (Deep Learning Toolbox)`'block_13_expand_relu'`
`inceptionv3` (Deep Learning Toolbox)`'mixed7'`[17 17]
`inceptionresnetv2` (Deep Learning Toolbox)`'block17_20_ac'`

For information on modifying how a network is transformed into a Faster R-CNN network, see Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model.

• During training, multiple image regions are processed from the training images The number of image regions per image is controlled by the `NumRegionsToSample` property. The `PositiveOverlapRange` and `NegativeOverlapRange` properties control which image regions are used for training. Positive training samples are those that overlap with the ground truth boxes by 0.6 to 1.0, as measured by the bounding box intersection-over-union metric (IoU). Negative training samples are those that overlap by 0 to 0.3. Choose values for these properties by testing the trained detector on a validation set.

Overlap ValuesDescription
`PositiveOverlapRange` set to `[0.6 1]`Positive training samples are set equal to the samples that overlap with the ground truth boxes by 0.6 to 1.0, measured by the bounding box IoU metric.
`NegativeOverlapRange` set to `[0 0.3]`Negative training samples are set equal to the samples that overlap with the ground truth boxes by 0 to 0.3.

If you set `PositiveOverlapRange` to `[0.6 1]`, then the function sets the positive training samples equal to the samples that overlap with the ground truth boxes by 0.6 to 1.0, measured by the bounding box IoU metric. If you set `NegativeOverlapRange` to `[0 0.3]`, then the function sets the negative training samples equal to the samples that overlap with the ground truth boxes by 0 to 0.3.

• Use the `trainingOptions` (Deep Learning Toolbox) function to enable or disable verbose printing.

## References

[1] Ren, S., K. He, R. Girschick, and J. Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." Advances in Neural Information Processing Systems. Vol. 28, 2015.

[2] Girshick, R. "Fast R-CNN." Proceedings of the IEEE International Conference on Computer Vision, 1440-1448. Santiago, Chile: IEEE, 2015.

[3] Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation." Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587. Columbus, OH: IEEE, 2014.

[4] Zitnick, C. L., and P. Dollar. "Edge Boxes: Locating Object Proposals from Edges." Computer Vision-ECCV 2014, 391-405. Zurich, Switzerland: ECCV, 2014.

## Version History

Introduced in R2017a

expand all

Behavior changed in R2019b