Main Content

yolov2Layers

Create YOLO v2 object detection network

Description

example

lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,network,featureLayer) creates a YOLO v2 object detection network and returns it as a LayerGraph object.

example

lgraph = yolov2Layers(___,'ReorgLayerSource',reorgLayer) specifies the source of reorganization layer by using a name-value pair. You can specify this name-value pair to add reorganization layer to the YOLO v2 network architecture. Specify this argument in addition to the input arguments in the previous syntax.

Examples

collapse all

Specify the size of the input image for training the network.

imageSize = [224 224 3];

Specify the number of object classes the network has to detect.

numClasses = 1;

Define the anchor boxes.

anchorBoxes = [1 1;4 6;5 3;9 6];

Specify the pretrained ResNet -50 network as the base network for YOLO v2. To use this pretrained network, you need to install the 'Deep Learning Toolbox Model for ResNet-50 Network' support package.

network = resnet50();

Analyze the network architecture to view all the network layers.

analyzeNetwork(network)

Specify the network layer to be used for feature extraction. You can choose any layer except the fully connected layer as feature layer.

featureLayer = 'activation_49_relu';

Create the YOLO v2 object detection network. The network is returned as a LayerGraph object.

lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,network,featureLayer);

Analyze the YOLO v2 network architecture. The layers succeeding the feature layer are removed. A series of convolution, ReLU, and batch normalization layers along with the YOLO v2 transform and YOLO v2 output layers are added to the feature layer of the base network.

analyzeNetwork(lgraph)

Specify the size of the input image for training the network.

imageSize = [224 224 3];

Specify the number of object classes the network has to detect.

numClasses = 1;

Define the anchor boxes.

anchorBoxes = [1 1;4 6;5 3;9 6];

Specify the pretrained ResNet -50 as base network for YOLO v2. To use this pretrained network, you need to install the 'Deep Learning Toolbox Model for ResNet-50 Network' support package.

network = resnet50();

Analyze the network architecture to view all the network layers.

analyzeNetwork(network)

Specify the network layer to be used for feature extraction. You can choose any layer except the fully connected layer as feature layer.

featureLayer = 'activation_49_relu';

Specify the network layer to be used as the source for reorganization layer.

reorgLayer = 'activation_47_relu';

Create the YOLO v2 object detection network. The network is returned as a LayerGraph object.

lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,network,featureLayer,'ReorglayerSource',reorgLayer);

Analyze the YOLO v2 network architecture. The layers succeeding the feature layer are removed. The detection subnetwork along with the YOLO v2 transform and YOLO v2 output layers are added to the feature layer of base network. The reorganization layer and the depth concatenation layer are also added to the network. The YOLO v2 reorg layer reorganizes the dimension of output features from activation_47_relu layer. The depth concatenation layer concatenates the output of the reorganization layer with the output of a higher layer.

analyzeNetwork(lgraph)

Input Arguments

collapse all

Size of input image, specified as one of these values:

  • Two-element vector of form [H W] - For a grayscale image of size H-by-W

  • Three-element vector of form [H W 3] - For an RGB color image of size H-by-W

Number of object classes, specified as a positive integer.

Anchor boxes, specified as an M-by-2 matrix defining the size and the number of anchor boxes. Each row in the M-by-2 matrix denotes the size of the anchor box in the form of [height width]. M denotes the number of anchor boxes. This input sets the AnchorBoxes property of the output layer.

The size of each anchor box is determined based on the scale and aspect ratio of different object classes present in input training data. Also, the size of each anchor box must be smaller than or equal to the size of the input image. You can use the clustering approach for estimating anchor boxes from the training data. For more information, see Estimate Anchor Boxes From Training Data.

Pretrained convolutional neural network, specified as an LayerGraph (Deep Learning Toolbox), DAGNetwork (Deep Learning Toolbox), or SeriesNetwork (Deep Learning Toolbox) object. This pretrained convolutional neural network is used as the base for the YOLO v2 object detection network. For details on pretrained networks in MATLAB®, see Pretrained Deep Neural Networks (Deep Learning Toolbox).

Name of feature layer, specified as a character vector or a string scalar. The name of one of the deeper layers in the network to be used for feature extraction. The features extracted from this layer are given as input to the YOLO v2 object detection subnetwork. You can use the analyzeNetwork (Deep Learning Toolbox) function to view the names of the layers in the input network.

Note

You can specify any network layer except the fully connected layer as the feature layer.

Name of reorganization layer, specified as a character vector or a string scalar. The name of one of the deeper layers in the network to be used as input to the reorganization layer. You can use the analyzeNetwork (Deep Learning Toolbox) function to view the names of the layers in the input network. The reorganization layer is the pass-through layer that reorganizes the dimension of low layer features to facilitate concatenation with high layer features.

Note

The input to the reorganization layer must be from any one of the network layers that lie above the feature layer.

Output Arguments

collapse all

YOLO v2 object detection network, returned as a LayerGraph object.

Note

The default value for the Normalization property of the image input layer in the returned lgraph object is set to the Normalization property of the base network specified in network.

Algorithms

The yolov2Layers function creates a YOLO v2 network, which represents the network architecture for YOLO v2 object detector. Use the trainYOLOv2ObjectDetector function to train the YOLO v2 network for object detection. The function returns an object that generates the network architecture for YOLO v2 object detection network presented in [1] and [2].

yolov2Layers uses a pretrained neural network as the base network to which it adds a detection subnetwork required for creating a YOLO v2 object detection network. Given a base network, yolov2Layers removes all the layers succeeding the feature layer in the base network and adds the detection subnetwork. The detection subnetwork comprises of groups of serially connected convolution, ReLU, and batch normalization layers. The YOLO v2 transform layer and YOLO v2 output layer are added to the detection subnetwork. If you specify the name-value pair 'ReorgLayerSource', the YOLO v2 network concatenates the output of reorganization layer with the output of feature layer.

For information on creating a custom YOLO v2 network layer-by-layer, see Create YOLO v2 Object Detection Network.

References

[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. Las Vegas, NV: CVPR, 2016.

[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.

Extended Capabilities

Version History

Introduced in R2019a