Main Content

Integrate YOLO v2 Vehicle Detector System on SoC

This example shows how to simulate a you only look once (YOLO) vehicle detector and verify the functionality of the end-to-end application using MATLAB.

The end-to-end application includes preprocessing of the images, a YOLO v2 vehicle detection network, and postprocessing of the images to overlay results.

Load Camera Data and Network File

This example uses a PandasetCameraData.mp4 file that contains a subset of the video from PandaSet data set. Download the video file and the network .mat file.

supportFileDir =  matlab.internal.examples.utils.getSupportFileDir();
pathToDataset = fullfile(supportFileDir, 'visionhdl', 'PandasetCameraData');
if(~isfile(fullfile(pathToDataset, 'PandasetCameraData.mp4')) ...
                    || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector32Layer.mat')) ...
                    || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector60Layer.mat')))
    PandasetZipFile = matlab.internal.examples.downloadSupportFile('visionhdl','PandasetCameraData.zip');
    [outputFolder,~,~] = fileparts(PandasetZipFile);
    unzip(PandasetZipFile,outputFolder);
end

addpath(pathToDataset);

A YOLO v2 vehicle detection application has three main modules. The preprocessing module, accepts the input frame and performs image resize and normalization. The preprocessed data is then consumed by the YOLO v2 vehicle detection network, which is a feature extraction network followed by a detection network. The network output is postprocessed to identify the strongest bounding boxes and the resulting bounding boxes are overlaid on the input image.

The preprocessing subsystem and DLIP are deployed on FPGA (Programmable Logic, PL) and the postprocessing is deployed on the ARM prcoessor (Processing System, PS). For deploying the vehicle detector, see YOLO v2 Vehicle Detector with Live Camera Input on Zynq-Based Hardware. This example shows how to model the preprocessing module (resize and normalization) and postprocessing module along with DL handshaking logic and network execution.

Explore Vehicle Detector

open_system('YOLOv2VehicleDetectorOnSoC');

The vehicle detector contains these modules:

  • Source- Selects the inputImage from Pandaset.

  • Conversion- Converts the input frame into RGB pixel stream.

  • Pixel-stream based preprocessing(to FPGA)- Preprocesses the input frame and writes it into DDR.

  • Deep learning IP Core Simulation Logic- Models the DL processor to calculate activations on the input frame and write the output to DDR

  • Conversion- Converts the input RGB pixel stream to frame for overlaying bounding boxes.

  • Postprocessing and Overlay(to ARM)- Applies postprocessing to network output and overlay the bounding boxes on the input frame.

  • Display- Displays the input frame with detections.

The inputImages stores numFrames number of images from Pandaset. The frame is initially resized and normalized in YOLOv2PreprocessDUT and the preprocessed output is written into DDR at the address location read from DL input handshaking registers, (InputValid, InputAddr, InputSize). The DLIP calculates activations on the preprocessed image, writes the activations to DDR, and updates the DL output handshaking registers, (OutputValid, OutputAddr, OutputSize). This handshaking triggers the YOLOv2PostprocessDUT, that reads the DL output from the address information obtained from the DL registers, and performs post processing and calculates bounding boxes that are displayed in the VideoViewer block via the overlayBoundingboxes function.

YOLOv2PreprocessDUT

open_system('YOLOv2VehicleDetectorOnSoC/YOLOv2PreprocessDUT');

The selectImage subsystem selects the input frame from inputImages block. A Frame To Pixels block converts the input image from the selectImage to a pixel stream and pixelcontrol bus. The Unpack subsystem divides the pixel stream into R, G, B components. The RGB data, (RIn, GIn, BIn) along with ctrl bus is fed for preprocessing. The input image is streamed out as, (ROut, GOut, BOut) to write it into the PS DDR for overlaying the bounding boxes.

The YOLOv2PreprocessDUT contains subsystems for frame dropping, selecting Region of Interest (ROI) from the input frame, preprocessing (resize and normalization), and handshaking logic.

The Frame Drop subsystem synchronizes data between YOLOv2PreprocessDUT and DLIP by dropping the input frames if DLIP is not available for processing. It contains finite state machine (FSM) logic for reading DLIP registers and a pixel bus creator to concatenate the output control signals of frame drop logic to pixel control bus. The readInputRegisters subsystem reads the inputAddrReg register and forwards the first frame to preprocessing and resets the control signals for rest of the frames until inputAddr is updated by DLIP. This frame drop logic lets the DLIP process one frame corresponding to one inputAddr.

The output of the Frame Drop subsystem is sent to the ROI Selector block that selects the ROI from the input image and forwards it for preprocessing. The ROI is selected for the input image from Pandaset of size 1920x1080 and is scaled down by a factor of 4 for faster simulation. The ROI is configured in helperSLYOLOv2SimulationSetup function.

hPos = 350;
vPos = 400;
hSize = 1000;
vSize = 600;

The YOLO v2 Preprocess Algorithm contains subsystems to perform resizing and normalization operations. The pixel stream from the Frame Drop subsystem is passed to the Resize subsystem for resizing the input image to the input size expected by the deep learning network, (128, 128, 3). The resized output is passed to Normalization subsystem for rescaling the pixel values to [0, 1] range. This preprocessed frame is then passed to the DL Handshake Logic Ext Mem subsystem to be written into the PL DDR.

The DL Handshake Logic Ext Mem subsystem contains a finite state machine (FSM) logic for handshaking with DLIP and a subsystem to write the frame to DDR. The Read DL Registers subsystem has the FSM logic to read the handshaking signals (InputValid, InputAddr, and InputSize) from the DLIP for multiple frames. The Write to DDR subsystem uses these handshaking signals to write the preprocessed frame to the memory using AXI4-Master protocol. For more information on the Yolov2PreprocessDUT refer to the example, Deploy and Verify YOLO v2 Vehicle Detector on FPGA

DLIP

open_system('YOLOv2VehicleDetectorOnSoC/DLIP','force');

The DLIP contains subsystems for prediction logic, DL input and output register handshaking logic, and an AXI Write controller to write the DL Output to DDR.

The FetchPreprocessedImage subsystem reads and rearranges the output from YOLOv2PreprocessDUT to the networkInputSize as required by the deep learning network. The network and the activation layer of the DLIP are setup using helperSLYOLOv2SimulationSetup and helperYOLOv2Network functions.

This example uses a pretrained YOLO v2 network that was trained on Pandaset. The network output is rearranged to the external memory data format of the DL Processor by concatenating the elements along the third dimension. For more information, see External Memory Data Format (Deep Learning HDL Toolbox).

The DL output is written to memory using AXIM Write Controller subsystem. The write operations from the YOLOv2PreprocessDUT and DLIP are multiplexed using DDR Write Arbitrator.

YOLOv2PostprocessDUT

open_system('YOLOv2VehicleDetectorOnSoC/YOLOv2PostprocessDUT','force');

The YOLOv2PostprocessDUT subsystem contains subsystems for DL Handshaking, reading DL output, transforming and applying post processing to the DL Output. The DL handshaking subsystems have variant behavior depending on whether the model is configured for simulation or deployment based on simulationFlag. Since this example demonstrates the simulation workflow, the simulationFlag is set to true in helperSLYOLOv2Setup script.

The Set Control Registers subsystem sets the control registers for YOLOv2PreprocessDUT, postProcStart, DUTProcStart, and inputAddrReg. The DL Handshaking subsystem reads the DL Output handshaking registers, (OutputValid, OutputAddr, OutputSize) indicating address, size, and validity of the output. The model abstracts these registers as datastore blocks for simulation. The readDLOutput subsystem uses these handshaking signals and reads the DL Output from PL DDR.

The readDLOutput subsystem contains subsystems for polling OutputValid, generating read requests, and reading DL output from PL DDR. The pollOutputValid function polls for the OutputValid signal from DLIP and triggers post processing when OutputValid is asserted. The read DL Output from PL DDR subsystem contains a signal rdDone which indicates that DL Output read operation is completed successfully. The TriggerDLOutputNext subsystem pulses OutputNext signal when rdDone is asserted to indicate to the DLIP that the output of current frame is read.

The DL output data is then sent to yolov2TransformlayerandPostprocess function for postprocessing. It transforms the DL Output from DDR by rearranging, normalizing the data, and thresholding the bounding boxes with a confidence score of 0.4. It returns the bounding boxes and pulses postProcDone signal to indicate that the post processing is completed successfully.

The YOLOv2PostprocessDUT is configured with these DL network parameters, networkInputSize, networkOutputSize, anchorBoxes and inputImageROI, inputROISize, confidenceThreshold in helperSLYOLOv2SimulationSetup.m script.

vehicleDetector = load(networkmatfile);
detector = vehicleDetector.detector;
net = detector.Network;
anchorBoxes = detector.AnchorBoxes;
networkInputSize = net.Layers(1, 1).InputSize;
networkOutputSize = [16,16,12];
paddedOutputSize = (networkOutputSize(1)*networkOutputSize(2)*networkOutputSize(3)*4)/3;
inputImageROI = [hPos, vPos, hSize, vSize];
inputROISize = [vSize, hSize, numComponents];
confidenceThreshold = 0.4;

Simulate Vehicle Detector

Configure the network for the vehicle detector using the helperSLYOLOv2SimulationSetup function.

helperSLYOLOv2SimulationSetup();

The script supports 2 networks, a 32 layer network(default) and a 60 layer network. To run the 60 layer network, set the networkConfig to '60layer'.

helperSLYOLOv2SimulationSetup('60layer');

This model takes a couple of minutes to update the diagram when you are compiling for the first time. Update the model before running the simulation.

set_param("YOLOv2VehicleDetectorOnSoC", SimulationCommand="update");
out = sim("YOLOv2VehicleDetectorOnSoC");
### Starting serial model reference simulation build.
### Model reference simulation target for DLHandshakeLogicExtMem is up to date.
### Model reference simulation target for YOLOv2PreprocessAlgorithm is up to date.

Build Summary

0 of 2 models built (2 models already up to date)
Build duration: 0h 0m 32.939s

Verify YOLOv2PreprocessDUT and YOLOv2PostprocessDUT using MATLAB

The example includes subsystems for verification of outputs of YOLOv2PreprocessDUT and YOLOv2PostprocessDUT. The Verify Preprocess Output and Verify Postprocess Output subsystems log the signals required for the verification of the preprocessed image and bounding boxes, respectively.

helperVerifyVehicleDetector;

Close the figures

close(hFigurePreprocess);
close(hFigurePostprocess);

The helperVerifyVehicleDetector script verifies all the logged outputs obtained in simulation. It compares the preprocessed image obtained in simulation with the reference image obtained by applying resize and normalize operations and overlays the bounding boxes obtained from simulation and from detect (Computer Vision Toolbox) function on the input images from the dataset.

Conclusion

This example demonstrated the YOLOv2 vehicle detector application comprising of preprocessing steps(image resize and normalization) and handshaking logic on FPGA, vehicle detection using DLIP followed by postprocessing and verified the results using MATLAB.

Copyright 2022-2023 The MathWorks, Inc.

Related Examples

More About