Integrate YOLO v2 Vehicle Detector System on SoC
This example shows how to simulate a you only look once (YOLO) vehicle detector and verify the functionality of the end-to-end application using MATLAB.
The end-to-end application includes preprocessing of the images, a YOLO v2 vehicle detection network, and postprocessing of the images to overlay results.
Load Camera Data and Network File
This example uses a PandasetCameraData.mp4 file that contains a subset of the video from PandaSet data set. Download the video file and the network .mat file.
supportFileDir = matlab.internal.examples.utils.getSupportFileDir(); pathToDataset = fullfile(supportFileDir, 'visionhdl', 'PandasetCameraData'); if(~isfile(fullfile(pathToDataset, 'PandasetCameraData.mp4')) ... || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector32Layer.mat')) ... || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector60Layer.mat'))) PandasetZipFile = matlab.internal.examples.downloadSupportFile('visionhdl','PandasetCameraData.zip'); [outputFolder,~,~] = fileparts(PandasetZipFile); unzip(PandasetZipFile,outputFolder); end addpath(pathToDataset);
A YOLO v2 vehicle detection application has three main modules. The preprocessing module, accepts the input frame and performs image resize and normalization. The preprocessed data is then consumed by the YOLO v2 vehicle detection network, which is a feature extraction network followed by a detection network. The network output is postprocessed to identify the strongest bounding boxes and the resulting bounding boxes are overlaid on the input image.
The preprocessing subsystem and DLIP are deployed on FPGA (Programmable Logic, PL) and the postprocessing is deployed on the ARM prcoessor (Processing System, PS). For deploying the vehicle detector, see YOLO v2 Vehicle Detector with Live Camera Input on Zynq-Based Hardware. This example shows how to model the preprocessing module (resize and normalization) and postprocessing module along with DL handshaking logic and network execution.
Explore Vehicle Detector
open_system('YOLOv2VehicleDetectorOnSoC');
The vehicle detector contains these modules:
Source- Selects the inputImage from Pandaset.
Conversion- Converts the input frame into RGB pixel stream.
Pixel-stream based preprocessing(to FPGA)- Preprocesses the input frame and writes it into DDR.
Deep learning IP Core Simulation Logic- Models the DL processor to calculate activations on the input frame and write the output to DDR
Conversion- Converts the input RGB pixel stream to frame for overlaying bounding boxes.
Postprocessing and Overlay(to ARM)- Applies postprocessing to network output and overlay the bounding boxes on the input frame.
Display- Displays the input frame with detections.
The inputImages
stores numFrames
number of images from Pandaset. The frame is initially resized and normalized in YOLOv2PreprocessDUT
and the preprocessed output is written into DDR at the address location read from DL input handshaking registers, (InputValid, InputAddr, InputSize)
. The DLIP
calculates activations on the preprocessed image, writes the activations to DDR, and updates the DL output handshaking registers, (OutputValid, OutputAddr, OutputSize)
. This handshaking triggers the YOLOv2PostprocessDUT
, that reads the DL output from the address information obtained from the DL registers, and performs post processing and calculates bounding boxes that are displayed in the VideoViewer
block via the overlayBoundingboxes
function.
YOLOv2PreprocessDUT
open_system('YOLOv2VehicleDetectorOnSoC/YOLOv2PreprocessDUT');
The selectImage
subsystem selects the input frame from inputImages
block. A Frame To Pixels
block converts the input image from the selectImage
to a pixel stream and pixelcontrol
bus. The Unpack
subsystem divides the pixel stream into R, G, B components. The RGB data, (RIn, GIn, BIn)
along with ctrl bus is fed for preprocessing. The input image is streamed out as, (ROut, GOut, BOut)
to write it into the PS DDR for overlaying the bounding boxes.
The YOLOv2PreprocessDUT
contains subsystems for frame dropping, selecting Region of Interest (ROI) from the input frame, preprocessing (resize and normalization), and handshaking logic.
The Frame Drop
subsystem synchronizes data between YOLOv2PreprocessDUT
and DLIP
by dropping the input frames if DLIP
is not available for processing. It contains finite state machine (FSM) logic for reading DLIP
registers and a pixel bus creator to concatenate the output control signals of frame drop logic to pixel control
bus. The readInputRegisters
subsystem reads the inputAddrReg
register and forwards the first frame to preprocessing and resets the control signals for rest of the frames until inputAddr
is updated by DLIP. This frame drop logic lets the DLIP process one frame corresponding to one inputAddr
.
The output of the Frame Drop
subsystem is sent to the ROI Selector
block that selects the ROI from the input image and forwards it for preprocessing. The ROI is selected for the input image from Pandaset of size 1920x1080 and is scaled down by a factor of 4 for faster simulation. The ROI is configured in helperSLYOLOv2SimulationSetup
function.
hPos = 350; vPos = 400; hSize = 1000; vSize = 600;
The YOLO v2 Preprocess Algorithm
contains subsystems to perform resizing and normalization operations. The pixel stream from the Frame Drop
subsystem is passed to the Resize
subsystem for resizing the input image to the input size expected by the deep learning network, (128, 128, 3)
. The resized output is passed to Normalization
subsystem for rescaling the pixel values to [0, 1] range. This preprocessed frame is then passed to the DL Handshake Logic Ext Mem
subsystem to be written into the PL DDR.
The DL Handshake Logic Ext Mem
subsystem contains a finite state machine (FSM) logic for handshaking with DLIP
and a subsystem to write the frame to DDR. The Read DL Registers
subsystem has the FSM logic to read the handshaking signals (InputValid, InputAddr, and InputSize)
from the DLIP
for multiple frames. The Write to DDR
subsystem uses these handshaking signals to write the preprocessed frame to the memory using AXI4-Master protocol. For more information on the Yolov2PreprocessDUT
refer to the example, Deploy and Verify YOLO v2 Vehicle Detector on FPGA
DLIP
open_system('YOLOv2VehicleDetectorOnSoC/DLIP','force');
The DLIP
contains subsystems for prediction logic, DL input and output register handshaking logic, and an AXI Write controller to write the DL Output to DDR.
The FetchPreprocessedImage
subsystem reads and rearranges the output from YOLOv2PreprocessDUT
to the networkInputSize
as required by the deep learning network. The network and the activation layer of the DLIP
are setup using helperSLYOLOv2SimulationSetup
and helperYOLOv2Network
functions.
This example uses a pretrained YOLO v2 network that was trained on Pandaset. The network output is rearranged to the external memory data format of the DL Processor by concatenating the elements along the third dimension. For more information, see External Memory Data Format (Deep Learning HDL Toolbox).
The DL output is written to memory using AXIM Write Controller
subsystem. The write operations from the YOLOv2PreprocessDUT
and DLIP
are multiplexed using DDR Write Arbitrator
.
YOLOv2PostprocessDUT
open_system('YOLOv2VehicleDetectorOnSoC/YOLOv2PostprocessDUT','force');
The YOLOv2PostprocessDUT
subsystem contains subsystems for DL Handshaking, reading DL output, transforming and applying post processing to the DL Output. The DL handshaking subsystems have variant behavior depending on whether the model is configured for simulation or deployment based on simulationFlag
. Since this example demonstrates the simulation workflow, the simulationFlag
is set to true in helperSLYOLOv2Setup
script.
The Set Control Registers
subsystem sets the control registers for YOLOv2PreprocessDUT
, postProcStart
, DUTProcStart
, and inputAddrReg
. The DL Handshaking
subsystem reads the DL Output handshaking registers, (OutputValid, OutputAddr, OutputSize)
indicating address, size, and validity of the output. The model abstracts these registers as datastore blocks for simulation. The readDLOutput
subsystem uses these handshaking signals and reads the DL Output from PL DDR.
The readDLOutput
subsystem contains subsystems for polling OutputValid
, generating read requests, and reading DL output from PL DDR. The pollOutputValid
function polls for the OutputValid
signal from DLIP
and triggers post processing when OutputValid
is asserted. The read DL Output from PL DDR
subsystem contains a signal rdDone
which indicates that DL Output read operation is completed successfully. The TriggerDLOutputNext
subsystem pulses OutputNext
signal when rdDone
is asserted to indicate to the DLIP
that the output of current frame is read.
The DL output data is then sent to yolov2TransformlayerandPostprocess
function for postprocessing. It transforms the DL Output from DDR by rearranging, normalizing the data, and thresholding the bounding boxes with a confidence score of 0.4. It returns the bounding boxes and pulses postProcDone
signal to indicate that the post processing is completed successfully.
The YOLOv2PostprocessDUT
is configured with these DL network parameters, networkInputSize, networkOutputSize, anchorBoxes
and inputImageROI, inputROISize, confidenceThreshold
in helperSLYOLOv2SimulationSetup.m
script.
vehicleDetector = load(networkmatfile); detector = vehicleDetector.detector; net = detector.Network; anchorBoxes = detector.AnchorBoxes; networkInputSize = net.Layers(1, 1).InputSize; networkOutputSize = [16,16,12]; paddedOutputSize = (networkOutputSize(1)*networkOutputSize(2)*networkOutputSize(3)*4)/3; inputImageROI = [hPos, vPos, hSize, vSize]; inputROISize = [vSize, hSize, numComponents]; confidenceThreshold = 0.4;
Simulate Vehicle Detector
Configure the network for the vehicle detector using the helperSLYOLOv2SimulationSetup
function.
helperSLYOLOv2SimulationSetup();
The script supports 2 networks, a 32 layer network(default) and a 60 layer network. To run the 60 layer network, set the networkConfig to '60layer'.
helperSLYOLOv2SimulationSetup('60layer');
This model takes a couple of minutes to update the diagram when you are compiling for the first time. Update the model before running the simulation.
set_param("YOLOv2VehicleDetectorOnSoC", SimulationCommand="update"); out = sim("YOLOv2VehicleDetectorOnSoC");
### Starting serial model reference simulation build. ### Model reference simulation target for DLHandshakeLogicExtMem is up to date. ### Model reference simulation target for YOLOv2PreprocessAlgorithm is up to date. Build Summary 0 of 2 models built (2 models already up to date) Build duration: 0h 0m 32.939s
Verify YOLOv2PreprocessDUT and YOLOv2PostprocessDUT using MATLAB
The example includes subsystems for verification of outputs of YOLOv2PreprocessDUT
and YOLOv2PostprocessDUT
. The Verify Preprocess Output
and Verify Postprocess Output
subsystems log the signals required for the verification of the preprocessed image and bounding boxes, respectively.
helperVerifyVehicleDetector;
Close the figures
close(hFigurePreprocess); close(hFigurePostprocess);
The helperVerifyVehicleDetector
script verifies all the logged outputs obtained in simulation. It compares the preprocessed image obtained in simulation with the reference image obtained by applying resize and normalize operations and overlays the bounding boxes obtained from simulation and from detect
(Computer Vision Toolbox) function on the input images from the dataset.
Conclusion
This example demonstrated the YOLOv2 vehicle detector application comprising of preprocessing steps(image resize and normalization) and handshaking logic on FPGA, vehicle detection using DLIP followed by postprocessing and verified the results using MATLAB.
Copyright 2022-2023 The MathWorks, Inc.
Related Examples
More About
- Deep Learning Processing of Live Video (Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware)