Main Content

Get Started with Deep Learning FPGA Deployment on Xilinx ZC706 SoC

This example shows how to create, compile, and deploy a handwritten character detection series network to an FPGA and use MATLAB® to retrieve the prediction results.

Prerequisites

  • Xilinx® Zynq® ZC706 Evaluation Kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx® FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

Load Pretrained Network

Load the pretrained network trained on the Modified National Institute of Standards and Technology (MNIST) database[1].

net = getDigitsNetwork;

View the layers of the pretrained network by using the Deep network Designer app.

deepNetworkDesigner(net)

Define FPGA Board Interface

Define the target FPGA board programming interface by using the dlhdl.Target object. Create a programming interface with custom name for your target device and a JTAG interface to connect the target device to the host computer. To use JTAG, install Xilinx™ Vivado™ Design Suite 2023.1. Set the toolpath by using the hdlsetuptoolpath function.

hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2023.1\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx');

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® ZC706 Evaluation Kit and the bitstream uses the single data type.

hW = dlhdl.Workflow(Network=net,Bitstream='zc706_single',Target=hTarget);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.

dn = compile(hW)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zc706_single.
### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
     1   'imageinput'        Image Input         28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'            2-D Convolution     8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'            ReLU                ReLU                                                          (HW Layer)
     4   'maxpool_1'         2-D Max Pooling     2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'            2-D Convolution     16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'            ReLU                ReLU                                                          (HW Layer)
     7   'maxpool_2'         2-D Max Pooling     2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'            2-D Convolution     32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'            ReLU                ReLU                                                          (HW Layer)
    10   'fc'                Fully Connected     10 fully connected layer                                      (HW Layer)
    11   'softmax'           Softmax             softmax                                                       (SW Layer)
    12   'Output1_softmax'   Regression Output   mean-squared-error                                            (SW Layer)
                                                                                                             
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>maxpool_2 ...
### Compiling layer group: conv_1>>maxpool_2 ... complete.
### Compiling layer group: conv_3>>relu_3 ...
### Compiling layer group: conv_3>>relu_3 ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________
    "InputDataOffset"           "0x00000000"     "184.0 kB"       
    "OutputResultOffset"        "0x0002e000"     "4.0 kB"         
    "SchedulerDataOffset"       "0x0002f000"     "204.0 kB"       
    "SystemBufferOffset"        "0x00062000"     "76.0 kB"        
    "InstructionDataOffset"     "0x00075000"     "52.0 kB"        
    "ConvWeightDataOffset"      "0x00082000"     "28.0 kB"        
    "FCWeightDataOffset"        "0x00089000"     "76.0 kB"        
    "EndOffset"                 "0x0009c000"     "Total: 624.0 kB"
### Network compilation complete.
dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {{}  [1×1568 single]}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx® Zynq® ZC706 hardware, run the deploy method of the dlhdl.Workflow object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, displays progress messages, and the time it takes to deploy the network.

deploy(hW)

Test Network

Load the example image.

inputImg = imread('five_28x28.pgm');
inputImg = dlarray(single(inputImg),'SSCB');

Classify the image on the FPGA by using the predict method of the dlhdl.Workflow object and display the results.

[prediction,speed] = hW.predict(single(inputImg),'Profile','on');
### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      62340                  0.00062                       1              63218           1581.8
    imageinput_norm           3527                  0.00004 
    conv_1                   10376                  0.00010 
    maxpool_1                 6774                  0.00007 
    conv_2                   11786                  0.00012 
    maxpool_2                 5450                  0.00005 
    conv_3                   17181                  0.00017 
    fc                        7214                  0.00007 
 * The clock frequency of the DL processor is: 100MHz
[val,idx] = max(prediction);
fprintf('The prediction result is %d\n', idx-1);
The prediction result is 5

Bibliography

  1. LeCun, Y., C. Cortes, and C. J. C. Burges. "The MNIST Database of Handwritten Digits." https://yann.lecun.com/exdb/mnist/.