Image Category Classification by Using Deep Learning

This example uses:

Deep Learning HDL Toolbox Deep Learning HDL Toolbox
Deep Learning Toolbox Deep Learning Toolbox
Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
Deep Learning Toolbox Model for ResNet-18 Network Deep Learning Toolbox Model for ResNet-18 Network

This example shows you how to create, compile, and deploy a dlhdl.Workflow object with ResNet-18 as the network object by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device. ResNet-18 is a pretrained convolutional neural network that has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee, mug, pencil, and many animals). You can also use VGG-19 and DarkNet-19 as the network objects.

Prerequisites

Xilinx ZCU102 SoC Development Kit
Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC
Deep Learning Toolbox™ Model for ResNet-18 Network
Deep Learning Toolbox™
Deep Learning HDL Toolbox™

Load the Pretrained Network

To load the pretrained Directed Acyclic Graph (DAG) network resnet18, enter:

[net,classNames] = imagePretrainedNetwork('resnet18');

To load the pretrained series network vgg19, enter:

% [net,classNames] = imagePretrainedNetwork('vgg19');

To load the pretrained series network darknet19, enter:

% [net,classNames] = imagePretrainedNetwork('darknet19');

The pretrained ResNet-18 network contains 71 layers including the input, convolution, batch normalization, ReLU, max pooling, addition, global average pooling, fully connected, and the softmax layers. To view the layers of the pretrained ResNet-18 network, enter:

analyzeNetwork(net)

Create Target Object

Use the dlhdl.Target class to create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG,Install Xilinx™ Vivado™ Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:

hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');

hTarget = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');

Create WorkFlow Object

Use the dlhdl.Workflow class to create an object. When you create the object, specify the network and the bitstream name. Specify the saved pretrained ResNet-18 neural network as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx ZCU102 SoC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('Network', net, 'Bitstream', 'zcu102_single', 'Target', hTarget);

Compile the ResNet-18 DAG network

To compile the ResNet-18 DAG network, run the compile method of the dlhdl.Workflow object. You can optionally specify the maximum number of input frames. You can also optionally specify the input image normalization to happen in software.

compile(hW, 'InputFrameNumberLimit', 15, 'HardwareNormalization', 'off')

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_single.
### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'data'                  Image Input                  224×224×3 images with 'zscore' normalization                          (SW Layer)
     2   'conv1'                 2-D Convolution              64 7×7×3 convolutions with stride [2  2] and padding [3  3  3  3]     (HW Layer)
     3   'conv1_relu'            ReLU                         ReLU                                                                  (HW Layer)
     4   'pool1'                 2-D Max Pooling              3×3 max pooling with stride [2  2] and padding [1  1  1  1]           (HW Layer)
     5   'res2a_branch2a'        2-D Convolution              64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
     6   'res2a_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
     7   'res2a_branch2b'        2-D Convolution              64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
     8   'res2a'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
     9   'res2a_relu'            ReLU                         ReLU                                                                  (HW Layer)
    10   'res2b_branch2a'        2-D Convolution              64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    11   'res2b_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    12   'res2b_branch2b'        2-D Convolution              64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    13   'res2b'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    14   'res2b_relu'            ReLU                         ReLU                                                                  (HW Layer)
    15   'res3a_branch2a'        2-D Convolution              128 3×3×64 convolutions with stride [2  2] and padding [1  1  1  1]   (HW Layer)
    16   'res3a_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    17   'res3a_branch2b'        2-D Convolution              128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    18   'res3a_branch1'         2-D Convolution              128 1×1×64 convolutions with stride [2  2] and padding [0  0  0  0]   (HW Layer)
    19   'res3a'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    20   'res3a_relu'            ReLU                         ReLU                                                                  (HW Layer)
    21   'res3b_branch2a'        2-D Convolution              128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    22   'res3b_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    23   'res3b_branch2b'        2-D Convolution              128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    24   'res3b'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    25   'res3b_relu'            ReLU                         ReLU                                                                  (HW Layer)
    26   'res4a_branch2a'        2-D Convolution              256 3×3×128 convolutions with stride [2  2] and padding [1  1  1  1]  (HW Layer)
    27   'res4a_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    28   'res4a_branch2b'        2-D Convolution              256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    29   'res4a_branch1'         2-D Convolution              256 1×1×128 convolutions with stride [2  2] and padding [0  0  0  0]  (HW Layer)
    30   'res4a'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    31   'res4a_relu'            ReLU                         ReLU                                                                  (HW Layer)
    32   'res4b_branch2a'        2-D Convolution              256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    33   'res4b_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    34   'res4b_branch2b'        2-D Convolution              256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    35   'res4b'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    36   'res4b_relu'            ReLU                         ReLU                                                                  (HW Layer)
    37   'res5a_branch2a'        2-D Convolution              512 3×3×256 convolutions with stride [2  2] and padding [1  1  1  1]  (HW Layer)
    38   'res5a_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    39   'res5a_branch2b'        2-D Convolution              512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    40   'res5a_branch1'         2-D Convolution              512 1×1×256 convolutions with stride [2  2] and padding [0  0  0  0]  (HW Layer)
    41   'res5a'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    42   'res5a_relu'            ReLU                         ReLU                                                                  (HW Layer)
    43   'res5b_branch2a'        2-D Convolution              512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    44   'res5b_branch2a_relu'   ReLU                         ReLU                                                                  (HW Layer)
    45   'res5b_branch2b'        2-D Convolution              512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    46   'res5b'                 Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    47   'res5b_relu'            ReLU                         ReLU                                                                  (HW Layer)
    48   'pool5'                 2-D Global Average Pooling   2-D global average pooling                                            (HW Layer)
    49   'fc1000'                Fully Connected              1000 fully connected layer                                            (HW Layer)
    50   'prob'                  Softmax                      softmax                                                               (SW Layer)
    51   'Output1_prob'          Regression Output            mean-squared-error                                                    (SW Layer)
                                                                                                                                  
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv1>>pool1 ...
### Compiling layer group: conv1>>pool1 ... complete.
### Compiling layer group: res2a_branch2a>>res2a_branch2b ...
### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete.
### Compiling layer group: res2b_branch2a>>res2b_branch2b ...
### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete.
### Compiling layer group: res3a_branch1 ...
### Compiling layer group: res3a_branch1 ... complete.
### Compiling layer group: res3a_branch2a>>res3a_branch2b ...
### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete.
### Compiling layer group: res3b_branch2a>>res3b_branch2b ...
### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete.
### Compiling layer group: res4a_branch1 ...
### Compiling layer group: res4a_branch1 ... complete.
### Compiling layer group: res4a_branch2a>>res4a_branch2b ...
### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete.
### Compiling layer group: res4b_branch2a>>res4b_branch2b ...
### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete.
### Compiling layer group: res5a_branch1 ...
### Compiling layer group: res5a_branch1 ... complete.
### Compiling layer group: res5a_branch2a>>res5a_branch2b ...
### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete.
### Compiling layer group: res5b_branch2a>>res5b_branch2b ...
### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete.
### Compiling layer group: pool5 ...
### Compiling layer group: pool5 ... complete.
### Compiling layer group: fc1000 ...
### Compiling layer group: fc1000 ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________
    "InputDataOffset"           "0x00000000"     "11.5 MB"       
    "OutputResultOffset"        "0x00b7c000"     "60.0 kB"       
    "SchedulerDataOffset"       "0x00b8b000"     "2.4 MB"        
    "SystemBufferOffset"        "0x00df1000"     "6.1 MB"        
    "InstructionDataOffset"     "0x01416000"     "2.4 MB"        
    "ConvWeightDataOffset"      "0x0167a000"     "49.5 MB"       
    "FCWeightDataOffset"        "0x047f2000"     "2.0 MB"        
    "EndOffset"                 "0x049e7000"     "Total: 73.9 MB"
### Network compilation complete.

dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

deploy(hW)

### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 172.21.59.90...
### Connection successful
### Programming FPGA device on Xilinx SoC hardware board at 172.21.59.90...
### Attempting to connect to the hardware board at 172.21.59.90...
### Connection successful
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
# Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/zcu102_single.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'
### Programming done. The system will now reboot for persistent changes to take effect.
### Rebooting Xilinx SoC at 172.21.59.90...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 172.21.59.90...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 15-Jul-2024 08:32:39
### Loading weights to FC Processor.
### 50% finished, current time is 15-Jul-2024 08:32:39.
### FC Weights loaded. Current time is 15-Jul-2024 08:32:40

Load Image for Prediction

Load the example image.

imgFile = 'book.jpg';
inputImg = imresize(imread(imgFile), [224,224]);
imshow(inputImg)

Run Prediction for One Image

Execute the predict method on the dlhdl.Workflow object and then show the label in the MATLAB command window.

inputDLArray = dlarray(single(inputImg),'SSCB');
[prediction, speed] = predict(hW,inputDLArray,'Profile','on');

### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   24702729                  0.11229                       1           24705200              8.9
    conv1                  2226736                  0.01012 
    pool1                   505414                  0.00230 
    res2a_branch2a          974493                  0.00443 
    res2a_branch2b          974112                  0.00443 
    res2a                   373972                  0.00170 
    res2b_branch2a          974239                  0.00443 
    res2b_branch2b          974266                  0.00443 
    res2b                   373862                  0.00170 
    res3a_branch1           539290                  0.00245 
    res3a_branch2a          542091                  0.00246 
    res3a_branch2b          909615                  0.00413 
    res3a                   187055                  0.00085 
    res3b_branch2a          909767                  0.00414 
    res3b_branch2b          910024                  0.00414 
    res3b                   186925                  0.00085 
    res4a_branch1           491263                  0.00223 
    res4a_branch2a          494696                  0.00225 
    res4a_branch2b          894219                  0.00406 
    res4a                    93654                  0.00043 
    res4b_branch2a          894119                  0.00406 
    res4b_branch2b          894076                  0.00406 
    res4b                    93554                  0.00043 
    res5a_branch1          1131916                  0.00515 
    res5a_branch2a         1134581                  0.00516 
    res5a_branch2b         2211568                  0.01005 
    res5a                    46662                  0.00021 
    res5b_branch2a         2211536                  0.01005 
    res5b_branch2b         2212127                  0.01006 
    res5b                    46902                  0.00021 
    pool5                    73829                  0.00034 
    fc1000                  215979                  0.00098 
 * The clock frequency of the DL processor is: 220MHz

[val, idx] = max(prediction);
classNames(idx)

ans = 
"binder"

Run Prediction for Multiple Images

Load multiple images and retrieve their prediction reults by using the mulltiple frame support feature. For more information, see Multiple Frame Support.

The demoOnImage function loads multiple images and retrieves their prediction results. The annotateresults function displays the image prediction result on top of the images which are assembled into a 3-by-5 array.

demoOnImage;