Image Category Classification by Using Deep Learning
This example shows you how to create, compile, and deploy a dlhdl.Workflow
object with ResNet-18 as the network object by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device. ResNet-18 is a pretrained convolutional neural network that has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee, mug, pencil, and many animals). You can also use VGG-19 and DarkNet-19 as the network objects.
Prerequisites
Xilinx ZCU102 SoC Development Kit
Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC
Deep Learning Toolbox™ Model for ResNet-18 Network
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Load the Pretrained Network
To load the pretrained Directed Acyclic Graph (DAG) network resnet18, enter:
[net,classNames] = imagePretrainedNetwork('resnet18');
To load the pretrained series network vgg19, enter:
% [net,classNames] = imagePretrainedNetwork('vgg19');
To load the pretrained series network darknet19, enter:
% [net,classNames] = imagePretrainedNetwork('darknet19');
The pretrained ResNet-18 network contains 71 layers including the input, convolution, batch normalization, ReLU, max pooling, addition, global average pooling, fully connected, and the softmax layers. To view the layers of the pretrained ResNet-18 network, enter:
analyzeNetwork(net)
Create Target Object
Use the dlhdl.Target
class to create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG,Install Xilinx™ Vivado™ Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');
Create WorkFlow Object
Use the dlhdl.Workflow
class to create an object. When you create the object, specify the network and the bitstream name. Specify the saved pretrained ResNet-18 neural network as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx ZCU102 SoC board. The bitstream uses a single data type.
hW = dlhdl.Workflow('Network', net, 'Bitstream', 'zcu102_single', 'Target', hTarget);
Compile the ResNet-18 DAG network
To compile the ResNet-18 DAG network, run the compile method of the dlhdl.Workflow
object. You can optionally specify the maximum number of input frames. You can also optionally specify the input image normalization to happen in software.
compile(hW, 'InputFrameNumberLimit', 15, 'HardwareNormalization', 'off')
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_single. ### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### The network includes the following layers: 1 'data' Image Input 224×224×3 images with 'zscore' normalization (SW Layer) 2 'conv1' 2-D Convolution 64 7×7×3 convolutions with stride [2 2] and padding [3 3 3 3] (HW Layer) 3 'conv1_relu' ReLU ReLU (HW Layer) 4 'pool1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [1 1 1 1] (HW Layer) 5 'res2a_branch2a' 2-D Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 6 'res2a_branch2a_relu' ReLU ReLU (HW Layer) 7 'res2a_branch2b' 2-D Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 8 'res2a' Addition Element-wise addition of 2 inputs (HW Layer) 9 'res2a_relu' ReLU ReLU (HW Layer) 10 'res2b_branch2a' 2-D Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 11 'res2b_branch2a_relu' ReLU ReLU (HW Layer) 12 'res2b_branch2b' 2-D Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 13 'res2b' Addition Element-wise addition of 2 inputs (HW Layer) 14 'res2b_relu' ReLU ReLU (HW Layer) 15 'res3a_branch2a' 2-D Convolution 128 3×3×64 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 16 'res3a_branch2a_relu' ReLU ReLU (HW Layer) 17 'res3a_branch2b' 2-D Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 18 'res3a_branch1' 2-D Convolution 128 1×1×64 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 19 'res3a' Addition Element-wise addition of 2 inputs (HW Layer) 20 'res3a_relu' ReLU ReLU (HW Layer) 21 'res3b_branch2a' 2-D Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 22 'res3b_branch2a_relu' ReLU ReLU (HW Layer) 23 'res3b_branch2b' 2-D Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 24 'res3b' Addition Element-wise addition of 2 inputs (HW Layer) 25 'res3b_relu' ReLU ReLU (HW Layer) 26 'res4a_branch2a' 2-D Convolution 256 3×3×128 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 27 'res4a_branch2a_relu' ReLU ReLU (HW Layer) 28 'res4a_branch2b' 2-D Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 29 'res4a_branch1' 2-D Convolution 256 1×1×128 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 30 'res4a' Addition Element-wise addition of 2 inputs (HW Layer) 31 'res4a_relu' ReLU ReLU (HW Layer) 32 'res4b_branch2a' 2-D Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 33 'res4b_branch2a_relu' ReLU ReLU (HW Layer) 34 'res4b_branch2b' 2-D Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 35 'res4b' Addition Element-wise addition of 2 inputs (HW Layer) 36 'res4b_relu' ReLU ReLU (HW Layer) 37 'res5a_branch2a' 2-D Convolution 512 3×3×256 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 38 'res5a_branch2a_relu' ReLU ReLU (HW Layer) 39 'res5a_branch2b' 2-D Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 40 'res5a_branch1' 2-D Convolution 512 1×1×256 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 41 'res5a' Addition Element-wise addition of 2 inputs (HW Layer) 42 'res5a_relu' ReLU ReLU (HW Layer) 43 'res5b_branch2a' 2-D Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 44 'res5b_branch2a_relu' ReLU ReLU (HW Layer) 45 'res5b_branch2b' 2-D Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 46 'res5b' Addition Element-wise addition of 2 inputs (HW Layer) 47 'res5b_relu' ReLU ReLU (HW Layer) 48 'pool5' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 49 'fc1000' Fully Connected 1000 fully connected layer (HW Layer) 50 'prob' Softmax softmax (SW Layer) 51 'Output1_prob' Regression Output mean-squared-error (SW Layer) ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software. ### Compiling layer group: conv1>>pool1 ... ### Compiling layer group: conv1>>pool1 ... complete. ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete. ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete. ### Compiling layer group: res3a_branch1 ... ### Compiling layer group: res3a_branch1 ... complete. ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete. ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete. ### Compiling layer group: res4a_branch1 ... ### Compiling layer group: res4a_branch1 ... complete. ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete. ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete. ### Compiling layer group: res5a_branch1 ... ### Compiling layer group: res5a_branch1 ... complete. ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete. ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete. ### Compiling layer group: pool5 ... ### Compiling layer group: pool5 ... complete. ### Compiling layer group: fc1000 ... ### Compiling layer group: fc1000 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "11.5 MB" "OutputResultOffset" "0x00b7c000" "60.0 kB" "SchedulerDataOffset" "0x00b8b000" "2.4 MB" "SystemBufferOffset" "0x00df1000" "6.1 MB" "InstructionDataOffset" "0x01416000" "2.4 MB" "ConvWeightDataOffset" "0x0167a000" "49.5 MB" "FCWeightDataOffset" "0x047f2000" "2.0 MB" "EndOffset" "0x049e7000" "Total: 73.9 MB" ### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {}
ddrInfo: [1×1 struct]
resourceTable: [6×2 table]
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow
object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
deploy(hW)
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 172.21.59.90... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 172.21.59.90... ### Attempting to connect to the hardware board at 172.21.59.90... ### Connection successful ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_single.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Programming done. The system will now reboot for persistent changes to take effect. ### Rebooting Xilinx SoC at 172.21.59.90... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 172.21.59.90... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 15-Jul-2024 08:32:39 ### Loading weights to FC Processor. ### 50% finished, current time is 15-Jul-2024 08:32:39. ### FC Weights loaded. Current time is 15-Jul-2024 08:32:40
Load Image for Prediction
Load the example image.
imgFile = 'book.jpg';
inputImg = imresize(imread(imgFile), [224,224]);
imshow(inputImg)
Run Prediction for One Image
Execute the predict method on the dlhdl.Workflow
object and then show the label in the MATLAB command window.
inputDLArray = dlarray(single(inputImg),'SSCB'); [prediction, speed] = predict(hW,inputDLArray,'Profile','on');
### Finished writing input activations. ### Running single input activation. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 24702729 0.11229 1 24705200 8.9 conv1 2226736 0.01012 pool1 505414 0.00230 res2a_branch2a 974493 0.00443 res2a_branch2b 974112 0.00443 res2a 373972 0.00170 res2b_branch2a 974239 0.00443 res2b_branch2b 974266 0.00443 res2b 373862 0.00170 res3a_branch1 539290 0.00245 res3a_branch2a 542091 0.00246 res3a_branch2b 909615 0.00413 res3a 187055 0.00085 res3b_branch2a 909767 0.00414 res3b_branch2b 910024 0.00414 res3b 186925 0.00085 res4a_branch1 491263 0.00223 res4a_branch2a 494696 0.00225 res4a_branch2b 894219 0.00406 res4a 93654 0.00043 res4b_branch2a 894119 0.00406 res4b_branch2b 894076 0.00406 res4b 93554 0.00043 res5a_branch1 1131916 0.00515 res5a_branch2a 1134581 0.00516 res5a_branch2b 2211568 0.01005 res5a 46662 0.00021 res5b_branch2a 2211536 0.01005 res5b_branch2b 2212127 0.01006 res5b 46902 0.00021 pool5 73829 0.00034 fc1000 215979 0.00098 * The clock frequency of the DL processor is: 220MHz
[val, idx] = max(prediction); classNames(idx)
ans = "binder"
Run Prediction for Multiple Images
Load multiple images and retrieve their prediction reults by using the mulltiple frame support feature. For more information, see Multiple Frame Support.
The demoOnImage
function loads multiple images and retrieves their prediction results. The annotateresults
function displays the image prediction result on top of the images which are assembled into a 3-by-5 array.
demoOnImage;
See Also
dlhdl.Workflow
| dlhdl.Target
| compile
| deploy
| predict
| Deep Network Designer