This example shows how to use the AXI4-Stream interface to enable high speed data transfer between the processor and FPGA on Zynq hardware.
To run this example, you must have the following software and hardware installed and set up:
HDL Coder Support Package for Xilinx Zynq Platform
Embedded Coder Support Package for Xilinx Zynq Platform
Xilinx Vivado Design Suite, with supported version listed in the HDL Coder documentation
To setup the Zedboard, refer to the "Set up Zynq hardware and tools" section in the Getting Started with HW/SW Codesign Workflow for Xilinx Zynq Platform example.
This example shows how to:
Model a streaming algorithm using a simplified streaming protocol.
Generate an HDL IP core with AXI4-Stream interface.
Integrate the generated IP core into a Zedboard reference design with DMA controller.
Use the AXI4-Stream driver block to generate C code that runs on an ARM processor.
The picture above is a high level architecture diagram that shows a streaming data transfer between the processor and FPGA fabric on Zynq platform. Typically, the AXI4-Stream interface is used together with a DMA controller to transfer a large chunk of data from the processor to FPGA. The data is usually represented as vector data on the software side. The DMA controller reads the vector data from memory, and "streams" it to the FPGA IP through the AXI4-Stream interface. The "streaming" process sends one data element per sample, which means the data path of the streaming algorithm in the FPGA IP is using a scalar data type.
The FPGA IP can also include an AXI4-Lite interface for control signals or parameter tuning. Compared to the AXI4-Lite interface, the AXI4-Stream interface transfers data much faster, making it more suitable for the data path of an algorithm.
Other than connecting to processor, the FPGA IP with AXI4-Stream interface can also be connected with other IPs with AXI4-Stream interface to transfer data inside of FPGA.
Suppose we want to deploy a simple symmetric FIR filter on Zynq. We want to implement the filter on FPGA. And the ARM processor generates the source data to stream it to FPGA through the AXI4-Stream interface.
Let's start with the sfir_fixed model. Note that the data path of this model (from x_in to y_out) is processing scalar input data, which is suitable for a streaming interface.
In order to enable data transfer from the software to the filter algorithm, we need to map the data path ports to the AXI4-Stream interface. The AXI4-Stream interface contains data (Data) and control signals such as data valid (Valid), back pressure (Ready), and data boundary (TLAST).
The AXI4-Stream IP core generation feature requires at least the Data and Valid signals to be modeled in the DUT. The Data signal is the primary payload to send across the interface. The Valid signal indicates when the Data signal is valid. Other control signals are optional.
Note: For IP core generation, Data and Valid follow a simplified streaming protocol. You don't need to model the full AXI4-Stream protocol, which is more complicated. HDL Coder automatically generates a streaming interface module in the HDL IP core to translate the simplified streaming protocol into the full AXI4-Stream protocol. As shown in the picture below, the protocol is simple: whenever the Data signal is valid, the Valid signal must also be asserted.
So, in order to map
sfir_fixed algorithm to the simplified streaming protocol, a Valid signal needs to be added. To add the Valid signal to your model, we recommend following modeling pattern:
Convert the algorithm subsystem into an enabled subsystem.
Add an input control port,
Valid_In, and output control port,
Valid_In to drive both the algorithm subsystem's enable port and
In this pattern, both the input streaming channel and output streaming channel follow the simplified streaming protocol.
Now, let's look at the example model.
The subsystem DUT is the hardware subsystem targeting the FPGA fabric. Inside this subsystem, the symmetric_fir subsystem represents the filter algorithm. The input ports, x_in_data and x_in_valid, and output ports, y_out_data and y_out_valid, are the data path ports of the filter. The other input ports, such as h_in1, are control ports that tune the filter parameters.
The model follows the modeling pattern for simplified streaming protocol. The symmetric_fir subsystem is an enabled subsystem. The input control signal, x_in_valid, controls the symmetric_fir subsystem's enable port and also drives the output control signal, y_out_valid.
With AXI4-Stream IP core generation, you can optionally model other streaming control signals. For example, you can model the back pressure signal, Ready. The AXI4-Stream interface communicates in master/slave mode, where the master device sends data to the slave device. The Ready signal is a back pressure signal from the slave device to master device that indicates whether the slave device can accept new data. As shown in following diagram, the Ready signal is asserted when the slave device can accept new data. When the slave device can no longer accept new data, it needs to de-assert the Ready signal. When the master device sees that the Ready signal is deasserted, it stops the data transfer at most one sample later. This one sample allowance is built into the protocol.
Note: This diagram illustrates the relationship between the Data, Valid, and Ready signals according to the simplified streaming protocol. When you run the
IP Core Generation workflow, the code generator adds a streaming interface module in the HDL IP core that translates the simplified protocol to the full streaming protocol.
For example, you can use the Ready signal when you use a FIFO block to collect a frame of incoming streaming data, which is then processed with your algorithm. During data processing, you deassert the Ready signal to prevent further incoming data.
Next, we start the HDL Workflow Advisor and use the Zynq hardware-software co-design workflow to deploy this design on the Zynq hardware. For a more detailed step-by-step guide, you can refer to the Getting Started with HW/SW Codesign Workflow for Xilinx Zynq Platform example.
1. Set up the Xilinx Vivado synthesis tool path using the following command in the MATLAB command window. Use your own Vivado installation path when you run the command.
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2017.4\bin\vivado.bat')
2. Start the HDL Workflow Advisor from the DUT subsystem,
hdlcoder_sfir_fixed_stream/DUT. The target interface settings are already saved in this example model, so the settings in Task 1.1 and 1.2 are automatically loaded. To learn more about saving target interface settings in the model, you can refer to the Save Target Hardware Settings in Model example.
In Task 1.1, IP Core Generation is selected for Target workflow, and Zedboard is selected for Target platform. In Task 1.2, Default system with AXI4-Stream interface is selected for Reference Design, and the Target platform interface table is loaded as shown in the following picture. The data path ports, x_in_data, x_in_valid, y_out_data, and y_out_valid, are mapped to the AXI4-Stream interfaces, and the control parameter ports, such as h_in1, are mapped to the AXI4-Lite interface.
The AXI4-Stream interface communicates in master/slave mode, where the master device sends data to the slave device. Therefore, if a data port is an input port, assign it to an AXI4-Stream Slave interface, and if a data port is output port, assign it to an AXI4-Stream Master interface.
3. Right-click Task 3.2, Generate RTL Code and IP Core, and select Run to Selected Task to generate the IP core. You can find the register address mapping and other documentation for the IP core in the generated IP Core Report.
Next, in the HDL Workflow Advisor, we run the Embedded System Integration tasks to deploy the generated HDL IP core on Zynq hardware.
1. Run Task 4.1, Create Project. This task inserts the generated IP core into the Default system with AXI4-Stream interface reference design. This reference design contains Xilinx AXI DMA IP to handle the processor to FPGA fabric data streaming. As shown in the first diagram, or in the IP core report, the data is sent from the ARM processing system, through the DMA controller and AXI4-Stream interface, to the generated HDL FIR filter IP core. The output of the filter IP core is then sent back to the processing system.
2. Optionally click the link in the Result pane to open the generated Vivado project. In the Vivado tool, click Open Block Design to view the Zynq design diagram, which includes the generated HDL IP core, AXI DMA controller and the processor.
3. In the HDL Workflow Advisor, run the rest of the tasks to generate the software interface model, and build and download the FPGA bitstream.
A software interface model is generated in Task 4.2, Generate Software Interface Model, as shown in the following picture.
Although the AXI4-Lite driver is automatically generated in the software interface model, the AXI4-Stream driver block cannot be automatically generated. The reason is that the AXI4-Stream driver block expects to be connected to a vector port on the software side, but the x_in_data DUT port is a scalar port.
1. Before you generate code from the software interface model:
Add the AXI4-Stream IIO Read and AXI4-Stream IIO Write driver blocks from Simulink Library Browser -> Embedded Coder Support Package for Xilinx Zynq Platform library.
Use a vector data source to drive the x_in_data port.
Connect the x_in_data port to the driver block.
Double click on the AXI4-Stream IIO Write block and set the Timeout to 0 instead of inf. This is as shown below.
5. Set the priority of the AXI4-Stream IIO Write block to 1 to make sure that write happens before read. To set the priority, right click on the block and open properties, set the priority to 1. This is as shown below.
6. Now double click on the AXI4-Stream IIO Read block and set the frame size to 100, Sample time to Ts and Timeout to 10. This is as shown below.
7. The priority of the AXI4-Stream IIO Read block need not to be set. Setting the priority for write block to 1 alone already ensure that write happens before read.
For this example, the updated software interface model is provided: hdlcoder_sfir_fixed_stream_sw.slx. A vector data source with 100 data elements is used in this model, and is connected to the AXI4-Stream DMA driver block. This means that for each processor sample time, the DMA controller will stream 100 32-bit data samples to the HDL IP core via the AXI4-Stream interface, and receive 100 32-bit streaming data samples.
2. Configure and build the software interface model for external mode:
In the generated model, open the Configuration Parameters dialog box.
Select Solver and set "Stop Time" to "inf".
From the model menu, select Simulation > Mode > External.
Click the Run button on the model toolstrip. Embedded Coder builds the model, downloads the ARM executable to the Zedboard hardware, executes it, and connects the model to the executable running on the Zedboard hardware.
3. Now, both the hardware and software parts of the design are running on Zynq hardware. The ARM processor sends the source data to the FPGA IP, through the DMA controller and the AXI4-Stream interface. The ARM processor receives the filter result data from the FPGA IP, and sends the result data to Simulink via external mode. Observe the output of the FIR filter IP core from the Zynq hardware on the Time Scope y_out.
4. Tune the FIR filter parameters in the software interface model and observe how the output of the FIR filter changes as you tune the parameters. The parameter values are sent to the Zynq hardware via external mode and the AXI4-Lite interface.