Description

Using Lookup Tables to Accelerate Deep Learning Inference

This video highlights the lookup table optimization capability to generate an efficient lookup table for a sigmoid function, which is a key activation function used in deep learning networks. We then compare the relative speedup on an Arduino Due^® and STMicroelectronics^® discovery board using the generated code for hardware in the loop simulation.

Published: 19 Nov 2019

Full Transcript

A lookup table is a key construct for embedded designs, and is often used to speed up the run-time execution of certain functions of your algorithm. For instance, complex trig functions are often replaced with a more efficient LUT implementation.

Let’s try a simple experiment – applying the same principle to the sigmoid function to investigate how we can accelerate the deep learning inference performance particularly on the edge.

The sigmoid function is a key building block for neural networks and is one of the commonly used nonlinear activation functions used in deep learning networks.

Here we have a simple Simulink subsystem that models the sigmoid function. I am going to use the Lookup Table Optimizer app to generate an optimal LUT, specifying the input and output data types. Since this is a bounded function, I can specify the bounds on the output and finally the tolerance on the output of 1%.

Once the optimization problem is solved, we can look at the comparison plot to verify that the error of the LUT approximation is within our specified tolerance.

Now as a next step, lets generate C code from the sigmoid function and the generated LUT and deploy it to a cortex M platform like the Arduino board.

We use hardware-in-the-loop simulation to run the generated code with inputs from Simulink. There is some overhead of running the code in this mode but this still gives us a good comparison of the relative execution speed.

As you can see from the execution profile, the LUT is 2.5 x faster on the Arduino. I repeated the same test on a Cortex M7 based STMicro discovery board. Here is a plot showing the relative speedup the lookup table with different data types.

In fact, this can scale up if you can share the lookup table approximation between all neurons, further decreasing the execution speed by orders of magnitude. You can do the same experiment with other activation functions like hyperbolic tangent.

To learn more about optimizing LUTs in your design, please refer to additional links below the video.

Related Resources

Related Products

Learn More

What Is Quantization?

Calculate Complex dB Using a Direct Lookup Table

Reducing Memory Footprint of Lookup Tables in Your Design

Convert Digit Recognition Neural Network to Fixed-Point and Generate C Code

Featured Product

Fixed-Point Designer

Up Next:

In this webinar we will demonstrate how to automatically generate C code from MATLAB functions. This capability is available for a subset of the MATLAB language called Embedded MATLAB — Generate C Code from MATLAB Functions Using the Embedded...

Related Videos:

Demonstrate new features, functionality and best practices for converting MATLAB code to fixed point using Fixed-Point Designer. You will learn tips and tricks for managing bit growth, getting data type proposals, and optimizing your results. — Best practices for Converting MATLAB Code to Fixed Point...

Use division to perform parameter optimization for fixed-point net slope computation with Fixed-Point Designer. — Fixed-Point Net Slope Computation Using Division

Control two 3-phase, brushless motors using an F28069 LaunchPad and the TI C2000 support package for Simulink . — Using a TI F28069 LaunchPad with Simulink

Develop and verify complex systems using Simulink and Simics. — Smarter System Verification using Simulink and Simics

View more related videos