MathWorks - Mobile View
  • MathWorks 㚢㚫㚦㒳㒆への㚵㚤㒳㚤㒳MathWorks 㚢㚫㚦㒳㒆への㚵㚤㒳㚤㒳
  • Access your MathWorks Account
    • マイ アカウント
    • コミュニティのプロファイル
    • ライセンスを関連付ける
    • サインアウト
  • 製品
  • ソリューション
  • アカデミア
  • サポート
  • コミュニティ
  • イベント
  • MATLAB を入手する
MathWorks
  • 製品
  • ソリューション
  • アカデミア
  • サポート
  • コミュニティ
  • イベント
  • MATLAB を入手する
  • MathWorks 㚢㚫㚦㒳㒆への㚵㚤㒳㚤㒳MathWorks 㚢㚫㚦㒳㒆への㚵㚤㒳㚤㒳
  • Access your MathWorks Account
    • マイ アカウント
    • コミュニティのプロファイル
    • ライセンスを関連付ける
    • サインアウト

ビデオ・Webセミナー

  • MathWorks
  • ビデオ
  • ビデオ ホーム
  • 検索
  • ビデオ ホーム
  • 検索
  • 営業へのお問い合わせ
  • 評価版
  Register to watch video
  • Description
  • Full Transcript
  • Related Resources

Generate HDL for a Deep Learning Processor

Jack Erickson, MathWorks

Implementing deep learning inference efficiently in edge applications requires collaboration between the design of the deep learning network and the deep learning processor.

Deep Learning HDL Toolbox™ enables FPGA prototyping of deep learning networks from within MATLAB®. To increase performance or target custom hardware, you can explore trade-offs in MATLAB to converge on a custom FPGA implementation of the deep learning processor. Then a single MATLAB function drives HDL Coder™ to generate an IP core with target-independent synthesizable RTL and AXI interfaces. It can also optionally run FPGA implementation to create a bitstream to program the deep learning processor onto the device.

Deep Learning HDL Toolbox delivers FPGA prototyping of deep learning inferencing from within MATLAB, so you can quickly iterate and converge on a network that delivers the performance for your system requirements while meeting your FPGA constraints.

But what if you want to customize the FPGA implementation, to improve performance or to target a custom board? For this, you can use MATLAB to configure the processor and to drive HDL Coder to generate an IP core with RTL and AXI interfaces.

This is all based on a deep learning processor architecture that has generic convolution and fully connected modules, so you can program your custom network and the logic that controls which layer is being run, along with its activation inputs and outputs. Since each layer’s parameters need to be stored in external DDR memory, the processor also includes high-bandwidth memory access.

You can customize this deep learning processor for your system requirements, which coupled with the ability to customize the deep learning network, delivers a lot of options to optimize FPGA implementation for your application.

To illustrate, let’s look an application that uses a series network that’s trained to classify logos. Let’s say we need to process 15 frames per second.

So we just load the trained network.

And we will set up a custom processor configuration with all default settings and running at 220 MHz. Note the data types and amount of parallel threads for the convolution module and fully-connected module. And this is set up by default to target a ZCU102 board, which is what we are using.

Then we apply the processor config to a workflow object for the trained network

Now we can estimate the performance of this custom processor before we deploy it. The result is the total latency here, which at 220 MHz means the frame rate would be just under 6 frames per second, which is not going to meet our system requirements.

This is where it’s important to collaborate because we have options.  Let’s say we’re committed to this board. And our deep learning expert doesn’t think we can remove any layers and get the same accuracy, but we might be able to quantize to int8. Going from 32-bit to 8-bit word lengths gives us the resources to perform more multiply-accumulates in parallel. 

So we’ll set up a new custom processor configuration object, with both the convolution and fully-connected layers set to int8, and increase the parallel thread count by 4x for each.

Now, we need to quantize the network itself in order to estimate its performance on the deep learning processor. You can learn more about this process in the documentation. It takes a minute to run, and returns for each layer the numeric ranges for the given calibration data store. Normally we would run more calibration images and then validate with another set, but…

Let’s see the estimation results for this new processor configuration – now we’re up to 16 frames per second, which is good enough for our fictional requirements.

From here, the buildProcessor function does the rest. It calls HDL Coder to generate target-independent synthesizable RTL for the processor you’ve configured. And if you have set up a reference design it will generate an IP core with the AXI register mapping so it plugs right into implementation. And if you’ve set up and defined an implementation workflow, it runs all the way through generating a bitstream to program the device.

We can take a look at the implementation results, here in Vivado. We’re meeting timing at the 220 MHz target, with the resource usage shown here.

This shows how powerful it can be to collaborate between the design of the deep learning network and the implementation of the deep learning processor, and how easy it is to do right in MATLAB.

Related Products

  • HDL Coder

Feedback

Up Next:

23:50
Using Xilinx System Generator for DSP with Simulink and HDL...

Related Videos:

8:56
Rapid Prototyping Using HDL Coder (Highlights)
5:36
HDL Coder Clock Rate Pipelining, Part 2: Optimization
1:55
What Is HDL Coder?
4:48
HDL Coder Clock Rate Pipelining, Part 1: Introduction
MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web site

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • 営業へのお問い合わせ
  • 評価版

製品を見る

  • MATLAB
  • Simulink
  • 学生向けソフトウェア
  • ハードウェア サポート
  • File Exchange
試す、購入する
  • ダウンロード
  • 評価版ソフトウェア
  • 営業へのお問い合わせ
  • 価格とライセンス
使い方を学ぶ
  • ドキュメンテーション
  • チュートリアル
  • MATLAB の例
  • ビデオ・Webセミナー
  • トレーニング

サポートを受ける

  • インストールのヘルプ
  • MATLAB Answers
  • 技術コンサルティング
  • ライセンスセンター
  • サポートへのお問い合わせ

MathWorks について

  • 採用情報
  • ニュースルーム
  • 社会貢献
  • 営業へのお問い合わせ
  • MathWorks について

MathWorks

Accelerating the pace of engineering and science

MathWorksはエンジニアや研究者向け数値解析ソフトウェアのリーディングカンパニーです。

ディスカバー…

  • Select a Web Site United States
  • 特許
  • 商標
  • プライバシー ポリシー
  • 違法コピー防止
  • アプリケーション ステータス

© 1994-2021 The MathWorks, Inc.

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • LinkedIn
  • RSS

MATLAB を語ろう