Performance

Troubleshoot code generation issues, improve code execution time, and reduce memory usage of generated code

Some of the most common reasons why GPU Coder™ generated code is not performing as expected are:

CUDA^® kernels are not created.
Host to device and device to host memory transfers (cudaMemcpy) are throttling performance.
Not enough parallelism or device issues.

These topics elaborate on the common causes for these symptoms and describe how to utilize the built-in screener to detect these issues. You can find information on how to work around for these issues and generate more efficient CUDA code.

Apps

expand all

GPU Coder

GPU Coder	Generate GPU code from MATLAB code
GPU Environment Check	Verify and set up GPU code generation environment

Functions

expand all

Code Generation

`codegen`	Generate C/C++ code from MATLAB code
`gpucoder`	Open GPU Coder app
`gpuPerformanceAnalyzer`	Analyze and optimize performance of the generated code (Since R2023a)
`gpuprofile`	Profile execution time for generated CUDA code (Since R2024a)

Programming for Code Generation

`coder.gpu.kernel`	Pragma that maps `for`-loops to GPU kernels
`coder.gpu.kernelfun`	Pragma that maps function to GPU kernels
`coder.gpu.nokernel`	Pragma to disable kernel creation for loops

Objects

expand all

Code configuration

`coder.gpuConfig`	Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder
`coder.CodeConfig`	Configuration parameters for C/C++ code generation from MATLAB code
`coder.EmbeddedCodeConfig`	Configuration parameters for C/C++ code generation from MATLAB code with Embedded Coder
`coder.gpuEnvConfig`	Configuration object containing the parameters to check the GPU code generation environment

Topics

Workflow
GPU Coder troubleshooting workflow.
Code Generation Reports
Create and view reports generated during code generation.
Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB^® code that runs on the GPU.
Generating a GPU Code Metrics Report for Code Generated from MATLAB Code
Create and explore GPU static code metrics report.
GPU Performance Analyzer
Visualize code metrics and identify optimization and tuning opportunities in your code.
Debug CUDA MEX Functions
Suggestions for debugging CUDA MEX function.
Kernel Analysis
Recommendations for generating efficient CUDA kernels.
Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
Analysis with NVIDIA Profiler
Improve performance by using the information obtained from NVIDIA^® Profiler (nvvp).
Register Count nvlink Error
Troubleshoot compilation failures due to a register count nvlink error.

Featured Examples

Pass GPU Inputs to Entry-Point Functions

Configure GPU Coder™ to pass GPU inputs to entry-point functions and produce GPU outputs. When you create inputs on the GPU in the caller of the entry-point function and access them on the GPU in the entry-point function, you can avoid creating unnecessary copies of memory and outputs between the CPU and the GPU. This approach can improve the performance of generated code when you integrate it with code that produces and consumes data on a GPU. Additionally, this example demonstrates how to generate code for functions that accept GPU inputs of unknown size by using the emxArray data type.

Since R2024a
Open Live Script

Profile Generated CUDA MEX Functions Using Performance Analyzer

Visualize code metrics and identify optimization and tuning opportunities in generated CUDA MEX.

Since R2024a
Open Live Script

Analyze Performance of Generated CUDA Code

Analyze and optimize the performance of generated CUDA® code by using the gpuPerformanceAnalyzer function.

Open Live Script

GPU Profiling on NVIDIA Jetson Platforms

Analyze and optimize the performance of the generated CUDA code on the Jetson™ platform.

Open Live Script

Analyze Performance of Code Generated for Deep Learning Networks

Analyze and optimize the performance of the generated CUDA code for deep learning networks.

Open Live Script