Maximum blocks per kernel
Maximum number of thread blocks per GPU kernel
Since R2021a
Model Configuration Pane: Code Generation / GPU Code
Description
The Maximum blocks per kernel parameter specifies the maximum number of CUDA® blocks created during a kernel launch.
Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading, and unloading of blocks.
If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.
When you specify the maximum number of blocks for each kernel, the code generator creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the coder.gpu.kernel
(GPU Coder) pragma. The coder.gpu.kernel
pragma takes precedence over the maximum number of kernels for each CUDA block.
Dependencies
This parameter requires a GPU Coder™ license.
To enable this parameter, select Generate GPU code on the Code Generation pane.
Settings
Specify the maximum number of CUDA blocks created during a kernel launch.
Recommended Settings
Application | Setting |
---|---|
Debugging | No impact |
Traceability | No impact |
Efficiency | No impact |
Safety precaution | No impact |
Programmatic Use
Parameter: GPUMaximumBlocksPerKernel |
Type: integer |
Value: any valid value |
Default: 0 |
Version History
Introduced in R2021a