Kernels from Library Calls

GPU Coder™ supports libraries optimized for CUDA^® GPUs such as cuBLAS, cuSOLVER, cuFFT, Thrust, cuDNN, and TensorRT libraries.

The cuBLAS library is an implementation of Basic Linear algebra Subprograms (BLAS) on top of the NVIDIA^® CUDA run time. It allows you to access the computational resources of the NVIDIA GPU.
The cuSOLVER library is a high-level package based on the cuBLAS and cuSPARSE libraries. It provides useful LAPACK-like features, such as common matrix factorization and triangular solve routines for dense matrices, a sparse least-squares solver, and an Eigenvalue solver.
The cuFFT library provides a high-performance implementation of the Fast Fourier Transform (FFT) algorithm on NVIDIA GPUs. The cuBLAS, cuSOLVER, and cuFFT libraries are part of the NVIDIA CUDA Toolkit.
Thrust is a C++ template library for CUDA. The Thrust library is shipped with CUDA Toolkit and allows you to take advantage of GPU-accelerated primitives such as sort to implement complex high-performance parallel applications.
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The NVIDIA TensorRT is a high performance deep learning inference optimizer and runtime library. For more information, see Code Generation for Deep Learning Networks by Using cuDNN and Code Generation for Deep Learning Networks by Using TensorRT.

GPU Coder does not require a special pragma to generate kernel calls to libraries. During the code generation process, when you select the Enable cuBLAS option in the GPU Coder app or use config_object.GpuConfig.EnableCUBLAS = true property in CLI, GPU Coder replaces some functionality with calls to the cuBLAS library. When you select the Enable cuSOLVER option in the GPU Coder app or use config_object.GpuConfig.EnableCUSOLVER = true property in CLI, GPU Coder replaces some functionality with calls to the cuSOLVER library. For GPU Coder to replace high-level math functions to library calls, the following conditions must be met:

GPU-specific library replacement must exist for these functions.
MATLAB^® Coder™ data size thresholds must be satisfied.

GPU Coder supports cuFFT, cuSOLVER, and cuBLAS library replacements for the functions listed in the table. For functions that do not have replacements in CUDA, GPU Coder uses portable MATLAB functions that are mapped to the GPU.

MATLAB Function	Description	MATLAB Coder LAPACK Support	cuBLAS, cuSOLVER, cuFFT, Thrust Support
`mtimes`	Matrix multiply	Yes	Yes
`mldivide (‘\’)`	Solve system of linear equation `Ax=B` for `x`	Yes	Yes
`lu`	LU matrix factorization	Yes	Yes
`qr`	Orthogonal-triangular decomposition	Yes	Partial
`det`	Matrix determinant	Yes	Yes
`chol`	Cholesky factorization	Yes	Yes
`rcond`	Reciprocal condition number	Yes	Yes
`linsolve`	Solve system of linear equations `Ax=B`	Yes	Yes
`eig`	Eigenvalues and eigen vectors	Yes	No
`schur`	Schur decomposition	Yes	No
`svd`	Singular value decomposition	Yes	Partial
`fft,fft2,fftn`	Fast Fourier Transform	Yes	Yes
`ifft,ifft2,ifftn`	Inverse Fast Fourier Transform	Yes	Yes
`sort`	Sort array elements		Yes, using `gpucoder.sort`

When you select the Enable cuFFT option in the GPU Coder app or use config_object.GpuConfig.EnableCUFFT = true property in CLI, GPU Coder maps fft,ifft,fft2,ifft2,fftn.ifftn function calls in your MATLAB code to the corresponding cuFFT library calls. For 2-D transforms and higher, GPU Coder creates multiple 1-D batched transforms. These batched transforms have higher performance than single transforms. GPU Coder only supports out-of-place transforms. If Enable cuFFT is not selected, GPU Coder uses C FFTW libraries where available or generates kernels from portable MATLAB FFT. Both single and double precision data types are supported. Input and output can be real or complex-valued, but real-valued transforms are faster. cuFFT library support input sizes that are typically specified as a power of 2 or a value that can be factored into a product of small prime numbers. In general the smaller the prime factor, the better the performance.

Note

Using CUDA library names such as cufft, cublas, and cudnn as the names of your MATLAB function results in code generation errors.

Kernels from Library Calls

See Also

Related Topics