外部コードまたはカスタムコードの統合

この例では、外部コードまたはカスタムコードを統合して、生成コードのパフォーマンスを向上させる方法を説明します。MATLAB^® Coder™ ではほとんどのアプリケーションに最適なコードを生成しますが、特定の要件に合わせて最適化されたカスタムコードを使用した方がよい場合があります。以下に例を示します。

ターゲット環境に最適化されたカスタムライブラリがある。
MATLAB Coder でサポートされない関数のカスタムライブラリがある。
企業で設定した標準に合せたカスタムライブラリがある。

このような場合は、MATLAB Coder で生成されたコードとカスタムコードを統合できます。

この例では、NVIDIA^® CUDA^® Basic Linear Algebra Subroutines (CUBLAS) ライブラリの関数 cublasSgemm を生成コードで統合する方法を示します。この関数は、グラフィックス処理装置 (GPU) で行列の乗算を実行します。

クラス coder.ExternalDependency から導かれるクラス ExternalLib_API を定義します。ExternalLib_API は、以下のメソッドで CUBLAS ライブラリへのインターフェイスを定義します。

getDescriptiveName:エラーメッセージに使用される ExternalLib_API についての記述名を返します。
isSupportedContext:ビルドコンテキストが CUBLAS ライブラリをサポートするかを特定します。
updateBuildInfo:ヘッダーファイルパスとリンクファイルをビルド情報に追加します。
GPU_MatrixMultiply:CUBLAS ライブラリ関数 cublasSgemm へのインターフェイスを定義します。

ExternalLib_API.m

classdef ExternalLib_API < coder.ExternalDependency
    %#codegen
    
    methods (Static)
        
        function bName = getDescriptiveName(~)
            bName = 'ExternalLib_API';
        end
        
        function tf = isSupportedContext(ctx)
            if  ctx.isMatlabHostTarget()
                tf = true;
            else
                error('CUBLAS library not available for this target');
            end
        end
        
        function updateBuildInfo(buildInfo, ctx)
            [~, linkLibExt, ~, ~] = ctx.getStdLibInfo();
            
            % Include header file path
            % Include header files later using coder.cinclude
            hdrFilePath = 'C:\My_Includes';
            buildInfo.addIncludePaths(hdrFilePath);
            
            % Include link files 
            linkFiles = strcat('libcublas', linkLibExt);
            linkPath = 'C:\My_Libs';
            linkPriority = '';
            linkPrecompiled = true;
            linkLinkOnly = true;
            group = '';
            buildInfo.addLinkObjects(linkFiles, linkPath, ...
                linkPriority, linkPrecompiled, linkLinkOnly, group);
            
            linkFiles = strcat('libcudart', linkLibExt);
            buildInfo.addLinkObjects(linkFiles, linkPath, ...
                linkPriority, linkPrecompiled, linkLinkOnly, group);
            
        end
        
        %API for library function 'cuda_MatrixMultiply'
        function C = GPU_MatrixMultiply(A, B)
            assert(isa(A,'single'), 'A must be single.');
            assert(isa(B,'single'), 'B must be single.');
            
            if(coder.target('MATLAB'))
                C=A*B;
            else
                
                % Include header files 
                %     for external functions and typedefs
                % Header path included earlier using updateBuildInfo
                coder.cinclude('"cuda_runtime.h"');
                coder.cinclude('"cublas_v2.h"');
                
                % Compute dimensions of input matrices
                m = int32(size(A, 1));
                k = int32(size(A, 2));
                n = int32(size(B, 2));
                
                % Declare pointers to matrices on destination GPU
                d_A = coder.opaque('float*');
                d_B = coder.opaque('float*');
                d_C = coder.opaque('float*');
                
                % Compute memory to be allocated for matrices
                % Single = 4 bytes
                size_A = m*k*4;
                size_B = k*n*4;
                size_C = m*n*4;
                
                % Define error variables 
                error = coder.opaque('cudaError_t');
                cudaSuccessV = coder.opaque('cudaError_t', ...
                    'cudaSuccess');
                
                % Assign memory on destination GPU 
                error = coder.ceval('cudaMalloc', ...
                    coder.wref(d_A), size_A);
                assert(error == cudaSuccessV, ...
                    'cudaMalloc(A) failed');
                error = coder.ceval('cudaMalloc', ...
                    coder.wref(d_B), size_B);
                assert(error == cudaSuccessV, ...
                    'cudaMalloc(B) failed');
                error = coder.ceval('cudaMalloc', ...
                    coder.wref(d_C), size_C);
                assert(error == cudaSuccessV, ...
                    'cudaMalloc(C) failed');
                
                % Define direction of copying 
                hostToDevice = coder.opaque('cudaMemcpyKind', ...
                    'cudaMemcpyHostToDevice');
                
                % Copy matrices to destination GPU 
                error = coder.ceval('cudaMemcpy',  ...
                    d_A, coder.rref(A), size_A, hostToDevice);
                assert(error == cudaSuccessV, 'cudaMemcpy(A) failed');
                
                error = coder.ceval('cudaMemcpy',  ...
                    d_B, coder.rref(B), size_B, hostToDevice);
                assert(error == cudaSuccessV, 'cudaMemcpy(B) failed');
                
                % Define type and size for result
                C = zeros(m, n, 'single');
                
                error = coder.ceval('cudaMemcpy', ...
                    d_C, coder.rref(C), size_C, hostToDevice);
                assert(error == cudaSuccessV, 'cudaMemcpy(C) failed');
                
                % Define handle variables for external library
                handle = coder.opaque('cublasHandle_t');
                blasSuccess = coder.opaque('cublasStatus_t', ...
                    'CUBLAS_STATUS_SUCCESS');
                
                % Initialize external library 
                ret = coder.opaque('cublasStatus_t');
                ret = coder.ceval('cublasCreate', coder.wref(handle));
                assert(ret == blasSuccess, 'cublasCreate failed');
                
               
                TRANSA = coder.opaque('cublasOperation_t', ...
                    'CUBLAS_OP_N');
                alpha = single(1);
                beta = single(0);
                
                % Multiply matrices on GPU 
                ret = coder.ceval('cublasSgemm', handle, ...
                    TRANSA,TRANSA,m,n,k, ...
                    coder.rref(alpha),d_A,m, ...
                    d_B,k, ...
                    coder.rref(beta),d_C,k);
                
                assert(ret == blasSuccess, 'cublasSgemm failed');
                
                % Copy result back to local host 
                deviceToHost = coder.opaque('cudaMemcpyKind', ...
                    'cudaMemcpyDeviceToHost');
                error = coder.ceval('cudaMemcpy', coder.wref(C), ...
                    d_C, size_C, deviceToHost);
                assert(error == cudaSuccessV, 'cudaMemcpy(C) failed');
                
            end
        end
    end
end

メソッド GPU_MatrixMultiply で定義されるインターフェイスと ExternalLib_API のビルド情報を使用して行列の乗算を実行するには、以下の行を MATLAB コードに組み込みます。
```
C= ExternalLib_API.GPU_MatrixMultiply(A,B);
```
たとえば、この行列の乗算を単独で実行する MATLAB 関数 Matrix_Multiply を定義できます。
```
function C = Matrix_Multiply(A, B) %#codegen
 C= ExternalLib_API.GPU_MatrixMultiply(A,B);
```
coder.config を使用して MEX 構成オブジェクトを定義します。CUBLAS ライブラリを使用するには、コード生成のターゲット言語を C++ に設定します。
```
cfg=coder.config('mex');
cfg.TargetLang='C++';
```
cfg を構成オブジェクトとして、single 型の 2 つの 2 X 2 行列を引数として使用して、Matrix_Multiply のコードを生成します。cublasSgemm はデータ型 float の行列の乗算をサポートするため、対応する MATLAB の行列は single 型でなければなりません。
```
codegen -config cfg Matrix_Multiply ...
            -args {ones(2,'single'),ones(2,'single')}
```
single 型の 2 つの 2 X 2 単位行列を使用して、生成された MEX 関数 Matrix_Multiply_mex をテストします。
```
Matrix_Multiply_mex(eye(2,'single'),eye(2,'single'))
```
この出力も 2 X 2 単位行列になります。

参考

トピック

外部 C/C++ コードのインターフェイスの開発

外部コードまたはカスタム コードの統合

参考

トピック

外部コードまたはカスタムコードの統合