Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

Question

Petter Stefansson 2017 年 6 月 7 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern

コメント済み: Petter Stefansson 2017 年 6 月 16 日

I’m trying to create a simple mex function that calls cublas functions such as cublasDgemm from inside a kernel so I can utilize nested, or dynamic, parallelism in my calculations which is supposed to be supported on newer GPUs such as the GTX1080 I’m using.

However, when I try to compile my cuda code from Matlab like this:

mexcuda CUBLAS_dgemm.cu -lcublas

I get the error:

Building with 'NVIDIA CUDA Compiler'.
Error using mex
ptxas fatal   : Unresolved extern function 'cublasCreate_v2'
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are
deprecated, and may be removed in a future release (Use
-Wno-deprecated-gpu-targets to suppress warning).
CUBLAS_dgemm.cu

And as soon as I comment out everything inside my kernel which is related to cublas it works fine again...Could someone please advise me on what I need to do to get this to compile and work? I would really appreciate it.

The sample cuda code I’ve written to test this looks like this:

#include "mex.h" 
#include "cublas_v2.h"
#include <cuda_runtime.h>
/* Kernel code with dgemm */
__global__ void dgemmkernel(const double* deviceX, double* XX, const int n, const int m) {  
    /* Cublas handle */
    cublasHandle_t handle;
    cublasCreate(&handle);
    /* Scalar constants */
    double alpha = 1.0, beta = 0.0;
    /*  Calculate XX = X'*X  using cublasDgemv.                              */
    cublasDgemm(handle, CUBLAS_OP_T, CUBLAS_OP_N, n, n, m, &alpha, deviceX, m, deviceX, m, &beta, XX, n); 
}
/* The Matlab gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
  /* Host-side variables */
  const double *X; // Host-side input X.
  double *Output1; // Matlab output.
  size_t m, n;     // size variables.
  /* Device-side variables. */
  double *deviceX; // Device-side version X.
  double *XX;   // GPU version XX.
  /* Get pointers to input host-side array X from Matlab */
  X = mxGetPr(prhs[0]);
  
  /* Get the dimensions of the input variables */
  m = mxGetM(prhs[0]);   // Number of rows in X.
  n = mxGetN(prhs[0]);   // Number of columns in X.
  /* Allocate memory on the device for the variables involved in the calculations. */
  cudaMalloc(&deviceX, m * n * sizeof(double)); // [m-by-n]
  cudaMalloc(&XX, n * n * sizeof(double));      // [n-by-n]
  
  /* Use cudaMemcpy to copy X from host to device */
  cudaMemcpy(deviceX, X, (m*n) * sizeof(double), cudaMemcpyHostToDevice);
  /* Call dgemm kernel */
  dgemmkernel<<<1, 1>>>(deviceX, XX, n, m);
  /* Deliver results back to matlab as host-side variables */            
  plhs[0] = mxCreateDoubleMatrix(n, n, mxREAL);
  Output1 = mxGetPr(plhs[0]);
  cudaMemcpy(Output1, XX, (n*n) * sizeof(double), cudaMemcpyDeviceToHost);
  /* Free the cudaMalloc'ed arrays from the device before exit */
  cudaFree(deviceX);
  cudaFree(XX);
}

2 件のコメント
なしを表示なしを非表示

Joss Knight 2017 年 6 月 7 日

Did you try using the -dynamic flag to mexcuda?

Petter Stefansson 2017 年 6 月 7 日

MATLAB Online で開く

No I hadn’t tried that, but now I have. It still doesn’t work but it changed the error message somewhat into:

Error using mex
nvlink error   : Undefined reference to 'cublasCreate_v2' in
'C:/Users/Petteri7/AppData/Local/Temp/mex_150043189848302_4708/CUBLAS_dgemm.obj'
(target: sm_35)
nvlink error   : Undefined reference to 'cublasDgemm_v2' in
'C:/Users/Petteri7/AppData/Local/Temp/mex_150043189848302_4708/CUBLAS_dgemm.obj'
(target: sm_35)
CUBLAS_dgemm.obj
c_mexapi_version.obj

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Joss Knight 2017 年 6 月 12 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern#answer_270427

MATLAB Online で開く

You need to link against the cublas device library in the device linking stage and unfortunately there isn't a proper formal API to do this. You can use the variable NVCC_FLAGS to add it there, and then the standard -L and -l options to add it to the host linking stage. In my example command below the cublas device library is located at /usr/local/cuda/lib64 - you should substitute this for the lib64 directory wherever you've installed the CUDA Toolkit.

mexcuda -v CUBLAS_dgemm.cu -dynamic NVCC_FLAGS=-lcublas_device -L/usr/local/cuda/lib64 -lcublas_device

In the long term I'll take this as a request to have a more convenient way of linking in other device libraries.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Petter Stefansson 2017 年 6 月 16 日

Yes that finally worked! Amazing. Thank you for the help!

サインインしてコメントする。

Answer 2

Joss Knight 2017 年 6 月 7 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/343696-unable-to-compile-cuda-code-containing-dynamic-parallelism-error-ptxas-fatal-unresolved-extern#answer_269922

I don't think you can call cublasDgemm inside a kernel - that's not the way dynamic parallelism works. You can't just call any old host code that happens to contain kernel launches. You can, as always, call device code, and what dynamic parallelism lets you do is launch other kernels.

2 件のコメント
なしを表示なしを非表示

Petter Stefansson 2017 年 6 月 7 日

Are you sure that’s not how dynamic parallelism works? Because if you look at nvidia’s “Dynamic parallelism in CUDA” technical notes it specifically states for example:

'A kernel can also call GPU libraries such as CUBLAS directly without needing to return to the CPU..'

As I understood it, starting from toolkit 5.0 any device with at least compute capability 3.5 should be able to do this. This feature is the main reason I upgraded from a lower CC version GPU, so it would be extremely disappointing if I have misunderstood the situation.

Joss Knight 2017 年 6 月 12 日

I did not realise that! Thank you! The MEXCUDA dynamic MEX options file is not properly set up to link against the cublas device library so you'll need to co-opt some existing MEX variables to get the right command line options. See my other Answer for how to do this.

サインインしてコメントする。

Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Unable to compile cuda code containing dynamic parallelism: Error: “ptxas fatal : Unresolved extern function 'cublasCreate_v2'”

2 件のコメント なしを表示なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示