フィルターのクリア

mexcuda compiler error: "__global__" does not apply here

33 ビュー (過去 30 日間)
Nathan
Nathan 2024 年 6 月 17 日
コメント済み: Nathan 2024 年 6 月 24 日 15:22
I'm writing a MEX CUDA function that performs the distance formula as: for each pixel of an image, where the pixel locations are in x and y respectively, and and contain the locations for an array of transducers. The function details aren't important for now, because I cannot compile when this kernel is in the code:
__global__ void distance_formula_index_units(uint32_t * delay, const double * x, const double * z, const double * x0, const double * z0, params * p) {
int x_px = threadIdx.x;
int y_px = threadIdx.y;
int sen = blockIdx.x;
int m_id = blockDim.x * blockDim.y + sen;
double x_dist = x[x_px] - x0[sen];
double y_dist = z[y_px] - z0[sen];
double distance = sqrt(x_dist*x_dist + y_dist*y_dist);
delay[m_id] = (uint32_t)(distance / p->c / p->dt);
}
Other explanations for context: delay is the output matrix of time delays, p is a typedef'd struct from earlier in the code that contains parameters about the image and system such as the wavespeed c and sample time resolution dt.
typedef struct params {
size_t n_xp;
size_t n_yp;
size_t n_sens;
double dt;
double c;
} params;
I run "mexcuda calculate_delays.cu", and instead of compiling, I get:
Error using mex
.\06_MEX_Functions\calculate_delays.cu(14): error: attribute "__global__" does not apply here
__declspec(__global__) void distance_formula_distance_units(uint32_t * delay, const double * xx, ...
^
I have only included this one function from the code, because it successfully compiles when I comment out this specific kernel, and the logic gate that activates it. There's another kernel in the code as well (distance_formula_time_units). The code that calls this function ALSO throws a different error:
if (*use_index_units) {
mxGPUArray * delaymatrix = mxGPUCreateGPUArray(3, output_dimensions, mxUINT32_CLASS, mxREAL, MX_GPU_DO_NOT_INITIALIZE);
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
distance_formula_index_units
<<<input_parameters->n_xp*input_parameters->n_yp, input_parameters->n_sens>>>
(delay_dvc_int, x_arr_dvc, y_arr_dvc, x0_dvc, z0_dvc, input_parameters);
cudaDeviceSynchronize();
// copy delay matrix from device back to host, set output
outputs[0] = mxGPUCreateMxArrayOnCPU(delaymatrix);
mxGPUDestroyGPUArray(delaymatrix);
} else { ...(call the other kernel)
Commenting out the destance formula index units kernel gives a compiler error on this block:
.\06_MEX_Functions\calculate_delays.cu(101): error: identifier "uint32_t" is undefined
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
^
This is wild, because uint32_t is completely defined in line 24 of stdint.h, and this is in the include tree of mex.h.
As far as I can tell, my code is completely valid, and intellisense in VSCode thinks so, too. ChatGPT doesn't find any code errors, either. What is going on here?

採用された回答

Joss Knight
Joss Knight 2024 年 6 月 24 日 9:48
編集済み: Joss Knight 2024 年 6 月 24 日 9:50
You cannot call a function declared __global__ from another function declared __global__. Declare the second function as __device__ and that should work.
Your device functions and kernels need to be declared using supported CUDA device types, whereas you are using host-side types defined by the MEX headers. Try uint32_T instead (i.e. capital T).
  1 件のコメント
Nathan
Nathan 2024 年 6 月 24 日 15:22
Excellent, great catch on the capital vs. lowercase T in uint32_T, that fixed the compiler issues completely! (Did not need to declare the second function as __device__)

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGet Started with GPU Coder についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by