mex cuda code for matrix multiplication

Question

0 投票

Hello,

I made a simple mex cuda code to calculate multiplication of two matrices of size NxN but never get the same results as in matlab command

C = A*B except for B is a diagonal matrix. Just wondering if I was wrong at any point? The code is attached. Thanks.

#include "mex.h"

#include "cuda_runtime.h"

// CUDA kernel for matrix multiplication

__global__ void matrixMultiplication(const double* A, const double* B, double* C, int N) {

int row = blockIdx.y * blockDim.y + threadIdx.y;

int col = blockIdx.x * blockDim.x + threadIdx.x;

if (row < N && col < N) {

double sum = 0;

for (int i = 0; i < N; i++) {

sum += A[row * N + i] * B[i * N + col];

}

C[row * N + col] = sum;

}

void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[]) {

/* Declare all variables */

const double* A;

const double* B;

double* C;

double* d_A = nullptr;

double* d_B = nullptr;

double* d_C = nullptr;

int N;

A = mxGetPr(prhs[0]);

B = mxGetPr(prhs[1]); // Use prhs[1] for matrix B

N = mxGetN(prhs[0]); // Get the number of columns of matrix A

/* Initialize output array */

plhs[0] = mxCreateDoubleMatrix((mwSize)N, (mwSize)N, mxREAL);

C = mxGetPr(plhs[0]);

// Allocate device memory for matrices A, B, and C

cudaMalloc((void**)&d_A, N * N * sizeof(double));

cudaMalloc((void**)&d_B, N * N * sizeof(double));

cudaMalloc((void**)&d_C, N * N * sizeof(double));

// Copy matrices A and B from host to device

cudaMemcpy(d_A, A, N * N * sizeof(double), cudaMemcpyHostToDevice);

cudaMemcpy(d_B, B, N * N * sizeof(double), cudaMemcpyHostToDevice);

// Define grid and block dimensions

dim3 dimGrid(128, 128);

dim3 dimBlock(1, 1);

// dim3 dimGrid((N + 15) / 16, (N + 15) / 16); // Adjust block size as needed

// dim3 dimBlock(16, 16);

// Launch the CUDA kernel

matrixMultiplication << <dimGrid, dimBlock >> > (d_A, d_B, d_C, N);

cudaDeviceSynchronize();

// Copy the result matrix C from device to host

cudaMemcpy(C, d_C, N * N * sizeof(double), cudaMemcpyDeviceToHost);

// Free device memory

cudaFree(d_A);

cudaFree(d_B);

cudaFree(d_C);

}

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Varun 2023 年 10 月 30 日

MATLAB Online で開く

0 投票

Hi Chieh-Hsun,

Looks like you are trying to calculate matrix multiplication of matrices A and B using "mex" CUDA code but the results are not matching when calculated using MATLAB command window as "C=A*B".

There are two orderings in which a 2D matrix can be flatten into 1D matrix:

Row-major wise: In row-major order, you traverse the 2D matrix row by row, collecting the elements one row at a time. You start with the first row, then move to the second row, and so on until you've visited all the rows.
Column-major wise: In column-major order, you traverse the 2D matrix column by column, collecting the elements one column at a time. You start with the first column, then move to the second column, and so on until you've visited all the columns.

For Example, consider the following matrix:

[[1, 2, 3],

[4, 5, 6],

[7, 8, 9]]

Row-major order: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Column-major order: [1, 4, 7, 2, 5, 8, 3, 6, 9]

So, the CUDA code which you have shared uses row-major wise sorting to flatten 2d matrix "A" and "B" into 1d array. It then calculates matrix multiplication on these 1d arrays and stores the result into new 1d array "C", again it is in row-major order.

But the "mexFunction", by default converts these 2d matrices into 1d arrays using column-major sorting.

Here 1d arrays "A" and "B" is in column-major sorting which was expected to be in row-major sorting.

A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);

And, finally the 1d array "C" is returned to the MATLAB as 2d array and conversion is based on the assumption that "C" is column-major 1d array.

C = mxGetPr(plhs[0]);

To resolve the above issues related to array orderings in the "mex" CUDA code, you can just pass transpose of the matrices "A" and "B" i.e. " A' " and " B' " to the CUDA code and again do transpose of the returned matrix in MATLAB after calling the CUDA code. Please refer to the following code snippets:

C = CUDA_MATRIX_MUL(A', B')
C = C'
%% C is the final answer

Hope it helps.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

mex cuda code for matrix multiplication

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

mex cuda code for matrix multiplication

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示