CUDA number of tasks exceed number of threads times blocks

3 ビュー (過去 30 日間)
Robert
Robert 2013 年 1 月 23 日
I have a problem if my number of tasks exceed the number of total available threads. Lets images I want to add tow vectors of length 100 000.
Matlab Code:
N=100*1000
a=double(-[1:N]);
b=double(2*[1:N]);
a_gpu=gpuArray(a);%Create array on GPU
b_gpu=gpuArray(b);%Create array on GPU
c_gpu=gpuArray(zeros(1,N));%Create array on GPU
k = parallel.gpu.CUDAKernel('add.ptx', 'add.cu');
k.ThreadBlockSize = 100;
k.GridSize=[100,1];
o = feval(k, a_gpu,b_gpu,c_gpu);
I know that I could increase the Threadblocksize and GridSize, but this is not I want to now. Imagine my vector would be much longer..
My Cuda code looks like this
__global__ void add( double *a, double *b, double *c) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
a[tid] = a[tid] + b[tid];
tid += blockDim.x * gridDim.x;
}
In the last line I try to force the program to really go to the end of my make, by using the same threads a second, third... time. That's what I read in the book "Cuda by Example".
But for some reason using Matlab it is not working. If I use this only using C and CUDA it works.
What is wrong with my code? What is the usual way to avoid if the number of tasks are larger than the MaxThreadSize time size Gridsize? I could use the other dimension too, but still how to avoid this problem?
Thanks a lot
Robert

回答 (0 件)

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by