PTX kernel time to run

1 回表示 (過去 30 日間)
Gaszton
Gaszton 2011 年 5 月 16 日
Hello, i am using R2010b, CUDA toolkit 3.1 with a geforce gt425m. While is was optimalizing my cuda code i observed that calling the kernel with feval in matlab has a ~2ms constant time measured with
tic feval(k,...) toc
the kernel code:
#define C_WIDTH 1024
#define C_HEIGHT 768
__global__ void timetest1(float* holo) {
int mindex=blockIdx.x*blockDim.x+threadIdx.x;
int size=C_WIDTH*C_HEIGHT;
if (mindex>=size)
return;
holo[mindex]=mindex*mindex;
}
Even if i take out the write to global memory //holo[mindex]=mindex*mindex; there is a ~2ms time
Does anybody know the origin of this lag? It would be great to somehow eliminate it.
Thanks,
Gaszton
PS: my matlab code for the kernel:
clear
import parallel.gpu.GPUArray
xsize=1024; ysize=768;
vectorsize=xsize*ysize; threadpblock=1024; k=parallel.gpu.CUDAKernel('TimeTest.ptx', 'TimeTest.cu'); k.ThreadBlockSize=[threadpblock,1,1]; k.GridSize=[ceil(vectorsize/threadpblock),1];
dholo=parallel.gpu.GPUArray.zeros(vectorsize,1,'single');
tic [dholo]=feval(k,dholo); time=toc;
['ms time= ' num2str(time*1000)]
clear

採用された回答

Edric Ellis
Edric Ellis 2011 年 5 月 16 日
Firstly, can I suggest that if possible you should upgrade to R2011a as we have made quite a few performance improvements in that release. Secondly, I think the main bottleneck in your code as written is that outside a function, an important optimisation called "in-place optimisation" cannot take place. If you place your code inside a function, then "dholo" will not be copied. For reference, I made a function like this:
function tmp
import parallel.gpu.GPUArray
xsize=1024; ysize=768;
vectorsize=xsize*ysize;
threadpblock=512; % I have a C1060
k=parallel.gpu.CUDAKernel('TimeTest.ptx', 'TimeTest.cu');
k.ThreadBlockSize=[threadpblock,1,1];
k.GridSize=[ceil(vectorsize/threadpblock),1];
dholo=parallel.gpu.GPUArray.zeros(vectorsize,1,'single');
tic
for ii = 1:1000
dholo=feval(k,dholo);
end
time=toc;
disp(['ms time= ' num2str(time)])
And the overhead on my C1060 was down to 0.05 ms.
  1 件のコメント
Gaszton
Gaszton 2011 年 5 月 16 日
Thank you for your help!
I am a PhD student in Hungary, Biological Research Centre
Hungarian Academy of Sciences,
we have a network licence (with limited number of instances of matlab to run parallel)
We used to buy a matlab update in every 1-2 year, but i dont really have an impact on that.
thank you again,
Gaszton

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by