I am currently working on implementing an algorithm on GPU, which is already working. When measuring the time of the algorithm - and especially its subroutines - I encountered very strange results. In order to understand these results better I decided to evaluate the times again for a very simple function. Once again I was surprised by the results and am therefore asking here for an explanation of the results. I post the code as well as the resulting output and hope that someone can explain me, why I am measuring these times.
The code is given by:
clear;
rng('default');
s = rng(5);
disp('##################################################################')
disp('##################################################################')
disp(' apply b(_gpu) once and measure with tic toc and timeit; ')
disp('##################################################################')
disp('##################################################################')
disp('-----------------------------------------------------')
disp('measuring the times for one application of b(_gpu) with tic and toc')
disp('-----------------------------------------------------')
size=200000000;
input=rand(size,1);
b_st=2;
amplitude=4;
periode=5;
b_st_gpu=gpuArray(b_st);
amplitude_gpu=gpuArray(amplitude);
periode_gpu=gpuArray(periode);
b = @(u) b_st+amplitude*sin(u*2*pi/periode);
b_gpu = @(u) b_st_gpu+amplitude_gpu*sin(u*gpuArray(2)*gpuArray(pi)/periode_gpu);
tic
test=b(input);
time_111=toc;
disp(['Time for operation b on CPU is ',num2str(time_111)])
input_gpu_array = gpuArray(input);
tic
test=b(input_gpu_array);
time_112 = toc;
disp(['Time for operation b on GPU is ',num2str(time_112)])
input_gpu_array = gpuArray(input);
tic
test=b_gpu(input_gpu_array);
time_113 = toc;
disp(['Time for operation b_gpu on GPU is ',num2str(time_113)])
disp('-----------------------------------------------------')
disp('measuring the times for one application of b(_gpu) with timeit')
disp('-----------------------------------------------------')
clear;
rng('default');
s = rng(5);
size=200000000;
input=rand(size,1);
b_st=2;
amplitude=4;
periode=5;
b_st_gpu=gpuArray(b_st);
amplitude_gpu=gpuArray(amplitude);
periode_gpu=gpuArray(periode);
b = @(u) b_st+amplitude*sin(u*2*pi/periode);
b_gpu = @(u) b_st_gpu+amplitude_gpu*sin(u*gpuArray(2)*gpuArray(pi)/periode_gpu);
test=b(input);
time_121=timeit(@()b(input));
disp(['Time for operation b on CPU is ',num2str(time_121)])
input_gpu_array = gpuArray(input);
test=b(input_gpu_array);
time_122 = gputimeit(@()b(input_gpu_array));
disp(['Time for operation b on GPU is ',num2str(time_122)])
input_gpu_array = gpuArray(input);
test=b_gpu(input_gpu_array);
time_123 = gputimeit(@()b_gpu(input_gpu_array));
disp(['Time for operation b_gpu on GPU is ',num2str(time_123)])
disp(' ')
disp(' ')
disp('Q1.1: Why is the time measured with gputimeit smaller than wit tic toc')
disp('Q1.2: Why is the time on the other hand longer for the CPU when measured')
disp(' with timeit')
disp('Q1.3: Why is b_gpu faster than b on the gpu when measured with tic toc')
disp('Q1.4: Why are b_gpu and b equally fast when measured with gputimeit')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp('##################################################################')
disp('##################################################################')
disp(' apply b(_gpu) iteratively and measure with tic toc; ')
disp(' important note: the input is overwritten as input = b(input) ')
disp('##################################################################')
disp('##################################################################')
disp('-----------------------------------------------------')
disp('measuring the times for one application of b(_gpu) with tic and toc')
disp('-----------------------------------------------------')
clear;
rng('default');
s = rng(5);
size=200000000;
input=rand(size,1);
b_st=2;
amplitude=4;
periode=5;
b_st_gpu=gpuArray(b_st);
amplitude_gpu=gpuArray(amplitude);
periode_gpu=gpuArray(periode);
b = @(u) b_st+amplitude*sin(u*2*pi/periode);
b_gpu = @(u) b_st_gpu+amplitude_gpu*sin(u*gpuArray(2)*gpuArray(pi)/periode_gpu);
tic
for counter = 1:10
input=b(input);
end
time_211=toc;
disp(['Time for operation 10 times b on CPU is ',num2str(time_211)])
disp(['Time on average for operation b on CPU is ',num2str(time_211/10)])
disp(' ')
input_gpu_array = gpuArray(input);
for counter = 1:10
input_gpu_array=b(input_gpu_array);
end
time_212=toc;
disp(['Time for operation 10 times b on GPU is ',num2str(time_212)])
disp(['Time on average for operation b on GPU is ',num2str(time_212/10)])
disp(' ')
input_gpu_array = gpuArray(input);
for counter = 1:10
input_gpu_array=b_gpu(input_gpu_array);
end
time_213=toc;
disp(['Time for operation 10 times b_gpu on GPU is ',num2str(time_213)])
disp(['Time on average for operation b_gpu on GPU is ',num2str(time_213/10)])
disp(' ')
disp(' ')
disp(' ')
disp('Q2.1: Why is now the CPU faster than the GPU with tic toc?')
disp('Q2.2: Why are now the b and b_gpu equally fast when measured with tic toc?')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp(' ')
disp('##################################################################')
disp('##################################################################')
disp(' apply b(_gpu) iteratively and measure with tic toc; ')
disp('important note: the input is not! overwritten as result = b(input)')
disp('##################################################################')
disp('##################################################################')
disp('-----------------------------------------------------')
disp('measuring the times for one application of b(_gpu) with tic and toc')
disp('-----------------------------------------------------')
clear;
rng('default');
s = rng(5);
size=200000000;
input=rand(size,1);
b_st=2;
amplitude=4;
periode=5;
b_st_gpu=gpuArray(b_st);
amplitude_gpu=gpuArray(amplitude);
periode_gpu=gpuArray(periode);
b = @(u) b_st+amplitude*sin(u*2*pi/periode);
b_gpu = @(u) b_st_gpu+amplitude_gpu*sin(u*gpuArray(2)*gpuArray(pi)/periode_gpu);
tic
for counter = 1:10
result=b(input);
end
time_311=toc;
disp(['Time for operation 10 times b on CPU is ',num2str(time_311)])
disp(['Time on average for operation b on CPU is ',num2str(time_311/10)])
disp(' ')
input_gpu_array = gpuArray(input);
for counter = 1:10
result=b(input_gpu_array);
end
time_312=toc;
disp(['Time for operation 10 times b on GPU is ',num2str(time_312)])
disp(['Time on average for operation b on GPU is ',num2str(time_312/10)])
disp(' ')
input_gpu_array = gpuArray(input);
for counter = 1:10
result=b_gpu(input_gpu_array);
end
time_313=toc;
disp(['Time for operation 10 times b_gpu on GPU is ',num2str(time_313)])
disp(['Time on average for operation b_gpu on GPU is ',num2str(time_313/10)])
disp(' ')
disp(' ')
disp(' ')
disp('Q3.1: Why are all three times now shorter than in 2?')
and the corresponding output by:
##################################################################
##################################################################
apply b(_gpu) once and measure with tic toc and timeit;
##################################################################
##################################################################
-----------------------------------------------------
measuring the times for one application of b(_gpu) with tic and toc
-----------------------------------------------------
Time for operation b on CPU is 0.34433
Time for operation b on GPU is 0.14935
Time for operation b_gpu on GPU is 0.093943
-----------------------------------------------------
measuring the times for one application of b(_gpu) with timeit
-----------------------------------------------------
Time for operation b on CPU is 0.42076
Time for operation b on GPU is 0.036677
Time for operation b_gpu on GPU is 0.036649
Q1.1: Why is the time measured with gputimeit smaller than wit tic toc
Q1.2: Why is the time on the other hand longer for the CPU when measured
with timeit
Q1.3: Why is b_gpu faster than b on the gpu when measured with tic toc
Q1.4: Why are b_gpu and b equally fast when measured with gputimeit
##################################################################
##################################################################
apply b(_gpu) iteratively and measure with tic toc;
important note: the input is overwritten as input = b(input)
##################################################################
##################################################################
-----------------------------------------------------
measuring the times for one application of b(_gpu) with tic and toc
-----------------------------------------------------
Time for operation 10 times b on CPU is 5.5627
Time on average for operation b on CPU is 0.55627
Time for operation 10 times b on GPU is 6.3557
Time on average for operation b on GPU is 0.63557
Time for operation 10 times b_gpu on GPU is 7.091
Time on average for operation b_gpu on GPU is 0.7091
Q2.1: Why is now the CPU faster than the GPU with tic toc?
Q2.2: Why are now the b and b_gpu equally fast when measured with tic toc?
##################################################################
##################################################################
apply b(_gpu) iteratively and measure with tic toc;
important note: the input is not! overwritten as result = b(input)
##################################################################
##################################################################
-----------------------------------------------------
measuring the times for one application of b(_gpu) with tic and toc
-----------------------------------------------------
Time for operation 10 times b on CPU is 4.2037
Time on average for operation b on CPU is 0.42037
Time for operation 10 times b on GPU is 4.6343
Time on average for operation b on GPU is 0.46343
Time for operation 10 times b_gpu on GPU is 5.008
Time on average for operation b_gpu on GPU is 0.5008
Q3.1: Why are all three times now shorter than in 2?
0 件のコメント
サインインしてコメントする。