Help with efficient collection of sub-matrices on GPU
1 回表示 (過去 30 日間)
古いコメントを表示
The problem that I have encountered is that the CPU is much faster than the GPU at grabbing sub-matrices from a larger matrix. The rest of this project is very computation heavy and runs much more efficiently on GPU but this efficiency is lost in the transferring of data to the GPU.
For each object in a frame I need to collect multiple image patches (sub-matrices) and do things with these patches.
I have created a function below to simulate my issue collecting the patches by timing the difference. If you don't need to see the whole function just scroll to the results and summary.
FUNCTION
%Timing the CPU and GPU for the retrieval of image patches at random
%locations
function cpuVSgpu(p,obs)
%the number of patches I want to retrieve
numPatches = p;
numObjects = obs;
patchSize = 71;
halfPatch = (patchSize-1)/2;
%resolution
res = [720 1280];
%load image on CPU and GPU and add padding so that no patch will go out of
%bounds
I = padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255);
gpuI = gpuArray(padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255));
%generate random coordinates on CPU and GPU and pad them
coords = zeros(numPatches, 2, numObjects);
for i = 1:numObjects
Y = randi([1,res(1,1)], numPatches, 1)+halfPatch;
X = randi([1,res(1,2)], numPatches, 1)+halfPatch;
coords(:,:,i) = [Y X];
end
%allocate for number of patches
patches = zeros(patchSize, patchSize, numPatches, 'uint8');
gpuPatches = zeros(patchSize, patchSize, numPatches, 'uint8','gpuArray');
%timing
t = nan(2,1);
t(1) = timeit(@() CPU);
t(2) = gputimeit(@() GPU);
row = {'CPU Time:', 'GPU Time:'};
t = table(t, 'RowNames',row);
function CPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
end
function GPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
gpuPatches(:,:,n) = gpuI(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
end
disp(t);
end
RESULTS
Here are the results for testing the collection of 100 patches with 5, 50, and 500 objects:
>> cpuVSgpu(100,5)
t
________
timeCPU: 0.004164
timeGPU: 0.094262
>> cpuVSgpu(100,50)
t
________
timeCPU: 0.041856
timeGPU: 0.93428
>> cpuVSgpu(100,500)
t
_______
timeCPU: 0.41799
timeGPU: 9.763
SUMMARY
So essentially this is slow on GPU:
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
Patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
Is there way to do this faster?
If done on CPU after collecting the patches I have to send them to GPU for computation. Sending data to GPU every frame kills performance and doing the computations on CPU kills performance. I could use some help as I'm stuck between a rock and a hard place here.
Thanks in advance!
0 件のコメント
回答 (1 件)
Joss Knight
2016 年 7 月 14 日
You need to vectorize your indexing, then it will be efficient on the GPU.
[offsetX, offsetY] = meshgrid(1:(2*halfPatch+1)) - halfPatch;
Y = bsxfun(@plus, offsetY, reshape(coords(:,1,:), 1,1,numPatches,numObjects));
X = bsxfun(@plus, offsetX, reshape(coords(:,2,:), 1,1,numPatches,numObjects));
Patches = reshape( I(Y(:),X(:)), size(X) );
This will give an M-by-N-by-numPatches-by-numObjects array of patches.
1 件のコメント
参考
カテゴリ
Help Center および File Exchange で GPU Computing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!