Practical k-nearest neighbors implementation with big data set
1 回表示 (過去 30 日間)
古いコメントを表示
My data looks like this
K = 200; % Could go up to 1000, or more....
X = cell(1,K);
Y = cell(1,K);
num_of_neighbors = 50; % This is a constant
for j = 1:K
% We can assume that the number of columns is never bigger than 100
X{j} = rand(3231961,44); % Yup, it's big, about 3 millions
Y{j} = rand(323196,44); %
end
So for each j = 1:K, I want to find the 50-nearest-neighbors in X{j} for each point in Y{j} (each point is a row). A simple implementation would be:
for j = 1:K
[IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end
but it is very slow. The thing is I'm using Windows 10, 16GB of Ram, and here's my GPU information:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'Quadro M1000M'
Index: 1
ComputeCapability: '5.0'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 1.6909e+09
MultiprocessorCount: 4
ClockRateKHz: 1071500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
So I tried using parfor:
parfor j = 1:K
[IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end
But the thing is each worker is supposed to take only X{j},Y{j}, but in fact (because X and Y was considered broadcast variables) they took all of X and Y!!! That's a lot of data :( . Of course I tested this method with smaller data, and tried to make X into a 3-D matrix X_new so that X{j} = X_new(:,:,j) . With this each worker would know which part they should take. But curiously, it did not show any improvements at all, and accumulating all of X{j} into 1 matrix is not very practical when X{j} is already large. So I really don't know how to parallelize the code :( .
I also tried to convert my data to single-precision floating points, but I'm on Windows 10, with only 1 GPU, and when I ran knnsearch with GPU inputs, I received an error (CUDA_ERROR_UNKNOWN or something). When I looked up the internet for clues, I found out that the reason is in this property of the GPU:
KernelExecutionTimeout: 1
So basically the computer forces the GPU to time out after a while so that it can have the resources for graphic display! I just need the GPU for data processing, so I decided to turn off GPU support for graphic display. Some googling and I found out that I had to turn on the Tesla Computing Cluster mode (TCC) for GPU. But the thing is Windows 10 forces the GPU to help with graphic display, and if I want to use it for computations only, then I have to plug in another GPU and then use one of them for computing! Please help me, thank you very much :(
1 件のコメント
Joss Knight
2017 年 1 月 30 日
You can turn off TDR using the TDR registry keys. Give that a go, see if it helps. But really, the problem is that this is a laptop. Even with a superb graphics chip dedicated to compute, you are limited by your laptop's power and cooling capabilities.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で GPU Computing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!