Accuracy problems using mldivide on GPU
古いコメントを表示
Hello all,
I want to solve a system of complex linear equations A*x=B on the GPU. However, when I compare the solution with the results of the same computation on the CPU, there seems to be a relatively large error.
Here is a minimal working example:
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
On my machine, I get
max_error = 1.0197e+03
which means that the results are unusable.
In a last-ditch attempt I tried using complex() both in gpuArray() and in pagefun(), as described here, but it changes nothing.
I am running Matlab R2021a with the latest updates. GPU driver is also the latest version (527.27).
ans =
CUDADevice with properties:
Name: 'Quadro P2200'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3685e+09
AvailableMemory: 4.3222e+09
MultiprocessorCount: 10
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
I would be very grateful for ideas and suggestions on how to solve the problem. Thanks in advance!
1 件のコメント
Bruno Luong
2023 年 1 月 9 日
Again the issue only occurs with A complex and not square.
採用された回答
その他の回答 (2 件)
Joss Knight
2023 年 1 月 7 日
編集済み: Joss Knight
2023 年 1 月 7 日
The answer is not less accurate, it is just as inaccurate. What you have are 60001 massively overdetermined systems generated by completely random data. The least squares solution to this is going to be all over the place, in other words, there will be no good answer. If you look at the relative residuals you'll find there's little difference in the (lack of) success of either CPU or GPU.
%%
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
x_CPU = pagemldivide(A,B);
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagemldivide(g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
res = abs(sqrt(sum((pagemtimes(A, x_CPU)-B).^2,1)));
g_res = abs(sqrt(sum((pagemtimes(g_A, g_x)-g_B).^2,1)));
rhsNorm = abs(sqrt(sum(B.^2,1)));
relres = reshape(res./rhsNorm, 2, []);
g_relres = reshape(g_res./rhsNorm, 2, []);
average_cpu_relative_residual = mean(relres, 'all')
average_gpu_relative_residual = mean(g_relres, 'all')
std_cpu = std(relres(:))
std_gpu = std(g_relres(:))
To which I got the answers
1.0197e+03
average_cpu_relative_residual =
0.9979
average_gpu_relative_residual =
1.0543
std_cpu =
0.1935
std_gpu =
0.2664
It seems the GPU solution is marginally worse, but given that the input was nonsense anyway I wouldn't read too much into it.
It's possible that you intended for the system matrix A to be square and only B to be 60x2 (i.e. 2 right-hand-sides), in which case with random data you'd get something much better behaved. I got:
1.3056e-11
average_cpu_relative_residual =
1.8911e-14
average_gpu_relative_residual =
1.6029e-14
std_cpu =
3.6059e-14
std_gpu =
3.1088e-14
3 件のコメント
Damian
2023 年 1 月 9 日
Joss Knight
2023 年 1 月 9 日
Well, I'll admit that numerical analysis is not my speciality (at doctorate level, anyway) but I'm fairly sure that 17% residual indicates that the data is still very poorly conditioned. Since it's 2-D data why not plot some of the data along with the best fit lines to get an idea of how poorly the GPU is doing. It may be that you can fix the problem with some basic outlier rejection, which would be sensible anyway if you have some bad outliers.
cublas's batch QR routines may have some issues, of course, with correct handling of poorly conditioned complex systems, so we'll take a look at that. In the meantime you could try using a pseudo-inverse as Matt J shows.
Damian
2023 年 1 月 9 日
Walter Roberson
2023 年 1 月 9 日
編集済み: Matt J
2023 年 1 月 9 日
1 投票
It is a known bug that should be fixed soon that affects page left and right divisions for complex numbers
2 件のコメント
Bruno Luong
2023 年 1 月 9 日
I agree with Walter it looks pretty much similar cause (wrong detection of matrix type and decomposition) for GPU and CPU.
カテゴリ
ヘルプ センター および File Exchange で Mathematics についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!