The results differ for two related reasons:
- GPUs and CPUs have quite different architectures,
- the CPU and GPU algorithms are slightly different.
On my GPU, I'm getting a max abs error of 1.6584e-15, which is within the expected tolerance and our error analysis.
As a rule of thumb, you should (almost always) expect a discrepancy between the GPU and the CPU that goes from 1e-14 to 1e-15 for computations in double precision.