It depends on what your computation looks like. The GPU can accelerate many of the usual vectorized operations. A.*B, A+B, A./B, etc... can be made faster as long as A and B can be kept on the GPU across all iterations. However, if you have to pull data back from the GPU to the host, at every iteration, it can introduce communication overhead that could make it all non-worthwhile.