Upgrading PC, tips to finding bottlenecks?

Hello,
We're currently running a simulation(simulating a grid of resistors), the main portion of it is solving a large (~1milx1mil or bigger) sparse SPD matrix, using a preconditioned conjugate gradient (PCG) method, many times (at each time step).
Is there a way to check whether memory bandwidth is an issue? We're currently looking to get an i7 6700K, but that would restrict us to dual channel RAM. I think this is probably fine, but it would be nice to confirm before we buy any hardware.
Also, Matlab's sparse library requires double precision. However, when looking at GPUs, it seems that they're optimized for single precision. When Matlab is doing GPU calculations, does it use psuedo double precision (which would incur a slowdown of ~1/32) ?
In addition, is there a way to check whether the PCG method would work in single precision? It would cause error, but we'd like to know if it's an acceptable amount or not. If it's too much error, we'd have to stick to CPU calculations, vs GPU. From what I've found online, most sparse libraries are done in double precision, and any error if you force single precision is at your own risk

回答 (1 件)

Walter Roberson
Walter Roberson 2016 年 11 月 28 日

1 投票

"When Matlab is doing GPU calculations, does it use psuedo double precision"
No. The Parallel Computing Toolbox requires GPUs with enough compute capability to handle double themselves. The precision it uses for a GPU operation is whatever precision is associated with the data to be processed. If your GPU only does double precision slowly then the result will be slow. MATLAB makes no attempt to emulate double precision with single precision.
"However, when looking at GPUs, it seems that they're optimized for single precision."
As you are looking at getting a new system with GPU, you should be considering getting a system with one of the new Pascal series of architectures, which offer much higher performance on double precision than previous architectures did. The new series is available to some supercomputer and academic sites in the USA, and is expected to be released to the public in January.
Caution: there are some operations that the Pascal architecture library does not handle properly yet due to bugs on the NVidia side. One of the major affected algorithms is Convoluntional Neural Networks. I have no information about PCG working on Pascal or not working.
Looking at https://www.mathworks.com/matlabcentral/answers/285134-preconditioning-algorithm-on-gpu-for-solution-of-sparse-matrices and following the links to http://docs.nvidia.com/cuda/cusparse/#cusparse-lt-t-gt-csrsmsolve I see that NVidia does implement cuSPARSE for single precision as well as double precision.

5 件のコメント

Walter Roberson
Walter Roberson 2016 年 11 月 28 日
Joshua, which OS are you using?
I see on Linux a memory bandwidth tool https://panthema.net/2013/pmbw/ . I tried to compile that on OS-X but I encountered a series of problems.
I also see http://zsmith.co/bandwidth.html which lists multiple operating systems. However, the project page for it seems to be missing.
Joshua Lauzier
Joshua Lauzier 2016 年 11 月 29 日
Hi Walter, thanks for the help.
We're only building the PC for in-group use, and we don't have access to a proper supercomputing facility. Budget wise, we're stuck in the ~$1-1.5k range. The only options in that range (if we need double precision) seem to be older Nvidia Titan Blacks/Titan Zs, which seem to be impossible to find new (I'm a bit hesitant to buy one secondhand off of Amazon).
We may just end up getting one of the new Nvidia Titan X, with the Pascal architecture, but we didn't want to take the double precision hit, since Nvidia has stopped supporting the option the Black/Z's had to mitigate the performance loss (it seems they're trying to push professionals to the Tesla line, but even the lowliest is well outside out budget).
Joshua Lauzier
Joshua Lauzier 2016 年 11 月 29 日
編集済み: Joshua Lauzier 2016 年 11 月 29 日
We're currently using Windows, but I can probably do Linux without too much issue. There's no big preference, I assumed (possibly wrongly) that Windows support/optimization for Matlab would be better simply due to the larger userbase
Walter Roberson
Walter Roberson 2016 年 11 月 29 日
Working notes, specs put together from various sources
  • Titan X - Maxwell. 200 Gflop FP64. GM200 CPU
  • Titan Z - Kepler. 2700 Gflop FP64
  • Titan Black - Kepler. 1707 Gflop FP64
  • Titan X (Pascal) - Pascal. GP102 CPU. FP32 = 11 teraflop, FP64 = 1/32 FP32, so should be about 330 - 343 Gflop
  • Tesla P100 - Pascal. FP64 = 4.7 teraflop
"While the Titan X [Pascal] has 40 percent more cores, 50 percent more memory and significantly more memory bandwidth than a GTX 1080, its clock speed is lower and it is more limited by power and thermals, all of which eats into its advantage. "
Walter Roberson
Walter Roberson 2016 年 11 月 29 日
So, uh, yes, Titan Z (Kepler) or Titan X Black (Kepler) would beat Titan X Pascal handily for FP64.
Looking at these specs, and at the prices, it almost looks to me as if most cost effective would be to go for putting in dual slower cards, like two of the older Titan X Maxwell GM200 cards. The disadvantage would be the need to partition the work between the two GPU. Hmmm, possibly your particular application is not suited for that. If you could distribute, then a tower of $200 graphics cards might be the most FP64 for the dollar. (Only one tower; after that you would need to go for MDCS licenses.)

サインインしてコメントする。

カテゴリ

質問済み:

2016 年 11 月 28 日

コメント済み:

2016 年 11 月 29 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by