Reproducibility convolutional neural network training with gpu
現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
古いコメントを表示
Hello,
I am training a CNN using my local GPU (to speed up training) for classification problems and would like to try different parameterizations. To avoid the variability effects due to different data and/or weights initialization I am resetting the random seeds each time before training:
% Initialize random seed (thus same dataset on same architecture would lead
% to predictable result)
rng(0);
%parallel.gpu.rng(0, 'CombRecursive');
randStream = parallel.gpu.RandStream('CombRecursive', 'Seed', 0);
parallel.gpu.RandStream.setGlobalStream(randStream);
% Train the CNN network
net = trainNetwork(TR.data,TR.reference,layers,options);
The problem is that when using GPU I am getting different results on each execution, even if initializing the GPU random seed to the same value. Strange thing is if I use CPU instead, then I do get the reproducible results. I am doing something wrong with GPU random seed initialization? Is there a know problem for this situation or something I am missing?
Thanks beforehand.
PS: I am using Matlab R2017b
採用された回答
Joss Knight
2018 年 9 月 20 日
Use of the GPU has non-deterministic behaviour. You cannot guarantee identical results when training your network, because it depends on the whims of floating point precision and parallel computations of the form (a + b) + c ~= a + (b + c).
Most of our GPU algorithms are in fact deterministic but a few are not, for instance, backward convolution.
14 件のコメント
Very interesting and good to know! Thanks you
I am encountering the same issue and I am very surprised and I should say very disappointed by Mathworks: as a Matlab user since version 3.5, I cannot imagine that people developping software can accept their code not to be reproductible? It's a jok! Mathworks has to correct this bug or to propose a solution to customers: what about moving single precision GPU code in double precision as this is now available ? (and you claim it is coming from whims of floating point precision)
Can you let us know what non-deterministic behaviour it is that you're experiencing, specifically? As far as I'm aware deep learning training is the only place this happens, and that particular behaviour is true across all the deep learning frameworks because they use the same underlying NVIDIA library that has this behaviour. Maybe there is some randomness in your particular application that we're missing?
Hello,
@Joss Knight (or any other Matlab Staff Member), my colleague reffered to this Link and said that it is now possible to acchieve deterministic results in TensorFlow for Deep Learning algorithms on the GPU.
Is this something that Matlab will be / is able to implement in the near future?
Thanks,
Barry
Joss Knight
2020 年 9 月 3 日
編集済み: Joss Knight
2020 年 9 月 3 日
I believe we have a plan to add support for deterministic training in a future release. As I say, as far as I know backward convolution and backward max-pooling are the only sources of indeterminism (other than certain kinds of parallel training) which means the problem is limited to training a deep network. If you know of other sources let me know.
@Joss Knight Repeatability and reporducibility are extremely important. How can someone even consider using MATLAB deep learning software for serious science if repeating the experiment yields slightly different results every time? I hope the plans to add deterministic behaviour to future releases happens sooner rather than later. It's unfortunate that this was not made a priority in the 2021 release
People use TensorFlow and pyTorch all the time for serious science and they have the exact same issue so I guess people don't consider it that bad a problem. You should only see this indeterminism during training which is typically initialized with random numbers anyway.
Aled Catherall
2022 年 2 月 4 日
編集済み: Aled Catherall
2022 年 2 月 4 日
@Joss Knight - Has progress been made on fixing the issue? Lack of deterministic and repeateable training is proving to be quite a problem for some applications. For example, when I make a small change to the input data or the network, I want to know if differences in my results are due to the changes I have made and not the vagaries of non-deterministic floating point arithmetic. An update on this issue would be welcome, thanks.
Also, please note that you shouldn't be using the term "random rumbers" - but rather pseudorandom numbers, since they are generated by Matlab from a deterministic algorithm and not a stochastic process (like nuclear decay)
We are working on a solution and will let you know when it lands!
Joss Knight: I'm looking forward to seeing it soon. Please hurry
@Joss Knight, can you perhaps link some references that say that backward convolution and backward max pooling are non-deterministic?
Hamza
2023 年 11 月 20 日
@Joss Knight have you found a solution?
I am also facing the same problem
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Parallel and Cloud についてさらに検索
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
