trainnet gives training loss is NaN

Question

Al 2024 年 7 月 11 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan

回答済み: BIPIN SAMUEL 2024 年 9 月 6 日

Hello, I'm currently working on semantic segmentation with Unet architecture on matlab. As version R2024a, I tried to train my model with the trainnet command. But after I ran my script, it gives me this result in Command Window.

I tried to change the MiniBatchSize and MaxEpoch but none seem to be working, it seemed like the training never happened because my GPU doesn't seem to have any activity. Does anyone know how to resolve this? Is Matlab R2024a still buggy so this happens?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Maneet Kaur Bagga 2024 年 7 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan#answer_1484113

MATLAB Online で開く

Hi Aliya,

I understand that you are encountering an issue where the training loss is "NaN", causing the training to stop.To debug the issue please refer to the following steps:

Enable verbose output in your training options to get more detailed information about each training step.
Verify that your MATLAB installation is correctly configured to use the GPU. You can use the following command to check the status of your GPU.

gpuDevice

Please refer to the the following code snippet to set your training options:

options = trainingOptions('adam', ...
    'InitialLearnRate', 1e-4, ...
    'MaxEpochs', 50, ...
    'MiniBatchSize', 16, ...
    'Plots', 'training-progress', ...
    'Verbose', true, ...
    'ExecutionEnvironment', 'gpu');  % Ensure you specify 'gpu' if you have a compatible GPU

I hope this helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Shreeya 2024 年 7 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan#answer_1484118

Hey Aliya

I see that you are not able to train your UNET model due to Nan loss. There are a few troubleshooting methods you can try:

Try to fiddle around with the learning rate on a smaller dataset to ensure if this is or isnt the root cause.
If the image size is huge, you may need a bigger network to converge the model.
I also came across an interesting take on teh usage of PNG images for model training. Essentially, PNG image have layers. During training, only the last layer maybe used, and thus the model learns nothing related to the features of rest of the images.

I'm also linking a few threads which you can refer to, apart from these suggestions:

https://www.mathworks.com/matlabcentral/answers/337587-how-to-avoid-nan-in-the-mini-batch-loss-from-traning-convolutional-neural-network

https://www.mathworks.com/matlabcentral/answers/1917165-training-loss-is-nan-deep-learning

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

Joss Knight 2024 年 7 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan#answer_1484433

編集済み: Joss Knight 2024 年 7 月 11 日

MATLAB Online で開く

Do your network weights contain NaNs? Try this

nansInMyNetwork = ~(all(cellfun(@allfinite, net.Learnables.Value)) && all(cellfun(@allfinite, net.State.Value)))

You might also want to check the Variance on any batch normalization layers to make sure none of the values are negative.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Matt J 2024 年 8 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan#answer_1494991

You need to provide more information about what you did, e.g., the training code. However, I can hazard a guess. I imagine problem occurs when you use the 'sgd' training algorithm, but it might not occur if you instead use 'adam'.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 5

BIPIN SAMUEL 2024 年 9 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2136373-trainnet-gives-training-loss-is-nan#answer_1511779

Hi @Al,

Have you utilized any custom layer in your network? May be the learnable parameters of the layer not intialized properly so that the loss (scalar value) calculated during the first iteration is very large. Also can you mention which loss function you have used?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

trainnet gives training loss is NaN

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (5 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

trainnet gives training loss is NaN

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (5 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示