Error using parallel-cpu with trainnet function

29 ビュー (過去 30 日間)
Ramiro
Ramiro 2024 年 11 月 21 日 21:34
コメント済み: Ramiro 2024 年 11 月 22 日 13:00
Hi, I got this error when trying to train a neural network using the trainnet function. The first time I trained the network everything works fine, but when I tried to use validation data I got an error. Then I tried to retrain the network without validation data but I got this error:
Error detected on worker 3.
net = train(trainer, net, mbq);
Error in trainnet (line 42)
[net,info] = deep.internal.train.trainnet(mbq, net, loss, options, ...
Caused by:
Out of Memory during deserialization
I have deleted the Jobs of the cluster but nothing worked.
Can anybody help me to solve the issue?
  2 件のコメント
Saurabh
Saurabh 2024 年 11 月 22 日 5:04
Hi @Ramiro,
Could you please share the details regarding the neural network and the memory allocation per worker? The minimum required memory is 4GB, with 8GB recommended. To optimize performance, consider increasing the memory available to each worker, which can typically be achieved by running fewer workers per compute node.
Ramiro
Ramiro 2024 年 11 月 22 日 13:00
Hi @Saurabh, I have attached the files of the source code.
Regards
Ramiro

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeImage Data Workflows についてさらに検索

製品


リリース

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by