Why data is discarded in shuffle operation when training a deep network?

Question

Hana Ahmed 2022 年 5 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1730315-why-data-is-discarded-in-shuffle-operation-when-training-a-deep-network

回答済み: Aravind 2025 年 1 月 30 日

When training a deep learning network, if the batch size does not evenly divide the number of training samples, then the training data that does not fit into the final batch of each epoch is discarded. Why this limitation? why part of the training data is discarded?

Setting the shuffle training option to "every-epoch" does not prevent discarding data, it just avoid discarding the same data every epoch.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Aravind 2025 年 1 月 30 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1730315-why-data-is-discarded-in-shuffle-operation-when-training-a-deep-network#answer_1558577

Hi @Hana Ahmed,

When training deep learning networks in MATLAB, if the batch size does not evenly divide the number of training samples, the leftover data that cannot fill a complete batch at the end of each epoch will be discarded. This is explained in the documentation here: https://www.mathworks.com/help/releases/R2022a/deeplearning/ref/trainingoptions.html#d123e146068. Under the “Shuffle” option, it is recommended to set the value to “every-epoch” to avoid discarding the same data each time.

Here are some reasons for this behavior:

Batch Processing Consistency: MATLAB's deep learning framework, similar to others, is optimized for processing batches of a consistent size, which enhances computational efficiency and fully utilizes parallel processing capabilities, particularly on GPUs.
Gradient Estimation Stability: Inconsistent, smaller batch sizes can lead to higher variance in gradient estimates, which can destabilize the convergence process during training and potentially result in less reliable learning outcomes.

This approach balances computational efficiency with the use of all available data during training.

To ensure no data is discarded, you can use a custom training loop. You can define a “minibatchqueue” object with your input data to create mini-batches. By setting the “PartialMiniBatch” option to “return”, you ensure that even if the number of observations is not divisible by the mini-batch size, no data is lost, as the final mini-batch will contain fewer observations. You can find more information about this here. You can also refer to this example on how to train a network using a custom training loop: https://www.mathworks.com/help/releases/R2022a/deeplearning/ug/train-network-using-custom-training-loop.html.

I hope this answers your question.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Why data is discarded in shuffle operation when training a deep network?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Why data is discarded in shuffle operation when training a deep network?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示