Advantage normalization for PPO Agent
古いコメントを表示
When dealing with PPO Agents, it is possibile to set a "NormalizedAdvantageMethod" to normalize the advantage function values for each mini-batch of experiences. The default value is "none".
While I can intuitively grasp that such a normalization operation may be beneficial in terms of reducing variance, I could not find any reference online which describes with sufficient details when and why this procedure should be useful. My questions are:
1) Under which circumstances does the normalization of advantage function values turn out to be practically beneficial?
2) If I decide to normalize the advantage function values, are there situations where the "moving" option (which uses a restrict number of samples) can be more beneficial then the "current" option (which uses all of the current samples available)? Intuitively I would say that the "current" option should always perform better
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Deep Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!