Episode Q0 increases exponentially
古いコメントを表示
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

回答 (1 件)
Emmanouil Tzorakoleftherakis
2021 年 2 月 16 日
0 投票
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
1 件のコメント
DAMODARAN B.K
2021 年 2 月 17 日
編集済み: DAMODARAN B.K
2021 年 2 月 17 日
カテゴリ
ヘルプ センター および File Exchange で Reinforcement Learning についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!