Episode Q0 increases exponentially

2021 2 月 16

1 回答

10 ビュー (30 日間)

0 投票

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

Emmanouil Tzorakoleftherakis 2021 年 2 月 16 日

0 投票

Hello,

Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.

Hope this helps

DAMODARAN B.K 2021 年 2 月 17 日

編集済み: DAMODARAN B.K 2021 年 2 月 17 日

is episode Q0, criticnetwork output or target value?

ヘルプセンターおよび File Exchange で Reinforcement Learning についてさらに検索

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by