DAMODARAN B.K

Last seen: 4年以上前 | 2021 年からアクティブ

Followers: 0 Following: 0

統計

Feeds

All (2)
MATLAB Answers (2)

質問

Why RL agent performs same actions repeatedly still it does not constitute optimal policy or better episode Q0.Can anyone explain?

約5年前 | 0 件の回答 | 0

0

回答

質問

Episode Q0 increases exponentially
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

約5年前 | 1 件の回答 | 0

1

回答