Reinforcement Learning Toolbox: Discount factor issue
5 ビュー (過去 30 日間)
古いコメントを表示
Hi,
I am trying to apply some RL algorithms in the RL toolbox such as ,the actor-critic algorithm, to a problem where the rewards for each step in an episode is discounted, though in the training manager window I see the episode reward as the cumulative reward rather than the discounted sum of rewards. I wonder if this is a bug as these seems confusing .
Thanks,
0 件のコメント
回答 (3 件)
Ajay Pattassery
2019 年 8 月 26 日
編集済み: Ajay Pattassery
2019 年 8 月 26 日
In the Episode Manager you could view the discounted sum of rewards for each episode named as Episode Reward. This should be the discounted sum of rewards over the time steps if you have set rlACAgentOptions to a discount factor as below.
opt = rlACAgentOptions('DiscountFactor',0.95)
If you are observing the reward on each episode is not the discounted sum of rewards, revert with env, critic, actor, trainOpts to reproduce the issue (Or the code you have used).
0 件のコメント
EBRAHIM ALEBRAHIM
2019 年 8 月 26 日
1 件のコメント
Ajay Pattassery
2019 年 8 月 29 日
Hello,
I have tried an Actor-Critic example by following the model given in the link. I can see the effect of the discount factor in the following example.
EBRAHIM ALEBRAHIM
2019 年 8 月 29 日
2 件のコメント
Ajay Pattassery
2019 年 9 月 5 日
The episode manager is showing the undiscounted cumulative reward from the environment. The discount factor, however, has an impact on training and hence the learned policy. You can observe the same by finding the average reward over a reasonable number of episodes with a discount factor closer to zero and with closer to one.
Srivatsank
2024 年 5 月 28 日
Hey @Ajay Pattassery. Is it possible to change this display to discounted Reward? It would be helpful in debugging the reward functions that we are working with.
参考
カテゴリ
Help Center および File Exchange で Environments についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!