Reinforcement Learning Toolbox: Episode Q0 stopped predicting after a few thousand simulations. DQN Agent.
5 ビュー (過去 30 日間)
古いコメントを表示
Q0 values were pretty ok until episode 2360, it's not stuck, just increasing very very slowly
I'm using the default generated DQN agent (with continuous observations and discrete actions) with only a few modifications. I'm not sure I understand what the issue is here or if this is the correct behaviour and this means my agent has converged to a somewhat stable result.
I understood, from documentation, that Episode Q0 should give a prediction of the "true discounted long-term reward", I assumed this meant the discounted reward for each single episode regardless of the convergence or lack thereof, but maybe I understood something wrong.
Please help clarify. I made several runs and they all display the same behaviour over a few thousand episodes (no always the same amount)
____
The changes I made were only these ones:
critic.Options = rlRepresentationOptions(...
'LearnRate',1e-3,...
'GradientThreshold',1,...
'UseDevice','gpu');
% extract agent options
agentOpts = agent.AgentOptions;
% modify agent options
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 0.005;
agentOpts.DiscountFactor = 0.1;
% resave agent with new options
agent = rlDQNAgent(critic,agentOpts);
2 件のコメント
Emmanouil Tzorakoleftherakis
2021 年 6 月 9 日
Hello,
This behavior is strange, I would create a technical support case so that we can take a closer look if possible.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Environments についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!