The RL-Agent's cumulative reward keeps overflowing

Question

Ronny Landsverk 2023 年 2 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1914490-the-rl-agent-s-cumulative-reward-keeps-overflowing

回答済み: Ashu 2023 年 2 月 22 日

Adapting the 'rlwatertank' example, my cumulative reward keeps overflowing.

The original example has a 'StopTrainingValue' of 800, reached before episode 200, but in my adapted example, I cannot get past a value of 128.

I'm pretty sure that the reason is due to an overflow in the 'accumulate_reward' subsystem in the 'RL-Agent' Simulink block which does not occur in the original example.

How do I fix this issue ?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ashu 2023 年 2 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1914490-the-rl-agent-s-cumulative-reward-keeps-overflowing#answer_1177135

MATLAB Online で開く

It is my understanding that you are trying to adapt the 'Water Tank Simulink Model' to train your agent and your cumulative rewards are overflowing.

I assume that you are using the default 'rlTrainingOptions' which is as follows

trainOpts = rlTrainingOptions(...
    MaxEpisodes=5000, ...
    MaxStepsPerEpisode=ceil(Tf/Ts), ...
    ScoreAveragingWindowLength=20, ...
    Verbose=false, ...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=800);

'StopTrainingCriteria' is set to "AverageReward" to stop training when the average reward over the last "ScoreAveragingWindowLength" (which is 20 episodes here) exceeds the 'StopTrainingValue' (which is 800.)

Now in your case, within 128 episodes the 'AverageRewards' overshoots the value of 800 over 20 consecutive episodes, hence stopping the training.

To overcome this you can try the following points -

Try increasing the maximum value of the accumulate_reward subsystem in the RL-Agent Simulink block to allow for larger reward values.
Experiment with the value of 'MaxStepsPerEpisode', which will result in less frequent updates to the rewards.
Additionally, you can try adjusting the hyperparameters of your reinforcement learning algorithm to better fit your problem. For example, you can try reducing the learning rate or increasing the discount factor, which may help stabilize the learning process and prevent reward overflow.
It may also be helpful to monitor the reward signal during training to identify any other issues that may be causing the overflow. You can do this by setting 'Verbose=true' in the 'rlTrainingOptions', which will display the reward and other metrics during training.

Finally, it's worth noting that the choice of the 'StopTrainingValue' is problem-dependent and may need to be adjusted depending on the specific requirements of your application.

You can refer to the following documentation to learn more about Water Tank Reinforcement Learning Model

https://www.mathworks.com/help/reinforcement-learning/ug/water-tank-reinforcement-learning-environment-model.html

To learn more about creating a Simulink Environment and Training an Agent, refer this document

https://www.mathworks.com/help/reinforcement-learning/ug/create-simulink-environment-and-train-agent.html

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

The RL-Agent's cumulative reward keeps overflowing

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

The RL-Agent's cumulative reward keeps overflowing

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示