評価版

Transient value problem of the variable in reward function of reinforcement learning

1 回表示 (過去 30 日間)

古いコメントを表示

Yihao Wan 2021 年 3 月 22 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning

コメント済み: Yihao Wan 2021 年 3 月 23 日

採用された回答: Emmanouil Tzorakoleftherakis

Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.

From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?

Thank you very much.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

採用された回答

Emmanouil Tzorakoleftherakis 2021 年 3 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning#answer_654817

You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Emmanouil Tzorakoleftherakis 2021 年 3 月 23 日

I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run

Yihao Wan 2021 年 3 月 23 日

Thank you very much for your help.

サインインしてコメントする。

その他の回答 (0 件)

サインインしてこの質問に回答する。

カテゴリ

Control Systems Reinforcement Learning Toolbox Environments

Help Center および File Exchange で Environments についてさらに検索

タグ

製品

Simulink

リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by

評価版