フィルターのクリア

Transient value problem of the variable in reward function of reinforcement learning

1 回表示 (過去 30 日間)
Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.
From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?
Thank you very much.

採用された回答

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 3 月 22 日
You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds
  5 件のコメント
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 3 月 23 日
I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run
Yihao Wan
Yihao Wan 2021 年 3 月 23 日
Thank you very much for your help.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeEnvironments についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by