Enforce action space constraints within the environment

27 ビュー (過去 30 日間)
John Doe
John Doe 2021 年 2 月 24 日
コメント済み: John Doe 2021 年 3 月 25 日
Hi,
My agent is training!!! But it's pretty much 0 reward every episode right now. I think it might be due to this:
contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.
How can I do this?
Also, is there a way to view the logged signals as the agent is training?
Thanks!
  1 件のコメント
John Doe
John Doe 2021 年 2 月 24 日
There's something odd going on. It's not 0 reward, but it's not growing. I do have that first action method i said implemented in the other question (so for 4 of the continuous actions, it only chooses the first action) and for 1 action it's used every time step. I guess i need to check the logged signals to really determine what's going on. I'm too excited to make it work on the first or second try lol

サインインしてコメントする。

採用された回答

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 2 月 24 日
If the environment is in Simulink, you can setup scopes and observe what's happening during training. If the environment is in MATLAB, you need to do some extra work and plot things yourself.
For your contraints question, which agent are you using? Some agents are stochastic and some like DDPG add noise for exploration on top of the action output. To be certain, you can use a saturation block in Simulink or an if statement to clip the action as needed in MATLAB.
  28 件のコメント
John Doe
John Doe 2021 年 3 月 2 日
編集済み: John Doe 2021 年 3 月 2 日
How can I do the scaling of the inputs to the network? That seems like the best way forward.
The environment is already constraining the actions, but the training is extremely sample inefficient and basically bouncing across the upper and lower limits of the actions for hundreds of episodes.
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 3 月 3 日
multiply the observations inside the 'step' function with a number that makes sense

サインインしてコメントする。

その他の回答 (1 件)

John Doe
John Doe 2021 年 3 月 17 日
編集済み: John Doe 2021 年 3 月 17 日
Hi,
I feel like i'm really close to getting this. I haven't gotten a successful run yet. For thousands of episodes, the agent continues to use actions way out of the limits. I've tried adding the min/max thing for forcing them in the environment. Do you have any tips on how I can make it converge to stay within the limits? I even tried changing the rewards to be equivalent to be close to the limits.
I'm wondering whether this is perhaps a known issue that is on the roadmap to make the agent pick actions within spec limits for the continuous agent?
  5 件のコメント
John Doe
John Doe 2021 年 3 月 18 日
Here's an example training. I gave it a negative reward for going outside the bounds of the action. This demonstrates how far outside the range the actor is picking. This same thing occurs for more episodes (5000) , although I don't have a screenshot for that. Surely there must be something I"m doing wrong? How can I make this converge?
John Doe
John Doe 2021 年 3 月 25 日
I had a bug where I was using normalized values instead of the real values! I was able to solve the environment after that after changing the action to discrete! THanks for all your help and this wonderful toolbox!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeApplications についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by