Reinforcement Learning - PPO agent with hybrid action space

Question

Federico Toso 2023 年 10 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2037816-reinforcement-learning-ppo-agent-with-hybrid-action-space

コメント済み: Emmanouil Tzorakoleftherakis 2023 年 10 月 30 日

I have a task which involves both discrete and continuous actions.

I would like to use PPO since it seems suitable in my case. I know that this algorithm support both discrete and continuous actions spaces, but it seems that current Mathworks implementation does not support both of them at the same time.

I was thinking about the following workaround:

Use two PPO agents (one for discrete actions, the other one for continuous actions)
Let them share the same critic network (this should be feasible, since they share the same observation space)
Train them in parallel with Reinforcement Learning App Designer, with syncronous parameter updates

In this way I may be able to achieve a result that resembles what I would get with a single PPO "hybrid" agent.

My questions:

1) Are the above 3 steps possible with current Mathworks implementation? (I'm mostly concerned about possible limitations of Reinforcement Learning App Designer in this sense)

2) Is there any other workaround that would be advisable for my case? (PPO with hybrid action space)

Of course any reference to an existing example would be highly welcome

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2023 年 10 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2037816-reinforcement-learning-ppo-agent-with-hybrid-action-space#answer_1342376

Hello,

The workaround you suggested makes sense to me. Unfortunately though, bullet #3 is not currently supported. You cannot do multi-agent training in the app (you would have to set the problem up programmatically) and you can also not do paralle multi-agent training (you would have to either train each agent in parallel sequentially, or do multi-agent training without parallel).

Can you provide more details on the application? It will help us prioritize support of hybrid PPO in a future release.

2 件のコメント
なしを表示なしを非表示

Federico Toso 2023 年 10 月 29 日

Thank you for your reply. My application currently involves a lot of continuous actions and only one discrete action, which allows the agent to choose one out of three alternatives at each time step. This discrete action is crucial for the application and I cannot bypass it. At the same time, it would really be a huge overhead to add and train a new PPO agent (sequentially, after the "main" continuous agent) only for this single discrete action. I love Mathworks approach to RL in general, but the impossibility to include hybrid action spaces in the same PPO agent from my point of view is a drawback that I currently don't know how to efficiently overcome. It'a pity also because, from my very personal point of view, this ugrade would not require huge modifications theoretically, given the flexible nature of PPO algorithm. The main adjustments would be related to the output layer of the Actor net and the calculation of the new total entropy loss. That said, I understand that there may be other implementation difficulties from your side that I'm not aware of

Emmanouil Tzorakoleftherakis 2023 年 10 月 30 日

Thank you for the reply. I will take the feedback to the development team.

サインインしてコメントする。

Reinforcement Learning - PPO agent with hybrid action space

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reinforcement Learning - PPO agent with hybrid action space

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示