フィルターのクリア

How does the "RL external action" is supposed to work?

25 ビュー (過去 30 日間)
Leonardo Molino
Leonardo Molino 2024 年 7 月 25 日 9:03
コメント済み: Shubham 2024 年 7 月 25 日 17:23
Hi all,
As some of you may already know, I have been working for a while with a 3DOF model of a business jet. This model is successfully controlled by the TECS algorithm that gives actions to reach some speed and altitude setpoints. The original idea was to train a DDPG agent to emulate these actions, rewarding it appropriately using the specifications of the TECS algorithm. After many weeks of failures, I would like to abandon this path and make one last test using the external action port of the RL block. The idea would be to run the same system with the TECS in parallel with the agent. The latter receives commands from the TECS directly. So I was wondering how learning with external actions works. Do neural networks update their weights and biases by observing the actions of the external agent? Also, can the action be injected continuously or is it better to proceed with an "on-off" approach? For example, I can start with external actions, then after a certain number of seconds they turn off and there is the agent alone. Are there any documents I can consult on this? Thanks

採用された回答

Shubham
Shubham 2024 年 7 月 25 日 10:17
Hi Leonardo,
Using an external action port in a Reinforcement Learning (RL) block can be a powerful method to facilitate the training of RL agents by leveraging an existing control system, such as your TECS (Total Energy Control System) algorithm. This approach can help guide the RL agent by providing it with actions that are known to be effective, potentially speeding up the learning process.
How Learning with External Actions Works:
When using external actions in the context of RL, the neural network can indeed update its weights and biases by observing the actions of the external agent. The idea is to use the external actions as a form of supervised learning signal, where the RL agent learns to mimic the external controller initially and then gradually takes over the control as it becomes more proficient.
Steps to Implement Learning with External Actions
  1. Parallel Execution: Run the TECS algorithm in parallel with the RL agent. The TECS algorithm provides the actions that are used as a reference for the RL agent.
  2. External Actions Input: Use the external action port of the RL block to feed the actions from the TECS algorithm into the RL agent. This allows the RL agent to observe both the state of the system and the actions taken by the TECS algorithm.
  3. Warm-up Phase: Start with the RL agent observing and learning from the TECS actions. During this phase, the agent tries to mimic the TECS actions as closely as possible.
  4. Gradual Transition: Gradually reduce the dependency on the TECS actions and allow the RL agent to take more control. This can be done by slowly decreasing the weight of the external actions in the loss function or by using an "on-off" approach where the external actions are turned off after a certain period.
On-Off Approach vs Continuous Injection
  • Continuous Injection: Continuously feeding the TECS actions to the RL agent can provide a consistent learning signal. However, it might make it difficult for the agent to learn to act independently.
  • On-Off Approach: Starting with external actions and then turning them off after a certain period can be effective. This allows the RL agent to learn from the TECS initially and then gradually take over control. This approach can help the agent transition from supervised learning to pure reinforcement learning.
  2 件のコメント
Leonardo Molino
Leonardo Molino 2024 年 7 月 25 日 13:04
Hi @Shubham, thank you for your reply! So in your opinion, an "on-off" approach is better. In this case, one could start by showing the agent the external commands and then, after a certain period of time, turn them off and let it go by its own. So, let's say that after t_end_mimic seconds we turn off the external actions; the agent comes into play and takes random actions and, for example, moves the altitude of the aircraft away from the desiered setpoint. After a certain deviation (altitude error), is it better to stop the training completely or do we let the TECS algorithm take control again? The TECS, indeed, is able to bring the aircraft back on the right path. In this latter case, how does this behaviour affect learning? Is it necessary to change the way the agent is rewarded? Thanks
Shubham
Shubham 2024 年 7 月 25 日 17:23
Yes, the "on-off" approach is effective. If the agent deviates significantly after t_end_mimic, it's better to let the TECS take control and correct the course. This stabilizes training and prevents reinforcing bad behavior.
Impact on Learning:
  • Penalize the agent for deviations and TECS interventions. Reward for maintaining control independently.
  • Example Reward: reward = baseReward - deviationPenalty - interventionPenalty.
By managing deviations and structuring rewards, you can help the RL agent learn effectively while ensuring stability.

サインインしてコメントする。

その他の回答 (0 件)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by