Is it a good idea to stop training if there is a violation of a hard constraint in reinforcement learning?

Question

Aysegul Kahraman 2022 年 3 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1673944-is-it-a-good-idea-to-stop-training-if-there-is-a-violation-of-a-hard-constraint-in-reinforcement-lea

回答済み: Avadhoot 2023 年 11 月 8 日

Hi,

I have a physical model in Simulink and I am trying to do the scheduling for my components by using RL toolbox.

There are some constraints like the one in 'Water Distribution System Scheduling' example. For example, for the tank example, the water level can not go lower than 0 or beyond the level of the tank. This example also uses the stopping simulation approach. However, if I train my model in this way, during the simulation it might still go beyond the limits and stop the simulation before the actual simulation time ends, which means the schedule for the dayahead is not completed.

What is the best method for maintaining hard constraints or physical boundaries with RL?

Any comments would be appreciated.

Thanks!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Avadhoot 2023 年 11 月 8 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1673944-is-it-a-good-idea-to-stop-training-if-there-is-a-violation-of-a-hard-constraint-in-reinforcement-lea#answer_1348447

Hi Aysegul,

I understand that you are facing some confusion in incorporating hard constraints in your reinforcement learning system. Usually, the decision to stop the training after a violation of hard constraints depends on the type of problem. In your case as you are working on a physical system, violating hard constraints can cause incorrect, illogical, or non-sensical results. So, it is better to stop the training when the system violates a hard constraint.

There exist better ways to incorporate hard constraints in the reinforcement learning environment such that the learner is incentivized to stay within the constraints. Some of the methods are listed below:

1) Reward Shaping:

You can shape the reward function to provide explicit penalties whenever the agent violates a constraint or crosses a boundary. By assigning negative rewards for constraint violations, the agent learns to avoid such actions.

2) Barrier Methods:

Barrier methods involve introducing a barrier function that penalizes constraint violations. The barrier function increases as the agent approaches the constraint boundary, discouraging violations.

3) Constraint Enforcement:

You can also use the constraint enforcement block in Simulink to model your hard constraints directly into the physical system so that the model never violates the constraints.

You can find a detailed example of constraint enforcement on the link mentioned below:

https://www.mathworks.com/help/slcontrol/ug/train-reinforcement-learning-agent-with-constraint-enforcement.html

I hope this helps.