How to set boundaries for action in reinforcement leaning?

Question

Keqiao Wu 2021 年 10 月 23 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1570093-how-to-set-boundaries-for-action-in-reinforcement-leaning

回答済み: Umeshraja 2025 年 6 月 9 日

myEnvClass.m

There are 3 actions in my environment and the boundaries of them are from [1; 1; 0] to [5; 5; 1]. The codes are as follows:

function this = myEnvClass()
            % Initialize Observation settings
            ObservationInfo = rlNumericSpec([9 1]);
            ObservationInfo.Name = 'ASV States';
            %ObservationInfo.Description = 'x, dx, theta, dtheta';
            ObservationInfo.Description = 'dx, dy, dz,dl,vx,vy,vz,phi,theta';
            
            % Initialize Action settings   
            ActionInfo = rlNumericSpec([3 1 1], 'LowerLimit',[1;1;0], 'UpperLimit',[5;5;1]);     
            ActionInfo.Name = 'ASV Action';
            ActionInfo.Description = 'rho,sigma,theta';
            
            % The following line implements built-in functions of RL env
            this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
            
            % Initialize property values and pre-compute necessary values
            updateActionInfo(this);
%             this.State = [400 400 -50 0 0 0 0 0 0]';
            
end

and the codes of updateActionInfo function are as follow:

function updateActionInfo(this)
%              this.ActionInfo.Elements = this.MaxAngle*[-1 1];  
            this.ActionInfo = rlNumericSpec([3 1 1], 'LowerLimit',[1;1;0], 'UpperLimit',[5;5;1]);  
            this.ActionInfo.Name = 'ASV Action';
            this.ActionInfo.Description = 'rho,sigma,theta';
            
end

But when I trained the agent(PPO), the actions in step fucntion were always far greater or far less than the boundary value. For example, action = [144, 152, -63], action = [1608, -1463, -598].

I attached my myEnvClass.m, would someone please help me?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Umeshraja 2025 年 6 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1570093-how-to-set-boundaries-for-action-in-reinforcement-leaning#answer_1566136

MATLAB Online で開く

Hi @Keqiao Wu,

I understand you're encountering an issue where the PPO agent produces actions that exceed the specified bounds, even though you've defined the action limits using rlNumericSpec in MATLAB's Reinforcement Learning Toolbox.

It's important to note that for PPO agents, the LowerLimit and UpperLimit properties in rlNumericSpec are treated as metadata—they're not enforced automatically by the agent. This behavior is documented here:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.util.rlnumericspec.html

In contrast, agents like DDPG, TD3, and SAC do perform automatic clipping to ensure actions stay within the specified limits.

To resolve this for PPO, you can either:

Normalize the action space to [-1, 1], then manually scale and/or clip the actions before applying them in the environment, or
Always clip the transformed action before passing it to the environment.

Here’s an example:

% Assume agent outputs actions in [-1, 1]
scaledAction = zeros(3,1);
scaledAction(1) = (Action(1) + 1) * 2 + 1; % Maps [-1,1] to [1,5]
scaledAction(2) = (Action(2) + 1) * 2 + 1; % Maps [-1,1] to [1,5]
scaledAction(3) = (Action(3) + 1) * 0.5;   % Maps [-1,1] to [0,1]
% Clip to ensure within bounds
scaledAction(1) = min(max(scaledAction(1), 1), 5);
scaledAction(2) = min(max(scaledAction(2), 1), 5);
scaledAction(3) = min(max(scaledAction(3), 0), 1);

Hope this helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to set boundaries for action in reinforcement leaning?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How to set boundaries for action in reinforcement leaning?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示