- Normalize the action space to [-1, 1], then manually scale and/or clip the actions before applying them in the environment, or
- Always clip the transformed action before passing it to the environment.
How to set boundaries for action in reinforcement leaning?
13 ビュー (過去 30 日間)
古いコメントを表示
There are 3 actions in my environment and the boundaries of them are from [1; 1; 0] to [5; 5; 1]. The codes are as follows:
function this = myEnvClass()
% Initialize Observation settings
ObservationInfo = rlNumericSpec([9 1]);
ObservationInfo.Name = 'ASV States';
%ObservationInfo.Description = 'x, dx, theta, dtheta';
ObservationInfo.Description = 'dx, dy, dz,dl,vx,vy,vz,phi,theta';
% Initialize Action settings
ActionInfo = rlNumericSpec([3 1 1], 'LowerLimit',[1;1;0], 'UpperLimit',[5;5;1]);
ActionInfo.Name = 'ASV Action';
ActionInfo.Description = 'rho,sigma,theta';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
% Initialize property values and pre-compute necessary values
updateActionInfo(this);
% this.State = [400 400 -50 0 0 0 0 0 0]';
end
and the codes of updateActionInfo function are as follow:
function updateActionInfo(this)
% this.ActionInfo.Elements = this.MaxAngle*[-1 1];
this.ActionInfo = rlNumericSpec([3 1 1], 'LowerLimit',[1;1;0], 'UpperLimit',[5;5;1]);
this.ActionInfo.Name = 'ASV Action';
this.ActionInfo.Description = 'rho,sigma,theta';
end
But when I trained the agent(PPO), the actions in step fucntion were always far greater or far less than the boundary value. For example, action = [144, 152, -63], action = [1608, -1463, -598].
I attached my myEnvClass.m, would someone please help me?
0 件のコメント
回答 (1 件)
Umeshraja
2025 年 6 月 9 日
I understand you're encountering an issue where the PPO agent produces actions that exceed the specified bounds, even though you've defined the action limits using rlNumericSpec in MATLAB's Reinforcement Learning Toolbox.
It's important to note that for PPO agents, the LowerLimit and UpperLimit properties in rlNumericSpec are treated as metadata—they're not enforced automatically by the agent. This behavior is documented here:
In contrast, agents like DDPG, TD3, and SAC do perform automatic clipping to ensure actions stay within the specified limits.
To resolve this for PPO, you can either:
Here’s an example:
% Assume agent outputs actions in [-1, 1]
scaledAction = zeros(3,1);
scaledAction(1) = (Action(1) + 1) * 2 + 1; % Maps [-1,1] to [1,5]
scaledAction(2) = (Action(2) + 1) * 2 + 1; % Maps [-1,1] to [1,5]
scaledAction(3) = (Action(3) + 1) * 0.5; % Maps [-1,1] to [0,1]
% Clip to ensure within bounds
scaledAction(1) = min(max(scaledAction(1), 1), 5);
scaledAction(2) = min(max(scaledAction(2), 1), 5);
scaledAction(3) = min(max(scaledAction(3), 0), 1);
Hope this helps!
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Statistics and Machine Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!