Problems on the rlnumeric action space and how to modify it in the custom reinforcement learning environment

7 ビュー (過去 30 日間)
I have defined the action space for the custom RL environment class using the code below:
function this = MyEnvironment()
numObs = 79;
ObservationInfo = rlNumericSpec([numObs 1]);
numAct = 3;
ActionInfo = rlNumericSpec([numAct 1], Lowerlimit=-1, Upperlimit = 1)
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo)
end
It is clearly stated that the action space is a 3x1 vector in a continuous space with the range of (-1, 1). To step the state, I am using these code:
[delVx, delVy, delVz] = getdelV(Action);
where getdelV() is defined in the class as a helper method
function [delVx, delVy, delVz] = getdelv(this, action)
Vmag = action(1)*this.Max_Vmag;
theta = action(2)*pi/2 + pi;
phi = action(3)*pi + pi;
delVx = Vmag*sin(theta)*cos(phi);
delVy = Vmag*sin(theta)*sin(phi);
delVz = Vmag*cos(theta);
end
However, everytime I validate the environment then an error message appear
>> validateEnvironment(test)
Error using rl.env.MATLABEnvironment/validateEnvironment (line 72)
Unable to evaluate step function.
Caused by:
Undefined function 'getdelV' for input arguments of type 'double'.
it seems that the input arguments is double instead, I am confused since the action should be a multi dimension vector. Therefore, I want to ask what's the nature of 'Action' in Reinforcement Learning Environment? Is there an example for the action space custom environment for me as a reference? (the matlab example only use the simple pole with a discrete action space)
Any help is appreciated.
  3 件のコメント
Tan
Tan 2025 年 1 月 2 日
Below are the more detailed:
methods
% Contructor method creates an instance of the environment
% Change class name and constructor name accordingly
function this = MyEnvironment()
% Initialize Observation settings
% numObs = this.statenum;
numObs = 79;
ObservationInfo = rlNumericSpec([numObs 1]);
ObservationInfo.Name = 'State of Spacecraft and Debris';
ObservationInfo.Description = 'Xa Ya Za Vxa Vya Vza fa Xd1 Yd1 Zd1 Vxd1 Vyd1 Vzd1 d t p.....';
% Initialize Action settings
% Action State mag_delV theta phi in km/s radian radian
numAct = 3;
ActionInfo = rlNumericSpec([numAct 1], Lowerlimit = -1, Upperlimit = 1, DataType="double");
ActionInfo.Name = 'del V';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
% Initialize property values and pre-compute necessary values
% updateActionInfo(this);
end
% Apply system dynamics and simulates the environment with the
% given action for one step.
function [Observation,Reward,IsDone,Info] = step(this,Action)
Info = [];
% Get action, convert action to delV in x y z coordinate
[delVx, delVy, delVz] = getdelV(Action);
%% update increment of spacecraft velocity
UpdatedV_State = this.State;
UpdatedV_state(4) = this.state(4) + delVx;
UpdatedV_state(5) = this.state(5) + delVy;
UpdatedV_state(6) = this.state(6) + delVz;
% obtain next state vector, the residual energy is reset = 0.
Observation = getNextState(UpdatedV_state,this.Ts);
% Update residual energy
Observation = this.state(7) - Action(1);
% Update system states
this.State = Observation;
this.Step = this.Step + 1;
% Check terminal condition
% ~
% ~
Isdone = true;
this.IsDone = IsDone;
% Get reward
Reward = getReward(this);
% (optional) use notifyEnvUpdated to signal that the
% environment has been updated (e.g. to update visualization)
notifyEnvUpdated(this);
end
% Reset environment to initial state and output initial observation
function InitialObservation = reset(this)
% this.numofdebris = getnumofdebris(this.DebrisState);
% this.statenum = 7 + 9*this.numofdebris;
% this.State = zeros(this.statenum);
% Initial spacecraft value:
% Initialize spacecraft state [X0; Y0; Z0, Vx0; Vy0; Vz0; fcmax]
% Position in km and velocity in km/s
InitialSpaceCraftState = [this.X0; this.Y0; this.Z0; this.Vx0; this.Vy0; this.Vz0; this.fcmax];
% obtain debris state from excel file
% then merge with spacecraft state
% for advanced processing, debris state should be separately
% processed*****
InitialObservation = [InitialSpaceCraftState; this.DebrisState];
this.State = InitialObservation;
this.Step = 0;
notifyEnvUpdated(this)
end
end
%% helper method
methods
% Helper methods to create the environment
%% compute delV in x y z direction
function [delVx, delVy, delVz] = getdelV(this, action)
% action = [Vmag, theta, phi] in [km/s radian radian]
% compute delV in x y z direction
Vmag = action(1)*this.Max_Vmag;
theta = action(2)*pi/2 + pi;
phi = action(3)*pi + pi;
delVx = Vmag*sin(theta)*cos(phi);
delVy = Vmag*sin(theta)*sin(phi);
delVz = Vmag*cos(theta);
end
% ...
% ...
% ...
% ...
% ...
end
Tan
Tan 2025 年 1 月 3 日
Thank you very much for you help. I check the method one by one and found that I have not call the helper method with a 'this'.

サインインしてコメントする。

採用された回答

Hitesh
Hitesh 2025 年 1 月 2 日
編集済み: Hitesh 2025 年 1 月 2 日
Hi @Tan
The method "getdelv" is defined with a lowercase "v", but the step function attempts to call "getdelV" with an uppercase "V". This mismatch in method names leads to MATLAB being unable to find the function, resulting in this error.
I have revised the code since this code correctly inherits from "rl.env.MATLABEnvironment", making it compatible with MATLAB's Reinforcement Learning Toolbox. It includes well-defined methods for initialization, stepping through the environment, and resetting it, which are essential for the environment's functionality.
classdef MyEnvironment < rl.env.MATLABEnvironment
properties
Max_Vmag = 10;
% Add any other necessary properties here
end
methods
function this = MyEnvironment()
numObs = 79;
ObservationInfo = rlNumericSpec([numObs 1]);
numAct = 3;
ActionInfo = rlNumericSpec([numAct 1], 'LowerLimit', -1, 'UpperLimit', 1);
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
% Implement the step function
function [nextObs, reward, isDone, loggedSignals] = step(this, action)
% Correctly call the helper method
[delVx, delVy, delVz] = getdelv(this, action);
% Dummy implementation for step
nextObs = zeros(79, 1); % Placeholder for the next observation
reward = 0; % Placeholder for the reward
isDone = false; % Placeholder for the done flag
loggedSignals = []; % Placeholder for any logged signals
end
% Implement the reset function
function initialObservation = reset(this)
% Return an initial observation
initialObservation = zeros(79, 1); % Placeholder for initial observation
end
% Helper method to calculate delV
function [delVx, delVy, delVz] = getdelv(this, action)
Vmag = action(1) * this.Max_Vmag;
theta = action(2) * pi/2 + pi;
phi = action(3) * pi + pi;
delVx = Vmag * sin(theta) * cos(phi);
delVy = Vmag * sin(theta) * sin(phi);
delVz = Vmag * cos(theta);
end
end
end
After implementing the required methods, you need to validate the environment again by running below command in "Command Window":
env = MyEnvironment();
validateEnvironment(env);
For more information regarding "Reinforcement Learning Environments", kindly refer to the below MATLAB documentation
  3 件のコメント
Hitesh
Hitesh 2025 年 1 月 2 日
Hi @Tan,
You need to ensure the following points in your code:
  • Property Initialization: Properties such as "Max_Vmag", "X0", "Y0", "Z0", "Vx0", "Vy0", "Vz0", "fcmax", and "DebrisState" need to be initialized within the class. This ensures that these properties have default values, which are used in the environment's initialization and operation.
  • Constructor Changes: The constructor needs to initialize "ObservationInfo" with the correct syntax for specifying data types and limits.
  • Terminal Condition: The "IsDone" variable need to set to "false" in the "step" method. This need to be modified to implement the actual terminal condition logic.
  • State Transition Logic: The "step" method correctly updates the velocity using the "getdelV" function and updates the state using a "getNextState" method. The "getNextState" method is a placeholder for state transition logic, which updates the position based on velocity and time step.
Kindly refer to the following revised code:
classdef MyEnvironment < rl.env.MATLABEnvironment
properties
Max_Vmag = 10;
X0 = 0;
Y0 = 0;
Z0 = 0;
Vx0 = 0;
Vy0 = 0;
Vz0 = 0;
fcmax = 1;
DebrisState = zeros(72, 1);
State
Step
IsDone
Ts = 0.1;
end
methods
function this = MyEnvironment()
numObs = 79;
ObservationInfo = rlNumericSpec([numObs 1], 'DataType', 'double');
ObservationInfo.Name = 'State of Spacecraft and Debris';
ObservationInfo.Description = 'Xa Ya Za Vxa Vya Vza fa Xd1 Yd1 Zd1 Vxd1 Vyd1 Vzd1 d t p.....';
numAct = 3;
ActionInfo = rlNumericSpec([numAct 1], 'LowerLimit', -1, 'UpperLimit', 1, 'DataType', "double");
ActionInfo.Name = 'del V';
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
function [Observation, Reward, IsDone, Info] = step(this, Action)
Info = [];
[delVx, delVy, delVz] = getdelV(this, Action);
UpdatedV_state = this.State;
UpdatedV_state(4) = this.State(4) + delVx;
UpdatedV_state(5) = this.State(5) + delVy;
UpdatedV_state(6) = this.State(6) + delVz;
Observation = this.getNextState(UpdatedV_state);
Observation(7) = this.State(7) - Action(1);
this.State = Observation;
this.Step = this.Step + 1;
IsDone = false; % Modify this condition as necessary
this.IsDone = IsDone;
Reward = this.getReward();
notifyEnvUpdated(this);
end
function InitialObservation = reset(this)
InitialSpaceCraftState = [this.X0; this.Y0; this.Z0; this.Vx0; this.Vy0; this.Vz0; this.fcmax];
InitialObservation = [InitialSpaceCraftState; this.DebrisState];
this.State = InitialObservation;
this.Step = 0;
notifyEnvUpdated(this);
end
function [delVx, delVy, delVz] = getdelV(this, action)
Vmag = action(1) * this.Max_Vmag;
theta = action(2) * pi / 2 + pi;
phi = action(3) * pi + pi;
delVx = Vmag * sin(theta) * cos(phi);
delVy = Vmag * sin(theta) * sin(phi);
delVz = Vmag * cos(theta);
end
function nextState = getNextState(this, currentState)
% Placeholder implementation of state transition logic
% Update position based on velocity and time step
nextState = currentState;
nextState(1:3) = currentState(1:3) + currentState(4:6) * this.Ts;
% Add other state transition logic as needed
end
function reward = getReward(this)
% Placeholder implementation of reward function
reward = 0; % Modify this logic to calculate actual reward
end
end
end
After implementing the required methods, you need to validate the environment again by running below command in "Command Window":
env = MyEnvironment();
validateEnvironment(env);
Tan
Tan 2025 年 1 月 2 日
Alright, thank you very much. I will review my whole code carefully.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeEnvironments についてさらに検索

製品


リリース

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by