Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

Question

wujianfa93 2020 年 5 月 27 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/535048-error-appears-when-setting-the-multi-dimensional-actions-in-matlab-environment-reinforcement-learni

コメント済み: Ryan Comeau 2020 年 6 月 1 日

As shown in the following codes, three actions, whose ranges are [0.1 10], [0.1 10] and [0 pi], respectively, are set:

%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1],'LowerLimit',[0.1 0.1 0]','UpperLimit',[10 10 pi]');
ActionInfo.Name = 'Action'; 
%% Environment
env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunction','myResetFunction');
rng(0);
InitialObs = reset(env);

Then, the actions are assigned three variables in the function myStepFunction (Action, LoggedSignals) as follows:

function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1);
para_rho2 = Action(2);
para_theta = Action(3);
......
end

Run the main.m, it is normal.

However, when running the following instruction, an error appears:

step(env,10)

Index exceeds the number of array elements (1)

Error: myStepFunction (line 23)

para_rho2 = Action(2);

Why does the dimension of the action is changed as 1? How to address this error?

If I set the variables para_rho2 and para_theta as constants, and change the dimension of the action as [1 1] in rlNumericSpec, then the instruction step(env,10) can be normally executed.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ryan Comeau 2020 年 5 月 29 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/535048-error-appears-when-setting-the-multi-dimensional-actions-in-matlab-environment-reinforcement-learni#answer_442283

MATLAB Online で開く

Hello, so I've take a look at the rocket lander code environment which MATLAB gives as an example. What they do, it that every action is scaled between 0 and 1. The maximum values for the actions are then stored in your environment properties as a vector of values defining the min and max values. When we hop into the step function, the actions get scaled. I know this seems strange, and i'm not sure if it's the best approach but i'm not a veteran of RL. So, what you should do is the following:

%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1 1],'LowerLimit',0,'UpperLimit',1);
ActionInfo.Name = 'Action'; 
%stuff...
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1).*env.borders(1); %so open rocket lander from MATLAB and take a look
para_rho2 = Action(2).*env.borders(2);
para_theta = Action(3).*env.borders(3);
......
end

Hope this helps

RC

2 件のコメント
なしを表示なしを非表示

wujianfa93 2020 年 6 月 1 日

編集済み: wujianfa93 2020 年 6 月 1 日

Many thanks for your reply! But I think I have solved this problem. When I continued designing the corresponding DDPG training process and start training, I found that the whole programme can correctly run. So I guess the verification using the function step() may not be necessary.

Ryan Comeau 2020 年 6 月 1 日

Awesome, nice work.

サインインしてコメントする。

Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示