Reinforcement Learning Toolbox- Multiple Discrete Actions for actor critic agent (imageInputLayer issues)

Question

Anthony 2019 年 9 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/482331-reinforcement-learning-toolbox-multiple-discrete-actions-for-actor-critic-agent-imageinputlayer-is

編集済み: Huzaifah Shamim 2020 年 7 月 10 日

I am workig on setting up a rlACAgent using the reinforcement learning toolbox. I have succesffully created this agent with a system that has only one set of finite actions but I am looking to expand the set of finite actions to any arbitary number but in this case 4 and I think i am messing up something with the layer creation.

My code is below everything runs successfully until I try to create the Agent and I get the following error: "The dimensions of observations are not compatible with those of Observation Info."

I feel like I'm missing something fundamental about the layer construction here but I've been scratching my head for a while. Any help would be appreciated!

obsInfo = rlNumericSpec([2 1]);
obsInfo.Name = 'Car Position';
obsInfo.Description = {'x, y'};
% Actions
actInfo = rlFiniteSetSpec({[-1 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 1],...
 [-1 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 1],...
 [-1 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 1],...
 [-1 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 1]});
actInfo.Name='Wheel Speeds';
actInfo.Description = {'Front Right Speed','Front Left Speed','Rear Right Speed',...
 'Rear Left Speed'};
%% Build Custom Environment
env=rlFunctionEnv(obsInfo,actInfo,'DriveStepFunction','DriveResetFunction')
%% Extract Data from Environment
obsInfo = getObservationInfo(env)
numObservation = obsInfo.Dimension(1);
actInfo = getActionInfo(env)
numActions = actInfo.Dimension(2);
%% Develop Critic
criticNetwork = [
 imageInputLayer([numObservation numActions 1],'Normalization','none','Name','state')
 fullyConnectedLayer(numObservation,'Name','CriticFC')];
criticOpts = rlRepresentationOptions('LearnRate',.01,'GradientThreshold',1);
critic = rlRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);
%% Develop Actor
actorNetwork = [
 imageInputLayer([numObservation numActions 1],'Normalization','none','Name','state')
 fullyConnectedLayer(numActions,'Name','action')];
actorOpts = rlRepresentationOptions('LearnRate',.01,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
 'Observation',{'state'},'Action',{'action'},actorOpts);
%% Develop Agent
agentOpts = rlACAgentOptions(...
 'NumStepsToLookAhead',5,...
 'DiscountFactor',1,...
 'EntropyLossWeight',.4);
agent = rlACAgent(actor,critic,agentOpts);

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Anthony 2020 年 7 月 9 日

I never managed to get this to work with an Actor Critic Agent but I switched to a DQN and got it to work

Huzaifah Shamim 2020 年 7 月 10 日

編集済み: Huzaifah Shamim 2020 年 7 月 10 日

Ah nice. For the DQN network, did you set up with observation and action input or just observation?

could i take a look at ur environment and how you set up certain things?

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2019 年 10 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/482331-reinforcement-learning-toolbox-multiple-discrete-actions-for-actor-critic-agent-imageinputlayer-is#answer_394833

Hi Anthony,

I believe this link should help. Looks like the action space is not set up correctly. For multiple discrete actions, you need to calculate all possible combinations of discrete actions, and use these with rlFiniteSetSpec.

2 件のコメント
なしを表示なしを非表示

Anthony 2019 年 10 月 4 日

MATLAB Online で開く

Hi Emmanouil,

Thanks for the help! I fixed the way I am representing my action space with the following code:

vectors = { [-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
    [-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
    [-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
    [-1 -.8 -.6 -.4 0 .4 .6 .8 1]'};
%input data: cell array of vectors
n = numel(vectors); % number of vectors
combs = cell(1,n); % pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); % the reverse order in these two
% comma-separated lists is needed to produce the rows of the result matrix
combs = cat(n+1, combs{:}); %concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);%reshape to obtain desired matrix
combs=combs';
% Actions
actInfo = rlFiniteSetSpec(num2cell(combs,1));

However, I am still having the same issue and getting the error:

"The dimensions of observations are not compatible with those of Observation Info."

Emmanouil Tzorakoleftherakis 2019 年 10 月 5 日

編集済み: Emmanouil Tzorakoleftherakis 2019 年 10 月 5 日

MATLAB Online で開く

I don't know the specifics of your environment, but the input and output dimensions of the actor and the critic are not set up properly. For instance, for the critic you need to have 1 output (since this is a number) and the input would be determined by the number of observations.

criticNetwork = [
 imageInputLayer([numObservation 1 1],'Normalization','none','Name','state')
 fullyConnectedLayer(1,'Name','CriticFC')];

Along the same lines, total number of actions is 6561 (number of possible combinations of your discrete inputs) so your actor input would be the same as the critic network and the output would be 6561

actorNetwork = [
 imageInputLayer([numObservation 1 1],'Normalization','none','Name','state')
 fullyConnectedLayer(6561,'Name','action')];

This example should be helpful to get an idea how to set up these dimensions.

サインインしてコメントする。

Reinforcement Learning Toolbox- Multiple Discrete Actions for actor critic agent (imageInputLayer issues)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

Reinforcement Learning Toolbox- Multiple Discrete Actions for actor critic agent (imageInputLayer issues)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示