Invalid input argument type or size such as observation, reward, isdone or loggedSignals. (Reinforcement learning toolbox)

Question

Kacjer Frank 2020 年 11 月 12 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/645148-invalid-input-argument-type-or-size-such-as-observation-reward-isdone-or-loggedsignals-reinforce

コメント済み: Emmanouil Tzorakoleftherakis 2020 年 11 月 22 日

% Create observation specifications.
numObservations = 6;
obsInfo = rlNumericSpec([numObservations 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'Information on reference voltage, measured capacitor voltage and load current';
% Create action specifications.
load('Actions.mat')
actInfo = rlFiniteSetSpec(num2cell(actions,2)); 
actInfo.Name = 'states';
agentblk = 'Reinforcement_learning_controller_discrete/RL_controller/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
rng(0)
dnn = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedLayer(24, 'Name','actorFC1') % why 24,48
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(24, 'Name','CriticStateFC2')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(length(actInfo.Elements),'Name','output')];
agentOptions = rlDQNAgentOptions(...
    'SampleTime',20e-6,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',3000,... 
    'UseDoubleDQN',false,...
    'DiscountFactor',0.9,...
    'MiniBatchSize',64);
agent = rlDQNAgent(critic,agentOptions);
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',1000,...
    'MaxStepsPerEpisode',500,...
    'ScoreAveragingWindowLength',5,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',200,...
    'SaveAgentCriteria','EpisodeReward',...
    'SaveAgentValue',200);
doTraining = true;
if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainingOptions);
else
    % Load the pretrained agent for the example.
    load('SimulinkVSCDQN.mat','agent');
end
simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);

Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)

Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)

Unable to compute gradient from representation.

Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)

Unable to evaluate the loss function. Check the loss function and ensure it runs successfully.

Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)

Number of elements must not change. Use [] as one of the size inputs to automatically calculate the appropriate size for that dimension.

The elements of action are 128x1 cell. It has 7 action, each with 2 possibile value, which results in 128x1 cell. When I set two possible elements in the actInfo manually, the model works well. However, the error presented above occurs when I use the 128x1 cell as the elements.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2020 年 11 月 13 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/645148-invalid-input-argument-type-or-size-such-as-observation-reward-isdone-or-loggedsignals-reinforce#answer_543613

編集済み: Emmanouil Tzorakoleftherakis 2020 年 11 月 13 日

Hello,

It's challenging to reproduce this without having access to a reproduction model (including the environment definition).

I would recommend comparing your code with this example which is similar in nature (has multiple discrete actions) and particularly lines 237-248 in RocketLander.m. Make sure each element in your cell array has appropriate dimensions whether that's 1x2 or 2x1.

If this does not work, check the dimensions of the IsDone and reward signal as well and make sure these are scalars.

2 件のコメント
なしを表示なしを非表示

Kacjer Frank 2020 年 11 月 22 日

Thank you for your reply. In the example you mentioned in the link, I wondered why the number of actions numAct is the number of elements of the actionInfo although there are only two outputs.

Emmanouil Tzorakoleftherakis 2020 年 11 月 22 日

If you have multiple discrete outputs, the way to set up the network currently in Reinforcement Learning Toolbox is to find all the possible combinations of those actions, and set these as outputs. So each combination is a possible output.

サインインしてコメントする。

Invalid input argument type or size such as observation, reward, isdone or loggedSignals. (Reinforcement learning toolbox)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Invalid input argument type or size such as observation, reward, isdone or loggedSignals. (Reinforcement learning toolbox)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示