Reward in training manager higher than should be

Question

Mohammed Eleffendi 2021 年 3 月 10 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be

コメント済み: zhq 2024 年 8 月 29 日

採用された回答: Mohammed Eleffendi

MATLAB Online で開く

Hi,

I am trying to train a reinfocement learning agent and I have the environment setup in simulink. I'm facing two issues:

1- The reward in the training manager appears to be much higher than it should be. As shown in the picture below, the scope connected to the reward signal shows a reward value of 1 which is correct. However, in the training manager it is 70 which is not correct.

2- After a number of episodes, the training stops and I get an error message:

Error using rl.env.AbstractEnv/simWithPolicy (line 82)
An error occurred while simulating "ADSTestBed" with the agent "falsifier_agent".
Error in rl.task.SeriesTrainTask/runImpl (line 33)
            [varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
            [varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 166)
            [varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 170)
            [this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 194)
                runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
                runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
            run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 421)
            run(trainer);
Error in rl.train.TrainingManager/run (line 211)
            train(this);
Error in rl.agent.AbstractAgent/train (line 78)
    TrainingStatistics = run(trainMgr);
Error in ADSTestBedScript (line 121)
trainingStats = train(falsifier_agent,env,trainOpts);
Caused by:
    Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
    Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
        Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
        Unable to compute gradient from representation.
            Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
            Error using 'backwardLoss' in Layer rl.layer.FcnLossLayer. The function threw an
            error and could not be executed.
                Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
                Number of elements must not change. Use [] as one of the size inputs to
                automatically calculate the appropriate size for that dimension.

I should mention that I have another agent in the simulink model but that agent is not being trained.

Version 2020b

Any help is appreciated. Thanks

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Mohammed Eleffendi 2021 年 3 月 18 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be#answer_650882

For the first issue, the reward in the training manager is the cumulative episode reward whereas the reward in the scope is a plot of the reward for every time step. So the reward in the training manager is correct there is no issue in here.

For the second issue, it turns out if you have 'UseDevice" set to 'gpu' you will encounter this error. Change it to 'cpu' and the error disappears. Support is exploring what is causing this issue.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Emmanouil Tzorakoleftherakis 2021 年 3 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be#answer_645267

Cannot be sure about the error, but it seems somewhere in your setup you are currently changing changing the number of parameters/inputs (check inputs to the RL Agent block).

For your first question, individual reward at each time step is different than the episode reward shown in the Episode Manager. The latter sums up the individual rewards over all time steps of an episode

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Gaurav Shetty 2021 年 9 月 14 日

Please check my code and ldentify the possible cause for the error

zhq 2024 年 8 月 29 日

我想问下：如果我把individual reward按时间步累加得到的值应该和the episode reward shown in the Episode Manager差不多对不对？我遇到一个不太明白的场景https://ww2.mathworks.cn/matlabcentral/answers/2148684-reinforcement-learning-training-monitor-episode-reward

サインインしてコメントする。

Reward in training manager higher than should be

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Reward in training manager higher than should be

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示