After training my DDPG RL agent and saving it, unexpected simulation output

Question

Abdul Basith Ashraf 2021 年 4 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/791734-after-training-my-ddpg-rl-agent-and-saving-it-unexpected-simulation-output

コメント済み: Rik 2021 年 4 月 5 日

After training my DDPG RL agent and saving it, it does not produce the expected result.

After training, first I ran the simulink model, I got the wrong kind of output. Then I loaded the saved mat file and ran

sim(env,saved_agent,simOpts)

The output (was a flat profile) which was simply different from what it was during training.

These are the agent options

agentOptions = rlDDPGAgentOptions(...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e3,...
    'SampleTime',0.1,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',64,...
    "NumStepsToLookAhead",10,...
    "SaveExperienceBufferWithAgent",true, ...
    "ResetExperienceBufferBeforeTraining",false);
agentOptions.NoiseOptions.Variance = 0.6;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;

And these are my training options

maxepisodes = 1000;
maxsteps = 1000;
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    "ScoreAveragingWindowLength",50,...
    'StopTrainingValue',1000,...
    'SaveAgentCriteria',"EpisodeReward", ...
    "SaveAgentValue",-1e2);

I want the output to be from the learned agent and it cannot be flat at all

EDIT

When I check inside my agent, it only has two properties

>>agent
agent = 
  rlDDPGAgent with properties:
        AgentOptions: [1×1 rl.option.rlDDPGAgentOptions]
    ExperienceBuffer: [1×1 rl.util.ExperienceBuffer]

2 件のコメント
なしを表示なしを非表示

Emmanouil Tzorakoleftherakis 2021 年 4 月 5 日

How many episodes did you train for? Simulation results are never going to be exactly the same as what you were seeing in training for a few reasons, including that during training there is added exploration. I would make sure that the output you are seeing during training is not purely due to exploration noise, i.e., make sure that your actor network is set up so that the deterministic output is not flat

Rik 2021 年 4 月 5 日

If this question is unclear, why did you mark an answer as accepted answer? You can simply post a comment with clarifications, or even edit your question to clarify it.

サインインしてコメントする。

サインインしてこの質問に回答する。