The reward is reduced after certain episode on the DQN

Question

seohee han 2020 年 10 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/619718-the-reward-is-reduced-after-certain-episode-on-the-dqn

回答済み: Emmanouil Tzorakoleftherakis 2020 年 11 月 10 日

MATLAB Online で開く

I don't know why the reward is reduced after 240episode.

I attach episode training progress and code.

my state(observation) is 2, continous and action space is 5,

anyone who knows what the problem is, please advise me why the algorithm is not working properly

nI = observationInfo.Dimension(1);  % number of inputs
nL1 = 12;
nL2 = 24;
nL3 = 24;                   % number of neurons
nO =  numel(actionInfo.Elements);  %11
dnn = [
    featureInputLayer(nI,'Normalization','none','Name','state')
    fullyConnectedLayer(nL1,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(nL2,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(nL3,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(nO,'Name','out')];
% 
% figureDFQ
% plot(layerGraph(dnn))
criticOptions = rlRepresentationOptions('LearnRate',0.0001,'GradientThreshold',1, 'UseDevice', "gpu");
critic = rlQValueRepresentation(dnn,observationInfo,actionInfo,'Observation',{'state'},  criticOptions);
agentOptions = rlDQNAgentOptions(...
    'SampleTime',  Ts,...
    'UseDoubleDQN', true,...
    'TargetSmoothFactor',1e-2,...
    'TargetUpdateFrequency', 20,...
    'DiscountFactor',0.99,...  % 0.
    'ExperienceBufferLength',1e8); 
agentOptions.EpsilonGreedyExploration.EpsilonMin = 0.01;
agentOptions.EpsilonGreedyExploration.EpsilonDecay = 0.0001;
agentOptions.EpsilonGreedyExploration.Epsilon = 1;
agent = rlDQNAgent(critic,agentOptions);
%% Train agent
maxsteps = ceil(trun/Ts);
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'ScoreAveragingWindowLength',5,...
    'Verbose',true,...  % display training porgress in the command line
    'UseParallel',false,...
    'StopTrainingCriteria','EpisodeCount',... % AverageReward, EpisodeCount
    'StopTrainingValue',300,...
    'SaveAgentCriteria','EpisodeCount',...
    'SaveAgentValue', 300,...
    'SaveAgentDirectory', "/Agent_5ac2ob_.mat",'agent');
%    'Plots','training-progress',...
trainingStats = train(agent,env,trainingOpts);

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2020 年 11 月 10 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/619718-the-reward-is-reduced-after-certain-episode-on-the-dqn#answer_540595

Hello,

There is no guarantee that the reward will keep going up always when using RL (after all there is certain stochastisity involved when exploring). I would recommend stopping training at the peak and check if the agent is good enough. If it's not then you would need to increase your exploration settings (maybe use smaller epsilon decay value and larger min epsilon value).