MATLAB Answers

How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

6 ビュー (過去 30 日間)
Tracy Shang
Tracy Shang 2021 年 5 月 1 日
編集済み: Tracy Shang 2021 年 5 月 4 日
The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.
I ran the following code. After the first episode, the Q(4,1) is set to -1.
However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.
In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.
Why is Q(4,2) set to 0.7441?
Why is Q(4,1) is updated too and set to -1.67?
clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'Plots', "none",...
'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

回答 (1 件)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 5 月 3 日
Can you try
critic.Options.L2RegularizationFactor=0;
This parameter is nonzero by default and likely the reason for the discrepancy you are observing
  1 件のコメント
Tracy Shang
Tracy Shang 2021 年 5 月 4 日
Thanks for your answer!
I tried the code you suggested. The resut showed no difference.
But you inspired me!
I tried another parameter just like as follows. The qTable was updated as shown in the following figure.
critic.Options.OptimizerParameters.GradientDecayFactor =0;
I tried both parameters by add the following codes and the qTable was updated as shown in the following figure. At least, the question about Q(4,1) is solved.
According the parameters I set, the equtation of calculating Qvalue is simplified as follows.
That is, .
Why is Q(4,2) set to -1.4139?
critic.Options.OptimizerParameters.GradientDecayFactor =0;
critic.Options.L2RegularizationFactor=0;
Looking forward to your further answer. Thank you very much!

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by