フィルターのクリア

Q-table issues in the example "Q-learning in the basic grid world"

1 回表示 (過去 30 日間)
Fangyuan Chang
Fangyuan Chang 2020 年 11 月 9 日
コメント済み: Adi Firdaus 2021 年 12 月 11 日
I trained a Q-learning agent in the matlab predefined environment "BasicGridWorld". I have an issue about the updates of the Q-table. When I set the number of episode to be 1, and set the episode step to be 1, I expect that the new updated Q-value equals to (alpha * R) according to the Bellman equation, where alpha is the learning rate and R is the instant reward. However, the code generates a Q-value different from my expectation. Can anyone help? The code is attached as follows:
rng(0)
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate = 0.1;
critic.Options.L2RegularizationFactor = 0;
critic.Options.Optimizer = "sgdm";
critic.Options.OptimizerParameters.Momentum = 0;
opts = rlQAgentOptions;
opts.EpsilonGreedyExploration.Epsilon = 0.8;
opts.EpsilonGreedyExploration.EpsilonMin = 0.01;
opts.EpsilonGreedyExploration.EpsilonDecay = 0.01;
opts.DiscountFactor = 0.5;
agent = rlQAgent(critic,opts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'StopTrainingValue',30,...
'Verbose',true,...
'Plots','none');
trainOpts.ScoreAveragingWindowLength = 50;
trainingStats = train(agent,env,trainOpts);
trained_critic=getCritic(agent);
trained_table = getLearnableParameters(trained_critic);
trained_qtable=trained_table{1};
% check the updated Q-value
[r,c]=find(trained_table{1,1}~=0);
Q_value = trained_table{1,1}(r,c)
Can anyone help point out my error?
Thank you very much.

回答 (0 件)

カテゴリ

Help Center および File ExchangeTraining and Simulation についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by