Hi,
I am using the Matlab Reinforcement Learning toolbox to train an rlQAgent.
The issue that I am facing is that the corresponding QTable, i.e., the output of the command getLearnableParameters(getCritic(qAgent)), is reset each time the train command is used.
Is it possible to avoid this reset so to train further a previously trained agent?
Thank you
Corrado

 採用された回答

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2020 年 5 月 19 日
編集済み: Emmanouil Tzorakoleftherakis 2020 年 5 月 20 日

0 投票

If you stop training, you should be able to continue from where you left off. I called 'train' on the basic grid world example a couple of times in a row and the output of 'getLearnableParameters(getCritic(qAgent))' was different. You can always save the trained agent and reload it as well to make sure you don't accidentally delete it.
Update:
There is a regularization term added to the loss which causes the other entries to change slightly. To avoid this, you can type:
qRepresentation.Options.L2RegularizationFactor=0;

5 件のコメント

Corrado Possieri
Corrado Possieri 2020 年 5 月 20 日
編集済み: Corrado Possieri 2020 年 5 月 20 日
I am actually traying to set the initial Qtable for the agent.
If I run the code
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qTable.Table = randn(size(qTable.Table));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
agentOpts = rlQAgentOptions;
agentOpts.DiscountFactor = 1;
qAgent = rlQAgent(qRepresentation,agentOpts);
trainOpts = rlTrainingOptions;
trainOpts.Plots = 'none';
trainOpts.MaxEpisodes = 1;
trainOpts.MaxStepsPerEpisode = 1;
trainOpts.Verbose = 1;
QTable0 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable1 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable2 = getLearnableParameters(getCritic(qAgent));
disp(find(QTable0{1} ~= QTable1{1}))
disp(find(QTable1{1} ~= QTable2{1}))
I get what I expect, that is just one and two entries of the QTable are changed.
However, if I try to force the initial value of the QTable
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qTable.Table = randn(size(qTable.Table));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
agentOpts = rlQAgentOptions;
agentOpts.DiscountFactor = 1;
qAgent = rlQAgent(qRepresentation,agentOpts);
trainOpts = rlTrainingOptions;
trainOpts.Plots = 'none';
trainOpts.MaxEpisodes = 1;
trainOpts.MaxStepsPerEpisode = 1;
trainOpts.Verbose = 1;
QTable0 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable1 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable2 = getLearnableParameters(getCritic(qAgent));
disp(find(QTable0{1} ~= QTable1{1}))
disp(find(QTable1{1} ~= QTable2{1}))
all its entries are perturbed as if the QTable is somehow reinitialized.
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2020 年 5 月 20 日
Maybe I am missing something but looks like the two scripts posted are exactly the same
Corrado Possieri
Corrado Possieri 2020 年 5 月 20 日
The difference is that in the second script the QTable is initialized randomly with the following additional line
qTable.Table = randn(size(qTable.Table));
If you run the two script you will see that just one entry of the QTable is modified by the training algorithm, whereas, in the second, the whole QTable is changed by just a single step of the training algorithm.
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2020 年 5 月 20 日
Updated my answer above with a solution - hope that helps.
Corrado Possieri
Corrado Possieri 2020 年 5 月 20 日
Thank you Emmanouil, this solved the issue.

サインインしてコメントする。

その他の回答 (0 件)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by