MATLAB Answers

load trained reinforcement learning multi-Agents to sim

29 ビュー (過去 30 日間)
Chao Wang
Chao Wang 2021 年 4 月 16 日 9:33
編集済み: Chao Wang 2021 年 4 月 19 日 15:11
I trained four agents with the Q learning method in reinforcement learning. After the training, the trained agents were loaded into the simulation, but they always chose the same action and remained unchanged, which failed to achieve the expected effect in the previous training.
Here is my code
mdl = 'FOUR_DG_0331';
agentBlk = ["FOUR_DG_0331/RL Agent1", "FOUR_DG_0331/RL Agent2", "FOUR_DG_0331/RL Agent3", "FOUR_DG_0331/RL Agent4"];
oInfo = rlFiniteSetSpec([123,456,789]);
aInfo = rlFiniteSetSpec([150,160,170]);
aInfo1 = rlFiniteSetSpec([150,170]);
obsInfos = {oInfo,oInfo,oInfo,oInfo};
actInfos = {aInfo1,aInfo,aInfo,aInfo};
env = rlSimulinkEnv(mdl,agentBlk,obsInfos,actInfos);
Ts = 0.01;
Tf = 4;
qTable1 = rlTable(oInfo,aInfo1);
qTable2 = rlTable(oInfo,aInfo);
qTable3 = rlTable(oInfo,aInfo);
qTable4 = rlTable(oInfo,aInfo);
criticOpts = rlRepresentationOptions('LearnRate',0.1);
Critic1 = rlQValueRepresentation(qTable1,oInfo,aInfo1,criticOpts);
Critic2 = rlQValueRepresentation(qTable2,oInfo,aInfo,criticOpts);
Critic3 = rlQValueRepresentation(qTable3,oInfo,aInfo,criticOpts);
Critic4 = rlQValueRepresentation(qTable4,oInfo,aInfo,criticOpts);
%/*Code here for agent option**/
%... ....
agent1 = rlQAgent(Critic1,QAgent_opt);
agent2 = rlQAgent(Critic2,QAgent_opt);
agent3 = rlQAgent(Critic3,QAgent_opt);
agent4 = rlQAgent(Critic4,QAgent_opt);
trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes = 1000;
trainOpts.MaxStepsPerEpisode = ceil(Tf/Ts);
trainOpts.StopTrainingCriteria = "EpisodeCount";
trainOpts.StopTrainingValue = 1000;
trainOpts.SaveAgentCriteria = "EpisodeCount";
trainOpts.SaveAgentValue = 15;
trainOpts.SaveAgentDirectory = "savedAgents";
trainOpts.Verbose = false;
trainOpts.Plots = "training-progress";
doTraining = false;
if doTraining
stats = train([agent1, agent2, agent3, agent4],env,trainOpts);
load(trainOpts.SaveAgentDirectory +"/Agents16.mat",'agent');
simOpts = rlSimulationOptions('MaxSteps',ceil(Tf/Ts));
experience = sim(env,[agent1 agent2 agent3 agent4 ],simOpts)
The result of the sim call is that all four agents choose the action 150.The agent does not choose other actions as it does when it is trained.
I don´t understand why... Can somebody help me out on this?

回答 (1 件)

Ari Biswas
Ari Biswas 2021 年 4 月 16 日 23:10
It could mean that the agents have converged to suboptimal policies. You can train the agents for longer to see if there is an improvement. Note that the behavior you see during training has exploration associated with it. If the EpsilonGreedyExploration.Epsilon parameter has not decayed much then the agents are still undergoing exploration. This could be one reason why you see a difference in the sim behavior.
  2 件のコメント
Chao Wang
Chao Wang 2021 年 4 月 19 日 15:10
I've tried training for longer, but the agents still doesn't work。Is this loading method wrong?


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by