I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

4 ビュー (過去 30 日間)
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
"SampleTime",sampleTime,...
"TargetSmoothFactor",1e-3,...
"DiscountFactor",0.995, ...
"MiniBatchSize",128, ...
"ExperienceBufferLength",1e6);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
"MaxEpisodes",maxEpisodes, ...
"MaxStepsPerEpisode",maxSteps, ...
"ScoreAveragingWindowLength",50, ... "StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",10000, ...
"Verbose", true, ...
"Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:

回答 (1 件)

Matteo D'Ambrosio
Matteo D'Ambrosio 2023 年 5 月 28 日
編集済み: Matteo D'Ambrosio 2023 年 5 月 28 日
I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
  1. Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
  2. Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
Hope this helps.

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by