- Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
- Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.
    18 ビュー (過去 30 日間)
  
       古いコメントを表示
    
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
    "SampleTime",sampleTime,...
    "TargetSmoothFactor",1e-3,...
    "DiscountFactor",0.995, ...
    "MiniBatchSize",128, ...
    "ExperienceBufferLength",1e6); 
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
    "MaxEpisodes",maxEpisodes, ...
    "MaxStepsPerEpisode",maxSteps, ...
    "ScoreAveragingWindowLength",50, ...    "StopTrainingCriteria","AverageReward", ...
    "StopTrainingValue",10000, ...
    "Verbose", true, ...
    "Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:
0 件のコメント
回答 (1 件)
  Matteo D'Ambrosio
      
 2023 年 5 月 28 日
        
      編集済み: Matteo D'Ambrosio
      
 2023 年 5 月 28 日
  
      I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
Hope this helps.
0 件のコメント
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

