I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

Question

Faraz Ahmad 2022 年 3 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1679199-i-am-working-on-path-planning-and-obstacle-avoidance-using-deep-reinforcement-learning-but-training

編集済み: Matteo D'Ambrosio 2023 年 5 月 28 日

check.png

Following is the code for creating rl Agent:

criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
    "SampleTime",sampleTime,...
    "TargetSmoothFactor",1e-3,...
    "DiscountFactor",0.995, ...
    "MiniBatchSize",128, ...
    "ExperienceBufferLength",1e6); 
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);

Training options are:

maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
    "MaxEpisodes",maxEpisodes, ...
    "MaxStepsPerEpisode",maxSteps, ...
    "ScoreAveragingWindowLength",50, ...    "StopTrainingCriteria","AverageReward", ...
    "StopTrainingValue",10000, ...
    "Verbose", true, ...
    "Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);

and for training, it is not converging as shown in the attached fig:

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Matteo D'Ambrosio 2023 年 5 月 28 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1679199-i-am-working-on-path-planning-and-obstacle-avoidance-using-deep-reinforcement-learning-but-training#answer_1246184

編集済み: Matteo D'Ambrosio 2023 年 5 月 28 日

I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:

Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.

Hope this helps.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示