PPO algorithm training problem in Reinforcement Learning Toolbox
古いコメントを表示

In the PPO training algorithm , here mentioned “For each experience sequence that does not contain a terminal state, N is equal to the ExperienceHorizon option value. Otherwise, N is less than ExperienceHorizon and SN is the terminal state.” ,
Here's my question :When N is smaller than ExperienceHorizon and N is also smaller than the size of mini-batch data, and this continues for multiple consecutive episodes, When does the algorithm update the parameters in this case?
AND another one question is :When will the PPO parameter be updated under the following parameter Settings:
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',10000,...
'MiniBatchSize',64,...
'NumEpoch',3,...)
trainOpts = rlTrainingOptions(...
'MaxEpisodes',10000,...
'MaxStepsPerEpisode',30,... )
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Reinforcement Learning についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!