My agent isn't learning it settles on a low reward

22 ビュー (過去 30 日間)
Kareem
Kareem 2025 年 1 月 21 日 11:24
編集済み: Shantanu Dixit 2025 年 1 月 23 日 5:58
Hello, I'm currently researching using reinforcement learning as a controller to solve non-linearities in hydraulic systems, I'm facing a problem during traning my rl agent isn't learning or settles on very low reward, I really don't understand it's behaviour I increased exploration and faced the same problem I was using initially a ddqn agent and faced the same problem. I'm so lost.
criticOptions = rlOptimizerOptions( ...
Optimizer="adam", ...
LearnRate=1e-5,...
GradientThreshold=1, ...
L2RegularizationFactor=2e-4);
actorOptions = rlOptimizerOptions( ...
Optimizer="adam", ...
LearnRate=1e-5,...
GradientThreshold=1, ...
L2RegularizationFactor=1e-5);
agentOptions = rlTD3AgentOptions;
agentOptions.ExplorationModel.StandardDeviation = 0.5;
agentOptions.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetUpdateFrequency = 10;
agentOptions.CriticOptimizerOptions = criticOptions;
agentOptions.ActorOptimizerOptions = actorOptions;
agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 400, ...
'MaxStepsPerEpisode', ceil(Tf / Ts), ...
'StopTrainingCriteria', 'EpisodeReward', ...
'StopTrainingValue', 2000, ...
'Verbose', true, ...
'Plots', 'training-progress', ...
'SaveAgentCriteria', 'Custom', ...
'SaveAgentValue', @mySaveFcn, ...
'SaveAgentDirectory', "SavedAgents");
[trainingStats] = train(agent, env, trainOpts);
Here's the for the agent and traning
function y = fcn(u)
u=abs(u);
if (u<=0.005)
y=10;
elseif (u<=0.05)
y=5;
elseif (u<=0.5)
y=1;
else
y=-1;
end
end
and this is the reward
I've increased the number of episodes it didn't change a thing

回答 (1 件)

Shantanu Dixit
Shantanu Dixit 2025 年 1 月 23 日 5:48
編集済み: Shantanu Dixit 2025 年 1 月 23 日 5:58
Hi Kareem,
It seems like you're having some issues training your reinforcement learning agent. Based on the details shared, here are a few things you can try:
  • Current reward function behaviour: Based on the code provided, currently the discrete reward with step like changes might make it challenging for the agent to learn effectively. You can experiment with a smoother continuous reward function. You can also experiment with the reward range (currently -1 to 10) to provide enough distinction for the agent to learn effectively.
  • Exploration parameters: You can also experiment with 'agentOptions.ExplorationModel.StandardDeviation' and 'agentOptions.ExplorationModel.StandardDeviationDecayRate https://in.mathworks.com/help/reinforcement-learning/ref/rl.option.rltd3agentoptions.html. Increasing the 'StandardDeviation' and decreasing the 'StandardDeviationDecayRate'allows the agent explore more and for a longer duration.
  • Learning Rate: You can also try increasing the learning rates (e.g., 1e-4 or 5e-4) for both actor and critic optimizers and monitor the training stability.
  • Training dynamics: You can verify the environment's dynamics and scaling in addition to the above points. For example, ensuring that the simulation time steps and total simulation time provide sufficient granularity for the agent to learn fine actions. Additionally, you can normalize the state and action spaces to have zero mean and unit variance to improve training efficiency and stability.
You can also refer to the other useful resources by MathWorks on reinforcement learning using MATLAB:

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by