Hello, I'm currently researching using reinforcement learning as a controller to solve non-linearities in hydraulic systems, I'm facing a problem during traning my rl agent isn't learning or settles on very low reward, I really don't understand it's behaviour I increased exploration and faced the same problem I was using initially a ddqn agent and faced the same problem. I'm so lost.
criticOptions = rlOptimizerOptions( ...
Optimizer="adam", ...
LearnRate=1e-5,...
GradientThreshold=1, ...
L2RegularizationFactor=2e-4);
actorOptions = rlOptimizerOptions( ...
Optimizer="adam", ...
LearnRate=1e-5,...
GradientThreshold=1, ...
L2RegularizationFactor=1e-5);
agentOptions = rlTD3AgentOptions;
agentOptions.ExplorationModel.StandardDeviation = 0.5;
agentOptions.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetUpdateFrequency = 10;
agentOptions.CriticOptimizerOptions = criticOptions;
agentOptions.ActorOptimizerOptions = actorOptions;
agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 400, ...
'MaxStepsPerEpisode', ceil(Tf / Ts), ...
'StopTrainingCriteria', 'EpisodeReward', ...
'StopTrainingValue', 2000, ...
'Verbose', true, ...
'Plots', 'training-progress', ...
'SaveAgentCriteria', 'Custom', ...
'SaveAgentValue', @mySaveFcn, ...
'SaveAgentDirectory', "SavedAgents");
[trainingStats] = train(agent, env, trainOpts);
Here's the for the agent and traning
function y = fcn(u)
u=abs(u);
if (u<=0.005)
y=10;
elseif (u<=0.05)
y=5;
elseif (u<=0.5)
y=1;
else
y=-1;
end
end
and this is the reward
I've increased the number of episodes it didn't change a thing