Reinforcement learning action getting saturated at one range of values

9 ビュー (過去 30 日間)
Hi all,
I have a reinforcement learning env with 4 observations and 6 actions. Each action has a lower limit of 0.05 and an upper limit of 1. I see that the actions during training are getting saturated at one band of values.
Example: The action limits that has been specified is 0.05 to 1. But I see that the action output during training varies in the range of 0 to 0.16 only and does not go out of that band
I have attached a capture of the action output during training.
Attaching the code below
clc;
clear;
close;
%Load the parameters for the simulink
SPWM_RL_Data;
%Open Simulink Model
mdl = "RL_Debug";
open_system(mdl);
%Create Environment Interface
open_system('RL_Debug/Firing Unit');
%Create Observation specifications
numObservations = 4;
observationInfo = rlNumericSpec([numObservations 1]);
observationInfo.Name = 'observations';
observationInfo.Description = 'Error signals';
%Create Action Specifications
numActions = 6;
actionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0.05;0.05;0.05;0.05;0.05;0.05],'UpperLimit',[1;1;1;1;1;1]);
actionInfo.Name = 'switchingPulses';
%Create Simulink environment for observation and action specifications
agentblk = 'RL_Debug/Firing Unit/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
%Get observation and action info from the environment
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
rng(0) % fix the random seed
statePath = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64,'Name','fc1')];
actionPath = [featureInputLayer(numActions, 'Normalization', 'none', 'Name','Action')
fullyConnectedLayer(64, 'Name','fc2')];
commonPath = [additionLayer(2,'Name','add')
reluLayer('Name','relu2')
fullyConnectedLayer(32, 'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(16, 'Name','fc4')
fullyConnectedLayer(1, 'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','add/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','add/in2');
%Create a representation of the critic using recurrent neural network
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(32, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numActions,'Name','Action')
tanhLayer('Name','tanh1')
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',0.001);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'scale'},actorOptions);
%Ts_agent = Ts;
agentOptions = rlTD3AgentOptions("SampleTime",Ts_agent, ...
"DiscountFactor", 0.995, ...
"ExperienceBufferLength",2e6, ...
"MiniBatchSize",512, ...
"NumStepsToLookAhead",5, ...
"TargetSmoothFactor",0.005, ...
"TargetUpdateFrequency",2);
agentOptions.ExplorationModel.Variance = 0.05;
agentOptions.ExplorationModel.VarianceDecayRate = 2e-4;
agentOptions.ExplorationModel.VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel.Variance = 0.1;
agentOptions.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
agent = rlTD3Agent(actor,[critic1,critic2],agentOptions);
%T = 1.0;
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts_agent);
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',8000,...
'ScoreAveragingWindowLength',100);
if(doTraining)
trainStats = train(agent,env,trainingOpts);
save("Agent.mat","agent")
else
load("Agent.mat")
end
%Simulate the Agent
rng(0);
simOptions = rlSimulationOptions('MaxSteps', maxsteps, 'NumSimulations', 1);
sim(env,agent,simOptions);

採用された回答

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021 年 4 月 15 日
編集済み: Emmanouil Tzorakoleftherakis 2023 年 6 月 20 日
Your scaling layer is not set up correctly. You want to scale to (upper limit-lower limit)/2 and then shift accordingly.
scalingLayer('Scale',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2,'Bias',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2)

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by