Reinforcement learning action getting saturated at one range of values

Question

Janani Sunil 2021 年 4 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/802096-reinforcement-learning-action-getting-saturated-at-one-range-of-values

編集済み: Emmanouil Tzorakoleftherakis 2023 年 6 月 20 日

採用された回答: Emmanouil Tzorakoleftherakis

Capture.PNG

MATLAB Online で開く

Hi all,

I have a reinforcement learning env with 4 observations and 6 actions. Each action has a lower limit of 0.05 and an upper limit of 1. I see that the actions during training are getting saturated at one band of values.

Example: The action limits that has been specified is 0.05 to 1. But I see that the action output during training varies in the range of 0 to 0.16 only and does not go out of that band

I have attached a capture of the action output during training.

Attaching the code below

clc;
clear;
close;
%Load the parameters for the simulink
SPWM_RL_Data;
%Open Simulink Model
mdl = "RL_Debug";
open_system(mdl);
%Create Environment Interface
open_system('RL_Debug/Firing Unit');
%Create Observation specifications
numObservations = 4;
observationInfo = rlNumericSpec([numObservations 1]);
observationInfo.Name = 'observations';
observationInfo.Description = 'Error signals';
%Create Action Specifications
numActions = 6;
actionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0.05;0.05;0.05;0.05;0.05;0.05],'UpperLimit',[1;1;1;1;1;1]);
actionInfo.Name = 'switchingPulses';
%Create Simulink environment for observation and action specifications
agentblk = 'RL_Debug/Firing Unit/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
%Get observation and action info from the environment
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
rng(0)  % fix the random seed
statePath = [featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(64,'Name','fc1')];
actionPath = [featureInputLayer(numActions, 'Normalization', 'none', 'Name','Action')
    fullyConnectedLayer(64, 'Name','fc2')];
commonPath = [additionLayer(2,'Name','add')
    reluLayer('Name','relu2')
    fullyConnectedLayer(32, 'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(16, 'Name','fc4')
    fullyConnectedLayer(1, 'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','add/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','add/in2');
%Create a representation of the critic using recurrent neural network
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
        'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(64, 'Name','actorFC1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(32, 'Name','actorFC2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numActions,'Name','Action')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','scale','Scale',actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',0.001);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'State'},'Action',{'scale'},actorOptions);
%Ts_agent = Ts;
agentOptions = rlTD3AgentOptions("SampleTime",Ts_agent, ...
    "DiscountFactor", 0.995, ...
    "ExperienceBufferLength",2e6, ...
    "MiniBatchSize",512, ...
    "NumStepsToLookAhead",5, ...
    "TargetSmoothFactor",0.005, ...
    "TargetUpdateFrequency",2);
agentOptions.ExplorationModel.Variance = 0.05;
agentOptions.ExplorationModel.VarianceDecayRate = 2e-4;
agentOptions.ExplorationModel.VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel.Variance = 0.1;
agentOptions.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
agent = rlTD3Agent(actor,[critic1,critic2],agentOptions);
%T = 1.0;
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts_agent); 
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',8000,... 
    'ScoreAveragingWindowLength',100);
if(doTraining)
  trainStats = train(agent,env,trainingOpts);
  save("Agent.mat","agent")
else   
    load("Agent.mat")
end
%Simulate the Agent
rng(0);
simOptions = rlSimulationOptions('MaxSteps', maxsteps, 'NumSimulations', 1);
sim(env,agent,simOptions);

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2021 年 4 月 15 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/802096-reinforcement-learning-action-getting-saturated-at-one-range-of-values#answer_676031

編集済み: Emmanouil Tzorakoleftherakis 2023 年 6 月 20 日

MATLAB Online で開く

Your scaling layer is not set up correctly. You want to scale to (upper limit-lower limit)/2 and then shift accordingly.

 scalingLayer('Scale',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2,'Bias',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Janani Sunil 2021 年 4 月 19 日

Thank you @Emmanouil Tzorakoleftherakis. This was of great help

サインインしてコメントする。

Reinforcement learning action getting saturated at one range of values

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reinforcement learning action getting saturated at one range of values

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示