Saved Agent gives me constatn output always..

9 ビュー (過去 30 日間)
sungho park
sungho park 2022 年 1 月 17 日
回答済み: Yash Sharma 2024 年 1 月 22 日
Hi, i'm using Reinforcment learning in matlab and i found out some issue.
i can see in the training session that the Input value is chaging, however after training session when i runs with saved agent it doesn't show Input value like training session.
<training session graph>
<runs with saved agent graph>
%% Create observation specification
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
numObs = obsInfo.Dimension(1);
%% Create action specification
actInfo = rlNumericSpec([1 1],'LowerLimit',-15,'UpperLimit',15);
%actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'current';
numActions = actInfo.Dimension(1);
%% Create the environment
blk= [mdl '/RL Agent'];
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
env.ResetFcn= @(in)setVariable(in,'current0',5,'Workspace',mdl);
env.UseFastRestart = 'off';
Ts= param.dt;
Tf= param.end_time;
rng(0)
%% Create DDPG Agent
statePath = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(200,'Name','CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
featureInputLayer(1,'Normalization','none','Name','action')
fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
%% Create the criticrepresentation using the specified deep neural
% network and options
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...
{'observations'},'Action',{'action'},criticOpts);
%% create the actor
actorNetwork = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(200,'Name','ctorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(200,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(1,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')
scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...
'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);
%% Create the DDPG agent optioon
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1,...
'ExperienceBufferLength',1e6,...
'DiscountFactor',0.99,...
'MiniBatchSize',64);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train Agent
maxepisodes = 2;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',50,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',500,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',0);
doTraining = true;
%if doTraining
% Train the agent.
% trainingStats = train(agent,env,trainOpts);
%else
% Load the pretrained agent for the example.
% load('agent_1000episodes.mat','agent')
%end
trainingStats = train(agent,env,trainOpts);
  1 件のコメント
Florian Rosner
Florian Rosner 2022 年 1 月 19 日
The behaviour during training might not be the same as in the simulation afterwards, due to the way the network is updated. However the number of episodes is according to my feeling pretty low. Did you tried to train for more episodes?

サインインしてコメントする。

回答 (1 件)

Yash Sharma
Yash Sharma 2024 年 1 月 22 日
Hi Sungho Park,
I understand that you have a pretrained RL DDPG agent and you want to load that agent in MATLAB, when you load a pretrained RL DDPG agent using the load function, it only loads the agent object itself, not the underlying network weights.
To effectively load the pretrained agent network into the RL DDPG network in MATLAB Simulink training, you can follow these steps:
  • Save the network weights separately: Before saving the agent to a MAT file, extract the network weights from the actor and critic networks using the getLearnableParameters function and save these network weights to separate variables.
  • Load the network weights and agent configuration: When loading the pretrained agent, use the "load" function to load the network weights and agent configuration from the MAT file. Assign the loaded network weights to the actor and critic networks of a new DDPG agent.
Pretrained_agent_flag = true;
if (Pretrained_agent_flag == true)
% Load the pretrained agent
pretrainedAgentData = load('MyAgent.mat');
% Extract the network weights from the loaded agent
actorWeights = getLearnableParameters(pretrainedAgentData.agent.actor);
criticWeights = getLearnableParameters(pretrainedAgentData.agent.critic);
% Create new actor and critic networks with the loaded weights
actorNetwork = setLearnableParameters (actorWeights);
criticNetwork = setLearnableParameters (criticWeights);
% Create a new DDPG agent with the loaded network weights and configuration
agent = rlDDPGAgent(actorNetwork, criticNetwork, agentOptions);
else
% Create a new DDPG agent
agent = rlDDPGAgent(actor, critic, agentOptions);
end
trainingResults = train(agent, env, trainingOptions);
Following are documentation links which I believe will help you for further reference:
Hope this helps!

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by