How can I save an output of a customized step function in Reinforcement learning?

32 ビュー (過去 30 日間)
Camilla Ancona
Camilla Ancona 2025 年 1 月 20 日 16:06
コメント済み: Camilla Ancona 2025 年 1 月 30 日 21:26
I have created a code for training a DQN agent with a customized enviroment using step and reset function following the example in the docuemntation. However I would like to be able to store the info about the state in the step function to investigate them after the training and after simulating the agent in the enviroment. I only know how to get the info about action and observation but I would kike also the state that now is a field of the structure LoggedSignals. I attach the main code and the step function and the reset function.
clear
clc
close all
load('ws_lorenz','tot_T')
%% Create Environment Interface
% rlNumericSpec([n,1]) specifies that the state variables are n and can
% take any value in R.
obsInfo = rlNumericSpec([1 1]);
obsInfo.Name = 'reactivity';
obsInfo.Description = 'r';
u_1 = [0.1 2];
my_cell = reshape(num2cell(u_1),1,length(u_1));
actInfo = rlFiniteSetSpec(my_cell);
actInfo.Name = 'System Action';
% now we are ready to define the environment.
%doc rlSimulinkEnv Create reinforcement learning environment using dynamic model implemented in Simulink
%doc rlFunctionEnv Specify custom reinforcement learning environment dynamics using functions
env = rlFunctionEnv(obsInfo,actInfo,'my_stepfun','my_resetfun');
% Fix the random generator seed for reproducibility.
rng(0)
%% Create DQN agent
%A DQN agent approximates the long-term reward given observations and
%actions using a critic value function representation.
%To create the critic, first create a deep neural network with the state as
% an input and as many outputs as the different values the control action
% can take (this is the size of the cell). The idea here is to obtain a
% different parametric approximator of the Q-factor for each value of u.
net = [
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(256)
reluLayer
fullyConnectedLayer(length(actInfo.Elements))
];
net = dlnetwork(net);
summary(net)
% Plot network
plot(net)
% Specify options for the critic. The LearnRate is key, the higher it is, the
% faster the training but potentially the less accurate the results.
criticOptions = rlOptimizerOptions( ...
LearnRate=1e-3, ...
GradientThreshold=1);
%specify the action and observation info for the critic, which you obtain
%from the environment interface.
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
% A vector Q-value Function is a neural network allowing to obtain a
% different parametric approximator of the Q-factor for each value of u.
critic = rlVectorQValueFunction(net,obsInfo,actInfo);
%To create the DQN agent, first specify the DQN agent options using rlDQNAgentOptions.
agentOpts = rlDQNAgentOptions(...
'UseDoubleDQN',true, ...
'TargetUpdateMethod',"periodic", ...
'TargetUpdateFrequency',10, ...
'ExperienceBufferLength',100000, ...
'DiscountFactor',0.95, ...
'MiniBatchSize',128, ...
CriticOptimizerOptions=criticOptions);
agentOpts.EpsilonGreedyExploration.Epsilon = 0.8;
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 1e-3;
agentOpts.EpsilonGreedyExploration.EpsilonMin = 0.1;
%Then, create the DQN agent using the specified critic representation
%and agent options.
agent = rlDQNAgent(critic,agentOpts);
%% Train Agent
%To train the agent, first specify the training options.
%Run one training session containing at most 1000 episodes,
%with each episode lasting at most 500 time steps.
%Display the training progress in the Episode Manager dialog box
%and disable the command line display (set the Verbose option to false).
%Stop training when the agent receives an moving average cumulative reward
%greater than 15000.
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 10000, ... % if the number of steps per episode is increased, this could be decreased.
'MaxStepsPerEpisode', tot_T, ... % this number of steps per episode might be insufficient in general
'Verbose', false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',1, ...
UseParallel=false);
%% Train the agent using the train function.
trainingStats = train(agent,env,trainOpts);
%% Simulate DQN Agent
%To validate the performance of the trained agent, simulate it within the
% environment.
experience = sim(env,agent);
totalReward = sum(experience.Reward)
figure(1)
x = squeeze(experience.Action.SystemAction.Data(:,1,:));%%1x1x258
plot(x')
plot(squeeze(experience.Action.SystemAction.Data));
title('Actions Over Time');
react = squeeze(experience.Observation.reactivity.Data(:,1,:)); %%1x1x259
figure(2)
plot(react')
title('Reactivity Over Time');
figure(3)
plot(trainingStats.EpisodeIndex, trainingStats.AverageReward);
xlabel('Episode');
ylabel('Average Reward');
function [NextObs,Reward,IsDone,LoggedSignals]...
= my_stepfun(Action,LoggedSignals)
% Custom step function.
%[NextObservation,Reward,IsDone,UpdatedInfo] = myStepFunction(Action,Info)
% This function applies the given action to the environment and evaluates
% the system dynamics for one simulation step.
% Define the environment constants.
% Sample time
Ts = 1;
sig = 1.3;
DF= LoggedSignals.DF ;
L = LoggedSignals.L;
H = LoggedSignals.H;
xi = LoggedSignals.xi;
m = LoggedSignals.m;
n = LoggedSignals.n;
tot_T = LoggedSignals.tot_T;
LoggedSignals.Time = LoggedSignals.Time+Ts;
kk = (1/Ts)*LoggedSignals.Time;
u = Action;
% Unpack the state vector from the logged signals.
x_k = LoggedSignals.State;
% Perform Euler integration.
[t, x] = ode113(@(t,x)my_lorenz_DQN(t,x,L,u, DF, H),[0 Ts],x_k');
LoggedSignals.State = x(end,:)';
% compute average state
St = [mean(x(end,1:n),2), mean(x(end,n+1:2*n),2), mean(x(end,2*n+1:3*n),2)];
% compute reactivity (using sig)
r = max(eig((DF(St) + DF(St)')/2 +sig*xi*H));
% The next observation is the reactivity
NextObs = r;
% Check early termination condition.
[err, ~, ~] = Err_sync(x, t, n, m, 0);
if LoggedSignals.Time >= 0.9*tot_T
LoggedSignals.cum_err = LoggedSignals.cum_err+err;
end
IsDone1 = LoggedSignals.cum_err>(20*eps);
IsDone2 = err>1e-1;
w1 = 1e5;
w2 = 1e2;
if IsDone1==1
Reward = -(tot_T-LoggedSignals.Time)*1e3;
elseif IsDone2==1
Reward = -(tot_T-LoggedSignals.Time)*1e4;
else
Reward = 1 -w1*err - w2*u;
end
IsDone = max(IsDone1,IsDone2) ;
end
function [InitialObservation, LoggedSignal] = my_resetfun()
load('reset_ws.mat','x0')
load('ws_lorenz','DF','L','H','xi','n','m','tot_T')
x = x0(:,randi(size(x0,2)));
LoggedSignal.State = x;
InitialObservation = 1; %% da cambiare
LoggedSignal.Time = 0;
LoggedSignal.DF = DF;
LoggedSignal.L = L;
LoggedSignal.H = H;
LoggedSignal.xi = xi;
LoggedSignal.m = m;
LoggedSignal.n = n;
LoggedSignal.cum_err = 0;
LoggedSignal.tot_T = tot_T;
end

採用された回答

Shantanu Dixit
Shantanu Dixit 2025 年 1 月 23 日 10:21
編集済み: Shantanu Dixit 2025 年 1 月 23 日 10:22
Hi Camilla,
Based on your requirements, you can store the states by creating a 'StateHistory'field in the 'LoggedSignals' structure within the reset function ('my_resetfun'). You can use this field to append the new states in the step function ('my_stepfun') and save the information about the states. Here's how you can modify your existing code:
  1. Add the 'StateHistory' field in 'my_resetfun'
  2. Modify 'my_stepfun' to store the state history.
%% reset function
function [InitialObservation, LoggedSignals] = my_resetfun()
% keep other code as required
LoggedSignals.StateHistory = []; % Initialize state history
% save the StateHistory field as required
end
%% step function
function [NextObs, Reward, IsDone, LoggedSignals] = my_stepfun(Action, LoggedSignals)
% keep other code as required
LoggedSignals.State = x(end, :)'; % Update the current state
LoggedSignals.StateHistory = [LoggedSignals.StateHistory; LoggedSignals.State']; % Append state
end
As per the documentation: https://www.mathworks.com/help/reinforcement-learning/ref/rl.env.rlfunctionenv.html#mw_d59c4ab8-0102-448b-863a-4abf2ade15b5 the 'LoggedSignals'in 'step'function property is no longer active (from 2023b) and will be replaced by the 'Info' property for passing information between steps. For a later version you can modify the code as per the version requirements.
Additionally you can refer to the MathWorks documentation for further details on creating custom environment using 'step' and 'reset' functions:
Hope this helps!
  1 件のコメント
Camilla Ancona
Camilla Ancona 2025 年 1 月 30 日 21:26
thanks for the aswer. However, in my understanding, this allows me only to save the history inside the step function during training. I would like to analyze the LoggedSignals after the training is finished. How can I do that?

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

製品


リリース

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by