RL SAC agent structure
古いコメントを表示
I’ve created an SAC agent, but I'm encountering the error below.
Error using rl.internal.validate.mapFunctionMeanStdOutput (line 10)
Deep neural network for continuous gaussian function must have 2 output layers, one for mean and one for standard deviation.
Error in rlContinuousGaussianActor (line 93)
model = rl.internal.validate.mapFunctionMeanStdOutput(model,nameValueArgs.ActionMeanOutputNames,nameValueArgs.ActionStandardDeviationOutputNames,"actor");
Error in RL_agent_1 (line 158)
actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...
I’ve also attached the code for my RL agent, and I’ve bolded the relevant part, which clearly shows that I already have two layers—one for the mean and one for the standard deviation.
% Create environment
codeenv = createOpfEnv();
% Retrieve observation and action specifications
obsInfo = getObservationInfo(env); % Observation info for all agents
actInfo = getActionInfo(env); % Action info for all agents
% Separate the observation and action information for each agent
numAgents = 3; % Example with 3 agents
% Separate observation and action info
obsInfo1 = obsInfo{1}; % Observation info for agent 1
obsInfo2 = obsInfo{2}; % Observation info for agent 2
obsInfo3 = obsInfo{3}; % Observation info for agent 3
actInfo1 = actInfo{1}; % Action info for agent 1
actInfo2 = actInfo{2}; % Action info for agent 2
actInfo3 = actInfo{3}; % Action info for agent 3
%% Define actor networks for each agent
% Define the actor network for Agent 1
actorNetwork1 = [
featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', 'Name', 'state1')
fullyConnectedLayer(64, 'Name', 'fc1_1')
reluLayer('Name', 'relu1_1')
fullyConnectedLayer(64, 'Name', 'fc2_1')
reluLayer('Name', 'relu2_1')
fullyConnectedLayer(64, 'Name', 'fc3_1')
reluLayer('Name', 'relu3_1')
fullyConnectedLayer(1, 'Name', 'mean1') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std1') % Output for the standard deviation
];
% Define the actor network for Agent 2
actorNetwork2 = [
featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', 'Name', 'state2')
fullyConnectedLayer(64, 'Name', 'fc1_2')
reluLayer('Name', 'relu1_2')
fullyConnectedLayer(64, 'Name', 'fc2_2')
reluLayer('Name', 'relu2_2')
fullyConnectedLayer(64, 'Name', 'fc3_2')
reluLayer('Name', 'relu3_2')
fullyConnectedLayer(1, 'Name', 'mean2') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std2') % Output for the standard deviation
];
% Define the actor network for Agent 3
actorNetwork3 = [
featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', 'Name', 'state3')
fullyConnectedLayer(64, 'Name', 'fc1_3')
reluLayer('Name', 'relu1_3')
fullyConnectedLayer(64, 'Name', 'fc2_3')
reluLayer('Name', 'relu2_3')
fullyConnectedLayer(64, 'Name', 'fc3_3')
reluLayer('Name', 'relu3_3')
fullyConnectedLayer(1, 'Name', 'mean3') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std3') % Output for the standard deviation
];
% For each agent, we'll define a critic network that combines the state and action
statePath1 = [
featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', Name="state1")
fullyConnectedLayer(64, Name="state_fc1_1")
reluLayer(Name="state_relu1_1")
];
actionPath1 = [
featureInputLayer(actInfo1.Dimension(1), 'Normalization', 'none', Name="action1")
fullyConnectedLayer(64, Name="action_fc1_1")
reluLayer(Name="action_relu1_1")
];
commonPath1 = [
concatenationLayer(1, 2, Name="concat1")
fullyConnectedLayer(64, Name="common_fc1_1")
reluLayer(Name="common_relu1_1")
fullyConnectedLayer(64, Name="common_fc2_1")
reluLayer(Name="common_relu2_1")
fullyConnectedLayer(1, Name="value1")
];
statePath2 = [
featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', Name="state2")
fullyConnectedLayer(64, Name="state_fc2_2")
reluLayer(Name="state_relu2_2")
];
actionPath2 = [
featureInputLayer(actInfo2.Dimension(1), 'Normalization', 'none', Name="action2")
fullyConnectedLayer(64, Name="action_fc2_2")
reluLayer(Name="action_relu2_2")
];
commonPath2 = [
concatenationLayer(1, 2, Name="concat2")
fullyConnectedLayer(64, Name="common_fc1_2")
reluLayer(Name="common_relu1_2")
fullyConnectedLayer(64, Name="common_fc2_2")
reluLayer(Name="common_relu2_2")
fullyConnectedLayer(1, Name="value2")
];
statePath3 = [
featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', Name="state3")
fullyConnectedLayer(64, Name="state_fc3_3")
reluLayer(Name="state_relu3_3")
];
actionPath3 = [
featureInputLayer(actInfo3.Dimension(1), 'Normalization', 'none', Name="action3")
fullyConnectedLayer(64, Name="action_fc3_3")
reluLayer(Name="action_relu3_3")
];
commonPath3 = [
concatenationLayer(1, 2, Name="concat3")
fullyConnectedLayer(64, Name="common_fc1_3")
reluLayer(Name="common_relu1_3")
fullyConnectedLayer(64, Name="common_fc2_3")
reluLayer(Name="common_relu2_3")
fullyConnectedLayer(1, Name="value3")
];
%% Assemble critic networks for each agent
% Combine state and action paths
criticNetwork1 = layerGraph(statePath1);
criticNetwork1 = addLayers(criticNetwork1, actionPath1);
criticNetwork1 = addLayers(criticNetwork1, commonPath1);
criticNetwork1 = connectLayers(criticNetwork1, 'state_relu1_1', 'concat1/in1');
criticNetwork1 = connectLayers(criticNetwork1, 'action_relu1_1', 'concat1/in2');
criticNetwork2 = layerGraph(statePath2);
criticNetwork2 = addLayers(criticNetwork2, actionPath2);
criticNetwork2 = addLayers(criticNetwork2, commonPath2);
criticNetwork2 = connectLayers(criticNetwork2, 'state_relu2_2', 'concat2/in1');
criticNetwork2 = connectLayers(criticNetwork2, 'action_relu2_2', 'concat2/in2');
criticNetwork3 = layerGraph(statePath3);
criticNetwork3 = addLayers(criticNetwork3, actionPath3);
criticNetwork3 = addLayers(criticNetwork3, commonPath3);
criticNetwork3 = connectLayers(criticNetwork3, 'state_relu3_3', 'concat3/in1');
criticNetwork3 = connectLayers(criticNetwork3, 'action_relu3_3', 'concat3/in2');
%% Set options for the actor and critic
actorOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);
criticOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);
%% Create actor and critic representations for each agent
% Use continuous actor for each agent (as required by SAC)
actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...
'ActionMeanOutputNames', 'mean1', 'ActionStandardDeviationOutputNames', 'std1');
actor2 = rlContinuousGaussianActor(actorNetwork2, obsInfo2, actInfo2, ...
'ActionMeanOutputNames', 'mean2', 'ActionStandardDeviationOutputNames', 'std2');
actor3 = rlContinuousGaussianActor(actorNetwork3, obsInfo3, actInfo3, ...
'ActionMeanOutputNames', 'mean3', 'ActionStandardDeviationOutputNames', 'std3');
% Create critic representations for each agent
%% Create Q-value critics for each agent
critic1 = rlQValueRepresentation(criticNetwork1, obsInfo1, actInfo1, criticOptions);
critic2 = rlQValueRepresentation(criticNetwork2, obsInfo2, actInfo2, criticOptions);
critic3 = rlQValueRepresentation(criticNetwork3, obsInfo3, actInfo3, criticOptions);
%% Define the SAC agent for each agent
agentOptions = rlSACAgentOptions('SampleTime', 1, ...
'TargetSmoothFactor', 1e-3, ...
'TargetUpdateFrequency', 1, ...
'ExperienceBufferLength', 1e6);
agent1 = rlSACAgent(actor1, critic1, agentOptions);
agent2 = rlSACAgent(actor2, critic2, agentOptions);
agent3 = rlSACAgent(actor3, critic3, agentOptions);
%% Training options and training process
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 500, ...
'MaxStepsPerEpisode', 100, ...
'ScoreAveragingWindowLength', 100, ...
'Verbose', true, ...
'Plots', 'training-progress');
%% Train the agents
train(agent1, env, trainOpts);
train(agent2, env, trainOpts);
train(agent3, env, trainOpts);
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Reinforcement Learning についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
