agent.learn data type issue, reinforcement learning toolbox

5 ビュー (過去 30 日間)
Lars Meijer
Lars Meijer 2024 年 3 月 12 日
コメント済み: Lars Meijer 2024 年 3 月 19 日
I am working on a reinforcement learning study. Currently, I am trying to finalize the agent and make it learn from it's experiences. I can not show all of the code but this is the most important part I think:
%% Define action and observation specifications
ActionInfo = rlFiniteSetSpec([1 2 3]); % Actions that the agent is able to take
ObservationInfo = rlNumericSpec([30 10]); % This is what eventually be input for the neural network
% lots of code here ....
% Defining the everything in experience
CurrentState = env.reset();
action = agent.getAction(CurrentState); % Get action from agent
[nextState, reward, isDone, ~] = env.step(action); % Interact with environment
% Collect experience
experience = struct(...
'Observation', {num2cell(CurrentState)}, ...
'Action', {num2cell(action)}, ...
'Reward', reward, ...
'NextObservation', {num2cell(nextState)}, ...
'IsDone', isDone);
% Train the agent with the experience
agent = agent.learn(experience); % Update agent with experience
To elaborate, the currentState and nextState are matrices of 30 x 10 of datatype double, action is 1x1 cell, reward is datatype double, and isDone is logical. However, when passing to these to experience, the agent.learn function does not work because of these parts of code in the batchExperienceArray.m file (when not passing the variables with num2cell):
% batch observation, next observation
for ct = 1:numel(ObservationDimension)
BatchDim = numel(ObservationDimension{ct})+1;
% Observation
Observation = arrayfun(@(x) (x.Observation{ct}), ExpStructArray, 'UniformOutput', false);
ObservationArray{ct} = cat(BatchDim, Observation{:});
% NextObservation
NextObservation = arrayfun(@(x) (x.NextObservation{ct}), ExpStructArray, 'UniformOutput', false);
NextObservationArray{ct} = cat(BatchDim, NextObservation{:});
end
Action = [ExpStructArray.Action];
for ct = 1:numel(ActionDimension)
BatchDim = numel(ActionDimension{ct})+1;
ActionArray{ct} = cat(BatchDim,Action{ct,:});
end
Here the error is that brace indexing is not supported for the data type. When I do pass all the variables in experience like it is in the code above, the error becomes:
Error using rl.function.AbstractFunction/validateInputData_
Input data dimensions must match the dimensions specified in the corresponding observation and action info
specifications.
The question thus becomes: how can I pass the data correctly to the agent.learn with the experience, without all these errors? What am I missing here? If any more information is missing, let me know.

回答 (1 件)

Avadhoot
Avadhoot 2024 年 3 月 19 日
From the information provided in the question I infer that you are having problems with the dimensions of the observation and action matrices in the input data. You have also implemented batching in your code. The error you are facing is due to a dimension mismatch between the input data and the observation and action info specifications. There also might be an issue with how you pass the experience structure to the "learn" function. You have mentioned that if you pass the variables without the "num2cell" conversion, it again gives the error: " brace indexing is not supported for the data type". This is because the batching in the "learn" function expects the inputs to be cell arrays.
According to MATLAB documentation, there should be buffers to store experiences and the dimensions of each buffer must be as follows:
  1. For the observation buffer: number of observations * number of observation channels * batch size.
  2. For the action buffer: number of actions * number of action channels * batch size.
  3. For reward buffer: 1 * batch size
The source of your error might be that you have not formatted the observations and actions according to the batch size. Consider formatting the buffers in the dimensions mentioned above.
For more information on the training procedure, refer to the below example:
I hope this helps in getting an idea about the cause of the error.
  3 件のコメント
Avadhoot
Avadhoot 2024 年 3 月 19 日
Lars Meijer
Lars Meijer 2024 年 3 月 19 日
I also did look at that one. However, it is also not using the agent creation from the Matlab toolbox. I have gone back to the basics with the following code:
%% Trying to create the custom training loop from scratch again
clear, clc
%% Create parameters that the environment needs, but should be defined outside of the environment to have a better overview
updateAfter = 24; % Determines after how many time instances (hours in this case) you want to plan the job shop again
JobBatchSize = 10; % Determines in what size of batch the updated jobs will be given (directly influences the size of the inputs of the neural network)
MaxMachines = 20; % Determines the max of machines (directly influence the size of inputs as well), which is dependend on the generated data
rng(0, 'twister') % Set rng to produce deterministisc results for reproducability
%% Importing training data
scriptPath = mfilename('fullpath'); % This determines the path where this file is in
scriptFolder = fileparts(scriptPath);
folderPath = fullfile(scriptFolder, 'TrainingData'); % This creates a path to the training data
epDataFiles = dir(fullfile(folderPath, '*.txt')); % Determines all the episode data files
numEpisodes = length(epDataFiles); % Determines the number of episodes based on the number of data files
%% Define action and observation specifications
ActionInfo = rlFiniteSetSpec([1 2 3]); % Actions that the agent is able to take
ObservationInfo = rlNumericSpec([(JobBatchSize+MaxMachines) 10]); % This is what eventually be input for the neural network
%% Creating the neural network
% Determine the wanted neurons per layer
Neurons = 64;
% Define the neural Network
qNetwork = [imageInputLayer(ObservationInfo.Dimension, 'Normalization', 'none') % Specify 'Normalization' parameter
fullyConnectedLayer(Neurons) % Fully connected layer with 64 neurons
reluLayer % Rectified Linear Unit (ReLU) activation function
fullyConnectedLayer(numel(ActionInfo.Elements))]; % Output layer
% Convert the network to a dlnetwork object
qNetwork = dlnetwork(qNetwork);
%% Creating DQN Agent
% Create a critic, so that the created neural network is used instead of a
% standard neural network
critic = rlVectorQValueFunction(qNetwork ,ObservationInfo,ActionInfo);
agent = rlDQNAgent(critic);
%% Initialize environment
env = DJSPEnvironmentFinal(ObservationInfo, ActionInfo, epDataFiles, folderPath, updateAfter, MaxMachines, JobBatchSize);
%% Training loop
for episode = 1:numEpisodes
isDone = false;
currentState = env.reset();
episodeReward = 0; % Initialize episode-specific reward
while ~isDone
% determine the number of steps taken
env.StepCount = env.StepCount + 1;
action = agent.getAction(currentState);
[nextState, reward, isDone, ~] = env.step(action); % Interact with environment
% Update episode reward and current state
episodeReward = episodeReward + reward;
currentState = nextState;
end
% Update the episode number until training is over
env.CurrentEpisode = env.CurrentEpisode + 1;
end
Now I need to add the training of the agent. I still do not completely understand how to do that. I think agent.train() would not be useful as I created my own training loop. However, i still do not understand the agent.learn() function completely. I hope that this extra context could help you give me some direction. Thanks again for your reply.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeTraining and Simulation についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by