rlTrainingFromDataOptions
Description
Use an rlTrainingFromDataOptions
object to specify options to
train an off-policy agent from existing data. Training options include the maximum number of
epochs to train, criteria for stopping training and criteria for saving agents. To train the
agent using the specified options, pass this object to trainFromData
.
For more information on training agents, see Train Reinforcement Learning Agents.
Creation
Description
returns
a default options set to train an off-policy agent offline, from existing data.tfdOpts
= rlTrainingFromDataOptions
creates the training option set tfdOpts
= rlTrainingOptions(Name=Value
)tfdOpts
and sets its properties using
one or more name-value arguments.
Properties
MaxEpochs
— Maximum number of epochs to train the agent
1000
(default) | positive integer
Maximum number of epochs to train the agent, specified as a positive integer. Each
epoch has a fixed number of learning steps specified by
NumStepsPerEpoch
. Regardless of other criteria for termination,
training terminates after MaxEpochs
.
Example: MaxEpochs=500
NumStepsPerEpoch
— Number of steps to run per epoch
500
(default) | positive integer
Number of steps to run per epoch, specified as a positive integer.
Example: NumStepsPerEpoch=1000
ExperienceBufferUpdateFrequency
— Buffer update period
1
(default) | positive integer
Buffer update period, specified as a positive integer. For example, if the value of
this option is 1
(default), then the buffer updates every epoch, if
it is 2
the buffer updates every other epoch, and so on. Note that
the experience buffer is not updated if it already contains all the available
data.
Example: ExperienceBufferUpdateFrequency=2
NumExperiencesPerExperienceBufferUpdate
— Number of experiences appended per buffer update
[]
(default) | positive integer
Number of experiences appended per buffer update, specified as a positive integer or empty matrix. If the value of this option is left empty (default) then, at training time, it is automatically set to half the length of the experience buffer used by the agent.
Example: NumExperiencesPerExperienceBufferUpdate=5e5
QValueObservations
— Batch of observations used to compute Q values
[]
(default) | cell array
Batch of observations used to compute Q values, specified as an 1-by-N cell array,
where N is the number of observation channels. Each cell must contain a batch of
observations, along the batch dimension, for the corresponding observation channel. For
example, if you have two observation channels carrying a 3-by-1 vector and a scalar, a
batch of 10 random observations is
{rand(3,1,10),rand(1,1,10)}
.
If the value of this option is left empty (default) then, at training time, it is automatically set to a cell array in which each element corresponding to an observation channel is an array of zeros having the same dimensions of the observation, without any batch dimension.
Example: QValueObservations={rand(3,1,10),rand(1,1,10)}
ScoreAveragingWindowLength
— Window length for averaging Q-values
5
(default) | positive integer scalar
Window length for averaging Q-values, specified as a scalar. One termination and one
saving options are expressed in terms of average Q-values. For these options, the
average is calculated over the last ScoreAveragingWindowLength
epochs.
Example: ScoreAveragingWindowLength=10
StopTrainingCriteria
— Training termination condition
"none"
(default) | "QValue"
| ...
Training termination condition, specified as one of the following strings:
"none"
— Stop training after the agent is trained for the number of epochs specified inMaxEpochs
."QValue"
— Stop training when the average Q-value (computed using the current critic and the observations specified inQValueObservations
) over the lastScoreAveragingWindowLength
epochs equals or exceeds the value specified in theStopTrainingValue
option.
Example: StopTrainingCriteria="QValue"
StopTrainingValue
— Critical value of training termination condition
"none"
(default) | scalar
Critical value of the training termination condition, specified as a scalar.
Training ends when the termination condition specified by the
StopTrainingCriteria
option equals or exceeds this value.
For instance, if StopTrainingCriteria
is
"QValue"
and StopTrainingValue
is
50
, then training terminates when the moving average Q-value
(computed using the current critic and the observations specified in
QValueObservations
) over the number of epochs specified in
ScoreAveragingWindowLength
equals or exceeds
50
.
Example: StopTrainingValue=50
SaveAgentCriteria
— Condition for saving agent during training
"none"
(default) | "EpochFrequency"
| "QValue"
| ...
Condition for saving the agent during training, specified as one of the following strings:
"none"
— Do not save any agents during training."EpochFrequency"
— Save the agent when the number of epochs is an integer multiple of the value specified in theSaveAgentValue
option."QValue"
— Save the agent when the when the average Q-value (computed using the current critic and the observations specified inQValueObservations
) over the lastScoreAveragingWindowLength
epochs equals or exceeds the value specified inSaveAgentValue
.
Set this option to store candidate agents that perform in term of Q-value, or just
to save agent at a fixed rate. For instance, if SaveAgentCriteria
is "EpochFrequency"
and SaveAgentValue
is
5
, then the agent is saved every five epochs.
Example: SaveAgentCriteria="EpochFrequency"
SaveAgentValue
— Critical value of condition for saving agent
"none"
(default) | scalar
Critical value of the condition for saving the agent, specified as a scalar.
Example: SaveAgentValue=10
SaveAgentDirectory
— Folder name for saved agents
"savedAgents"
(default) | string | character vector
Folder name for saved agents, specified as a string or character vector. The folder
name can contain a full or relative path. When an episode occurs in which the conditions
specified by the SaveAgentCriteria
and
SaveAgentValue
options are satisfied, the software saves the
current agent in a MAT-file in this folder. If the folder does not exist, the training
function creates it. When SaveAgentCriteria
is
"none"
, this option is ignored and no folder is created.
Example: SaveAgentDirectory = pwd + "\run1\Agents"
Verbose
— Option to display training progress at the command line
false
(0
) (default) | true
(1
)
Option to display training progress at the command line, specified as the logical
values false
(0
) or true
(1
). Set to true
to write information from
each training episode to the MATLAB® command line during training.
Example: Verbose=true
Plots
— Option to display training progress with Reinforcement Learning Training Monitor
"training-progress"
(default) | "none"
Option to display training progress with Reinforcement Learning Training
Monitor, specified as "training-progress"
or
"none"
. By default, calling train
opens
Reinforcement Learning Training Monitor, which graphically and
numerically displays information about the training progress, such as the reward for
each episode, average reward, number of episodes, and total number of steps. For more
information, see train
. To
turn off this display, set this option to "none"
.
Example: Plots="none"
Object Functions
trainFromData | Train off-policy reinforcement learning agent using existing data |
Examples
Configure Options to Train Agent from Data
Create an options set to train a reinforcement learning agent offline, from an existing dataset.
Set the maximum number of epochs to 2000 and the maximum number of steps per epoch to 1000. Do not set any criteria to stop the training before 1000 epochs. Also, display training progress on the command line instead of using Reinforcement Learning Training Monitor.
tfdOpts = rlTrainingFromDataOptions(... MaxEpochs=2000,... NumStepsPerEpoch=1000,... Verbose=true,... Plots="none")
tfdOpts = rlTrainingFromDataOptions with properties: MaxEpochs: 2000 NumStepsPerEpoch: 1000 ExperienceBufferUpdateFrequency: 1 NumExperiencesPerExperienceBufferUpdate: [] QValueObservations: [] ScoreAveragingWindowLength: 5 StopTrainingCriteria: "none" StopTrainingValue: "none" SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "none"
Alternatively, create a default options set and use dot notation to change some of the values.
trainOpts = rlTrainingFromDataOptions;
trainOpts.MaxEpochs = 2000;
trainOpts.NumStepsPerEpoch = 1000;
trainOpts.Verbose = true;
trainOpts.Plots = "training-progress";
trainOpts
trainOpts = rlTrainingFromDataOptions with properties: MaxEpochs: 2000 NumStepsPerEpoch: 1000 ExperienceBufferUpdateFrequency: 1 NumExperiencesPerExperienceBufferUpdate: [] QValueObservations: [] ScoreAveragingWindowLength: 5 StopTrainingCriteria: "none" StopTrainingValue: "none" SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress"
You can now use trainOpts
as an input argument to the trainFromData
command.
Version History
Introduced in R2023a
MATLAB コマンド
次の MATLAB コマンドに対応するリンクがクリックされました。
コマンドを MATLAB コマンド ウィンドウに入力して実行してください。Web ブラウザーは MATLAB コマンドをサポートしていません。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)