allExperiences
Description
returns all experiences stored in experience buffer experiences
= allExperiences(buffer
)buffer
as
individual experiences, each with a batch size of 1 and a sequence length of 1.
returns experiences concatenated along the dimension specified by experience
= allExperiences(buffer
,ConcatenateMode=mode
)mode
.
You can concatenate experiences along the batch dimension or the sequence dimension.
Examples
Extract All Experiences from Replay Memory Buffer
Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation
obsContinuous = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[1;5]); obsDiscrete = rlFiniteSetSpec([1 2 3]); obsInfo = [obsContinuous obsDiscrete];
Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 5,000.
buffer = rlReplayMemory(obsInfo,actInfo,5000);
Append a sequence of 10 random experiences to the buffer.
for i = 1:10 experience(i).Observation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; experience(i).Action = {actInfo.UpperLimit.*rand(2,1)}; experience(i).NextObservation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; experience(i).Reward = 10*rand(1); experience(i).IsDone = 0; end append(buffer,experience);
After appending experiences to the buffer, you extract all of the experiences from the buffer. Extract all of the experiences as individual experiences, each with a batch size of 1 and sequence size of 1.
experience = allExperiences(buffer)
experience=10×1 struct array with fields:
Observation
Action
NextObservation
Reward
IsDone
Alternatively, you can extract all of the experiences as a single experience batch.
expBatch = allExperiences(buffer,ConcatenateMode="batch")
expBatch = struct with fields:
Observation: {[2x1x10 double] [1x1x10 double]}
Action: {[2x1x10 double]}
Reward: [9.5751 9.1574 7.4313 8.2346 1.8687 1.6261 5.0596 2.5428 3.5166 5.6782]
NextObservation: {[2x1x10 double] [1x1x10 double]}
IsDone: [0 0 0 0 0 0 0 0 0 0]
Input Arguments
buffer
— Experience buffer
rlReplayMemory
object | rlPrioritizedReplayMemory
object | rlHindsightReplayMemory
object | rlHindsightPrioritizedReplayMemory
object
Experience buffer, specified as one of the following replay memory objects.
mode
— Concatenation mode
"none"
(default) | "batch"
| "sequence"
Concatenation mode specified as a one of the following values.
"none"
— Return experience as N individual experiences, each with a batch size of 1 and a sequence length of 1."batch"
— Return experience as a single batch with a sequence length of 1."sequence"
— Return experience as a single sequence with a batch size of 1.
Output Arguments
experience
— All buffered experiences
structure array | structure
All N buffered experiences, returned as a structure array or
structure. When mode
is:
"none"
,experience
is returned as a structure array of length N, where each element contains one buffered experience (batchSize
=1
andSequenceLength
=1
)."batch"
,experience
is returned as a structure. Each field ofexperience
contains all buffered experiences concatenated along the batch dimension (batchSize
= N andSequenceLength
=1
)."sequence"
,experience
is returned as a structure. Each field ofexperience
contains all buffered experiences concatenated along the batch dimension (batchSize
=1
andSequenceLength
= N).
experience
contains the following fields.
Observation
— Observation
cell array
Observation, returned as a cell array with length equal to the number of
observation specifications specified when creating the buffer. Each element of
Observation
contains a
DO-by-batchSize
-by-SequenceLength
array, where DO is the dimension of the
corresponding observation specification.
Action
— Agent action
cell array
Agent action, returned as a cell array with length equal to the number of
action specifications specified when creating the buffer. Each element of
Action
contains a
DA-by-batchSize
-by-SequenceLength
array, where DA is the dimension of the
corresponding action specification.
Reward
— Reward value
scalar | array
Reward value obtained by taking the specified action from the observation,
returned as a 1-by-1-by-SequenceLength
array.
NextObservation
— Next observation
cell array
Next observation reached by taking the specified action from the observation,
returned as a cell array with the same format as
Observation
.
IsDone
— Termination signal
integer | array
Termination signal, returned as a
1-by-1-by-SequenceLength
array of integers. Each element of
IsDone
has one of the following values.
0
— This experience is not the end of an episode.1
— The episode terminated because the environment generated a termination signal.2
— The episode terminated by reaching the maximum episode length.
Version History
Introduced in R2022b
See Also
Objects
MATLAB コマンド
次の MATLAB コマンドに対応するリンクがクリックされました。
コマンドを MATLAB コマンド ウィンドウに入力して実行してください。Web ブラウザーは MATLAB コマンドをサポートしていません。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)