generateHindsightExperiences

Generate hindsight experiences from hindsight experience replay buffer

Since R2023a

Syntax

experience = generateHindsightExperiences(buffer,trajectoryLength)

Description

experience = generateHindsightExperiences(buffer,trajectoryLength) generates hindsight experiences from the last trajectory added to the specified hindsight experience replay memory buffer.

example

Examples

collapse all

Generate Experiences from Hindsight Replay Memory

Open Live Script

When you use a hindsight replay memory buffer within your custom agent training loop, you generate experiences at the end of training episode.

Create an observation specification for an environment with a single observation channel with six observations. For this example, assume that the observation channel contains the signals [ $a$ , $x_{m}$ , $y_{m}$ , $x_{g}$ , $y_{g}$ , $c$ ], where:

$x_{g}$ and $y_{g}$ are the goal observations.
$x_{m}$ and $y_{m}$ are the goal measurements.
$a$ and $c$ are additional observations.

obsInfo = rlNumericSpec([6 1],...
    LowerLimit=0,UpperLimit=[1;5;5;5;5;1]);

Create a specification for a single action.

actInfo = rlNumericSpec([1 1],...
    LowerLimit=0,UpperLimit=10);

To create a hindsight replay memory buffer, first define the goal condition information. Both the goals and goal measurements are in the single observation channel. The goal measurements are in elements 2 and 3 of the observation channel and the goals are in elements 4 and 5 of the observation channel.

goalConditionInfo = {{1,[2 3],1,[4 5]}};

For this example, use hindsightRewardFcn1 as the ground-truth reward function and hindsightIsDoneFcn1 as the termination condition function.

Create the hindsight replay memory buffer.

buffer = rlHindsightReplayMemory(obsInfo,actInfo, ...
    @hindsightRewardFcn1,@hindsightIsDoneFcn1,goalConditionInfo);

As you train your agent, you add experience trajectories to the experience buffer. For this example, add a random experience trajectory of length 10.

for i = 1:10
    exp(i).Observation = {obsInfo.UpperLimit.*rand(6,1)};
    exp(i).Action = {actInfo.UpperLimit.*rand(1)};
    exp(i).NextObservation = {obsInfo.UpperLimit.*rand(6,1)};
    exp(i).Reward = 10*rand(1);
    exp(i).IsDone = 0;
end
exp(10).IsDone = 1;

append(buffer,exp);

At the end of the training episode, you generate hindsight experiences from the last trajectory added to the buffer. Generate experiences specifying the length of the last trajectory added to the buffer.

newExp = generateHindsightExperiences(buffer,10);

For each experience in the final trajectory, the default "final" sampling strategy generates a new experience where it replaces the goals in Observation and NextObservation with the goal measurements from the final experience in the trajectory.

To validate this behavior, first view the final goal measurements from exp.

exp(10).NextObservation{1}(2:3)

ans = 2×1

    0.7277
    0.6803

Next, view the goal values for one of the generated experiences. This value should match the final goal measurement.

newExp(6).Observation{1}(4:5)

ans = 2×1

    0.7277
    0.6803

After generating the new experiences, append them to the buffer.

append(buffer,newExp);

Input Arguments

collapse all

`buffer` — Hindsight experience buffer
`rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

Hindsight experience buffer, specified as one of the following replay memory objects.

`trajectoryLength` — Length of last trajectory in buffer
positive integer object

Length of last trajectory in buffer, specified as a positive integer.

Output Arguments

collapse all

`experience` — Hindsight experiences generated from the buffer
structure

Experiences sampled from the buffer, returned as a structure with the following fields.

`Observation` — Observation
cell array

Observation, returned as a cell array with length equal to the number of observation specifications specified when creating the buffer. Each element of Observation contains a D_O-by-batchSize-by-SequenceLength array, where D_O is the dimension of the corresponding observation specification.

`Action` — Agent action
cell array

Agent action, returned as a cell array with length equal to the number of action specifications specified when creating the buffer. Each element of Action contains a D_A-by-batchSize-by-SequenceLength array, where D_A is the dimension of the corresponding action specification.

`Reward` — Reward value
scalar | array

Reward value obtained by taking the specified action from the observation, returned as a 1-by-1-by-SequenceLength array.

`NextObservation` — Next observation
cell array

Next observation reached by taking the specified action from the observation, returned as a cell array with the same format as Observation.

`IsDone` — Termination signal
integer | array

Termination signal, returned as a 1-by-1-by-SequenceLength array of integers. Each element of IsDone has one of the following values.

0 — This experience is not the end of an episode.
1 — The episode terminated because the environment generated a termination signal.
2 — The episode terminated by reaching the maximum episode length.

`trajectoryLength` — Length of last trajectory in experience buffer
positive integer

Length of last trajectory in experience buffer, specified as a positive integer.

Version History

Introduced in R2023a

generateHindsightExperiences

Syntax

Description

Examples

Generate Experiences from Hindsight Replay Memory

Input Arguments

`buffer` — Hindsight experience buffer
`rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`trajectoryLength` — Length of last trajectory in buffer
positive integer object

Output Arguments

`experience` — Hindsight experiences generated from the buffer
structure

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar | array

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
integer | array

`trajectoryLength` — Length of last trajectory in experience buffer
positive integer

Version History

See Also

Functions

Objects

generateHindsightExperiences

Syntax

Description

Examples

Generate Experiences from Hindsight Replay Memory

Input Arguments

buffer — Hindsight experience buffer rlHindsightReplayMemory object | rlHindsightPrioritizedReplayMemory object

trajectoryLength — Length of last trajectory in buffer positive integer object

Output Arguments

experience — Hindsight experiences generated from the buffer structure

Observation — Observation cell array

Action — Agent action cell array

Reward — Reward value scalar | array

NextObservation — Next observation cell array

IsDone — Termination signal integer | array

trajectoryLength — Length of last trajectory in experience buffer positive integer

Version History

See Also

Functions

Objects

`buffer` — Hindsight experience buffer
`rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`trajectoryLength` — Length of last trajectory in buffer
positive integer object

`experience` — Hindsight experiences generated from the buffer
structure

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar | array

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
integer | array

`trajectoryLength` — Length of last trajectory in experience buffer
positive integer