generatePolicyBlock

Generate Simulink block that evaluates policy of an agent or policy object

Syntax

generatePolicyBlock(agent)

generatePolicyBlock(policy)

generatePolicyBlock(___,MATFileName=dataFileName)

Description

This function generates a Simulink^® Policy evaluation block from an agent or policy object. It also creates a data file which stores policy information. The generated policy block loads this data file to properly initialize itself prior to simulation. You can use the block to simulate the policy and generate code for deployment purposes.

For more information on policies and value functions, see Create Policies and Value Functions.

example

generatePolicyBlock(agent) creates a block that evaluates the policy of the specified agent using the default block name, policy name, and data file name.

example

generatePolicyBlock(policy) creates a block that evaluates the learned policy of the specified policy object using the default block name, policy name, and data file name.

generatePolicyBlock(___,MATFileName=dataFileName) specifies the file name of the data file.

Examples

collapse all

Create Policy Evaluation Block from PG Agent

Open Live Script

First, create and train a reinforcement learning agent. For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System.

load("MATLABCartpolePG.mat","agent")

Then, create a policy evaluation block from this agent using default names.

generatePolicyBlock(agent);

This command creates an untitled Simulink® model, containing the policy block, and the blockAgentData.mat file, containing information needed to create and initialize the policy block, (such as the trained deep neural network used by the actor within the agent). The block loads this data file to properly initialize itself prior to simulation.

You can now drag and drop the block in a Simulink® model and connect it so that it takes the observation from the environment as input and so that the calculated action is returned to the environment. This allows you to simulate the policy in a closed loop. You can then generate code for deployment purposes. For more information, see Deploy Trained Reinforcement Learning Policies.

Close the model.

 bdclose("untitled")

Create Policy Block from Deterministic Actor Policy Object

Open Live Script

Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

Alternatively use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer.

layers = [
    sequenceInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(10)
    reluLayer
    lstmLayer(8,OutputMode="sequence")
    fullyConnectedLayer(20)
    fullyConnectedLayer(actInfo.Dimension(1))
    tanhLayer
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)

   Initialized: true

   Number of learnables: 880

   Inputs:
      1   'sequenceinput'   Sequence input with 4 dimensions

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)

actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {9x1 cell}
              State: {2x1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}

ans = 2x1 single column vector

   -0.0742
    0.0158

Create a policy object from actor.

policy = rlDeterministicActorPolicy(actor)

policy = 
  rlDeterministicActorPolicy with properties:

              Actor: [1x1 rl.function.rlContinuousDeterministicActor]
      Normalization: "none"
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
         SampleTime: -1

You can access the policy options using dot notation. Check the policy with a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}

ans = 2×1

   -0.0060
   -0.0161

You can train the policy with a custom training loop.

Then, create a policy evaluation block from this policy object using the default name for the generated MAT-file.

generatePolicyBlock(policy);

Close the model.

 bdclose("untitled")

Input Arguments

collapse all

`agent` — Reinforcement learning agent
reinforcement learning agent object

Trained reinforcement learning agent, specified as one of the following agent objects. To train your agent, use the train function.

For agents with a stochastic actor (PG, PPO, SAC, TRPO, AC), the action returned by the generated policy function depends on the value of the UseExplorationPolicy property of the agent. By default, UseExplorationPolicy is false and the generated action is deterministic. If UseExplorationPolicy is true, the generated action is stochastic.

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlDeterministicActorPolicy` | `rlStochasticActorPolicy`

Reinforcement learning policy, specified as one of the following objects:

Note

rlAdditiveNoisePolicy and rlEpsilonGreedyPolicy policy objects are not supported.

`dataFileName` — Name of generated data file
`"blockAgentData"` (default) | string | character vector

Name of generated data file, specified as a string or character vector. If a file with the specified name already exists in the current MATLAB^® folder, then an appropriate digit is added to the name so that no existing file is overwritten.

The generated data file contains four structures that store data needed to fully characterize the policy. Prior to simulation, the block (which is generated with the data file name as mask parameter) loads this data file to properly initialize itself.

Version History

Introduced in R2019a

generatePolicyBlock

Syntax

Description

Examples

Create Policy Evaluation Block from PG Agent

Create Policy Block from Deterministic Actor Policy Object

Input Arguments

`agent` — Reinforcement learning agent
reinforcement learning agent object

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlDeterministicActorPolicy` | `rlStochasticActorPolicy`

`dataFileName` — Name of generated data file
`"blockAgentData"` (default) | string | character vector

Version History

See Also

Functions

Objects

Blocks

Topics

generatePolicyBlock

Syntax

Description

Examples

Create Policy Evaluation Block from PG Agent

Create Policy Block from Deterministic Actor Policy Object

Input Arguments

agent — Reinforcement learning agent reinforcement learning agent object

policy — Reinforcement learning policy rlMaxQPolicy | rlDeterministicActorPolicy | rlStochasticActorPolicy

dataFileName — Name of generated data file "blockAgentData" (default) | string | character vector

Version History

See Also

Functions

Objects

Blocks

Topics

`agent` — Reinforcement learning agent
reinforcement learning agent object

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlDeterministicActorPolicy` | `rlStochasticActorPolicy`

`dataFileName` — Name of generated data file
`"blockAgentData"` (default) | string | character vector