Main Content

rlContinuousDeterministicActor

Deterministic actor with a continuous action space for reinforcement learning agents

Description

This object implements a function approximator to be used as a deterministic actor within a reinforcement learning agent with a continuous action space. A continuous deterministic actor takes an environment state as input and returns as output the action that maximizes the expected discounted cumulative long-term reward, thereby implementing a deterministic policy. After you create an rlContinuousDeterministicActor object, use it to create a suitable agent, such as rlDDPGAgent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Description

example

actor = rlContinuousDeterministicActor(net,observationInfo,actionInfo) creates a continuous deterministic actor object using the deep neural network net as underlying approximator. For this actor, actionInfo must specify a continuous action space. The network input layers are automatically associated with the environment observation channels according to the dimension specifications in observationInfo. The network must have a single output layer with the same data type and dimensions as the action specified in actionInfo. This function sets the ObservationInfo and ActionInfo properties of actor to the observationInfo and actionInfo input arguments, respectively.

Note

actor does not enforce constraints set by the action specification; therefore, when using this actor, you must enforce action space constraints within the environment.

example

actor = rlContinuousDeterministicActor(net,observationInfo,actionInfo,ObservationInputNames=netObsNames) specifies the names of the network input layers to be associated with the environment observation channels. The function assigns, in sequential order, each environment observation channel specified in observationInfo to the layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

example

actor = rlContinuousDeterministicActor({basisFcn,W0},observationInfo,actionInfo) creates a continuous deterministic actor object using a custom basis function as underlying approximator. The first input argument is a two-element cell array whose first element is the handle basisFcn to a custom basis function and whose second element is the initial weight vector W0. This function sets the ObservationInfo and ActionInfo properties of actor to the observationInfo and actionInfo input arguments, respectively.

actor = rlContinuousDeterministicActor(___,UseDevice=useDevice) specifies the device used to perform computational operations on the actor object, and sets the UseDevice property of actor to the useDevice input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the actor, specified as one of the following:

Note

Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to a dlnetwork object. However, best practice is to convert other representations to dlnetwork explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net), where net is any Deep Learning Toolbox™ neural network object. The resulting dlnet is the dlnetwork object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.

The network must have the environment observation channels as inputs and a single output layer representing the action.

rlContinuousDeterministicActor objects support recurrent deep neural networks. For an example, see Create Deterministic Actor from Recurrent Neural Network.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

Network input layers names corresponding to the environment observation channels. When you use the pair value arguments 'ObservationInputNames' with netObsNames, the function assigns, in sequential order, each environment observation channel specified in observationInfo to each network input layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

Note

Of the information specified in observationInfo, the function only uses the data type and dimension of each channel, but not its (optional) name or description.

Example: {"NetInput1_airspeed","NetInput2_altitude"}

Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The action to be taken based on the current observation, which is the output of the actor, is the vector a = W'*B, where W is a weight matrix containing the learnable parameters and B is the column vector returned by the custom basis function.

Your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here, obs1 to obsN are inputs in the same order and with the same data type and dimensions as the environment observation channels defined in observationInfo.

Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]

Initial value of the basis function weights W, specified as a matrix having as many rows as the length of the vector returned by the basis function and as many columns as the dimension of the action space.

Properties

expand all

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation signals.

rlContinuousDeterministicActor sets the ObservationInfo property of actor to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually.

Action specifications for a continuous action space, specified as an rlNumericSpec object defining properties such as dimensions, data type and name of the action signals.

rlContinuousDeterministicActor sets the ActionInfo property of actor to the input observationInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually.

For custom basis function representations, the action signal must be a scalar, a column vector, or a discrete action.

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations a CPU.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: "gpu"

Object Functions

rlDDPGAgentDeep deterministic policy gradient (DDPG) reinforcement learning agent
rlTD3AgentTwin-delayed deep deterministic policy gradient reinforcement learning agent
getActionObtain action from agent, actor, or policy object given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradientEvaluate gradient of function approximator object given observation and action input data
accelerateOption to accelerate computation of gradient for approximator object based on neural network
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
setModelSet function approximation model for actor or critic
getModelGet function approximator model from actor or critic

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing two doubles.

actInfo = rlNumericSpec([2 1]);

To approximate the policy within the actor, use a deep neural network. The input of the network must accept a four-element vector (the observation vector just defined by obsInfo), and its output must be the action and be a two-element vector, as defined by actInfo.

Create a neural network as an array of layer objects.

net = [featureInputLayer(4)
       fullyConnectedLayer(2)];

Convert the network to a dlnetwork object and display the number of learnable parameters.

net = dlnetwork(net);
summary(net)
   Initialized: true

   Number of learnables: 10

   Inputs:
      1   'input'   4 features

Create the actor object with rlContinuousDeterministicActor, using the network and the observation and action specification objects as input arguments. The network input layer is automatically associated with the environment observation channel according to the dimension specifications in obsInfo.

actor = rlContinuousDeterministicActor( ...
    net, ...
    obsInfo, ...
    actInfo)
actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
          UseDevice: "cpu"

To check your actor, use getAction to return the action from a random observation, using the current network weights.

act = getAction(actor, ...
    {rand(obsInfo.Dimension)}); 
act{1}
ans = 2x1 single column vector

   -0.5054
    1.5390

You can now use the actor to create a suitable agent (such as rlDDPGAgent or rlTD3AgentOptions).

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing two doubles.

actInfo = rlNumericSpec([2 1]);

To approximate the policy within the actor, use a deep neural network. The input of the network (here called myobs) must accept a four-element vector (the observation vector just defined by obsInfo), and its output must be the action (here called myact) and be a two-element vector, as defined by actInfo.

Create the network as an array of layer objects. Name the network input layer netObsIn so you can later explicitly associate it to the observation input channel.

net = [
    featureInputLayer(4,Name="netObsIn")
    fullyConnectedLayer(16)
    reluLayer
    fullyConnectedLayer(2)];

Convert the network to a dlnetwork object, and display the number of learnable parameters.

net = dlnetwork(net);
summary(net)
   Initialized: true

   Number of learnables: 114

   Inputs:
      1   'netObsIn'   4 features

Create the actor object with rlContinuousDeterministicActor, using the network, the observation and action specification objects, and the name of the network input layer to be associated with the environment observation channel.

actor = rlContinuousDeterministicActor(net, ...
            obsInfo,actInfo, ...
            Observation="netObsIn")
actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
          UseDevice: "cpu"

To check your actor, use getAction to return the action from a random observation, using the current network weights.

act = getAction(actor,{rand(obsInfo.Dimension)}); 
act{1}
ans = 2x1 single column vector

    0.4013
    0.0578

You can now use the actor to create a suitable agent (such as rlDDPGAgent or rlTD3AgentOptions).

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as consisting of two channels, the first is a two-by-two continuous matrix and the second is a scalar that can assume only two values, 0 and 1.

obsInfo = [rlNumericSpec([2 2]) 
           rlFiniteSetSpec([0 1])];

Create a continuous action space specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that a single action is a column vector containing three doubles.

actInfo = rlNumericSpec([3 1]);

Create a custom basis function with two input arguments in which each output element is a function of the observations defined by obsInfo.

myBasisFcn = @(obsA,obsB) [obsA(1,1)+obsB(1)^2;
                           obsA(2,1)-obsB(1)^2;
                           obsA(1,2)^2+obsB(1);
                           obsA(2,2)^2-obsB(1)];

The output of the actor is the vector W'*myBasisFcn(obsA,obsB), which is the action taken as a result of the given observation. The weight matrix W contains the learnable parameters and must have as many rows as the length of the basis function output and as many columns as the dimension of the action space.

Define an initial parameter matrix.

W0 = rand(4,3);

Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial weight matrix. The second and third arguments are, respectively, the observation and action specification objects.

actor = rlContinuousDeterministicActor({myBasisFcn,W0},obsInfo,actInfo)
actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [2x1 rl.util.RLDataSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
          UseDevice: "cpu"

To check your actor, use the getAction function to return the action from a given observation, using the current parameter matrix.

a = getAction(actor,{rand(2,2),0})
a = 1x1 cell array
    {3x1 double}

a{1}
ans = 3×1

    1.3192
    0.8420
    1.5053

Note that the actor does not enforce the set constraint for the discrete set elements.

a = getAction(actor,{rand(2,2),-1});
a{1}
ans = 3×1

    2.7890
    1.8375
    3.0855

You can now use the actor to create a suitable agent (such as rlDDPGAgent or rlTD3AgentOptions).

Create observation and action information. You can also obtain these specifications from an environment. For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles, and the action space as a continuous two-dimensional space, so that a single action is a column vector containing two doubles.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

To approximate the policy within the actor, use a recurrent deep neural network. You can obtain the dimension of the observation and action spaces from the environment specification objects.

Create a neural network as an array of layer objects. Since this network is recurrent, use a sequenceInputLayer as the input layer and at least one lstmLayer.

net = [sequenceInputLayer(obsInfo.Dimension(1))
       fullyConnectedLayer(10)
       reluLayer
       lstmLayer(8,OutputMode="sequence")
       fullyConnectedLayer(20)
       fullyConnectedLayer(actInfo.Dimension(1))
       tanhLayer];

Convert the network to a dlnetwork object and display the number of learnable parameters.

net = dlnetwork(net);
summary(net)
   Initialized: true

   Number of learnables: 880

   Inputs:
      1   'sequenceinput'   Sequence input with 4 dimensions

Create a deterministic actor representation for the network.

actor = rlContinuousDeterministicActor( ...
    net, ...
    obsInfo, ...
    actInfo);

To check your actor, use getAction to return the action from a random observation, given the current network weights.

a = getAction(actor, ...
    {rand(obsInfo.Dimension)}); 
a{1}
ans = 2x1 single column vector

   -0.0742
    0.0158

You can now use the actor to create a suitable agent (such as rlDDPGAgent or rlTD3AgentOptions).

Version History

Introduced in R2022a