Main Content

rlConservativeQLearningOptions

Regularizer options object to train DQN and SAC agents

Since R2023a

    Description

    Use an rlConservativeQLearningOptions object to specify conservative Q-learning regularizer options to train a DQN or SAC agents. The options you can specify are the minimum weight and the number of random actions used for Q-value compensation, and are mostly useful to train agents offline (specifically to deal with possible differences between the probability distribution of the dataset and the one generated by the environment). To enable the conservative Q-learning regularizer when training an agent, set the BatchDataRegularizerOptions property of the agent options object to a rlConservativeQLearningOptions object (that has your preferred minimum weight and number of samples).

    Creation

    Description

    cqOpts = rlConservativeQLearningOptions returns a default conservative Q-learning regularizer options set.

    example

    cqOpts = rlConservativeQLearningOptions(Name=Value) creates the conservative Q-learning regularizer option set cqOpts and sets its properties using one or more name-value arguments.

    Properties

    expand all

    Weight used for Q-value compensation, specified as a positive scalar. For more information, see Algorithms.

    Example: MinQValueWeight=0.1

    Number of sampled actions used for Q-value compensation, specified as a positive integer. This is the number of random actions used to estimate the logarithm of the sum of Q-values for the SAC agent. For more information see Continuous Actions Regularizer (SAC).

    Example: NumSampledActions=30

    Object Functions

    Examples

    collapse all

    Create an rlConservativeQLearningOptions object specifying the weight to be used for Q-value compensation.

    opt = rlConservativeQLearningOptions( ...
        MinQValueWeight=5)
    opt = 
      rlConservativeQLearningOptions with properties:
    
          MinQValueWeight: 5
        NumSampledActions: 10
    
    

    You can modify options using dot notation. For example, set NumSampledActions to 20.

    opt.NumSampledActions = 20;

    To specify this behavioral cloning option set for an agent, first create the agent options object. For this example, create a default rlDQNAgentOptions object for a DQN agent.

    agentOpts = rlDQNAgentOptions;

    Then, assign the rlBehaviorCloningRegularizerOptions object to the BatchDataRegularizerOptions property.

    agentOpts.BatchDataRegularizerOptions  = opt;

    When you create the agent, use agentOpts as the last input argument for the agent constructor function rlDQNAgent.

    Algorithms

    expand all

    In conservative Q-learning the regularizer that is added to the critic loss relies on the difference between the expected Q-values of the actions from the current policy and the Q-values of the actions from the data set.

    References

    [1] Kumar, Aviral, Aurick Zhou, George Tucker, and Sergey Levine. "Conservative q-learning for offline reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 1179-1191.

    Version History

    Introduced in R2023a