方策と価値関数

方策と価値関数の近似器 (アクターやクリティックなど) を定義する

学習が実行されている間、ほとんどのエージェントは、アクターかクリティック、またはその両方に依存します。アクターは、実行するアクションを選択する方策を学習します。クリティックは、方策の価値を推定する価値 (または Q 値) 関数を学習します。

Reinforcement Learning Toolbox™ は、アクターおよびクリティック用の関数近似器オブジェクトと、カスタムループおよび展開用の方策オブジェクトを提供します。近似器オブジェクトは、深層ニューラルネットワーク、線形基底関数、ルックアップテーブルなどのさまざまな近似モデルを内部的に使用できます。

方策、価値関数、アクターおよびクリティックの概要については、Create Policies and Value Functionsを参照してください。

ブロック

Policy

強化学習方策 (R2022b 以降)

関数

すべて展開する

アクターとクリティックの作成

`rlTable`	値テーブルまたは Q テーブル
`rlValueFunction`	Value function approximator object for reinforcement learning agents (R2022a 以降)
`rlQValueFunction`	Q-Value function approximator with a continuous or discrete action space reinforcement learning agents (R2022a 以降)
`rlVectorQValueFunction`	Vector Q-value function approximator with hybrid or discrete action space for reinforcement learning agents (R2022a 以降)
`rlContinuousDeterministicActor`	Deterministic actor with a continuous action space for reinforcement learning agents (R2022a 以降)
`rlDiscreteCategoricalActor`	Stochastic categorical actor with a discrete action space for reinforcement learning agents (R2022a 以降)
`rlContinuousGaussianActor`	Stochastic Gaussian actor with a continuous action space for reinforcement learning agents (R2022a 以降)
`rlHybridStochasticActor`	Hybrid stochastic actor with a hybrid action space for reinforcement learning agents (R2024b 以降)

エージェント間でのアクターとクリティックの取得および設定

`getActor`	強化学習エージェントからのアクターの抽出
`setActor`	Set actor of reinforcement learning agent
`getCritic`	強化学習エージェントからのクリティックの抽出
`setCritic`	Set critic of reinforcement learning agent

近似モデルと学習可能なパラメーターの取得および設定

`getModel`	Get approximation model from function approximator object
`setModel`	Set approximation model in function approximator object
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 以降)

入力の正規化

`rlNormalizer`	Configure normalization for input of function approximator object (R2024a 以降)
`getNormalizer`	Get normalizer from function approximator object (R2024a 以降)
`setNormalizer`	Set normalizer in function approximator object (R2024a 以降)
`normalize`	Normalize input data using method defined in normalizer object (R2024a 以降)

アクターおよびクリティックのための学習オプション

rlOptimizerOptions Optimization options for actors and critics (R2022a 以降)

エージェントからの方策オブジェクトの抽出

`getGreedyPolicy`	Extract greedy (deterministic) policy object from agent (R2022a 以降)
`getExplorationPolicy`	Extract exploratory (stochastic) policy object from agent (R2023a 以降)

カスタム学習および展開のための方策オブジェクトの作成

`rlOptimizer`	Creates an optimizer object for actors and critics (R2022a 以降)
`rlMaxQPolicy`	Policy object to generate discrete max-Q actions for custom training loops and application deployment (R2022a 以降)
`rlEpsilonGreedyPolicy`	Policy object to generate discrete epsilon-greedy actions for custom training loops (R2022a 以降)
`rlDeterministicActorPolicy`	Policy object to generate continuous deterministic actions for custom training loops and application deployment (R2022a 以降)
`rlAdditiveNoisePolicy`	Policy object to generate continuous noisy actions for custom training loops (R2022a 以降)
`rlStochasticActorPolicy`	Policy object to generate stochastic actions for custom training loops and application deployment (R2022a 以降)
`rlHybridStochasticActorPolicy`	Policy object to generate hybrid stochastic actions for custom training loops and application deployment (R2024b 以降)

パラメーターの取得と設定

`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 以降)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`policyParameters`	Obtain structure of policy parameters to update policy during simulation or deployment (R2025a 以降)
`updatePolicyParameters`	Update policy according to structure of policy parameters given as input argument (R2025a 以降)

ニューラルネットワーク環境用の近似器

`rlContinuousDeterministicTransitionFunction`	Deterministic transition function approximator object for neural network-based environment (R2022a 以降)
`rlContinuousGaussianTransitionFunction`	Stochastic Gaussian transition function approximator object for neural network-based environment (R2022a 以降)
`rlContinuousDeterministicRewardFunction`	Deterministic reward function approximator object for neural network-based environment (R2022a 以降)
`rlContinuousGaussianRewardFunction`	Stochastic Gaussian reward function approximator object for neural network-based environment (R2022a 以降)
`rlIsDoneFunction`	Is-done function approximator object for neural network-based environment (R2022a 以降)

アクションおよび価値の取得

`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getValue`	Obtain estimated value from a critic given environment observations and actions
`getMaxQValue`	Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (R2022a 以降)

深層ニューラルネットワーク層

`quadraticLayer`	Quadratic layer
`scalingLayer`	Scaling layer
`softplusLayer`	ソフトプラス層
`featureInputLayer`	特徴入力層
`reluLayer`	正規化線形ユニット (ReLU) 層
`tanhLayer`	双曲線正接 (tanh) 層
`fullyConnectedLayer`	全結合層
`lstmLayer`	再帰型ニューラルネットワーク (RNN) 用の長短期記憶 (LSTM) 層
`softmaxLayer`	ソフトマックス層

トピック

Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
Import Neural Network Models Using ONNX
You can import existing policies from other deep learning frameworks using the ONNX™ model format.