メインコンテンツ

方策と価値関数

方策と価値関数の近似器 (アクターやクリティックなど) を定義する

学習が実行されている間、ほとんどのエージェントは、アクターかクリティック、またはその両方に依存します。アクターは、実行するアクションを選択する方策を学習します。クリティックは、方策の価値を推定する価値 (または Q 値) 関数を学習します。

Reinforcement Learning Toolbox™ は、アクターおよびクリティック用の関数近似器オブジェクトと、カスタム ループおよび展開用の方策オブジェクトを提供します。近似器オブジェクトは、深層ニューラル ネットワーク、線形基底関数、ルックアップ テーブルなどのさまざまな近似モデルを内部的に使用できます。

方策、価値関数、アクターおよびクリティックの概要については、Create Policies and Value Functionsを参照してください。

ブロック

Policy強化学習方策 (R2022b 以降)

関数

すべて展開する

rlTable値テーブルまたは Q テーブル
rlValueFunctionValue function approximator object for reinforcement learning agents (R2022a 以降)
rlQValueFunction Q-Value function approximator with a continuous or discrete action space reinforcement learning agents (R2022a 以降)
rlVectorQValueFunction Vector Q-value function approximator with hybrid or discrete action space for reinforcement learning agents (R2022a 以降)
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents (R2022a 以降)
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents (R2022a 以降)
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents (R2022a 以降)
rlHybridStochasticActorHybrid stochastic actor with a hybrid action space for reinforcement learning agents (R2024b 以降)
getActor強化学習エージェントからのアクターの抽出
setActorSet actor of reinforcement learning agent
getCritic強化学習エージェントからのクリティックの抽出
setCriticSet critic of reinforcement learning agent
getModelGet approximation model from function approximator object
setModelSet approximation model in function approximator object
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
syncParametersModify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 以降)
rlNormalizerConfigure normalization for input of function approximator object (R2024a 以降)
getNormalizerGet normalizer from function approximator object (R2024a 以降)
setNormalizerSet normalizer in function approximator object (R2024a 以降)
normalizeNormalize input data using method defined in normalizer object (R2024a 以降)
rlOptimizerOptionsOptimization options for actors and critics (R2022a 以降)
getGreedyPolicyExtract greedy (deterministic) policy object from agent (R2022a 以降)
getExplorationPolicyExtract exploratory (stochastic) policy object from agent (R2023a 以降)
rlOptimizerCreates an optimizer object for actors and critics (R2022a 以降)
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment (R2022a 以降)
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops (R2022a 以降)
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment (R2022a 以降)
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops (R2022a 以降)
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment (R2022a 以降)
rlHybridStochasticActorPolicyPolicy object to generate hybrid stochastic actions for custom training loops and application deployment (R2024b 以降)
syncParametersModify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 以降)
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
policyParametersObtain structure of policy parameters to update policy during simulation or deployment (R2025a 以降)
updatePolicyParametersUpdate policy according to structure of policy parameters given as input argument (R2025a 以降)
rlContinuousDeterministicTransitionFunctionDeterministic transition function approximator object for neural network-based environment (R2022a 以降)
rlContinuousGaussianTransitionFunctionStochastic Gaussian transition function approximator object for neural network-based environment (R2022a 以降)
rlContinuousDeterministicRewardFunctionDeterministic reward function approximator object for neural network-based environment (R2022a 以降)
rlContinuousGaussianRewardFunctionStochastic Gaussian reward function approximator object for neural network-based environment (R2022a 以降)
rlIsDoneFunctionIs-done function approximator object for neural network-based environment (R2022a 以降)
getActionObtain action from agent, actor, or policy object given environment observations
getValueObtain estimated value from a critic given environment observations and actions
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data (R2022a 以降)
quadraticLayerQuadratic layer
scalingLayerScaling layer
softplusLayerSoftplus layer
featureInputLayer特徴入力層
reluLayer正規化線形ユニット (ReLU) 層
tanhLayer双曲線正接 (tanh) 層
fullyConnectedLayer全結合層
lstmLayer再帰型ニューラル ネットワーク (RNN) 用の長短期記憶 (LSTM) 層
softmaxLayerソフトマックス層

トピック