エージェント

強化学習エージェントの作成と構成

強化学習エージェントは、環境から観測値と報酬を受け取り、環境にアクションを返します。学習が実行されている間、エージェントはパラメーターを継続的に更新して、特定の環境の方策を改善します。

Reinforcement Learning Toolbox™ ソフトウェアは、Q 学習、DQN、PG、AC、DDPG、TD3、SAC、PPO などのいくつかの一般的なアルゴリズムを使用する組み込みの強化学習エージェントを提供します。独自のカスタムエージェントを実装することもできます。

エージェントの概要については、強化学習エージェントを参照してください。方策、価値関数、アクターおよびクリティックの概要については、Create Policies and Value Functionsを参照してください。

アプリ

強化学習デザイナー

強化学習エージェントの設計、学習、およびシミュレーション (R2021a 以降)

ブロック

RL Agent

強化学習エージェント

関数

すべて展開する

エージェント

`rlQAgent`	Q-learning reinforcement learning agent
`rlSARSAAgent`	SARSA reinforcement learning agent
`rlDQNAgent`	Deep Q-network (DQN) reinforcement learning agent
`rlPGAgent`	Policy gradient (PG) reinforcement learning agent
`rlACAgent`	Actor-critic (AC) reinforcement learning agent
`rlPPOAgent`	Proximal policy optimization (PPO) reinforcement learning agent (R2019b 以降)
`rlTRPOAgent`	Trust region policy optimization (TRPO) reinforcement learning agent (R2021b 以降)
`rlDDPGAgent`	Deep deterministic policy gradient (DDPG) reinforcement learning agent
`rlTD3Agent`	Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent (R2020a 以降)
`rlSACAgent`	Soft actor-critic (SAC) reinforcement learning agent (R2020b 以降)

エージェントのオプション

`rlQAgentOptions`	Options for Q-learning agent
`rlSARSAAgentOptions`	Options for SARSA agent
`rlDQNAgentOptions`	Options for DQN agent
`rlPGAgentOptions`	Options for PG agent
`rlACAgentOptions`	Options for AC agent
`rlPPOAgentOptions`	Options for PPO agent (R2019b 以降)
`rlTRPOAgentOptions`	Options for TRPO agent (R2021b 以降)
`rlDDPGAgentOptions`	Options for DDPG agent
`rlTD3AgentOptions`	Options for TD3 agent (R2020a 以降)
`rlSACAgentOptions`	Options for SAC agent (R2020b 以降)
`rlAgentInitializationOptions`	強化学習エージェント初期化用のオプション (R2020b 以降)
`rlConservativeQLearningOptions`	Regularizer options object to train DQN and SAC agents (R2023a 以降)
`rlBehaviorCloningRegularizerOptions`	Regularizer options object to train DDPG, TD3 and SAC agents (R2023a 以降)

モデルベースの方策の最適化

`rlMBPOAgent`	Model-based policy optimization (MBPO) reinforcement learning agent (R2022a 以降)
`rlMBPOAgentOptions`	Options for MBPO agent (R2022a 以降)

アクターおよびクリティックの取得と設定

`getActor`	Extract actor from reinforcement learning agent
`getCritic`	強化学習エージェントからのクリティックの抽出
`setActor`	Set actor of reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent

アクションの取得

getAction Obtain action from agent, actor, or policy object given environment observations (R2020a 以降)

経験バッファー

`rlReplayMemory`	Replay memory experience buffer (R2022a 以降)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (R2022b 以降)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (R2023a 以降)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (R2023a 以降)
`append`	Append experiences to replay memory buffer (R2022a 以降)
`sample`	Sample experiences from replay memory buffer (R2022a 以降)
`resize`	リプレイメモリ経験バッファーのサイズ変更 (R2022b 以降)
`allExperiences`	Return all experiences in replay memory buffer (R2022b 以降)
`validateExperience`	Validate experiences for replay memory (R2023a 以降)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (R2023a 以降)

観測仕様とアクション仕様

`getActionInfo`	Obtain action data specifications from reinforcement learning environment, agent, or experience buffer
`getObservationInfo`	Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer

エージェントまたは経験バッファーのリセット

reset Reset environment, agent, experience buffer, or policy object (R2022a 以降)

トピック

エージェントの基礎

強化学習エージェント
いくつかの標準的な強化学習アルゴリズムのいずれかを使用してエージェントを作成することも、独自のカスタムエージェントを定義することもできます。
Create Agents Using Reinforcement Learning Designer
Interactively create or import agents for training using the Reinforcement Learning Designer app.

エージェントタイプ

Q 学習エージェント
強化学習用の Q 学習エージェントを作成する。
SARSA Agents
Create SARSA agents for reinforcement learning.
深層 Q ネットワーク (DQN) エージェント
強化学習用の DQN エージェントを作成する。
Policy Gradient (PG) Agents
Create policy gradient agents for reinforcement learning.
Actor-Critic (AC) Agents
Create actor-critic agents for reinforcement learning.
Proximal Policy Optimization (PPO) Agents
Create PPO agents for reinforcement learning.
Trust Region Policy Optimization (TRPO) Agents
Create TRPO agents for reinforcement learning.
深層決定論的方策勾配 (DDPG) エージェント
強化学習用の DDPG エージェントを作成する。
双生遅延深層決定論的 (TD3) 方策勾配エージェント
強化学習用の TD3 エージェントを作成する。
ソフト actor-critic (SAC) エージェント
強化学習用の SAC エージェントを作成する。
Model-Based Policy Optimization (MBPO) Agents
A model-based (MBPO) reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.

カスタムエージェント

Create Custom Reinforcement Learning Agents
Create custom agents.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.

エージェント

アプリ

ブロック

関数

エージェント

エージェントのオプション

モデルベースの方策の最適化

アクターおよびクリティックの取得と設定

アクションの取得

経験バッファー

観測仕様とアクション仕様

エージェントまたは経験バッファーのリセット

トピック

エージェントの基礎

エージェント タイプ

カスタム エージェント

エージェントタイプ

カスタムエージェント