学習とシミュレーション

強化学習エージェントの学習とシミュレーション

学習が実行されている間、エージェントはパラメーターを継続的に更新して、任意の環境に最適な方策を学習します。シミュレーションが実行されている間、エージェントは環境から観測値と報酬を受け取り、パラメーターを更新せずに環境にアクションを返します。

Reinforcement Learning Toolbox™ は、エージェントに学習させ、シミュレーションを通じて学習結果を検証するための関数を提供します。エージェントの学習とシミュレーションの概要については、Train Reinforcement Learning Agentsを参照してください。

アプリ

強化学習デザイナー

強化学習エージェントの設計、学習、およびシミュレーション

関数

すべて展開する

エージェントの学習

`train`	Train reinforcement learning agents within a specified environment
`rlTrainingOptions`	Options for training reinforcement learning agents
`rlMultiAgentTrainingOptions`	Options for training multiple reinforcement learning agents (R2022a 以降)
`trainWithEvolutionStrategy`	Train DDPG, TD3 or SAC agent using an evolutionary strategy within a specified environment (R2023b 以降)
`rlEvolutionStrategyTrainingOptions`	Options for training off-policy reinforcement learning agents using an evolutionary strategy (R2023b 以降)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (R2024a 以降)

オフラインでのエージェントの学習

`trainFromData`	Train off-policy reinforcement learning agent using existing data (R2023a 以降)
`rlTrainingFromDataOptions`	Options to train reinforcement learning agents using existing data (R2023a 以降)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (R2024a 以降)

学習時のエージェントの評価

`rlEvaluator`	Options for evaluating reinforcement learning agents during training (R2023b 以降)
`rlCustomEvaluator`	Custom object for evaluating reinforcement learning agents during training (R2023b 以降)

データのログ記録

`rlDataLogger`	Create either a file logger object or a monitor logger object to log training data (R2022b 以降)
`rlDataViewer`	Open Reinforcement Learning Data Viewer tool (R2023a 以降)
`FileLogger`	Log reinforcement learning training data to MAT files (R2022b 以降)
`MonitorLogger`	Log reinforcement learning training data to monitor window (R2022b 以降)
`trainingProgressMonitor`	深層学習カスタム学習ループの学習進行状況の監視およびプロット (R2022b 以降)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 以降)
`store`	Store data in the internal memory of a (file or monitor) logger object (R2022b 以降)
`write`	ロガーの内部メモリからロギングターゲットへの保存データの転送 (R2022b 以降)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 以降)

エージェントのシミュレーション

`sim`	Simulate trained reinforcement learning agents within specified environment
`rlSimulationOptions`	Options for simulating a reinforcement learning agent within an environment

経験バッファー

`rlReplayMemory`	Replay memory experience buffer (R2022a 以降)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (R2022b 以降)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (R2023a 以降)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (R2023a 以降)
`append`	Append experiences to replay memory buffer (R2022a 以降)
`sample`	Sample experiences from replay memory buffer (R2022a 以降)
`resize`	リプレイメモリ経験バッファーのサイズ変更 (R2022b 以降)
`allExperiences`	Return all experiences in replay memory buffer (R2022b 以降)
`validateExperience`	Validate experiences for replay memory (R2023a 以降)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (R2023a 以降)

カスタム学習

`rlOptimizer`	Creates an optimizer object for actors and critics (R2022a 以降)
`runEpisode`	Simulate reinforcement learning environment against policy or agent (R2022a 以降)
`syncParameters`	Modify the learnable parameters of one approximator toward the learnable parameters of another approximator (R2022a 以降)
`update`	Update the state of on optimizer object and a set of learnable parameters using the gradient value (R2022a 以降)
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (R2022a 以降)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 以降)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 以降)
`Future`	Object that supports deferred outputs for reinforcement learning environment simulations running on workers (R2022a 以降)
`fetchNext`	Retrieve next available unread outputs from a reinforcement learning environment simulations running on workers (R2022a 以降)
`fetchOutputs`	Retrieve results from all reinforcement learning environment simulations running on workers (R2022a 以降)
`cancel`	Cancel unfinished reinforcement learning environment simulations on workers (R2022a 以降)
`wait`	Wait for reinforcement learning environment simulations running on a workers to finish (R2022a 以降)
`dlfeval`	カスタム学習ループ用の深層学習モデルの評価
`dlaccelerate`	Accelerate deep learning function
`AcceleratedFunction`	Accelerated deep learning function

パラメーターの取得と設定

`syncParameters`	Modify the learnable parameters of one approximator toward the learnable parameters of another approximator (R2022a 以降)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`policyParameters`	Obtain structure of policy parameters to update policy during simulation or deployment (R2025a 以降)
`updatePolicyParameters`	Update policy according to structure of policy parameters given as input argument (R2025a 以降)

ブロック

RL Agent	強化学習エージェント
Policy	強化学習方策 (R2022b 以降)

トピック

学習とシミュレーションの基礎

Train Reinforcement Learning Agents
Find the optimal policy by training your agent within a specified environment.
基本グリッドワールドでの強化学習エージェントの学習
MATLAB^® でグリッドワールドを解決するために Q 学習エージェントと SARSA エージェントに学習させる。
MDP 環境での強化学習エージェントの学習
一般的なマルコフ決定過程環境で強化学習エージェントに学習させる。

強化学習デザイナーアプリの使用

Specify Training Options in Reinforcement Learning Designer
Interactively specify options for training reinforcement learning agents using the Reinforcement Learning Designer app.
Specify Simulation Options in Reinforcement Learning Designer
Interactively specify options for simulating reinforcement learning agents using the Reinforcement Learning Designer app.
強化学習デザイナーを使用したエージェントの設計と学習
強化学習デザイナーアプリを使用して、カートポールシステム用の DQN エージェントの設計および学習を行う。
Tune Hyperparameters Using Reinforcement Learning Designer
Search the hyperparameter space using Reinforcement Learning Designer.

既定のエージェントの作成と学習

離散カートポールの平衡化のための既定の DQN エージェントの学習
MATLAB でモデル化された離散行動空間カートポールシステムの平衡化を行うために既定の DQN エージェントに学習させる。
離散振子の振り上げと平衡化のための既定の DQN エージェントの学習
Simulink^® でモデル化された離散行動空間振子の振り上げと平衡化を行うように、既定の DQN エージェントに学習させる。
連続振子の振り上げと平衡化のための既定の DDPG エージェントの学習
Simulink でモデル化された連続行動空間振子の平衡化を行うために DDPG エージェントに学習させる。
連続カートポールの振り上げと平衡化のための既定の DDPG エージェントの学習
Simscape™ Multibody™ でモデル化された連続行動空間のカートポールシステムの振り上げと平衡化を行うように、カスタムネットワークを使って既定の DDPG エージェントに学習させる。
Train Default PPO Agent for Discrete Lander Vehicle
Train a default PPO agent to land a discrete action space flying vehicle.

カスタム近似器を使ったエージェントの作成と学習

Train LSPI Agent to Balance Discrete Cart-Pole
Train an LSPI agent to balance discrete action space cart-pole system modeled in MATLAB.
Train PG Agent with Custom Actor Network to Balance Discrete Cart-Pole
Train a PG agent with custom actor network to balance a discrete action space cart-pole system modeled in MATLAB.
Train PG Agent with Custom Actor and Baseline Networks to Control Discrete Double Integrator
Train a PG agent with a custom actor and baseline networks to control a discrete action space double integrator system modeled in MATLAB.
振子の振り上げと平衡化のための、イメージ観測を使用した DDPG エージェントの学習
イメージベースの観測信号を使用して DDPG エージェントに学習させる。
Train Soft Actor Critic Agent with Custom Networks for Discrete Lander Vehicle
Train a SAC agent to land a discrete action space flying vehicle.

カスタム Simulink 環境のエージェントの作成と学習

DDPG エージェントを使用したタンク内の水位の制御
Simulink で学習環境としてモデル化されたプラントを使用し、強化学習を使ってコントローラーに学習させる。
Train DDPG Agent to Swing Up and Balance Pendulum with Bus Signal
Train a DDPG agent to balance a continuous action space pendulum Simulink model that contains observations in a bus signal.

複数のプロセスと GPU の使用

Train Agents Using Parallel Computing and GPUs
Accelerate agent training by running simulations in parallel on multiple cores, GPUs, clusters or cloud resources.
Train AC Agent to Balance Discrete Cart-Pole Using Parallel Computing
Train an AC agent to control a discrete action space cart-pole system using asynchronous parallel computing.
並列計算を使用した車線維持支援用 DQN エージェントの学習
並列計算を使用して自動運転アプリケーションのために DQN エージェントに学習させる。

高度な学習とシミュレーション

Train PPO Agent with Curriculum Learning for a Lane Keeping Application
Train a PPO agent for a lane keeping assist task by gradually increasing task complexity.
Train DQN Agent Using Hindsight Experience Replay
Train a DQN agent in a navigation environment with sparse rewards.
Train Reinforcement Learning Agent Offline to Control Quanser QUBE Pendulum
Train TD3 agent offline to control a Quanser QUBE pendulum.
Train Biped Robot to Walk Using Evolution Strategy-Reinforcement Learning Agents
Train TD3 agent using evolutionary strategy.
ディープネットワークデザイナーを使用した DQN エージェントの作成およびイメージ観測値を使用した学習
Deep Learning Toolbox™ のディープネットワークデザイナーアプリを使用して、強化学習エージェントを作成する。
Transfer Learning: Fine-Tune DQN Agent for Pendulum Swing-Up from Earth to Mars
Use transfer learning to partially retrain a DQN agent to swing-up and balance a pendulum with Mars gravity conditions.

学習データのログ記録とハイパーパラメーターの調整

Log Training Data to Disk
Log a variety of data to disk while training an agent.
Train Agent or Tune Environment Parameters Using Parameter Sweeping
Tune a DDPG agent using hyperparameter sweeping.
Tune Hyperparameters Using Bayesian Optimization
Tune reinforcement learning hyperparameters using Bayesian optimization.
Configure Exploration for Reinforcement Learning Agents
Use visualization to configure exploration in reinforcement learning agents.

マルチエージェント学習

Train Multiple Agents to Perform Collaborative Task
Train two continuous action space PPO agents to collaboratively move an object.
Train Multiple Agents for Area Coverage
Train three discrete action space PPO agents to explore a grid-world environment in a collaborative-competitive manner.
Train Multiple Agents for Path Following Control
Train a DQN and a DDPG agent to collaboratively perform adaptive cruise control and lane keeping assist to follow a path.

カスタムエージェントと学習アルゴリズムの開発

Train Reinforcement Learning Policy Using Custom Training Loop
Train a reinforcement learning policy using your own custom training loop.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.
Custom PPO Training Loop with Random Network Distillation
Use a custom training loop to train a custom PPO policy with random network distillation on a pendulum environment with sparse rewards.
Custom Training Loop with Simulink Action Noise
Use a custom training loop to train a continuous action space reinforcement learning policy in Simulink when action noise is generated within the model.
Custom DQN Training Loop with LSTM Network
Use a custom training loop to train a DQN agent with a LSTM network.

モデルベースの方策最適化エージェントの学習

Train MBPO Agent to Balance Continuous Cart-Pole System
A model-based reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Model-Based Reinforcement Learning Using Custom Training Loop
Create a model-based reinforcement learning agent using a custom training loop.