学習とシミュレーション

強化学習エージェントの学習とシミュレーション

学習が実行されている間、エージェントはパラメーターを継続的に更新して、任意の環境に最適な方策を学習します。シミュレーションが実行されている間、エージェントは環境から観測値と報酬を受け取り、パラメーターを更新せずに環境にアクションを返します。

Reinforcement Learning Toolbox™ は、エージェントに学習させ、シミュレーションを通じて学習結果を検証するための関数を提供します。エージェントの学習とシミュレーションの概要については、Train Reinforcement Learning Agentsを参照してください。

アプリ

強化学習デザイナー

強化学習エージェントの設計、学習、およびシミュレーション (R2021a 以降)

関数

すべて展開する

エージェントの学習

`train`	Train reinforcement learning agents within a specified environment
`rlTrainingOptions`	Options for training reinforcement learning agents
`rlMultiAgentTrainingOptions`	Options for training multiple reinforcement learning agents (R2022a 以降)
`trainWithEvolutionStrategy`	Train DDPG, TD3 or SAC agent using an evolutionary strategy within a specified environment (R2023b 以降)
`rlEvolutionStrategyTrainingOptions`	Options for training off-policy reinforcement learning agents using an evolutionary strategy (R2023b 以降)
`inspectTrainingResult`	Plot training information from a previous training session (R2021a 以降)

オフラインでのエージェントの学習

`trainFromData`	Train off-policy reinforcement learning agent using existing data (R2023a 以降)
`rlTrainingFromDataOptions`	Options to train reinforcement learning agents using existing data (R2023a 以降)
`inspectTrainingResult`	Plot training information from a previous training session (R2021a 以降)

学習時のエージェントの評価

`rlEvaluator`	Options for evaluating reinforcement learning agents during training (R2023b 以降)
`rlCustomEvaluator`	Custom object for evaluating reinforcement learning agents during training (R2023b 以降)

データのログ記録

`rlDataLogger`	Create either a file logger object or a monitor logger object to log training data (R2022b 以降)
`rlDataViewer`	Open Reinforcement Learning Data Viewer tool (R2023a 以降)
`FileLogger`	Log reinforcement learning training data to MAT-files (R2022b 以降)
`MonitorLogger`	Log reinforcement learning training data to monitor window (R2022b 以降)
`trainingProgressMonitor`	Monitor and plot training progress for deep learning custom training loops (R2022b 以降)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 以降)
`store`	Store data in the internal memory of a (file or monitor) logger object (R2022b 以降)
`write`	ロガーの内部メモリからロギングターゲットへの保存データの転送 (R2022b 以降)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 以降)

エージェントのシミュレーション

`sim`	Simulate trained reinforcement learning agents within specified environment
`rlSimulationOptions`	Options for simulating a reinforcement learning agent within an environment

経験バッファー

`rlReplayMemory`	Replay memory experience buffer (R2022a 以降)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (R2022b 以降)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (R2023a 以降)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (R2023a 以降)
`append`	Append experiences to replay memory buffer (R2022a 以降)
`sample`	Sample experiences from replay memory buffer (R2022a 以降)
`resize`	リプレイメモリ経験バッファーのサイズ変更 (R2022b 以降)
`allExperiences`	Return all experiences in replay memory buffer (R2022b 以降)
`validateExperience`	Validate experiences for replay memory (R2023a 以降)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (R2023a 以降)

カスタム学習

`rlOptimizer`	Creates an optimizer object for actors and critics (R2022a 以降)
`runEpisode`	Simulate reinforcement learning environment against policy or agent (R2022a 以降)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 以降)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 以降)
`Future`	Object that supports deferred outputs for reinforcement learning environment simulations running on workers (R2022a 以降)
`fetchNext`	Retrieve next available unread outputs from a reinforcement learning environment simulations running on workers (R2022a 以降)
`fetchOutputs`	Retrieve results from all reinforcement learning environment simulations running on workers (R2022a 以降)
`cancel`	Cancel unfinished reinforcement learning environment simulations on workers (R2022a 以降)
`wait`	Wait for reinforcement learning environment simulations running on a workers to finish (R2022a 以降)

ブロック

RL Agent	強化学習エージェント
Policy	強化学習方策 (R2022b 以降)

トピック

学習とシミュレーションの基礎

Train Reinforcement Learning Agents
Find the optimal policy by training your agent within a specified environment.
基本グリッドワールドでの強化学習エージェントの学習
MATLAB^® でグリッドワールドを解決するために Q 学習エージェントと SARSA エージェントに学習させる。
MDP 環境での強化学習エージェントの学習
一般的なマルコフ決定過程環境で強化学習エージェントに学習させる。
Create Simulink Environment and Train Agent
Train a controller using reinforcement learning with a plant modeled in Simulink^® as the training environment.
Train Reinforcement Learning Agent for Simple Contextual Bandit Problem
Train Q and DQN agents to solve a contextual bandit problem.

強化学習デザイナーアプリの使用

強化学習デザイナーを使用したエージェントの設計と学習
強化学習デザイナーアプリを使用して、カートポールシステム用の DQN エージェントの設計および学習を行う。
Specify Training Options in Reinforcement Learning Designer
Interactively specify options for training reinforcement learning agents using the Reinforcement Learning Designer app.
Specify Simulation Options in Reinforcement Learning Designer
Interactively specify options for simulating reinforcement learning agents using the Reinforcement Learning Designer app.

高度な学習とシミュレーション

Create DQN Agent Using Deep Network Designer and Train Using Image Observations
Create a reinforcement learning agent using the Deep Network Designer app from the Deep Learning Toolbox™.
Log Training Data to Disk
Log a variety of data to disk while training an agent.
Train Agent or Tune Environment Parameters Using Parameter Sweeping
Tune a DDPG agent using hyperparameter sweeping.
Train Reinforcement Learning Agent Offline to Control Quanser QUBE Pendulum
Train TD3 agent offline to control a Quanser QUBE pendulum.
Train Biped Robot to Walk Using Evolution Strategy-Reinforcement Learning Agents
Train TD3 agent using evolutionary strategy.

複数のプロセスと GPU の使用

Train Agents Using Parallel Computing and GPUs
Accelerate agent training by running simulations in parallel on multiple cores, GPUs, clusters or cloud resources.
Train AC Agent to Balance Cart-Pole System Using Parallel Computing
Train a AC agent for a discrete action space environment using asynchronous parallel computing.
Train DQN Agent for Lane Keeping Assist Using Parallel Computing
Train a DQN agent for an automated driving application using parallel computing.

マルチエージェント学習

Train Multiple Agents to Perform Collaborative Task
Train two continuous action space PPO agents to collaboratively move an object.
Train Multiple Agents for Area Coverage
Train three discrete action space PPO agents to explore a grid-world environment in a collaborative-competitive manner.
Train Multiple Agents for Path Following Control
Train a DQN and a DDPG agent to collaboratively perform adaptive cruise control and lane keeping assist to follow a path.

カスタムエージェントと学習アルゴリズムの開発

Train Reinforcement Learning Policy Using Custom Training Loop
Train a reinforcement learning policy using your own custom training loop.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.
Custom Training Loop with Simulink Action Noise
Use a custom training loop to train a continuous action space reinforcement learning policy in Simulink when action noise is generated within the model.

モデルベースの方策最適化エージェントの学習

Train MBPO Agent to Balance Cart-Pole System
A model-based reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Model-Based Reinforcement Learning Using Custom Training Loop
Create a model-based reinforcement learning agent using a custom training loop.