用途

強化学習を適用する方法の例

強化学習は、制御、ロボティクス、スケジューリング、最適化、金融など、幅広い分野のさまざまな問題に適用できます。こちらはその例です。

チュートリアル

制御タスクの実行のためのエージェントの学習

DDPG エージェントを使用したタンク内の水位の制御
Simulink^® で学習環境としてモデル化されたプラントを使用し、強化学習を使ってコントローラーに学習させる。
強化学習を使用した PI コントローラーの調整
TD3 エージェントを使用して PI コントローラーのゲインを調整する。
Train SAC Agent for Ball Balance Control
Train a SAC agent to balance a ball on a flat surface using a robot arm.
DDPG エージェントを使用したタンク内の水位の制御
Simulink で学習環境としてモデル化されたプラントを使用し、強化学習を使ってコントローラーに学習させる。
Train Reinforcement Learning Agents to Control Quanser QUBE Pendulum
Train SAC and PPO agents to balance the Quanser QUBE rotational inverted pendulum.
Train Reinforcement Learning Agent Offline to Control Quanser QUBE Pendulum
Train TD3 agent offline to control a Quanser QUBE pendulum.
Train TD3 Agent for PMSM Control
Train a TD3 agent to control the currents in a permanent magnet synchronous motor.
強化学習を使用した PMSM のベクトル制御 (Motor Control Blockset)
この例では、強化学習の制御設計法を使用して永久磁石同期モーター (PMSM) のベクトル制御 (FOC) を実装する方法を示します。
Train DQN Agent with LSTM Network to Control House Heating System
Train a DQN agent with a recurrent network to control the temperature of an house.
制約の適用を使用した強化学習エージェントの学習 (Simulink Control Design)
Constraint Enforcement ブロックを使用してアクションを制約して強化学習エージェントに学習させる。
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.

ロボットの制御のためのエージェントの学習

滑走ロボットを制御するための DDPG エージェントの学習
摩擦のない 2 次元平面上を滑走するロボットを制御するために DDPG エージェントに学習させる。
Train PPO Agent for a Lander Vehicle
Train a discrete PPO agent to land a flying vehicle.
Train Discrete Soft Actor Critic Agent for Lander Vehicle
Train a discrete SAC agent to land a flying vehicle.
強化学習エージェントを使用した二足歩行ロボットの学習
Simscape™ Multibody™ でモデル化された二足歩行ロボットを制御するために、DDPG と TD3 エージェントを比較する。
Train Biped Robot to Walk Using Evolution Strategy-Reinforcement Learning Agents
Train TD3 agent using evolutionary strategy.
DDPG エージェントを使用した四足歩行ロボットの移動
Simscape Multibody でモデル化された四足歩行ロボットを制御するために DDPG エージェントに学習させる。

制御仕様からの報酬の生成

Generate Reward Function from a Model Predictive Controller for a Servomotor
Generate a reward function from an MPC controller applied to a servomotor and use it to train a TD3 agent.
Generate Reward Function from a Model Verification Block for a Water Tank System
Generate a reward function from an model verification block applied to a water tank system and use it to train a TD3 agent.

模倣学習

Imitate MPC Controller for Lane Keeping Assist
Train a deep neural network to imitate the behavior of a model predictive controller within a lane keeping assist system.
Imitate Nonlinear MPC Controller for Flying Robot
Train a deep neural network to imitate the behavior of a nonlinear model predictive controller for a flying robot.
Train DDPG Agent with Pretrained Actor Network
Train a DDPG agent using an actor network that has been previously trained using supervised learning.

自動車用途向けのエージェントの学習

車線維持支援用 DQN エージェントの学習
車線維持支援アプリケーション用に DQN エージェントに学習させる。
Train PPO Agent with Curriculum Learning for a Lane Keeping Application
Train a PPO agent for a lane keeping assist task by gradually increasing task complexity.
アダプティブクルーズコントロール用の DDPG エージェントの学習
アダプティブクルーズコントロールアプリケーション用の DDPG エージェントに学習させる。
経路追従制御用の DDPG エージェントの学習
車線追従制御用に DDPG エージェントに学習させる。
Train Multiple Agents for Path Following Control
Train a DQN and a DDPG agent to collaboratively perform adaptive cruise control and lane keeping assist to follow a path.
Train Hybrid SAC Agent for Path-Following Control
Train an hybrid SAC agent for lane following control.
Train PPO Agent for Automatic Parking Valet
Train a discrete action space PPO agent to park a car in an open parking space.
Automatic Parking Valet with Unreal Engine Simulation
Use a TD3 agent with an MPC controller to perform a parking maneuver.

その他の用途

Train Reinforcement Learning Agent for Simple Contextual Bandit Problem
Train Q and DQN agents to solve a contextual bandit problem.
Train Agent to Play Turn-Based Game
Train a DQN agent to play a turn-based game.
Deep Reinforcement Learning for Optimal Trade Execution
This example shows how to use the Reinforcement Learning Toolbox™ and Deep Learning Toolbox™ to design agents for optimal trade execution.
Multiperiod Goal-Based Wealth Management Using Reinforcement Learning
This example shows a reinforcement learning (RL) approach to maximize the probability of obtaining an investor's wealth goal at the end of the investment horizon. This problem is known in the literature as goal-based wealth management (GBWM). In GBWM, risk is not necessarily measured using the standard deviation, the value-at-risk, or any other common risk metric. Instead, risk is understood as the likelihood of not attaining an investor's goal. This alternative concept of risk implies that, sometimes, in order to increase the probability of attaining an investor’s goal, the optimal portfolio’s traditional risk (that is, standard deviation) must increase if the portfolio is underfunded. In other words, for the investor’s view of risk to decrease, the traditional view of risk must increase if the portfolio’s wealth is too low.
Train DQN Agent for Beam Selection (Communications Toolbox)
Train a deep Q-network (DQN) reinforcement learning agent for beam selection in a 5G new radio communications system. (R2022b 以降)
Water Distribution System Scheduling Using Reinforcement Learning
Train a DQN agent to optimally activate pumps in a water distribution system.

モデルベースの方策最適化エージェントの学習

Train MBPO Agent to Balance Continuous Cart-Pole System
A model-based reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Model-Based Reinforcement Learning Using Custom Training Loop
Create a model-based reinforcement learning agent using a custom training loop.