Reinforcement Learning on Hardware
From the series: Reinforcement Learning
This Tech Talk covers different approaches for using reinforcement learning (RL) to develop and deploy control policies on real hardware. Instead of focusing on algorithm details, it compares strategies for training and running policies, emphasizing tradeoffs in hardware safety, training time, and system performance. The demo uses a Quanser Qube Servo 2 rotary pendulum, controlled by a policy running on a Raspberry Pi®, with training performed in MATLAB® and Simulink® on a PC. Key concepts include offline RL, training on hardware versus simulation, and the importance of validating policies before deployment. The talk also highlights challenges such as communication latency, computational limits of embedded processors, and the sim2real gap when transferring policies from simulation to physical systems.
Published: 10 Dec 2025