Main Content

Automatic Parking Valet with Unreal Engine Simulation

This example shows the design of a hybrid controller for an automatic searching and parking task. You will learn how to combine adaptive model predictive control (MPC) with reinforcement learning (RL) to perform a parking maneuver. The path of the vehicle is visualized using the Unreal Engine® simulation environment.

The control objective is to park the vehicle in an empty spot after starting from an initial pose. The control algorithm executes a series of maneuvers while sensing and avoiding obstacles in tight spaces. It switches between an adaptive MPC controller and a reinforcement learning agent to complete the parking maneuver. The adaptive MPC controller moves the vehicle at a constant speed along a reference path while searching for an empty parking spot. When a spot is found, the reinforcement learning agent takes over and executes a pretrained parking maneuver. Prior knowledge of the environment (the parking lot) including the locations of the empty spots and parked vehicles is available to both the controllers.

Parking Lot

The parking lot environment used in this example is a subsection of the Large Parking Lot (Automated Driving Toolbox) scene. The parking lot is represented by the ParkingLotEnvironment object, which stores information on the ego vehicle, empty parking spots, and static obstacles (parked cars and boundaries). Each parking spot has a unique index number and an indicator light that is either green (free) or red (occupied). Parked vehicles are represented in black while other boundaries are outlined in green.

Specify a sample time Ts (seconds) for the controllers and a simulation time Tf (seconds).

Ts = 0.1;
Tf = 50;

Create a reference path for the ego vehicle using the createReferenceTrajectory helper function included with this example. The reference path starts from the south-east corner of the parking lot and ends in the west as displayed with the dashed pink line.

xRef = createReferenceTrajectory(Ts,Tf);

Create a ParkingLotEnvironment object with a free spot at index 32 and the specified reference path xRef.

freeSpotIndex = 32;
map = ParkingLotEnvironment(freeSpotIndex,"Route",xRef);

Specify an initial pose (X0,Y0,θ0) for the ego vehicle. The units of X0 and Y0 are meters and θ0 is in radians.

egoInitialPose = [40 -55 pi/2];

Compute the target pose for the vehicle using the createTargetPose function. The target pose corresponds to the location in freeSpotIdx.

egoTargetPose = createTargetPose(map,freeSpotIndex);

Sensor Modules

The parking algorithm uses geometric approximations of a camera and a lidar sensor to gather information from the parking lot environment.

Camera

In this example, the field of view of a camera mounted on the ego vehicle is represented by the area shaded in green in the following figure. The camera has a field of view φ bounded by ±π/3 radians and a maximum measurement depth dmax of 10 m. As the ego vehicle moves forward, the camera module senses the parking spots within the field of view and determines whether a spot is free or occupied. For simplicity, this action is implemented using geometrical relationships between the spot locations and the current vehicle pose. A parking spot is within the camera range if didmax and φminφiφmax, where di is the distance to the parking spot and φi is the angle to the parking spot.

Lidar

The lidar sensor in this example is modeled using radial line segments emerging from the geometric center of the vehicle. Distances to obstacles are measured along these line segments. The maximum measurable lidar distance along any line segment is 6 m. The reinforcement learning agent uses these readings to determine the proximity of the ego vehicle to other vehicles in the environment.

Auto Parking Valet Model

Load the auto parking valet parameters.

autoParkingValetParams3D

The parking valet model, including the controllers, ego vehicle, sensors, and parking lot, is implemented in a Simulink model. Open the model.

mdl = "rlAutoParkingValet3D";
open_system(mdl)

In this model:

  • The vehicle dynamics are modeled in the Ego Vehicle Model subsystem. The dynamics is represented by a single-track bicycle kinematics model with two input signals: vehicle speed v (m/s) and steering angle δ (radians).

  • The adaptive MPC and RL agent blocks are found in the MPC Tracking Controller and RL Controller subsystems, respectively.

  • Mode switching between the controllers is handled by the Vehicle Mode subsystem, which outputs Search and Park signals. Initially, the vehicle is in search mode and the adaptive MPC controller tracks the reference path. When a free spot is found, the Park signal activates the RL Agent to perform the parking maneuver.

  • The Visualization Subsystem handles animation of the environment. Double-click this subsystem to specify the visualization options.

To plot the environment in a figure, set the 2D Visualization parameter to On.

If you have Automated Driving Toolbox™ software installed, you can set the Unreal Engine Visualization parameter to On to display the vehicle animation in the Unreal Engine environment. Enabling the Unreal Engine simulation can degrade simulation performance. Therefore, set the parameter to Off when training the agent.

Adaptive Model Predictive Controller Design

Create the adaptive MPC controller object for reference trajectory tracking using the createMPCForParking script. For more information on adaptive MPC, see Adaptive MPC (Model Predictive Control Toolbox).

createMPCForParking3D;

Reinforcement Learning Controller Design

To train the reinforcement learning agent, you must create an environment interface and an agent object.

Create Environment

The environment for training is the region shaded in red in the following figure. Due to symmetry in the parking lot, training within this region is sufficient for the policy to adjust to other regions after coordinate transformations are applied to the observations. Constraining the training to this region also significantly reduces training duration when compared to training over the entire parking lot space.

For this environment:

  • The training region is a 13.625 m x 12.34 m space with the target spot at its horizontal center.

  • The observations are the position errors Xe and Ye of the ego vehicle with respect to the target pose, the sine and cosine of the true heading angle θ, and the lidar sensor readings.

  • The vehicle speed during parking is a constant 2 m/s.

  • The action signals are discrete steering angles that range between +/- π/4 radians in steps of 0.2618 radians or 15 degrees.

  • The vehicle is considered parked if the errors with respect to target pose are within specified tolerances of +/- 0.75 m (position) and +/-10 degrees (orientation).

  • The episode terminates if the ego vehicle goes out of the bounds of the training region, collides with an obstacle, or parks successfully.

  • The reward rt provided at time t, is:

rt=2e-(0.05Xe2+0.04Ye2)+0.5e-40θe2-0.05δ2+100ft-50gt

Here, Xe, Ye, and θe are the position and heading angle errors of the ego vehicle from the target pose, while δ is the steering angle. ft (0 or 1) indicates whether the vehicle has parked and gt (0 or 1) indicates if the vehicle has collided with an obstacle or left the training region at time t.

The coordinate transformations on vehicle pose (X,Y,θ) observations for different parking spot locations are as follows:

  • Parking spots 1-14: X=X,Y=Y+20.41,θ=θ

  • Parking spots 15-28: X=41-X,Y=-64.485-Y,θ=θ-π

  • Parking spots 29-37: notransformation

  • Parking spots38-46: X=41-X,Y=-84.48-Y,θ=θ-π

Create the observation and action specifications for the environment.

nObs = 16;
nAct = 1;
observationInfo = rlNumericSpec([nObs 1]);
observationInfo.Name = "observations";
actionInfo = rlNumericSpec([nAct 1],"LowerLimit",-1,"UpperLimit",1);
actionInfo.Name = "actions";

Create the Simulink environment interface, specifying the path to the RL Agent block.

blk = mdl + "/Controller/RL Controller/RL Agent";
env = rlSimulinkEnv(mdl,blk,observationInfo,actionInfo);

Specify a reset function for training. The autoParkingValetResetFcn function resets the initial pose of the ego vehicle to random values at the start of each episode.

env.ResetFcn = @autoParkingValetResetFcn3D;

For more information on creating Simulink environments, see rlSimulinkEnv.

Create Agent

The agent in this example is a twin-delayed deep deterministic policy gradient (TD3) agent. TD3 agents rely on actor and critic functions to learn the optimal policy. To learn more about TD3 agents, see Twin-Delayed Deep Deterministic Policy Gradient Agents.

Set the random seed generator for reproducibility.

rng(0)

To create the critic function, first create a deep neural network with 16 inputs and one output. The output of the critic network is the state-action value function for a taking a given action from a given observation.

cnet = [
    featureInputLayer(nObs,"Normalization","none","Name","State")
    fullyConnectedLayer(128, "Name", "fc1")
    concatenationLayer(1,2,"Name","concat")
    reluLayer("Name","relu1")
    fullyConnectedLayer(128, "Name","fc3")
    reluLayer("Name","relu2")
    fullyConnectedLayer(1,"Name","CriticOutput")];
actionPath = [
    featureInputLayer(nAct,"Normalization","none","Name","Action")
    fullyConnectedLayer(128,"Name","fc2")];
criticNet = layerGraph(cnet);
criticNet = addLayers(criticNet, actionPath);
criticNet = connectLayers(criticNet,"fc2","concat/in2");

Create the Q-value function for the critics. For more information see rlQValueFunction.

criticdlnet = dlnetwork(criticNet);
critic1 = rlQValueFunction(criticNet,observationInfo,actionInfo,...
    "ObservationInputNames","State","ActionInputNames","Action");
critic2 = rlQValueFunction(criticNet,observationInfo,actionInfo,...
    "ObservationInputNames","State","ActionInputNames","Action");

Create the actor neural network. The output of the actor network is the steering angle.

anet = [featureInputLayer(nObs,"Normalization","none","Name","State")
    fullyConnectedLayer(128, "Name","actorFC1")
    reluLayer("Name","relu1")
    fullyConnectedLayer(128,"Name","actorFC2")
    reluLayer("Name","relu2")
    fullyConnectedLayer(nAct,"Name","Action")
    tanhLayer("Name","tanh1")];
actorNet = layerGraph(anet);

Create the actor function for the TD3 agent. For more information see rlContinuousDeterministicActor.

actordlnet = dlnetwork(actorNet);
actor = rlContinuousDeterministicActor(actordlnet,observationInfo,actionInfo,...
    "ObservationInputNames","State");

Specify the agent options and create the TD3 agent. For more information on TD3 agent options, see rlTD3AgentOptions.

agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
    "DiscountFactor",0.99, ...
    "ExperienceBufferLength",1e6, ...
    "MiniBatchSize",128);

Set the noise options for exploration.

agentOpts.ExplorationModel.StandardDeviation = 0.1;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOpts.ExplorationModel.StandardDeviationMin = 0.01;

Set the optimizer parameters. For this example, set the actor and critic learn rates to 1e-3 and 2e-3, respectively. Set a gradient threshold factor of 1 to limit the gradients during training.

% Actor optimizer options
agentOpts.ActorOptimizerOptions.LearnRate = 1e-3;
agentOpts.ActorOptimizerOptions.GradientThreshold = 1;
agentOpts.ActorOptimizerOptions.L2RegularizationFactor = 1e-3;
% Critic optimizer options
agentOpts.CriticOptimizerOptions(1).LearnRate = 2e-3;
agentOpts.CriticOptimizerOptions(2).LearnRate = 2e-3;
agentOpts.CriticOptimizerOptions(1).GradientThreshold = 1;
agentOpts.CriticOptimizerOptions(2).GradientThreshold = 1;

Create the agent.

agent = rlTD3Agent(actor,[critic1 critic2], agentOpts);

Train Agent

To train the agent first specify the training options.

The agent is trained for a maximum of 10000 episodes with each episode lasting a maximum of 200 time steps. The training terminates when the maximum number of episodes is reached or the average reward over 200 episodes reaches the value of 120 or more. Specify the options for training using the rlTrainingOptions function.

trainOpts = rlTrainingOptions(...
    "MaxEpisodes",10000,...
    "MaxStepsPerEpisode",200,...
    "ScoreAveragingWindowLength",200,...
    "Plots","training-progress",...
    "StopTrainingCriteria","AverageReward",...
    "StopTrainingValue",120);

Train the agent using the train function. Fully training this agent is a computationally intensive process that may take several hours to complete. To save time while running this example, load a pretrained agent by setting doTraining to false. To train the agent yourself, set doTraining to true.

doTraining = false;
if doTraining
    trainingResult = train(agent,env,trainOpts);
else
    load("rlAutoParkingValetAgent.mat","agent");
end

Simulate Parking Task

To Validate the trained agent, simulate the model and observe the parking maneuver.

sim(mdl);

{"String":"Figure Parking Lot contains an axes object. The axes object contains 223 objects of type image, rectangle, line, text, polygon, images.roi.polyline.","Tex":[],"LaTex":[]}

The vehicle tracks the reference path using the MPC controller before switching to the RL controller when the target spot is detected. The vehicle then completes the parking maneuver.

To view the trajectory, open the Ego Vehicle Pose scope.

open_system(mdl + "/Ego Vehicle Model/Ego Vehicle Pose")

Unreal Engine Simulation

Turn on the Unreal Engine visualization by opening the Visualization block and setting the Unreal Engine Visualization parameter to On. Initializing the Unreal Engine simulation environment can take a few seconds..

Park the vehicle in a different spot.

freeSpotIndex = 20;
sim(mdl)

{"String":"Figure Parking Lot contains an axes object. The axes object contains 223 objects of type image, rectangle, line, text, polygon, images.roi.polyline.","Tex":[],"LaTex":[]}