I must say I am impressed by the reinforcement learning toolbox that came out with the 2019a MATLAB version. It greatly simplifies the development of reinforcement learning algorithms for control purposes. However, I have encountered difficulties in having the algorithms work with complex problems.
I am modelled the dynamics of my system in Simulink. At the moment, S-functions do not work for the computation of the reward and end conditions due to algebraic loops, which is annoying. Nevertheless, I have been able to bypass that by using Simulink blocks. My biggest problem, though, is that I cannot initialise (or even access) the experience buffer property of the DDPG agent.
The system I am modelling is a vehicle attempting to perform a particular, complex manoeuvre in three-degrees-of-freedom. As the manoeuvre is well understood, I have the benchmark of other control algorithms such as PID and NMPC. At the moment, the DDPG agent is struggling to learn (i.e. not converging) despite my playing with the network sizes, noise options and reward function. I fear this is because of the size of the search space (seven continuous states and two continuous actions). I am also using a random reset function to explore different starting points. My intention would be to use the data I can collect from multiple PID and NMPC simulations to initialise the experience. Although with the noise, the device would be still exploring the state- and action-spaces, it would at least be able to experience the higher rewards earlier in the exploration process and thus hopefully improve learning.
Any ideas if the experience buffer may be modified or initialised? At the moment, I fear it is a protected property. Could this be looked at for the next realese in addition to apprenticeship learning?
Many thanks in advance for the help! Also, if you have additional suggestions, they would be greatly appreciated!