Why Reinforcement Learning Agent Block's action output is not reset?

Question

Melih Doganay Sazak 2022 年 9 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1815055-why-reinforcement-learning-agent-block-s-action-output-is-not-reset

コメント済み: Venu 2023 年 11 月 22 日

I'm working on the RL Agent problem learning a reference trajectory in a two-dimensional plane. The algorithm I use is DDPG. My agent is a missile, and that missile must learn the reference trajectory. For this, I set my states as follows:

Let the coordinates of the agent at time t be denoted by x_m, y_m, and the coordinates of the reference trajectory as x_r, y_r.

My states are as follows:

x_m, y_m, (x_m-x_r), (y_m-y_r) and integrals of these differences. (Inspired by Water-Tank Problem)

My action is as follows:

a = Lateral acceleration information (between -100 and 100)

Here is my simulink model:

Here is my reward function:

My problem is:

My network's acceleration values are in the range of (-1,1). By multiplying the output of my network by 100, I create the appropriate input for my environment. But I do the multiplication after adding the OU noise. So the variance value of my noise is 0.3.

Everything is fine so far, but something caught my attention during the simulation. While defining the observationInformation, I deliberately set the limit values in the range (-2.2). Then I saw that in each new episode, the output of the agentBlock is the last value in the previous episode.

On first episode, action gets -1.2 on the last step

On second episode, action starts with -1.2. --> Why? The Environment is reset, why is action not reset also?

Therefore, the environment continues to be explored over that value by adding noise on the last value. But shouldn't the agentBlock's action output be reset when the environment is reset in every episode?

When I pull the action limits to the range of (-1.1), the action value taken by my agent becomes saturated at one of the limit values and cannot learn anything.