Why Reinforcement Learning Agent Block's action output is not reset?

3 ビュー (過去 30 日間)
Melih Doganay Sazak
Melih Doganay Sazak 2022 年 9 月 30 日
コメント済み: Venu 2023 年 11 月 22 日
I'm working on the RL Agent problem learning a reference trajectory in a two-dimensional plane. The algorithm I use is DDPG. My agent is a missile, and that missile must learn the reference trajectory. For this, I set my states as follows:
Let the coordinates of the agent at time t be denoted by x_m, y_m, and the coordinates of the reference trajectory as x_r, y_r.
My states are as follows:
x_m, y_m, (x_m-x_r), (y_m-y_r) and integrals of these differences. (Inspired by Water-Tank Problem)
My action is as follows:
a = Lateral acceleration information (between -100 and 100)
Here is my simulink model:
Here is my reward function:
My problem is:
My network's acceleration values ​​are in the range of (-1,1). By multiplying the output of my network by 100, I create the appropriate input for my environment. But I do the multiplication after adding the OU noise. So the variance value of my noise is 0.3.
Everything is fine so far, but something caught my attention during the simulation. While defining the observationInformation, I deliberately set the limit values ​​in the range (-2.2). Then I saw that in each new episode, the output of the agentBlock is the last value in the previous episode.
On first episode, action gets -1.2 on the last step
On second episode, action starts with -1.2. --> Why? The Environment is reset, why is action not reset also?
Therefore, the environment continues to be explored over that value by adding noise on the last value. But shouldn't the agentBlock's action output be reset when the environment is reset in every episode?
When I pull the action limits to the range of (-1.1), the action value taken by my agent becomes saturated at one of the limit values ​​and cannot learn anything.
I did everything to make the agent learn for 7 days but I started giving up slowly. I need some advice to understand where I went wrong.
Thanks in advance
  1 件のコメント
Venu
Venu 2023 年 11 月 22 日
Could you share the reset function commands you have given while defining simulink environment? That would help to proceed debugging process further.
Please find this documentation for your reference.
https://www.mathworks.com/help/reinforcement-learning/ref/rl.env.simulinkenvwithagent.html

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by