Dealing with delayed observations/reward in RL

8 ビュー (過去 30 日間)
Nicolas CRETIN
Nicolas CRETIN 2024 年 8 月 26 日
コメント済み: Nicolas CRETIN 2024 年 8 月 27 日
Hi everyone,
I'm currently facing an issue: my agent can't learn to control the water tank system of the example below if I add an unit delay to the observation signal.
So, I just added a delay as the following picture shows:
Then, it seems that, the agent can no longer learn what action to take.
But, I guess it is a normal behavior since nothing in the network architecture allow it to learn signal time dependencies. This is why I tried to add Long short-term memory layers (LSTM), but I didn't succeed.
So, in general terms, is adding LSTM layers a good solution to this kind of problems? How can we give a chance to the agent to learn time dependencies in signals?
I'm using a DDPG agent and to add the LSTM layers I set the option UseRNN to true and I leaved the default architecture for the actor and the critic nets.
initOpts = rlAgentInitializationOptions(UseRNN=true)
I'm using the 2023b version and I suspect that the Matlab example doesn't work in the 2024a version.
This would be particularly useful for example for penalising agent for big actions (flow) - adding a penalty proportional to the action taken - or for penalising agent for big action variations - adding a penalty proportional
I added the result of my training below:
Strangely enough, we can see that the flow is always oscillating a little.
For this test, the reward has been slightly modified as follows:
reward = rewardFromTheMatlabExample + 2 / 20 * abs(error) + 2; % add a small continuous component to improve convergence
trainOpts.StopTrainingCriteria="none"; % remove the stopping criteria
Any help would be greatly appreciated!
Regards
  1 件のコメント
Nicolas CRETIN
Nicolas CRETIN 2024 年 8 月 27 日
Hi,
Sorry, I have spoken too fast: after a very long time it worked (22 hours with a powerfull computer). Have a look below:

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeEnvironments についてさらに検索

製品


リリース

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by