Reinforcement Learning on Simscape
現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
古いコメントを表示
I am having an issue with RL in simscape. I added a unit delay in order to break an algrbraic loop but what the unit delay initial condition does it that it sets the value I want to change to a constant equal to the initial condition of the unit delay bloc. Do you by any chance know what might be the problem causing this ?
I will add a screenshot of the training.
採用された回答
Emmanouil Tzorakoleftherakis
2024 年 6 月 28 日
One option is to look at introducing the delay on the observation, not the action. Please take a look at this page for more details
14 件のコメント
Karim Darwich
2024 年 7 月 1 日
@Emmanouil TzorakoleftherakisI have another question that I am not able to answer, my actions are constant at each episode.
Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
編集済み: Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
Hi, In general, if you have a question unrelated to the original one, it's a good idea to start a separate thread for visibility. Not sure which agent you are using, but make sure your exploration options make sense. Also, let the agent run for a few episodes first as sometimes the behavior you are describing is common in the initial episodes.
Karim Darwich
2024 年 7 月 1 日
@Emmanouil Tzorakoleftherakis My apologies. I am using a PPO agent with a 0.001 Learning rate for both the actor and the critic. I did a trainning over 50 episodes but the action is still constant (it's changing from an episode to another tho). I am very new to RL and I am trying mostly with trial and error. Thank you for your previous responses.
Np. Which release are you using? How long are your episodes? What is the agent sample size? What is your reward? Also, I would let training continue for a few hundred episodes and check again if the issue persists.
Karim Darwich
2024 年 7 月 1 日
編集済み: Karim Darwich
2024 年 7 月 1 日
@Emmanouil Tzorakoleftherakis Thank you for your response.
I am using a PPO agent with the hyperparameters in the screenshot attached to the message. I am using 2023a. The reward is also attached to this message. Thank you in advance hope this clears the situation. I am actually doing an RL control on a model I did myself for the control of district heating networks.
PS: I will later on create another question and link it to this one that way the question would be much more visible.
Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
編集済み: Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
Thanks. It seems your agent sample time is the same as the episode duration. Is that expected? How often do you expect your agent to take actions? Regardless, that explains what you are seeing. The agent will basically only take one action per episode,so 50 actions in total for 50 episodes. This is really not sufficient training time.
Karim Darwich
2024 年 7 月 1 日
@Emmanouil Tzorakoleftherakis Yes I did not see that I had the same sample time and experience horizon. Thank you very much.
Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
編集済み: Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
I was actually referring to the isdone signal in the reward function. It is set to true at t=86400 which is the same as the agent sample time.
Karim Darwich
2024 年 7 月 1 日
@Emmanouil Tzorakoleftherakis Oh sorry I see ! And what option would be a good one in that case ?
Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
編集済み: Emmanouil Tzorakoleftherakis
2024 年 7 月 1 日
Depends on your problem and how frequently your agent needs to take actions (that's determined by the agent sample time)
Karim Darwich
2024 年 7 月 2 日
@Emmanouil Tzorakoleftherakis So for example if I want my agent to change the mass flow every hour for one day I should put the sample time at 3600 (the number of seconds in an hour) with the isdone condition at 86400 (the number of seconds in a day)
Correct. Alternatively, you could use the maxstepsperepisode training option and leave the isdone flag to be false all the time. The IsDone flag can be used for cases where you want to terminate an episode early (e.g. some constraint is being violated, etc.)
Karim Darwich
2024 年 7 月 2 日
@Emmanouil Tzorakoleftherakis Perfect. Thank you very much sir !
np
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Training and Simulation についてさらに検索
参考
2024 年 6 月 28 日
2024 年 7 月 2 日
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
