Deep reinforcement learning for multi-agents

Question

1 投票

By the multi-agent deep reinforcement learning toolbox, three agents are trained. The reward changes are as shown in the picture. Why do agents' rewards decrease and converge to an unfavorable situation after the reward increases and they move towards desired performance? I expected the process of increasing the rewards and achieving the desired goal to continue as the episode progresses. According to the picture, from episode 700, agents converge to undesired situations, and they didn't change their states.

Thank you.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis 2020 年 11 月 22 日

編集済み: Emmanouil Tzorakoleftherakis 2020 年 11 月 22 日

1 投票

Hello,

The policies you will get from RL training change depending on the amount of time the agents spend exploring. Usually, if you see a situation like this where agents converge to a non-ideal solution, you may want to change the agent options to increase exploration.

Hope that helps

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

beni hadi 2020 年 11 月 25 日

Thank you for your help.

サインインしてコメントする。

Deep reinforcement learning for multi-agents

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

Deep reinforcement learning for multi-agents

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示