Procedure to link state path and action path in a DQL critic reinforcement learning agent?

Question

Margarita Cabrera 2021 年 3 月 27 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/785066-procedure-to-link-state-path-and-action-path-in-a-dql-critic-reinforcement-learning-agent

コメント済み: Margarita Cabrera 2021 年 3 月 29 日

採用された回答: Emmanouil Tzorakoleftherakis

Hi all

Some reinforcement learning examples available at the Mathworks page use a single output DQN agent as a critic.

See for instance in page

https://es.mathworks.com/help/reinforcement-learning/ref/rldqnagent.html

section: Create a DQN Agent Using a Single-Output Critic Representation

In this case the observation path and the action path are linked trough an additionLayer

I wonder why is used an additionLayer instead of a concatLayer?

It seems more appropriate to maintain input to the common path independently for each of the two parts: StatePath and ActionPath, given that, after junction, there are more layers in the common path to model critic output.

When would you recommend to use an addition layer to join state and action paths instead of using a concatenation layer?

Thanks in advance

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2021 年 3 月 29 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/785066-procedure-to-link-state-path-and-action-path-in-a-dql-critic-reinforcement-learning-agent#answer_661514

Hello,

Some comments on the points you raise above:

1.There are two ways to create the critic network for DQN as you probably saw in the doc page - one is using single output (Q value for provided input-action pair), and the other using multiple outputs (Q values for all possible actions for the specified input state). The latter is more efficient and is typically preferred

2.Concatenation and addition will both accomplish merging of features. Even with concatenation, you are only maintaining independent input paths on the surface, since the concatenation layer will also add the concatenated features. But your intuition is correct in that concatenation is more efficient:

-Addition: y = A1*u1 + b1 + A2*u2 + b2

-Concatenation: y = A*[u1;u2] + b = A1*u1 + A2*u2 + b

A couple of differences between the two:

a) The additionlayer requires a fullyconnected layer for each input branch, whereas concat merging only needs 1 fullyconnected layer downstream of the concat layer. So overall, concat layer needs less FC layers

b) You could choose to have FC layers as input to the concatenation layers similar to what you would need for addition. In that case, the preceding fully connected layers do not have to have the same number of nodes if you are using concatenation (unlike with addition)

c) As you can tell from the above equations, concatenation uses less parameters so it's more efficient in that sense

I think one of the advantages of addition is that is it more transparent/easily understood, which is why it's not being used in all the shipping examples

3. If you see that you are having difficulty creating neural network architectures manually, you can use the default agent feature. This features creates a default network architecture for you (you can always go back and change the architecture if you want).

Hope this helps

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Margarita Cabrera 2021 年 3 月 29 日

Thanks a lot Emmanouil

サインインしてコメントする。

Procedure to link state path and action path in a DQL critic reinforcement learning agent?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

Procedure to link state path and action path in a DQL critic reinforcement learning agent?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示