Action Value can't be constained

Question

Zhengyang Chen 2020 年 8 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/574699-action-value-can-t-be-constained

コメント済み: Asvin Kumar 2020 年 8 月 16 日

I am a beginner to RL and now trying to use policy gradient as my agent. Here is somthing weird I found when I try to output the action value in a certain range.

In the Create Continuous Stochastic Actor from Deep Neural Network of this link:

https://uk.mathworks.com/help/reinforcement-learning/ref/rlstochasticactorrepresentation.html

The action value limit is set first in rlNumericSpec(), but the constrain here seems to have no effect on the actual actor output. If I change the lower limit to 0, it would still yield negative value.

My question is, to actually have an action output within range, do I need to achieve this via the neural network construction. Say I want a range of 0 to 5, how should I modify the network then?

BTW, why the output elements of the neural network should be two times the actual action output? what's happening inside rlStochasticActorRepresentation()?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Asvin Kumar 2020 年 8 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/574699-action-value-can-t-be-constained#answer_476008

For your first question:

Have a look at the discussion here. https://www.mathworks.com/matlabcentral/answers/515602-incorrect-tanhlayer-output-in-rl-agent#answer_425717

In short, it might be because of the noise added to the predicted action. If I'm not wrong, you should be able to modify the properties of the noise in such a way that it doesn't affect your range.

For your second question:

The documentation for rlStochasticActorRepresentation says that the network output layer must have twice as many elements as the number of dimensions of the continuous action space and that they represent all the mean values followed by all the variances (which must be non-negative) of the Gaussian distributions for the dimensions of the action space.

The reason for the mean and variance is the nature of stochastic actors. From the description of rlStochasticActorRepresentation, a stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. This random action is sampled from the Gaussian distribution described by the mean and variance.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Zhengyang Chen 2020 年 8 月 15 日

Hi, I think I kinda understand for the first question, why there still has negative outputs. It is not useful to simply control all the output of the neural network to be over zero, cuz the action is finally decided by the mean and variance, which are the NN output. So basically if we have a large variance, the action value can be more likely to locate on the far end on each side of the normal distribution. And that means a negative value is possible to get. Am I correct?

Asvin Kumar 2020 年 8 月 16 日

Perfect. Your explanation works better than mine.

サインインしてコメントする。

Action Value can't be constained

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Action Value can't be constained

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示