Action Value can't be constained

3 ビュー (過去 30 日間)
Zhengyang Chen
Zhengyang Chen 2020 年 8 月 3 日
コメント済み: Asvin Kumar 2020 年 8 月 16 日
I am a beginner to RL and now trying to use policy gradient as my agent. Here is somthing weird I found when I try to output the action value in a certain range.
In the Create Continuous Stochastic Actor from Deep Neural Network of this link:
The action value limit is set first in rlNumericSpec(), but the constrain here seems to have no effect on the actual actor output. If I change the lower limit to 0, it would still yield negative value.
My question is, to actually have an action output within range, do I need to achieve this via the neural network construction. Say I want a range of 0 to 5, how should I modify the network then?
BTW, why the output elements of the neural network should be two times the actual action output? what's happening inside rlStochasticActorRepresentation()?

回答 (1 件)

Asvin Kumar
Asvin Kumar 2020 年 8 月 6 日
For your first question:
In short, it might be because of the noise added to the predicted action. If I'm not wrong, you should be able to modify the properties of the noise in such a way that it doesn't affect your range.
For your second question:
The documentation for rlStochasticActorRepresentation says that the network output layer must have twice as many elements as the number of dimensions of the continuous action space and that they represent all the mean values followed by all the variances (which must be non-negative) of the Gaussian distributions for the dimensions of the action space.
The reason for the mean and variance is the nature of stochastic actors. From the description of rlStochasticActorRepresentation, a stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. This random action is sampled from the Gaussian distribution described by the mean and variance.
  4 件のコメント
Zhengyang Chen
Zhengyang Chen 2020 年 8 月 15 日
Hi, I think I kinda understand for the first question, why there still has negative outputs. It is not useful to simply control all the output of the neural network to be over zero, cuz the action is finally decided by the mean and variance, which are the NN output. So basically if we have a large variance, the action value can be more likely to locate on the far end on each side of the normal distribution. And that means a negative value is possible to get. Am I correct?
Asvin Kumar
Asvin Kumar 2020 年 8 月 16 日
Perfect. Your explanation works better than mine.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeImage Data Workflows についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by