Action value exceeds the boundry of the final layer activation fucntion of the actor

Question

awcii 2023 年 6 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1984659-action-value-exceeds-the-boundry-of-the-final-layer-activation-fucntion-of-the-actor

回答済み: Harsh 2025 年 7 月 16 日

Hi,

I'm using DDPG agent for my RL application with Matlab 2022a version.

I want to take action between 0 and 1 value. To do this, i use SigmoidLayer function at the final layer of the action. However, it exceed the 0-1 boundry. I also tried to use tanh with

scalingLayer(Scale=0.5,Bias=0.5);

,but it exceed the boundry again. How it can be possible?

Meanwhile, i also tried to use

actInfo = rlNumericSpec([1 1],LowerLimit=0,UpperLimit=1);

to limit action, yes it limits the action value but it doesn't scale it. it just act as a saturation block (like putting a saturation block in simulink in front of the action output). So, with this way, the RL works wrong.

How can achive to take action between 0 and 1?

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

awcii 2023 年 6 月 18 日

than you for your reply. i solved it by reducing the noise variance now.

awcii 2023 年 6 月 19 日

however, deacreasing the noise variance cause a lack of exploration during training. So, in totaly, i need a new solution.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Harsh 2025 年 7 月 16 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1984659-action-value-exceeds-the-boundry-of-the-final-layer-activation-fucntion-of-the-actor#answer_1568049

Hi @awcii

I understand that you're seeing action values exceed the [0, 1] range—even when using "sigmoidLayer" or "tanhLayer" with "scalingLayer". Most probable reason for this is because the DDPG agent adds exploration noise after the actor network output. This noise bypasses the bounding effect of the final activation layer, causing the actual actions to fall outside the desired range. Additionally, using "rlNumericSpec" with "LowerLimit" and "UpperLimit" only clips the final action values—it does not scale or constrain the network’s internal outputs, which can interfere with learning by distorting gradients.

To fix this, you should create a custom noise layer that adds Gaussian noise during training and passes data unchanged during inference. Place this layer just before the final "sigmoidLayer" in your actor network. This ensures that the noise is applied to the pre-activation values, and the "sigmoidLayer" guarantees the final output remains strictly within (0, 1), preserving both proper exploration and stable gradient flow.

Please refer to the following documentation to understand more about respective topics:

"sigmoidLayer" - https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.sigmoidlayer.html
"tanhLayer" - https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.tanhlayer.html
Custom Layer Creation - https://www.mathworks.com/help/deeplearning/ug/define-custom-deep-learning-layer.html

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Action value exceeds the boundry of the final layer activation fucntion of the actor

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Action value exceeds the boundry of the final layer activation fucntion of the actor

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示