In the matlab's example, it use a custom layer for the actor called: "fullyConnectedPILayer", the description says:
"Gradient descent optimization can drive the weights to negative values. To avoid negative weights, replace normal fullyConnectedLayer with a fullyConnectedPILayer. This layer ensures that the weights are positive by implementing the function Y=abs(WEIGHTS)∗X. This layer is defined in fullyConnectedPILayer.m."
So, the two weights always suppose to be positive, but after training, my actor network has a negative weight value (negative ki = -0.0057) and a positive weight value (Kp = 0.0455). Also, in the same example, it says:
"The integral and proportional gains of the PI controller are the absolute weights of the actor representation. To obtain the weights, first extract the learnable parameters from the actor."
And it uses the abs function to get the weigths, so it doesn't make any sense to use the custom layer "fullyConnectedPILayer", because the actor network can generate negative weights.
the code of the layer is as follows:
classdef fullyConnectedPILayer < nnet.layer.Layer
function obj = fullyConnectedPILayer(Weights,Name)
obj.Name = Name;
obj.Description = "fullyConnectedNonNegWeightLayer";
obj.Weights = Weights;
function Z = predict(obj, X)
Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');
The code for my actor network is exactly the same as the example:
initialGain = single([1e-3 2]);
actorNetwork = [
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
I don't know why it generates a negative weight if I'm using this custom layer.