why AC-agent converged to minimal ?

Question

Kun Cheng 2023 年 10 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2030734-why-ac-agent-converged-to-minimal

回答済み: Sugandhi 2023 年 10 月 18 日

Hello everyone!

I trained an AC-Agent. But agent converged to policy that gives minimal reward. I'm not sure if the problem is with the neural network or the environment. Rewards are negative because i want to find minimal of volumen. I have changed two parameter, lernrate = 0.05, entropylossweight=0.01. other parameter are default. I do not know what parameter should be of particular interest.

I changed lernrate to lower value of 0.0005, then cant converge.

Here ist actor and critic:

I want actor give value between [0 1]

%% neural network
nnc = [
    featureInputLayer(prod(obsInfo.Dimension), 'Name', 'input_c')
    fullyConnectedLayer(Knoten, 'Name', 'fc_c1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(Knoten, 'Name', 'fc_c2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(1, 'Name', 'output')];
nnc = dlnetwork(nnc);critic = rlValueFunction(nnc,obsInfo);
% getValue(critic,{rand(obsInfo.Dimension)})
input_actor = [
    featureInputLayer( ...
    prod(obsInfo.Dimension), ...
    Name="input_a")
    fullyConnectedLayer( ...
    prod(actInfo.Dimension), ...
    Name="in_fc")
    ];
nna1 = [
    tanhLayer(Name="tanhMean");
    fullyConnectedLayer(prod(actInfo.Dimension),"Name", 'fc_mean');
    sigmoidLayer(Name="output_mean")
    ];
nna2 = [
    tanhLayer(Name="tanhStdv");
    fullyConnectedLayer(prod(actInfo.Dimension),"Name", 'fc_div');
    softplusLayer(Name="output_div")
    ];
nna = layerGraph(input_actor);
nna = addLayers(nna,nna1);
nna = addLayers(nna,nna2);
nna = connectLayers(nna,"in_fc","tanhMean/in");
nna = connectLayers(nna,"in_fc","tanhStdv/in");
% plot(nna)
nna = dlnetwork(nna);
% summary(net)
actor = rlContinuousGaussianActor(nna, obsInfo, actInfo, ...
    ActionMeanOutputNames="output_mean",...
    ActionStandardDeviationOutputNames="output_div",...
    ObservationInputNames="input_a");

the step function in environment

function [nextobs,reward,isdone,loggedSignals] = step(this,action)
    % unpack actions
    this.Robot.x = action(1);
    this.Robot.y = action(2);
    this.Robot.NumOfHeight = action(3);
    this.Robot.NumOfAngle = action(4);
    [~, ~, ~, this.Volumennew, ~]=PrunnedTreeGenerator(this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, 3,...
                0.8, this.H, this.Bin_In_Training, this.RB, 0.5, 1);
    % Assign new state when small volume are found
    if this.Volumennew<=min(this.volume_tree_Collection)
                this.volume_tree = this.Volumennew;
                %disp(this.volume_tree)
                this.volume_tree_Collection = [this.volume_tree_Collection;...
                    this.volume_tree];
    end
    reward = -this.volume_tree/(0.5^2*pi*sum(this.Bin_In_Training(:, 3)));
            % isdone function: step stops, when found distance between
            % point that bigger than minimal, a negative reward are given
            %isdone = this.Volumennew>=min(this.volume_tree_Collection);
    Distance = distanceCalculator(this, this.Robot.x, this.Robot.y);
    Mean = meanXYZ(this);
    Sigma_Square=getSigma(this);
    DivisionSize=getSizeDivision(this);
    isdone = this.l>=24;
    if ~isdone
                this.l=this.l+1;
                nextobs = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
                %reward = sum(1+this.l)/this.l;
                this.State = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
                % if isdone is false, that means a minimal has been found,
                % therefor a positive reward has been given
    else
                this.l=this.l+1;
                %disp(this.State)
                %disp(this.volume_tree)
                nextobs = [this.Robot.x, this.Robot.y, this.Robot.NumOfHeight, this.Robot.NumOfAngle, this.volume_tree, size(this.Bin_In_Training, 1), Distance, Mean, Sigma_Square, DivisionSize]';
                this.StepState = [this.StepState;this.k this.l nextobs'];
                this.k=this.k+1;
                %reward = ;
     end
            this.State = nextobs;
            this.IsDone = isdone;
            loggedSignals = nextobs;
end

Hope for help!

thanks!

Kun

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Sugandhi 2023 年 10 月 18 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2030734-why-ac-agent-converged-to-minimal#answer_1335424

MATLAB Online で開く

Hi,

I understand that the agent is converging to a policy that gives minimal rewards because of the way the rewards are calculated in the ‘step’ function of the environment. The reward calculation is based on the volume of a tree, and the reward is set to be negative, which means the agent is incentivized to minimize the volume.

To understand why the agent is converging to a policy that gives minimal rewards, you need to examine the reward function and the environment dynamics.

reward = -this.volume_tree/(0.5^2*pi*sum(this.Bin_In_Training(:, 3)));

It seems that the goal of your task is to find the minimal volume. However, negative rewards can make convergence challenging, especially if the agent is trained using gradient-based methods.

Few possible workarounds could be:

Reward Scaling: Instead of using negative rewards, consider scaling the rewards to a positive range that aligns with the agent's objective.
Exploration: Ensure that your agent has sufficient exploration during training. Exploration allows the agent to explore different actions and states, which can help in finding better policies.

Reinforcement learning training can be sensitive to various factors, and it often requires experimentation and iterative adjustments to achieve desirable results.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

why AC-agent converged to minimal ?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

why AC-agent converged to minimal ?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示