Creating an actorLossFunction for ContinuousDeterministicActor

Question

rtn 2022 年 5 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1726415-creating-an-actorlossfunction-for-continuousdeterministicactor

回答済み: Takeshi Takahashi 2022 年 6 月 2 日

Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Elements',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
end

Here is my

actInfo =

rlNumericSpec with properties:

LowerLimit: [2×1 double]

UpperLimit: [2×1 double]

Name: "CartPole Action"

Description: [0×0 string]

Dimension: [2 1]

DataType: "double"

obsInfo =

rlNumericSpec with properties:

LowerLimit: -Inf

UpperLimit: Inf

Name: "CartPole States"

Description: "pendulum_force, cart position, cart velocity"

Dimension: [4 1501]

DataType: "double"

Here is how I set my actor

actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);

To create my loss function can I do the following?

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
    
end

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Takeshi Takahashi 2022 年 6 月 2 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1726415-creating-an-actorlossfunction-for-continuousdeterministicactor#answer_976830

Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.

rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Creating an actorLossFunction for ContinuousDeterministicActor

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Creating an actorLossFunction for Continuous​Determinis​ticActor

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Creating an actorLossFunction for ContinuousDeterministicActor

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示