Creating an actorLossFunction for Continuous​Determinis​ticActor

3 ビュー (過去 30 日間)
rtn
rtn 2022 年 5 月 24 日
回答済み: Takeshi Takahashi 2022 年 6 月 2 日
Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
Here is my
actInfo =
rlNumericSpec with properties:
LowerLimit: [2×1 double]
UpperLimit: [2×1 double]
Name: "CartPole Action"
Description: [0×0 string]
Dimension: [2 1]
DataType: "double"
obsInfo =
rlNumericSpec with properties:
LowerLimit: -Inf
UpperLimit: Inf
Name: "CartPole States"
Description: "pendulum_force, cart position, cart velocity"
Dimension: [4 1501]
DataType: "double"
Here is how I set my actor
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);
To create my loss function can I do the following?
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end

採用された回答

Takeshi Takahashi
Takeshi Takahashi 2022 年 6 月 2 日
Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.
rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.

その他の回答 (0 件)

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by