In TrainMBPOA​gentToBala​nceCartPol​eSystemExa​mple/ cartPoleRewardFunction ,(nextObs)is what?

19 ビュー (過去 30 日間)
Lin
Lin 2024 年 10 月 25 日
コメント済み: Lin 2024 年 11 月 14 日 12:06
function reward = cartPoleRewardFunction(obs,action,nextObs)
% Compute reward value based on the next observation.
if iscell(nextObs)
nextObs = nextObs{1};
end
% Distance at which to fail the episode
xThreshold = 2.4;
% Reward each time step the cart-pole is balanced
rewardForNotFalling = 1;
% Penalty when the cart-pole fails to balance
penaltyForFalling = -50;
x = nextObs(1,:);
distReward = 1 - abs(x)/xThreshold;
isDone = cartPoleIsDoneFunction(obs,action,nextObs);
reward = zeros(size(isDone));
reward(logical(isDone)) = penaltyForFalling;
reward(~logical(isDone)) = ...
0.5 * rewardForNotFalling + 0.5 * distReward(~logical(isDone));
end
I really want to know where nextObs is passing this function in from? Why can't I find this variable in the main function.
If my environment is built from Simulink, how do I get the nextObs variable?

採用された回答

Ayush Aniket
Ayush Aniket 2024 年 10 月 28 日
Hi Lin,
The nextObs variable returns the next state after transition from the current state by the Reinforcement Learning(RL) Agent. While training, using the train function, the step function is implicitly called, which takes the environment model for the RL agent and the action as input, and returns three outputs: nextObs,reward and isdone.These inputs are then used in the reward function to calculate the reward for the action taken.
The Train MBPO Agent to Balance Continuous Cart-Pole System example uses a rlNeuralNetworkEnvironment object to create the environment. In this function, you can provide a custom reward function by using the function handle. Refer to the following documentation link for this input paramater:
Once a custom reward function handle is provided, it is implicitly fed the input arguments (obs,action,nextObs) during training.
However, you can evaluate these function by using the step function (and get the nextObs variable) as shown in the following documentation section:
  3 件のコメント
Ayush Aniket
Ayush Aniket 2024 年 11 月 12 日 8:19
Can you share the custom reward function you are using?
Lin
Lin 2024 年 11 月 14 日 12:06
I used a Simulink environment,state is a 2×1 vector,action is a 1×1 vector.
Main function call
useGroundTruthReward = true;
if useGroundTruthReward
rewardFcn = @RewardFunction;
else
% This neural network uses action and next observation as inputs.
rewardnet = createRewardNetworkActionNextObs(numObservations,numActions);
rewardFcn = rlContinuousDeterministicRewardFunction(rewardnet,...
obsInfo,...
actInfo, ...
ActionInputNames="action",...
NextObservationInputNames="nextState");
end
RewardFunction
function reward = cartPoleRewardFunction(obs,action,nextObs)
% Compute reward value based on the next observation.
if iscell(nextObs)
nextObs = nextObs{1};
end
% Distance at which to fail the episode
xThreshold = 2400;
% Reward each time step the cart-pole is balanced
rewardForNotFalling = 0;
% Penalty when the cart-pole fails to balance
penaltyForFalling = -50;
x = nextObs(1,:);
distReward = -log2(10000*abs(x)+1);
isDone = cartPoleIsDoneFunction(obs,action,nextObs);
reward = zeros(size(isDone));
reward(logical(isDone)) = penaltyForFalling;
reward(~logical(isDone)) = ...
0.5 * rewardForNotFalling + 1* distReward(~logical(isDone));
end
% reward = 1/(abs(x)+0.000001);

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeEnvironments についてさらに検索

製品


リリース

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by