What's the difference between getAction and predict in RL and why does it change with agent and actor?

Question

Kevin Voogd 2022 年 7 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto

回答済み: Ari Biswas 2023 年 1 月 26 日

Hi all,

I am trying to import the neural network of my PPO actor via ONNX. I followed the steps shown in here Train DDPG Agent with Pretrained Actor Network (adapted to PPO, though). I do not import a critic for the network because my network is ready to be deployed. When I check the output of predict(....) it matches what I've in Python. However, getAction(agent,{testData}) and getAction(actor,{testData}) differ from predict(...) and even from themselves. Moreover, they change every run even if the input is kept constant (for example, feeding an array of ones). Can someone clarify me why the output of getAction changes when used with agent abd actor, and why does not match the value of the neural network?

Best regards,

Kevin

Here is the code used and a result obtained:

agentAction = -0.9091

actorAction = -0.8572

predictNN = 0.8436

actorNetwork  = importONNXNetwork("C:\...\ppo_model.onnx",'TargetNetwork',"dlnetwork", "InputDataFormats",'BC');
actorNetwork = layerGraph(actorNetwork);
low_limit = transpose([0.0 -pi -20000.0, -20000.0, -1.5, -20000, -20000, -2, -3, -3.5, -4]);
upper_limit = transpose([20.0, pi, 20000.0, 20000.0, 1.5, 20000, 20000, 2, 3, 3.5, 4]);
obsInfo = rlNumericSpec([11 1], 'LowerLimit',low_limit, 'UpperLimit',upper_limit);
actInfo = rlNumericSpec([1 1],'LowerLimit',-0.18,'UpperLimit',0.18);
% Code generation does not support the last custom layer, so delete it
actorNetwork = removeLayers(actorNetwork, 'onnx__Gemm_0_BatchSizeVerifier');
actorNetwork = removeLayers(actorNetwork, 'x25Output');
actorNetwork = removeLayers(actorNetwork, 'x26Output');
actorNetwork = connectLayers(actorNetwork, 'onnx__Gemm_0', 'Gemm_0');
% Get the names of the layers required to generate the actor
netMeanActName = actorNetwork.Layers(12).Name;
netStdActName = actorNetwork.Layers(13).Name;
netObsNames = actorNetwork.Layers(1).Name;
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo,'ActionMeanOutputNames', netMeanActName, 'ActionStandardDeviationOutputNames', netStdActName, 'ObservationInputNames', netObsNames);
agent = rlPPOAgent(obsInfo, actInfo);
agent.setActor(actor)
% Check that the network used by supervisedActor is the same one that was loaded. To do so, evaluate both the network and the agent using the same random input observation.
testData = ones(11,1);
% Evaluate the actor
agentAction = getAction(agent,{testData})
actorAction = getAction(actor,{testData})
% Evaluate the agent's actor 
predictImNN = predict(getModel(getActor(agent)),dlarray(testData','BC'))

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ari Biswas 2023 年 1 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto#answer_1157175

The PPO agent with continuous action space has a stochastic policy. The network has two outputs: mean and standard deviation.

Calling getAction on the agent/actor returns the action sampled from the policy using the mean and stdev outputs of the network.

Calling predict on the network gives you the mean and std values. You should do [mean,std] = predict(...) instead to get both values.

Also, you must ensure that you are comparing from the same random number generator state. For e.g. ensure that you execute rng(0) before you evaluating the networks each time.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

What's the difference between getAction and predict in RL and why does it change with agent and actor?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

What's the difference between getAction and predict in RL and why does it change with agent and actor?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示