How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

Question

houssam deboucha 2024 年 8 月 28 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2148454-how-to-compute-the-gradients-of-sac-agent-for-custom-training-in-additon-is-the-target-critics-are

編集済み: praguna manvi 2024 年 9 月 4 日

I'm trying to train multi SAC agent using parallel computing, i don't know how to compute the gradients of agents using dlfeval function, knowing that i have created minibatchqueue for data processing. In addition, given that the agents have been created as agent=rlSACAgent(actor1,[critic1,critic2],agentOpts) , should i introduce the critics targets or they are internally handled by MATLAB by specifying the smoothing factor tau or updating frequency of target critic, and how i can update them?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

praguna manvi 2024 年 9 月 4 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2148454-how-to-compute-the-gradients-of-sac-agent-for-custom-training-in-additon-is-the-target-critics-are#answer_1510569

編集済み: praguna manvi 2024 年 9 月 4 日

MATLAB Online で開く

Hi @houssam deboucha,

The critic and actor networks are updated internally using the “train” function for agents defined as:

agent = rlSACAgent(actor,[critic1,critic2],agentOpts);

You can find an example of training a rlSACAgent in this documentation:

https://www.mathworks.com/help/reinforcement-learning/ug/train-sac-agent-for-ball-balance-control.html#TrainSACAgentForBallBalanceControlExample-2

For custom training you can refer to this documentation, which outlines the functions needed:

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-policy-using-custom-training.html#TrainRLPolicyUsingCustomTrainLoopExample-6

Typically, you could use “getValue” or “getAction” functions to extract outputs, calculate loss and compute gradients with “dlgradient”. Here is a link to another example with custom training using sampled minibatch experiences:

https://www.mathworks.com/help/reinforcement-learning/ug/custom-training-loop-with-simulink-action-noise.html#CustomTrainingLoopWithSimulinkActionNoiseExample-11

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to compute the gradients of SAC agent for custom training. In additon, is the target critics are updated automatically by matlab, given that agent =rlSACAgent()

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示