Training DDPG agent with custom training loop

Question

平成 2025 年 5 月 31 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2177442-training-ddpg-agent-with-custom-training-loop

回答済み: Hitesh 2025 年 6 月 3 日

Currently, I am designing a control system using deep reinforcement learning (DDPG) in reinforcement learning toolbox, MATLAB/Simulink. Specifically, I need to implement a custom training loop that does not rely on train functon. Could you please show me how to implement a custom training loop for training a DDPG agent? I would like to understand how to implement a standard DDPG-based control system using a custom training loop in MATLAB.

I will now provide the MATLAB code I currently use train function for a DDPG agent. Could you convert it into a version that uses a custom training loop (without using train)?

obsInfo = rlNumericSpec([6 1]);
obsInfo.Name = "observations";
actInfo = rlNumericSpec([1 1]);
actInfo.Name = "control input";
mdl ='SIM_RL'; % Simulink model by Plant + RL agent block
env = rlSimulinkEnv( ...
    "SIM_RL", ...
    "SIM_RL/Agent/RL Agent", ...
     obsInfo, actInfo);
% Domain randomization: Reset function
env.ResetFcn = @(in)localResetFcn(in);
function in = localResetFcn(in)
% Fixed range of plant parameter
M_min = Nominal_value*(1 - 0.5); % -50% of nominal mass
M_max = Nominal_value*(1 + 0.5); % +50% of nominal mass
% Randomize mass
randomValue_M = M_min + (M_max - M_min) * rand;
in = setBlockParameter(in, ...
"SIM_RL/Plant/Mass", ...
Value=num2str(randomValue_M));
end
% The construction of the critic Network structure is omitted here.
% ....
criticNet = initialize(criticNet);
critic = rlQValueFunction(criticNet,obsInfo,actInfo);
% The construction of the actor Network structure is omitted here.
% ....
actorNet = initialize(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
% Set-up agent
criticOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
actorOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
agentOpts = rlDDPGAgentOptions(...
    SampleTime=0.01,...
    CriticOptimizerOptions=criticOpts,...
    ActorOptimizerOptions=actorOpts,...
    ExperienceBufferLength=1e5,...
    DiscountFactor=0.99,...
    MiniBatchSize=128,...
    TargetSmoothFactor=1e-3);
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 5000;
maxsteps = ceil(Simulation_End_Time/0.01);
trainOpts = rlTrainingOptions(...
    MaxEpisodes=maxepisodes,...
    MaxStepsPerEpisode=maxsteps,...
    ScoreAveragingWindowLength=5,...
    Verbose=true,...
    Plots="training-progress",...
    StopTrainingCriteria="EpisodeCount",...
    SaveAgentCriteria="EpisodeReward",...
    SaveAgentValue=-1.0);
doTraining = true;
if doTraining
    evaluator = rlEvaluator(...
        NumEpisodes=1,...
        EvaluationFrequency=5);
    % Train the agent.
    trainingStats = train(agent,env,trainOpts,Evaluator=evaluator);
else
    % Load the pretrained agent
    load("agent.mat","agent")
end

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Hitesh 2025 年 6 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2177442-training-ddpg-agent-with-custom-training-loop#answer_1565809

MATLAB Online で開く

Hi 平成,

To convert DDPG agent training setup from using the "train" function into a custom training loop in MATLAB. The custom loop gives you greater control over training, evaluation, logging, and integration with domain randomization.

Main Components of a Custom Training Loop are:

Environment Reset: Start each episode by resetting the environment.
Action Selection: Use the actor network to select an action based on the current observation.
Environment Step: Apply the action to the environment (e.g., via sim for Simulink models) and collect the next observation, reward, and done flag.
Experience Storage: Store the transition (state, action, reward, next state, done) in a replay buffer.
Learning: Sample mini-batches from the buffer and perform gradient updates on the actor and critic networks.
Target Updates: Soft update the target networks (actor and critic) toward the main networks.
Logging & Evaluation: Track performance (e.g., cumulative reward) and optionally evaluate the agent periodically.

Kindly refer to the following custom training loop as an example.

% Create agent
agent = rlDDPGAgent(actor, critic, agentOpts);
% Experience buffer
buffer = agent.ExperienceBuffer;
% Logging
episodeRewards = zeros(maxEpisodes,1);
% Custom Training Loop
for episode = 1:maxEpisodes
    % Reset environment and agent
    initialObs = reset(env);
    agent.reset();
    % Track episode reward
    totalReward = 0;
    for step = 1:maxStepsPerEpisode
        % Get action from agent
        action = getAction(agent, initialObs);
        % Step the environment
        [nextObs, reward, isDone, ~] = step(env, action);
        % Store experience
        experience = rlExperience(initialObs, action, reward, nextObs, isDone);
        append(buffer, experience);
        % Learn from experience if enough samples available
        if buffer.NumExperiences >= agentOpts.MiniBatchSize
            learn(agent, buffer);
        end
        % Update state and reward
        initialObs = nextObs;
        totalReward = totalReward + reward;
        if isDone
            break;
        end
    end
    % Log reward
    episodeRewards(episode) = totalReward;
    fprintf("Episode %d: Total Reward = %.2f\n", episode, totalReward);
    % Optional: save best agent
    if mod(episode, 50) == 0
        save(sprintf('agent_episode_%d.mat', episode), 'agent');
    end
end

For more information regarding "DDPG Training Algorithm", kindly refer to the following MATLAB documentation:

https://www.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html#:~:text=Value%20Functions.-,DDPG%20Training%20Algorithm,-DDPG%20agents%20use

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Training DDPG agent with custom training loop

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Training DDPG agent with custom training loop

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示