- Environment Reset: Start each episode by resetting the environment.
- Action Selection: Use the actor network to select an action based on the current observation.
- Environment Step: Apply the action to the environment (e.g., via sim for Simulink models) and collect the next observation, reward, and done flag.
- Experience Storage: Store the transition (state, action, reward, next state, done) in a replay buffer.
- Learning: Sample mini-batches from the buffer and perform gradient updates on the actor and critic networks.
- Target Updates: Soft update the target networks (actor and critic) toward the main networks.
- Logging & Evaluation: Track performance (e.g., cumulative reward) and optionally evaluate the agent periodically.
Training DDPG agent with custom training loop
    13 ビュー (過去 30 日間)
  
       古いコメントを表示
    
Currently, I am designing a control system using deep reinforcement learning (DDPG) in reinforcement learning toolbox, MATLAB/Simulink.  Specifically, I need to implement a custom training loop that does not rely on train functon. Could you please show me how to implement a custom training loop for training a DDPG agent? I would like to understand how to implement a standard DDPG-based control system using a custom training loop in MATLAB. 
I will now provide the MATLAB code I currently use train function for a DDPG agent. Could you convert it into a version that uses a custom training loop (without using train)?
obsInfo = rlNumericSpec([6 1]);
obsInfo.Name = "observations";
actInfo = rlNumericSpec([1 1]);
actInfo.Name = "control input";
mdl ='SIM_RL'; % Simulink model by Plant + RL agent block
env = rlSimulinkEnv( ...
    "SIM_RL", ...
    "SIM_RL/Agent/RL Agent", ...
     obsInfo, actInfo);
% Domain randomization: Reset function
env.ResetFcn = @(in)localResetFcn(in);
function in = localResetFcn(in)
% Fixed range of plant parameter
M_min = Nominal_value*(1 - 0.5); % -50% of nominal mass
M_max = Nominal_value*(1 + 0.5); % +50% of nominal mass
% Randomize mass
randomValue_M = M_min + (M_max - M_min) * rand;
in = setBlockParameter(in, ...
"SIM_RL/Plant/Mass", ...
Value=num2str(randomValue_M));
end
% The construction of the critic Network structure is omitted here.
% ....
criticNet = initialize(criticNet);
critic = rlQValueFunction(criticNet,obsInfo,actInfo);
% The construction of the actor Network structure is omitted here.
% ....
actorNet = initialize(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
% Set-up agent
criticOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
actorOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
agentOpts = rlDDPGAgentOptions(...
    SampleTime=0.01,...
    CriticOptimizerOptions=criticOpts,...
    ActorOptimizerOptions=actorOpts,...
    ExperienceBufferLength=1e5,...
    DiscountFactor=0.99,...
    MiniBatchSize=128,...
    TargetSmoothFactor=1e-3);
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 5000;
maxsteps = ceil(Simulation_End_Time/0.01);
trainOpts = rlTrainingOptions(...
    MaxEpisodes=maxepisodes,...
    MaxStepsPerEpisode=maxsteps,...
    ScoreAveragingWindowLength=5,...
    Verbose=true,...
    Plots="training-progress",...
    StopTrainingCriteria="EpisodeCount",...
    SaveAgentCriteria="EpisodeReward",...
    SaveAgentValue=-1.0);
doTraining = true;
if doTraining
    evaluator = rlEvaluator(...
        NumEpisodes=1,...
        EvaluationFrequency=5);
    % Train the agent.
    trainingStats = train(agent,env,trainOpts,Evaluator=evaluator);
else
    % Load the pretrained agent
    load("agent.mat","agent")
end
0 件のコメント
回答 (1 件)
  Hitesh
      
 2025 年 6 月 3 日
        Hi 平成,
To convert DDPG agent training setup from using the "train" function into a custom training loop in MATLAB. The custom loop gives you greater control over training, evaluation, logging, and integration with domain randomization. 
Main Components of a Custom Training Loop are:
Kindly refer to the following custom training loop as an example.
% Create agent
agent = rlDDPGAgent(actor, critic, agentOpts);
% Experience buffer
buffer = agent.ExperienceBuffer;
% Logging
episodeRewards = zeros(maxEpisodes,1);
% Custom Training Loop
for episode = 1:maxEpisodes
    % Reset environment and agent
    initialObs = reset(env);
    agent.reset();
    % Track episode reward
    totalReward = 0;
    for step = 1:maxStepsPerEpisode
        % Get action from agent
        action = getAction(agent, initialObs);
        % Step the environment
        [nextObs, reward, isDone, ~] = step(env, action);
        % Store experience
        experience = rlExperience(initialObs, action, reward, nextObs, isDone);
        append(buffer, experience);
        % Learn from experience if enough samples available
        if buffer.NumExperiences >= agentOpts.MiniBatchSize
            learn(agent, buffer);
        end
        % Update state and reward
        initialObs = nextObs;
        totalReward = totalReward + reward;
        if isDone
            break;
        end
    end
    % Log reward
    episodeRewards(episode) = totalReward;
    fprintf("Episode %d: Total Reward = %.2f\n", episode, totalReward);
    % Optional: save best agent
    if mod(episode, 50) == 0
        save(sprintf('agent_episode_%d.mat', episode), 'agent');
    end
end
For more information regarding "DDPG Training Algorithm", kindly refer to the following MATLAB documentation:
0 件のコメント
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

