set a maximum training time for training a PPO agent

Question

Danial Kazemikia 2024 年 7 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2141271-set-a-maximum-training-time-for-training-a-ppo-agent

回答済み: Satwik 2024 年 7 月 30 日

In training process of a PPO RL agent, how can I make the code check the elapsed time and stop training if it exceeds the desired threshold. Suppose you want to stop training after a maximum of 30 minutes (1800 seconds).

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Satwik 2024 年 7 月 30 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2141271-set-a-maximum-training-time-for-training-a-ppo-agent#answer_1492261

MATLAB Online で開く

Hi,

I understand that you want to limit the training process of your RL PPO agent based on the elapsed time. According to my knowledge and the documentation for ‘rlTrainingOptions’, there is no predefined option in the ‘StopTrainingCriteria’ property which can achieve this directly. However, a possible workaround is to use the custom stop criteria by specifying ‘StopTrainingValue’ as a function name or handle. Here is the documentation link for reference:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.option.rltrainingoptions.html

Below is an example code snippet demonstrating this approach:

% Set the maximum training time (in seconds)
maxTrainingTime = 1800; % 30 minutes
% Custom stop function
function stopTraining = customStopFunction(trainingStats)
    persistent startTime;
    if isempty(startTime)
        startTime = tic;
    end
    elapsedTime = toc(startTime);
    if elapsedTime > maxTrainingTime
        stopTraining = true;
        disp(['Stopping training after ', num2str(elapsedTime), ' seconds.']);
    else
        stopTraining = false;
    end
end
% Set training options
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes', 10000, ...
    'MaxStepsPerEpisode', 500, ...
    'ScoreAveragingWindowLength', 100, ...
    'Verbose', true, ...
    'Plots', 'training-progress', ...
    'StopOnError', 'off', ...
    'SaveAgentCriteria', 'EpisodeReward', ...
    'SaveAgentValue', 500, ...
    'StopTrainingCriteria', 'Custom', ...
    'StopTrainingValue', @customStopFunction);
% Train the agent
trainStats = train(agent, env, trainingOptions); % 'env' is the defined 
% environment and 'agent' is the rlPPOAgent.

Here is the reference to the ‘tic’ and ‘toc’ functions used in the ‘customStopFunction’ to capture time:

I hope this gives you a direction for taking the next steps to achieve the desired result.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

set a maximum training time for training a PPO agent

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

set a maximum training time for training a PPO agent

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示