Data to train RL agent (PPO)

Question

Sourabh 2024 年 6 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2126676-data-to-train-rl-agent-ppo

回答済み: Shivansh 2024 年 6 月 20 日

I have 2 arrays which are 8001x2 size. one is input and other is output array.

now can i use these two arrays to train my RL agent ? (PPO agent)

i saw the example of using data to train RL agent on mathworks site but their data contains state actions rewards and all the other information as well. is it not possible with just the input and output array to train my RL agent ?

2 件のコメント
なしを表示なしを非表示

Ayush Aniket 2024 年 6 月 12 日

Hi Sourabh,

Can you elaborate on the problem you are trying to solve using your dataset?

From your description, it seems that you already have an output data that you would want a Machine Learning Model to learn based on the input. This falls under the category of Supervised Learning. To accomplish such task there are several other functions available in MATLAB.

RL is used for un-supervised learning tasks, wherein the training data has to be in the form of experience such that the ML model (RL agent) can learn by interacting with the environment and observing its response.

Sourabh 2024 年 6 月 12 日

I was using tf model before and now i want to use output data which i obtained from giving a step input to same tf model. That is the only difference.

Can't i replace tf model with data of same tf model??

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Shivansh 2024 年 6 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2126676-data-to-train-rl-agent-ppo#answer_1474781

Hi Sourabh,

The Proximal Policy Optimization (PPO) agent or any other Reinforcement Learning agent unlike supervised learning requires explicit states, actions, and rewards etc.

You can try to extract the information from the output arrays regarding the rewards and actions but that might require customising the entire Reinforcement learning model and you might not be able to use any existing example model.

I will recommend to use the Reinforcement Learning after properly modelling the problem for desired results. If you have the inputs and labels, you can try using a supervised learning model.

I hope it helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Data to train RL agent (PPO)

2 件のコメント
なしを表示なしを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Data to train RL agent (PPO)

2 件のコメント なしを表示なしを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示