Is it possible to change to the second element of rlNumericSpec other than 1 ?

Question

Aysegul Kahraman 2022 年 4 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1686114-is-it-possible-to-change-to-the-second-element-of-rlnumericspec-other-than-1

回答済み: Shubham 2024 年 1 月 19 日

Hi all,

Observation and action information seem to only take [number of obs/action 1]. As an example from https://se.mathworks.com/help/reinforcement-learning/ug/quadruped-robot-locomotion-using-ddpg-gent.html:

obsInfo = rlNumericSpec([numObs 1]);

obsInfo.Name = 'observations';

numAct = 8;

actInfo = rlNumericSpec([numAct 1],'LowerLimit',-1,'UpperLimit', 1);

actInfo.Name = 'torque';

It seems the second element is 1, representing the time step (for all the cases even with the scheduling tank example). And we defined my problem with one action only, but I want to find a result like 1x24 as an example (it is a scheduling problem and I do not want to have a good control during the simulation, I also want to have a vector which gives me a good horizon control like an MPC. It made sense to increase this one to another bigger number (like 24) based on my problem. However, I haven’t seen any documentation that shows this is actually doable.

The documentation seems clear about using the scalar value (the first component to define the dimension of action/obs).

While creating the critic network, it directly caused an error, as we can expect.

critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);

because of an error using rl.representation.rlAbstractRepresentation/validateModelInputDimension

Model input sizes must match the dimensions specified in the corresponding observation and action info specifications.

It does not seem like it is going to be possible to change, but before saying anything certain I’d be really happy to take some advise.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Shubham 2024 年 1 月 19 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1686114-is-it-possible-to-change-to-the-second-element-of-rlnumericspec-other-than-1#answer_1392791

Hi Aysegul,

In reinforcement learning (RL), particularly when using MATLAB's Reinforcement Learning Toolbox, the observation and action spaces are defined by their respective specifications (rlNumericSpec in this case). The second element in the specification array usually represents the dimensionality of the observation or action space at each time step. For typical control tasks, this dimensionality is 1 because the agent takes one action or receives one observation at each time step.

Refer to this documentation: https://in.mathworks.com/help/reinforcement-learning/agents.html

However, for problems like scheduling where you want to output a sequence of actions at once (like in Model Predictive Control, MPC), you're interested in having a policy that outputs a vector of actions representing decisions over a horizon. This is different from the standard setup in many RL environments, where the action at each time step is decided based on the current state.

To implement this in MATLAB, you need to consider the following:

Action Space Modification: You can define your action space to be a vector of actions, but you need to ensure that the RL algorithm and the environment can handle this structure. This is not a common setup for standard RL algorithms, which expect to output a single action at each time step.
Custom Environment: You might need to create a custom environment that can interpret a vector of actions and apply them over multiple time steps. This means that the step function of your environment must be capable of handling a sequence of actions as input and computing the state transitions accordingly.
Custom Algorithm: Since standard RL algorithms are not designed for outputting a sequence of actions, you may need to modify an existing algorithm or develop a new one that can work with action sequences. This could involve significant changes to the way the policy is learned and how the value function is estimated.
Critic Network: The critic network will also need to be designed to accept the modified action space. This may involve changing the network architecture so that it can process the sequence of actions.