reinforcement learning will suddenly stop during the training process

Question

嘻嘻 2023 年 11 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2041386-reinforcement-learning-will-suddenly-stop-during-the-training-process

コメント済み: Sam Chak 2023 年 11 月 3 日

My reinforcement learning will suddenly stop during the training process, and the following error will appear. The error code is as follows. Is there any effective way to solve this problem? I would be very grateful for your answer.

The error code： ''The derivative of the state in Simulink is not finite, the simulation will stop, and there may be a singularity in the solution.''

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Sam Chak 2023 年 11 月 1 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2041386-reinforcement-learning-will-suddenly-stop-during-the-training-process#answer_1344666

HI @嘻嘻

The most common issue that the error message you've encountered in Simulink indicates that there is a "division by zero" in the derivative of the state variable. As the denominator approaches 0, the division value approaches either positive infinity or negative infinity. Because the derivative value is not finite, it can lead to numerical instability in the simulation. When this happens, Simulink may stop the simulation to prevent incorrect results.

You need to review your model equations and block configurations in Simulink to identify the source of the issue. Look for any potential causes of division by zero, or trigonometric terms like 1/cos(θ), or tan(θ), when θ = ±90°.

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

Sam Chak 2023 年 11 月 2 日

MATLAB Online で開く

@嘻嘻,

It seems like the RL agent may have entered an unstable region. Instead of working with a complete black box, I would suggest identifying the 7th-order linear system, if possible, using the frequency response method. This approach requires the System Identification Toolbox.

Once you have an identified nominal model, you can apply LQR to determine the 'nominal' stabilizing input gains. Using the gain matrix, you can 'guide' the RL agent to explore gains in the vicinity of these values, considering relative deviations as a percentage of the nominal values.

A   = magic(7)                  % hypothetical nominal state matrix identified
A = 7×7
    30    39    48     1    10    19    28
    38    47     7     9    18    27    29
    46     6     8    17    26    35    37
     5    14    16    25    34    36    45
    13    15    24    33    42    44     4
    21    23    32    41    43     3    12
    22    31    40    49     2    11    20
B   = [zeros(4, 3); eye(3)]     % 3 inputs
B = 7×3
     0     0     0
     0     0     0
     0     0     0
     0     0     0
     1     0     0
     0     1     0
     0     0     1
rk  = rank(ctrb(A, B));         % rank of controllability matrix
ToF = logical(rk == length(A))  % check if true (1), the system is controllable
ToF = logical
   1
% Then, you can apply LQR to find the stabilizing input gains 
Q   = eye(7);                   % <-- to be designed
R   = eye(3);                   % <-- to be designed
K   = lqr(A, B, Q, R)           % stabilizing input gain matrix
K = 3×7
1.0e+03 *

   -0.0738    0.7898   -0.2982    0.1553    0.1512    0.1311    0.0790
    0.1511    1.6302   -0.4304    0.0588    0.1311    0.1946    0.1711
    0.3558    0.8166    0.0559    0.0348    0.0790    0.1711    0.2302

嘻嘻 2023 年 11 月 3 日

Thank you very much. I have obtained the gain matrix K according to your method, but I don't know how to proceed in the next step. Could you tell me more about it?

Sam Chak 2023 年 11 月 3 日

Hi @嘻嘻

From an optimal control perspective, you can specify guidance for the search direction of the RL agent with known nominal values in the performance cost function, including any constraints.

I'm still learning about RL, but my colleagues who work with RL have recommended using the 'generateRewardFunction()' command to design a custom cost function that fits your application. You can find an example at this link:

https://www.mathworks.com/help/reinforcement-learning/ug/generate-reward-fcn-from-mpc-for-servomotor.html

サインインしてコメントする。

reinforcement learning will suddenly stop during the training process

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

reinforcement learning will suddenly stop during the training process

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

9 件のコメント 7 件の古いコメントを表示7 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示