Backslash (\) operator became much slower when updating from 2015b to 2019a

Question

Robert Schiemann 2019 年 7 月 19 日

3
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/472485-backslash-operator-became-much-slower-when-updating-from-2015b-to-2019a

回答済み: Robert Schiemann 2019 年 7 月 22 日

採用された回答: Christine Tobler

MATLAB Online で開く

Hello,

we recently updated our pretty old Matlab version from 2015b to 2019a, also due to hope for increased code performance.

A big simulation code of mine, which solves a lot of nonlinear (and hence also linear) systems, is now much slower than in the old 2019b version. Using the profiler, I was able to quickly find the root cause of the issue: The backslash operator, at least in my case, became slower by a factor of approximately 2.

My code is simply:

dx = -jac\f;

In my case, the matrix (called jac) is a 1438x1438 sparse matrix, the spy plot looks as follows:

The vector f is a non-sparse 1438x1 vector.

Doing (repeated and averaged) timing tests with the following code:

timeit(@() -jac\f)

yields

0.0054 on Matlab 2015b, and

0.012 on Matlab 2019a.

What is going on here? Has the algorithm selection in mldivide/backslash operator changed in the recent years? How can I get back to the old performance?

Trying to reproduce the problem in a more general and simple setting, I composed the following code:

Adense = diag(ones(4e3,1));
Asparse = sparse(Adense);
b = rand(4e3,1);
Duration_dense = timeit(@() -Adense\b)
Duration_sparse = timeit(@() -Asparse\b)

On my machine, this yields for Matlab 2019b:

Duration_dense =
    0.0823
Duration_sparse =
   3.6871e-05

In Matlab 2015b, I get the following for the same code as above:

Duration_dense =
    0.0813
Duration_sparse =
   3.8327e-05

I think this proves that in both the old and new versions, sparse matrices are correctly recognized as such (anything else would have been shocking). The issue might then be related to the specific sparsity pattern of my matrix.

Does anyone have an idea what is going on here?

Thank you!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Christine Tobler 2019 年 7 月 19 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/472485-backslash-operator-became-much-slower-when-updating-from-2015b-to-2019a#answer_383988

MATLAB Online で開く

Thank you for these logs. The difference in behavior comes from the following message, saying that we repeat the factorization with different options:

sp\: However, UMFPACK may have failed to produce a correct factorization
     possibly due to aggressive pivot tolerances or because the problem
     is ill-conditioned.
sp\: These tolerances were:
	Pivot Tolerance:           0.1
	Diagonal Pivot Tolerance:  0.001
We will factor the matrix again but use standard partial pivoting.

This is done because we heuristically determined that it's likely for the result of the first computation not to be accurate enough. This tolerance was tightened recently, which would be why it's now showing up for this matrix.

There are parameters in spparms that can affect the behavior of the sparse LU solve ('piv_tol' and 'sym_tol') which may be changed to avoid this behavior. You could use spparms to check if this works.

Note these options will also affect the accuracy of the result, so it's a good idea to check the residual while you are modifying the parameters, and to, if possible, only use modified spparms settings in the context where you need them. These are global variables that will affect the behavior of every call to backslash in a MATLAB session.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Christine Tobler 2019 年 7 月 19 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/472485-backslash-operator-became-much-slower-when-updating-from-2015b-to-2019a#answer_383973

Based on the spy plot, I would be astonished if a different method was selected between the two releases - I think the LU solver is the only viable candidate for this matrix.

However, there are several options to the LU solver that are chosen heuristically, and we have adusted some of them. The best way to see what is going on would be to use the command

>> spparms('spumoni', 2)

and then call backslash. This will show diagnostic messages about what methods and options are selected in each version. Could you run this in both releases and post the diagnostic messages here, please?

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Robert Schiemann 2019 年 7 月 19 日

MATLAB Online で開く

Thank you for your answer.

Below are the messages as requested when the commands are run in Matlab 2019a:

sp\: bandwidth = 1061+1+956.
sp\: is A diagonal? no.
sp\: is band density (0.00347406) > bandden (0.5) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive or negative diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKagewith automatic reordering.
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long
    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)
    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: no CPU timer 
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: UMFPACK's factorization was successful.
sp\: However, UMFPACK may have failed to produce a correct factorization
     possibly due to aggressive pivot tolerances or because the problem
     is ill-conditioned.
sp\: These tolerances were:
	Pivot Tolerance:           0.1
	Diagonal Pivot Tolerance:  0.001
We will factor the matrix again but use standard partial pivoting.
sp\: UMFPACK's solve was successful.
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        none.
    number of rows in matrix A:       1450
    number of columns in matrix A:    1450
    entries in matrix A:              6619
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes
    strategy used:                    unsymmetric
    ordering used:                    colamd on A
    modify Q during factorization:    yes
    prefer diagonal pivoting:         no
    pivots with zero Markowitz cost:               232
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S not square or diagonal not preserved
    symbolic factorization defragmentations:       1
    symbolic memory usage (Units):                 34264
    symbolic memory usage (MBytes):                0.5
    Symbolic size (Units):                         4814
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00
    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              8.86738e-02
    maximum sum (abs (rows of A)):              2.61614e+11
    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                   30001                28551    95%
        peak size (Units)                      60885                30631    50%
        final size (Units)                     28674                15454    54%
    Numeric final size (Units)                 36685                22740    62%
    Numeric final size (MBytes)                  0.6                  0.3    62%
    peak memory usage (Units)                  81399                51145    63%
    peak memory usage (MBytes)                   1.2                  0.8    63%
    numeric factorization flops          1.04220e+06          3.57740e+05    34%
    nz in L (incl diagonal)                    15104                11016    73%
    nz in U (incl diagonal)                    28503                13536    47%
    nz in L+U (incl diagonal)                  42157                23102    55%
    largest front (# entries)                   3965                 1488    38%
    largest # rows in front                       61                   40    66%
    largest # columns in front                    65                   52    80%
    initial allocation ratio used:                 0.7
    # of forced updates due to frontal growth:     14
    nz in L (incl diagonal), if none dropped       11016
    nz in U (incl diagonal), if none dropped       13536
    number of small entries dropped                0
    nonzeros on diagonal of U:                     1450
    min abs. value on diagonal of U:               1.82e-07
    max abs. value on diagonal of U:               1.52e+00
    estimate of reciprocal of condition number:    1.19e-07
    indices in compressed pattern:                 5184
    numerical values stored in Numeric object:     23122
    numeric factorization defragmentations:        1
    numeric factorization reallocations:           1
    costly numeric factorization reallocations:    1
    numeric factorization CPU time (sec):          0.00
    numeric factorization wallclock time (sec):    0.00
    solve flops:                                   2.67528e+05
    iterative refinement steps taken:              2
    iterative refinement steps attempted:          2
    sparse backward error omega1:                  1.77e-16
    sparse backward error omega2:                  3.05e-26
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00
    total symbolic + numeric + solve flops:        6.25268e+05

In Matlab 2015b, they read:

    sp\: bandwidth = 1061+1+956.
sp\: is A diagonal? no.
sp\: is band density (0.00) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering.
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long
    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)
    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: CPU timer is ANSI C clock (may wrap around).
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: UMFPACK's factorization was successful.
sp\: UMFPACK's solve was successful.
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        ANSI clock ( ) routine.
    number of rows in matrix A:       1450
    number of columns in matrix A:    1450
    entries in matrix A:              6619
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes
    strategy used:                    unsymmetric
    ordering used:                    colamd on A
    modify Q during factorization:    yes
    prefer diagonal pivoting:         no
    pivots with zero Markowitz cost:               232
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S not square or diagonal not preserved
    symbolic factorization defragmentations:       1
    symbolic memory usage (Units):                 34264
    symbolic memory usage (MBytes):                0.5
    Symbolic size (Units):                         4814
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00
    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              8.86738e-02
    maximum sum (abs (rows of A)):              2.61614e+11
    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                   30001                28551    95%
        peak size (Units)                      60885                29416    48%
        final size (Units)                     28674                14640    51%
    Numeric final size (Units)                 36685                21926    60%
    Numeric final size (MBytes)                  0.6                  0.3    60%
    peak memory usage (Units)                  81399                49930    61%
    peak memory usage (MBytes)                   1.2                  0.8    61%
    numeric factorization flops          1.04220e+06          2.97058e+05    29%
    nz in L (incl diagonal)                    15104                10414    69%
    nz in U (incl diagonal)                    28503                12523    44%
    nz in L+U (incl diagonal)                  42157                21487    51%
    largest front (# entries)                   3965                 1394    35%
    largest # rows in front                       61                   37    61%
    largest # columns in front                    65                   48    74%
    initial allocation ratio used:                 0.7
    # of forced updates due to frontal growth:     10
    nz in L (incl diagonal), if none dropped       10414
    nz in U (incl diagonal), if none dropped       12523
    number of small entries dropped                0
    nonzeros on diagonal of U:                     1450
    min abs. value on diagonal of U:               1.82e-07
    max abs. value on diagonal of U:               8.19e+00
    estimate of reciprocal of condition number:    2.22e-08
    indices in compressed pattern:                 5403
    numerical values stored in Numeric object:     21316
    numeric factorization defragmentations:        0
    numeric factorization reallocations:           0
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.00
    numeric factorization wallclock time (sec):    0.00
    numeric factorization mflops (CPU time):       74.26
    symbolic + numeric CPU time (sec):             0.00
    symbolic + numeric mflops (CPU time):          59.41
    solve flops:                                   1.70169e+05
    iterative refinement steps taken:              1
    iterative refinement steps attempted:          1
    sparse backward error omega1:                  1.84e-16
    sparse backward error omega2:                  2.84e-26
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00
    total symbolic + numeric + solve flops:        4.67227e+05

At first glance, I see that in the 2019a version, there seems to be an issue with UMFPACK factorization, while in the 2015a version, there is not. Maybe there are other indications hidden in the logs.

サインインしてコメントする。

Answer 3

Robert Schiemann 2019 年 7 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/472485-backslash-operator-became-much-slower-when-updating-from-2015b-to-2019a#answer_384175

Thank you for your answers. Your workarounds helped, but instead of putting them to productive use, I spent a little effort to actually make my matrix better conditioned. That also lead to the result of the factorization not being redone.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Backslash (\) operator became much slower when updating from 2015b to 2019a

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Backslash (\) operator became much slower when updating from 2015b to 2019a

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示