Backslash (\) operator became much slower when updating from 2015b to 2019a

8 ビュー (過去 30 日間)
Robert Schiemann
Robert Schiemann 2019 年 7 月 19 日
回答済み: Robert Schiemann 2019 年 7 月 22 日
Hello,
we recently updated our pretty old Matlab version from 2015b to 2019a, also due to hope for increased code performance.
A big simulation code of mine, which solves a lot of nonlinear (and hence also linear) systems, is now much slower than in the old 2019b version. Using the profiler, I was able to quickly find the root cause of the issue: The backslash operator, at least in my case, became slower by a factor of approximately 2.
My code is simply:
dx = -jac\f;
In my case, the matrix (called jac) is a 1438x1438 sparse matrix, the spy plot looks as follows:
spy.PNG
The vector f is a non-sparse 1438x1 vector.
Doing (repeated and averaged) timing tests with the following code:
timeit(@() -jac\f)
yields
0.0054 on Matlab 2015b, and
0.012 on Matlab 2019a.
What is going on here? Has the algorithm selection in mldivide/backslash operator changed in the recent years? How can I get back to the old performance?
Trying to reproduce the problem in a more general and simple setting, I composed the following code:
Adense = diag(ones(4e3,1));
Asparse = sparse(Adense);
b = rand(4e3,1);
Duration_dense = timeit(@() -Adense\b)
Duration_sparse = timeit(@() -Asparse\b)
On my machine, this yields for Matlab 2019b:
Duration_dense =
0.0823
Duration_sparse =
3.6871e-05
In Matlab 2015b, I get the following for the same code as above:
Duration_dense =
0.0813
Duration_sparse =
3.8327e-05
I think this proves that in both the old and new versions, sparse matrices are correctly recognized as such (anything else would have been shocking). The issue might then be related to the specific sparsity pattern of my matrix.
Does anyone have an idea what is going on here?
Thank you!

採用された回答

Christine Tobler
Christine Tobler 2019 年 7 月 19 日
Thank you for these logs. The difference in behavior comes from the following message, saying that we repeat the factorization with different options:
sp\: However, UMFPACK may have failed to produce a correct factorization
possibly due to aggressive pivot tolerances or because the problem
is ill-conditioned.
sp\: These tolerances were:
Pivot Tolerance: 0.1
Diagonal Pivot Tolerance: 0.001
We will factor the matrix again but use standard partial pivoting.
This is done because we heuristically determined that it's likely for the result of the first computation not to be accurate enough. This tolerance was tightened recently, which would be why it's now showing up for this matrix.
There are parameters in spparms that can affect the behavior of the sparse LU solve ('piv_tol' and 'sym_tol') which may be changed to avoid this behavior. You could use spparms to check if this works.
Note these options will also affect the accuracy of the result, so it's a good idea to check the residual while you are modifying the parameters, and to, if possible, only use modified spparms settings in the context where you need them. These are global variables that will affect the behavior of every call to backslash in a MATLAB session.

その他の回答 (2 件)

Christine Tobler
Christine Tobler 2019 年 7 月 19 日
Based on the spy plot, I would be astonished if a different method was selected between the two releases - I think the LU solver is the only viable candidate for this matrix.
However, there are several options to the LU solver that are chosen heuristically, and we have adusted some of them. The best way to see what is going on would be to use the command
>> spparms('spumoni', 2)
and then call backslash. This will show diagnostic messages about what methods and options are selected in each version. Could you run this in both releases and post the diagnostic messages here, please?
  1 件のコメント
Robert Schiemann
Robert Schiemann 2019 年 7 月 19 日
Thank you for your answer.
Below are the messages as requested when the commands are run in Matlab 2019a:
sp\: bandwidth = 1061+1+956.
sp\: is A diagonal? no.
sp\: is band density (0.00347406) > bandden (0.5) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive or negative diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKagewith automatic reordering.
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: no CPU timer
11: compiled for normal operation (debugging disabled)
computer/operating system: Microsoft Windows
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: UMFPACK's factorization was successful.
sp\: However, UMFPACK may have failed to produce a correct factorization
possibly due to aggressive pivot tolerances or because the problem
is ill-conditioned.
sp\: These tolerances were:
Pivot Tolerance: 0.1
Diagonal Pivot Tolerance: 0.001
We will factor the matrix again but use standard partial pivoting.
sp\: UMFPACK's solve was successful.
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: none.
number of rows in matrix A: 1450
number of columns in matrix A: 1450
entries in matrix A: 6619
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: unsymmetric
ordering used: colamd on A
modify Q during factorization: yes
prefer diagonal pivoting: no
pivots with zero Markowitz cost: 232
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S not square or diagonal not preserved
symbolic factorization defragmentations: 1
symbolic memory usage (Units): 34264
symbolic memory usage (MBytes): 0.5
Symbolic size (Units): 4814
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 8.86738e-02
maximum sum (abs (rows of A)): 2.61614e+11
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 30001 28551 95%
peak size (Units) 60885 30631 50%
final size (Units) 28674 15454 54%
Numeric final size (Units) 36685 22740 62%
Numeric final size (MBytes) 0.6 0.3 62%
peak memory usage (Units) 81399 51145 63%
peak memory usage (MBytes) 1.2 0.8 63%
numeric factorization flops 1.04220e+06 3.57740e+05 34%
nz in L (incl diagonal) 15104 11016 73%
nz in U (incl diagonal) 28503 13536 47%
nz in L+U (incl diagonal) 42157 23102 55%
largest front (# entries) 3965 1488 38%
largest # rows in front 61 40 66%
largest # columns in front 65 52 80%
initial allocation ratio used: 0.7
# of forced updates due to frontal growth: 14
nz in L (incl diagonal), if none dropped 11016
nz in U (incl diagonal), if none dropped 13536
number of small entries dropped 0
nonzeros on diagonal of U: 1450
min abs. value on diagonal of U: 1.82e-07
max abs. value on diagonal of U: 1.52e+00
estimate of reciprocal of condition number: 1.19e-07
indices in compressed pattern: 5184
numerical values stored in Numeric object: 23122
numeric factorization defragmentations: 1
numeric factorization reallocations: 1
costly numeric factorization reallocations: 1
numeric factorization CPU time (sec): 0.00
numeric factorization wallclock time (sec): 0.00
solve flops: 2.67528e+05
iterative refinement steps taken: 2
iterative refinement steps attempted: 2
sparse backward error omega1: 1.77e-16
sparse backward error omega2: 3.05e-26
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 6.25268e+05
In Matlab 2015b, they read:
sp\: bandwidth = 1061+1+956.
sp\: is A diagonal? no.
sp\: is band density (0.00) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with automatic reordering.
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: CPU timer is ANSI C clock (may wrap around).
11: compiled for normal operation (debugging disabled)
computer/operating system: Microsoft Windows
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: UMFPACK's factorization was successful.
sp\: UMFPACK's solve was successful.
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: ANSI clock ( ) routine.
number of rows in matrix A: 1450
number of columns in matrix A: 1450
entries in matrix A: 6619
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: unsymmetric
ordering used: colamd on A
modify Q during factorization: yes
prefer diagonal pivoting: no
pivots with zero Markowitz cost: 232
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S not square or diagonal not preserved
symbolic factorization defragmentations: 1
symbolic memory usage (Units): 34264
symbolic memory usage (MBytes): 0.5
Symbolic size (Units): 4814
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 8.86738e-02
maximum sum (abs (rows of A)): 2.61614e+11
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 30001 28551 95%
peak size (Units) 60885 29416 48%
final size (Units) 28674 14640 51%
Numeric final size (Units) 36685 21926 60%
Numeric final size (MBytes) 0.6 0.3 60%
peak memory usage (Units) 81399 49930 61%
peak memory usage (MBytes) 1.2 0.8 61%
numeric factorization flops 1.04220e+06 2.97058e+05 29%
nz in L (incl diagonal) 15104 10414 69%
nz in U (incl diagonal) 28503 12523 44%
nz in L+U (incl diagonal) 42157 21487 51%
largest front (# entries) 3965 1394 35%
largest # rows in front 61 37 61%
largest # columns in front 65 48 74%
initial allocation ratio used: 0.7
# of forced updates due to frontal growth: 10
nz in L (incl diagonal), if none dropped 10414
nz in U (incl diagonal), if none dropped 12523
number of small entries dropped 0
nonzeros on diagonal of U: 1450
min abs. value on diagonal of U: 1.82e-07
max abs. value on diagonal of U: 8.19e+00
estimate of reciprocal of condition number: 2.22e-08
indices in compressed pattern: 5403
numerical values stored in Numeric object: 21316
numeric factorization defragmentations: 0
numeric factorization reallocations: 0
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 0.00
numeric factorization wallclock time (sec): 0.00
numeric factorization mflops (CPU time): 74.26
symbolic + numeric CPU time (sec): 0.00
symbolic + numeric mflops (CPU time): 59.41
solve flops: 1.70169e+05
iterative refinement steps taken: 1
iterative refinement steps attempted: 1
sparse backward error omega1: 1.84e-16
sparse backward error omega2: 2.84e-26
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 4.67227e+05
At first glance, I see that in the 2019a version, there seems to be an issue with UMFPACK factorization, while in the 2015a version, there is not. Maybe there are other indications hidden in the logs.

サインインしてコメントする。


Robert Schiemann
Robert Schiemann 2019 年 7 月 22 日
Thank you for your answers. Your workarounds helped, but instead of putting them to productive use, I spent a little effort to actually make my matrix better conditioned. That also lead to the result of the factorization not being redone.

カテゴリ

Help Center および File ExchangeFunction Creation についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by