selecting units for (1) scaling of variable and (2) condition number minimization
5 ビュー (過去 30 日間)
古いコメントを表示
In gradient-based optimization problem, selecting units can influence the condition number of the gradient. A smaller condition number is generally good for optimization.
At the same time, selecting untis can also make the variable unblanced. For example, one variable maybe of the scale of 10^10, while another variable maybe of the scale of 0.0001.
My experience is if I make the variable of similar scale, the optimization problem generally finishes good.
Sometimes, the the two objectives are contradictory to each other.
How to balance these two contradictions? Thank you very much!
1 件のコメント
回答 (1 件)
Matt J
2023 年 8 月 4 日
編集済み: Matt J
2023 年 8 月 4 日
You are free to translate as well as scale your optimization variables (or make any other nonlinear 1-1 transformation that might be useful).
For example, this quadratic objective is well-conditioned, wih condition number = 1, and doesn't require a change of units,
but has solutions at very large x and very small y. I'm not sure why you consider this a problem, but you could remedy it by making the change of variables , and rewriting the problem as,
9 件のコメント
Bruno Luong
2023 年 8 月 5 日
編集済み: Bruno Luong
2023 年 8 月 5 日
"I am not talking about the conditioning of the Hessian. I am talking about the conditioning of the gradient."
AFAIK condition number applied on matrix. It's defined as
norm(A)*norm(inv(A)) https://en.wikipedia.org/wiki/Condition_number
There is no such thing as conditioning of the gradient which is a vector and NOT a matrix.
To describe your problem you must start to use correctly math terminology.
I do SVD to compute the gradient of the objective function, whose largest singular value to smallest singular value is defined as condition number"
I don't know what SVD you are talking about (what is the matrix, is it the Jacobian of the model (?)), but if that is the case please explain this process.
What you call conditioning might be something entirely different than what WE think (the matrix is the Hessian) and may be that's explain why you get confusing result.
Note that for non-linear case the Hessian is NOT J'*J, where J is the Jacobian of the model (at the considered point).
And the Jacobian change wrt the point. Do you take the Jacobian at the first guess? At the solution of the preceding optimisation? Something else?
"This condition number is dependant on selection of units."
Of course this we all know, but you did not explain:
- when normalizing the unit and it converges faster; does the conditiong improve or degrade?
Also the conditiong is just a partial view of the whole picture. May be you have in your model somesort of null-space (space of the decision variables that is NOT observable by your data), or you have some constrained problem and active constraints and you need to evalute the conditioning of the Hessian projected on the tangent space (*), in this case the condition number of the full Hessian does NOT reflect the convergence rate.
Manything that can lead you to a wrong conclusion. If you are not able to show with a MWE the disussion is just vain.
At least show us the figure of normalization process, the problem dimension, the condition number you estimate - at the initial point and at the convergence point -, the number of iterations for convergence, the number of active constraints at the solution; etc...
I'll stop here, without more details the discussion is a waste of time.
(*) Acutually the curvature of the constraints also matter.
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!