Preconditioning of optimization problems

Question

This question suggests that you can precondition an optimization problem by a simple multiplicative scaling of the variables in the objective function. However, when I look up literature on preconditioning I see it is typically defined on matrices in the context of solving sets of linear equations.

Could someone explain the connection between the two and how does one find an appropriate preconditioning vector (e.g. Jacobi/diagonal scaling mentioned in that question)?

I am guessing the matrix to preconditioned in optimization context is the Hessian, is this correct (but L-BFGS doesn't explicitly calculate the Hessian)? But then how does the modification to the objective function precondition the Hessian?

score 2 · Answer 1 · answered Jan 19 '14 at 20:49

The simplest example of an optimization problem which can be solved by casting it as a linear equation system is the linear least squares problem $$min_x||Ax-b||_2^2$$ By deriving this expression, we end up with a linear system which is called the normal equations: $$(A^ {\rm T}A ) x = A^ {\rm T} b $$ Another relevant optimization problem is that of nonlinear least squares: $$min_\beta\sum_{i=1}^m [y_i - f(x_i, \ \boldsymbol \beta) ]^2$$ This problem can be solved using the Gauss-Newton method, which is an iterative method that involves solving a linear equation system in every step. The matrix in the system is the Jacobian.

Now, when is variable scaling important in this context? One example for the nonlinear least squares problem is the problem of photogrammetric camera calibration, where you have to find the parameters of a camera given correspondences between image image and world coordinates. In this problem some of the parameters to solve are given as angles and others are given as distances. Some of those distances may be on the scale of millimiters and others may be on the scale of kilometers. In this case, you have an a-priori knowledge of the problem which you can use to create the preconditioning vector.

It's not a complete answer but I hope it helps.

Thanks. This helps, but I am also specifically interested in algorithms like L-BFGS, where the gradient is not a matrix. Any idea how it is done in that context? — Bitwise, Jan 20 '14 at 01:47

Preconditioning of optimization problems

1 Answers1