This question suggests that you can precondition an optimization problem by a simple multiplicative scaling of the variables in the objective function. However, when I look up literature on preconditioning I see it is typically defined on matrices in the context of solving sets of linear equations.
Could someone explain the connection between the two and how does one find an appropriate preconditioning vector (e.g. Jacobi/diagonal scaling mentioned in that question)?
I am guessing the matrix to preconditioned in optimization context is the Hessian, is this correct (but L-BFGS doesn't explicitly calculate the Hessian)? But then how does the modification to the objective function precondition the Hessian?