I am working on a high dimensional (N ~ 1000-60000) optimization problem which is currently solved with an LBFGS algorithm. I have experimented with different diagonal preconditioners as I know that the gradients in some dimensions are oders of magnitude larger than in other dimensions and I have observed a significant speed up for the majority of the problem instances I looked at - but not for all. The longer I work with the this simple preconditioner, however, the more I come to the realization that I do not understand why this makes a difference. Shouldn't an LBFGS algorithm account for the scaling in different dimensions automatically? Am I doing something fundamentally wrong when implementing the Jacobi preconditioner with the two following modifications of my objective function:
The first line of code in the objective function is
x = x./PC
i.e., I a scale my variables x with the preconditioner PC and
- The last line of code in the objective function is
[f,dx] = [f, dx./PC]
i.e., I scale the gradient accordingly. x, dx, and PC are vectors. f is the objective function value.
Thank you everybody for your time. I appreciate any help and pointers to relevant literature. The stuff I found focused on CG methods and I could not relate that to my problem...