Quasi Newton vs. Newton Method (nonlinear least square)

Question

I'm not sure if an old thread appears in the list if I reused and commented it, but to me it doesn't. Therefore I open a new one and just link my question: Difference between Newton's method and Gauss-Newton method Thx

update: In virtue of the desired Q&A format this i my question with respect to both methods:

I'm wondering why one wants to ignore the Hessian in the nonlinear least square quasi newton method? Whats the advantage? Does it ensure better convergence for some reason? I don't see why dropping this term is worth considering? It's not a huge issue to include the hessian...So there must be some deeper reason that for example it seemed to be more robust in iterative methods compared to including the hessian, but convergence is slower...(newton is 2. order converging,or?) compare the update process: $ \Delta b = -\left( J^T J + H r(b) \right )^{-1} J^T r(b) $ with and without the hessian $H$...

Post the question from that link here, as the answer you wrote there isn't an answer and will most likely get deleted soon. — kingW3, Mar 20 '17 at 16:14
You should really include the entire question here. In particular, is your question about Gauss-Newton specifically (which is specialized for nonlinear least squares), or about quasi-Newton methods in general (e.g. BFGS etc.)? — Ian, Mar 20 '17 at 16:14
If you're asking about Gauss-Newton for nonlinear least squares, it saves you the trouble of computing or even approximating $r''(x)$. This saves considerable computation time in practice because it means that you can essentially work only at the level of one derivative. Keep in mind that if the domain is $n$ dimensional then each derivative is roughly $n$ times more expensive to compute than the last. — Ian, Mar 20 '17 at 16:32
But if I know my function to minimize e.g. f(x)=A * exp(B*x) + C for simple monoexponential, then I can just specify the hessian and there is not lots to compute, right? — Diger, Mar 20 '17 at 16:41
It is actually a huge issue to include the Hessian for large scale problems. Even just computing the Hessian is expensive, and solving a linear system involving the Hessian is also expensive. Quasi-Newton methods have a much cheaper cost per iteration by avoiding the Hessian. We would love to use Newton's method but for very large problems we just can't do it, it's too expensive. — littleO, Mar 20 '17 at 16:43
In a small problem with only a few parameters you may as well use full Newton, or perhaps "full quasi-Newton" (i.e. with a finite difference gradient and a finite difference Hessian). In these cases Gauss-Newton is really a "premature optimization". But it matters in large problems. — Ian, Mar 20 '17 at 16:47
premature in what sense? Meaning that the mature version for small problems like above is using full newton, or which one is the method of choice? For exmaple I implemented both cases and if I set H = 0 in the above update process I get convergence, if I dont more often it diverges... — Diger, Mar 20 '17 at 16:51
"Premature optimization" is a term in programming, where the programmer makes the mistake of optimizing some code that didn't really need to be optimized because it would've been fast enough without that optimization. In the process the code typically winds up unnecessarily complex, buggy, obfuscated, less general, or having similar flaws. That said, generally both methods should converge provided the initial condition is suitable. — Ian, Mar 20 '17 at 17:41
Do you have an example of a "big" function, where quasi newton is much faster than newon?? As far as I could see, also even if the hessian would be calculated it is not done so by finite difference, but by one of these methods: the table startig with DFP,... on the site https://en.wikipedia.org/wiki/Quasi-Newton_method — Diger, Mar 21 '17 at 09:58

Quasi Newton vs. Newton Method (nonlinear least square)

0 Answers0