All of the literature I'm reading immediately skips to the idea of adjusting the step size as you iterate (as far as I can tell) to maximize the rate of convergence. In the context of neural network modeling, I'm building up from fixed-step gradient descent to the more involved methods for the purpose of minimizing error.
For example, say I have a simple twice-differentiable function $x^2 + 4$. Is there some way to formalize the rate of convergence and subsequently solve for the fixed step size that maximizes this? The claim is that objective quadratic functions have an optimal fixed step size and this can be shown analytically.
http://en.wikipedia.org/wiki/Newton's_method_in_optimization
– muzzlator Feb 27 '13 at 18:34