Newton method and machine learning

Question

There is some debate about why Newton method is not widely used in machine learning. Instead, people tend to use gradient descent.

Some people claim that Newton method is not used because it involves the second derivative. How so? Indirectly? Why? Doesn't Newton method neglect the second derivative?
Is there a name for Newton's method with cubic convergence?
Can we claim that Newton's method is a form of gradient descent?

Newton's method for root finding does not need a second derivative; Newton's method for optimisation is root-finding on the derivative, and so needs a second derivative. — Parcly Taxel, Apr 18 '20 at 21:44

score 2 · Accepted Answer · answered Apr 18 '20 at 21:46

In machine learning, the interest in solving function-is-$0$ conditions is for, say, minimizing $f$ by setting $\nabla f=0$. Since this is already a first derivative, Newton's method ends up using the second derivative $\nabla^2 f$, which is very expensive in high dimensions.

The cubic approach you linked looks unfamiliar. I was hoping it'd be Halley's method, but it seems different.

Newton's method isn't considered a form of gradient descent, because GD doesn't choose its step size to approximate the root. Newton's method is quadratically convergent, which is a bit of a double-edged sword; GD prefers a slower but somewhat safer linear convergence.

Newton method and machine learning

1 Answers1

Linked