0

There is some debate about why Newton method is not widely used in machine learning. Instead, people tend to use gradient descent.

Daniel S.
  • 823
  • 3
    Newton's method for root finding does not need a second derivative; Newton's method for optimisation is root-finding on the derivative, and so needs a second derivative. – Parcly Taxel Apr 18 '20 at 21:44

1 Answers1

2

In machine learning, the interest in solving function-is-$0$ conditions is for, say, minimizing $f$ by setting $\nabla f=0$. Since this is already a first derivative, Newton's method ends up using the second derivative $\nabla^2 f$, which is very expensive in high dimensions.

The cubic approach you linked looks unfamiliar. I was hoping it'd be Halley's method, but it seems different.

Newton's method isn't considered a form of gradient descent, because GD doesn't choose its step size to approximate the root. Newton's method is quadratically convergent, which is a bit of a double-edged sword; GD prefers a slower but somewhat safer linear convergence.

J.G.
  • 115,835