There is some debate about why Newton method is not widely used in machine learning. Instead, people tend to use gradient descent.
Some people claim that Newton method is not used because it involves the second derivative. How so? Indirectly? Why? Doesn't Newton method neglect the second derivative?
Is there a name for Newton's method with cubic convergence?
Can we claim that Newton's method is a form of gradient descent?