-2

I want to understand how newton's method is derived from Taylor expansion, and as many answers show that $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)+O\left(h^3\right)$$

and would simply it to : $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)) = 0$$

My question is: Why are we assuming $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)) = 0\quad ?$$

Why not use 1, or 2, or 3 or any number?

Based on my understanding, if we want to calculate the h that will minimize $$ y =h f'(x)+\frac{1}{2} h^2 f''(x)$$, we would need to take a derivative on it for y', instead of assuming $$f(x+h)=f(x)+h f'(x)+\frac{1}{2} h^2 f''(x)) = 0$$

Thanks!

amWhy
  • 209,954
  • "Based on my understanding, if we want to calculate the h that will minimize"? Why would you want to minimize the function that you define as $y$? The goal of the method is to find the roots of $f(x)$, i.e., the values $x_$ such that $f(x_)=0$. The goal is not to minimize $y$ – Daniel S. Apr 18 '20 at 02:18
  • I want to minimize the cost function over h, just like what we do when deriving gradient descent. – user3431800 Apr 18 '20 at 02:23
  • I might sound a little confusing because i really am. I see people derive gradient descent with Taylor expansion, so I thought I could do the same with newton's method, since newton's method says x = x - f(x)/f'(x), and basically the step is f(x)/f'(x) instead of in gradient descent we have x = x - f'(x), but I am only able to do it by assuming the f(x+h) = 0, but it is actually not.. – user3431800 Apr 18 '20 at 02:26

1 Answers1

0

I'm assuming you are studying Newton's method to find the root of $f(x)$.

https://en.wikipedia.org/wiki/Newton%27s_method

Newton's method to find the root of $f(x)$ is defined by the following iterative process

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} $$

To understand the rationale behind the method, let us assume that the root equals $x_*=x_0+h_*$. Then, by definition, $f(x_*)=f(x_0+h_*)=0$. This is the definition of root.

Newton's method is based on two assumptions:

  • $f(x_*)=f(x_0+h_*)=0$

  • $f(x_*) \approx f(x_0) + h_* f'(x_0) + h_*^2 f''(x_0+\theta h_*)/2$

Note that the term $h_*^2$ is assumed to be small, so the contribution of $h_*^2 f''(x_0+\theta h_*)/2$ is small. For instance, if $ h_*=0.01$, then $h_*^2=0.0001$ which is much smaller than $h_*$, so $h_*^2$ multiplied by something can be eliminated. This does not necessarily mean that the second derivative itself is small.

A more rigorous justification as to why the term $h_*^2 f''(x_0 + \theta h_*)/2$ is neglected relates to the fact that by doing so the error of the method decays quadratically with respect to the number of steps

Newton method: quadratic convergence

Then,

$$f(x_*) =0 \approx f(x_0) + h_* f'(x_0) $$

which produces

\begin{equation} h_* \approx -\frac{f(x_0)}{f'(x_0)} \end{equation}

If we denote the right hand side by $h$, we get Newton method, wherein \begin{equation} x_{n+1} = x_n + h = x_n -\frac{f(x_0)}{f'(x_0)} \end{equation}

Please, check the full answer at

Newton iteration converges monotonically

Daniel S.
  • 823
  • I don't get it. Even if by definition that f(x) ~= f(x+h), it does not mean f(x) = 0 nor f(x+h) =0 – user3431800 Apr 18 '20 at 02:17
  • by definition, $f(x+h)=0$. This is how $h$ is defined. $h$ is defined as the value such that after you add it to $x$ you reach the root. – Daniel S. Apr 18 '20 at 02:19
  • again the answer you referred to explains something related to Gauss-Newton, that eliminates the second derivative somehow, which is the part that i do not understand. I am asking about newton's method. Now per my understanding there are 3 methods derived from newton's method, one is Gauss-Newton, the other is Levenberg-Marquardt, and those two are to eliminate the Hessian matrix. But pure newton method should still involve the second derivative. – user3431800 Apr 18 '20 at 02:20
  • Newton method assumes that the contribution of the second derivative is negligible. This is the big insight – Daniel S. Apr 18 '20 at 02:28
  • well that is what confuses me. Because some places they like to include the second derivative in newton's method and show how to get rid of it in guass-newton or other methods.. – user3431800 Apr 18 '20 at 02:29
  • And i really think that if the second derivative is negligible, then show me what it is before it is eliminated...so that i know what i am ignoring here – user3431800 Apr 18 '20 at 02:31
  • please, share your source -- otherwise, at wiki it is assumed that $h^2$ is so small that the contribution is 0 https://en.wikipedia.org/wiki/Newton%27s_method – Daniel S. Apr 18 '20 at 02:31
  • https://math.stackexchange.com/questions/2624361/newtons-method-with-cubic-convergence-three-taylor-terms-instead-of-two you can take a look here .. – user3431800 Apr 18 '20 at 02:33
  • And i really think that if the second derivative is negligible, then show me what it is before it is eliminated ==> it is the term $h^2$ that is assumed to be very small hence this term times the second derivative is small – Daniel S. Apr 18 '20 at 02:34
  • basically my need is to using the Taylor expansion to derive gradient descent's step as well as newton's method step(the h). Per my understanding, newtons method uses the second derivative, which is major difference from gradient descent. If you are tell me that the second derivative could be eliminated without cost, that does not make sense to me.. – user3431800 Apr 18 '20 at 02:41
  • please, share a source where the second derivative is used in the so called Newton method, whatever that means – Daniel S. Apr 18 '20 at 02:46
  • Hey Daniel, I am not saying that it is used. I am saying it is eliminated with a process...but i do not see a process or logic, so i want someone to help with deriving the missing logic. – user3431800 Apr 18 '20 at 02:47
  • the logic is that if h=0.01, then h^2=0.0001 which is much smaller than h, so h^2 multilplied by something can be eliminated – Daniel S. Apr 18 '20 at 02:48
  • Hi Daniel, really appreciate your response and help. But I am trying to understand that if as you say, newton's method can be deducted as x = x - f(x)/f'(x), eliminating the second derivative, why we not use newton's method but gradient descent in machine learning? https://stats.stackexchange.com/questions/253632/why-is-newtons-method-not-widely-used-in-machine-learning I searched a few answers and those people are saying newton's methods does involve second derivative calculation, could you kindly explain why? – user3431800 Apr 18 '20 at 07:24
  • since you are telling me that newton's method will not use that terms and others are saying it does, and based on the derivation here https://math.stackexchange.com/questions/2624361/newtons-method-with-cubic-convergence-three-taylor-terms-instead-of-two, there is a second term. – user3431800 Apr 18 '20 at 07:27
  • and it is also not so hard to understand why I am taking a derivative on y. If the derivative of a function equals 0, it is at its local minimum/maximum. So I want to solve for x by taking a derivative on x. – user3431800 Apr 18 '20 at 07:41
  • @user3431800 let us move this discussion here https://math.stackexchange.com/questions/3632301/newton-method-and-machine-learning – Daniel S. Apr 18 '20 at 21:35