0

In literature about trust region methods I found the following statement:

First they define for $t\in (0,1)$

$f(x_k+p)=f(x_k)+\nabla f(x_k)^Tp+\frac{1}{2}p^T\nabla^2f(x_k+tp)p$

and by using $B_k$ as an approximation to the Hessian (with $B_k$ symmetric)

$m_k(p)=f(x_k)+\nabla f(x_k)^Tp+\frac{1}{2}p^TB_kp$.

Then it says the difference between $m_k(p)$ and $f(x_k+p)$ is $\mathcal{O}(\Vert p\Vert^2)$.

I'm not sure, why this statement about the difference holds.

Obviously the difference between those functions is in the second-order term. The only statement I remember from previous lessons for approximations of Hessians is that the difference is $\mathcal{o}(\Vert x_{k+1}-x_{k}\Vert)$. Since $x_{k+1}$ is in this case equal to $x_k+p$ it follows $\mathcal{o}(\Vert p\Vert)$, but this doesn't lead to the statement.

I would be really happy for advices.

Edit: $f(x_k+p)$ is the Taylor-series expansion around $x_k$ from $f$ and $m_k(p)$ is the model function, which should be minimized in every trust region iteration.

sqlman
  • 172
  • I've edited the formula subsituting $f(x_k)$ to $f_k$ – user Dec 01 '17 at 17:45
  • Sorry but at a third sight there is something no so ckear in the equations, could you please verify the rxact expression you are dealing with. I want to be sure about my interpretation and the context . – user Dec 01 '17 at 18:19
  • Edited formula and added a bit more context. – sqlman Dec 01 '17 at 18:38
  • You haven't made any statement about the possible difference between $B_{k}$ and $\nabla^{2} f(x_{k}+tp)$. You need to have some bound on this, or else anything could happen. – Brian Borchers Dec 01 '17 at 19:52
  • Because there is no further information than this. Just the two functions and that there difference is $\mathcal{O}(\Vert p\Vert^2)$. And i want to understand why, because on the first views it feels that this statement comes out of nowhere. At the moment I assume that its the difference between the Taylor-series from $f$ and $m_k$. Since $m_k$ is until the first order term equal to $f(x_k+p)$ and different in the second order term. – sqlman Dec 01 '17 at 20:04

1 Answers1

1

The first equation describes the function in a ball around $x_k$ and the remainder term should be added:

$$f(x_k+p)=f_k+g_k^Tp+\frac{1}{2}p^T\nabla^2f(x_k+tp)p+o(||p||)$$

while the second equation describes the best approximation of $f$ for $x_k=p$ that is for $t=0$.

This expression of the remainder: $$o(||p||)=O(||p||^2)$$ is called Peano's remainder to be distigushed by Lagrange's remainder.

https://en.wikipedia.org/wiki/Taylor%27s_theorem

user
  • 154,566
  • Thank you, but why do they use then $t\in (0,1)$ in the expression, instead of just $\nabla^2f(x_k)$. I thought this $t$ is there to avoid the remainder. – sqlman Dec 01 '17 at 17:31
  • I've edited my answer because I didn't explain correctly the meaning of 2nd equation. The assumption for t∈(0,1) is not so important I think, it's only to say: we are consider the function "near" "x_k". – user Dec 01 '17 at 17:48