7

In Numerical Optimization by Nocedal and Wright, (http://home.agh.edu.pl/~pba/pdfdoc/Numerical_Optimization.pdf) Chapter 2 on unconstrained optimization, page 25 top, the authors claim that

"The equivalent formula for SR1, $$B_{k+1} = B_k + \frac{(y_k - B_k s_k) (y_k - B_k s_k)^T} {(y_k - B_k s_k)^T s_k}$$

and BFGS, $$B_{k+1} = B_k - \frac{B_ks_ks_k^T}{s_k^TB_ks_k} + \frac{y_ky_k^T}{y_k^Ts_k} $$

applied to the inverse approximation $$H_k = B_k^{-1} (definition) $$

is $$H_{k+1} = (I - \rho_k s_k y_k^T)H_k (I - \rho_k y_k s_k^T) + \rho_k s_k s_k^T$$ where $\rho_k = \frac{1}{y_k^T s_k}$"

I have expanded the inverse update given above to

$$H_{k+1} = H_k - \frac{H_ky_ks_k^T}{y_k^T s_k} - \frac{s_k y_k^T H_k}{y_k^T s_k} + \frac{s_k y_k^T H_k y_k s_k^T}{(y_k^T s_k)^2} + \frac{s_k s_k^T}{y_k^T s_k}$$

But from here, I cannot algebraically manipulate that into either of the non-inverse update formulas.

As well, I do not understand why, given that the SR1 and BFGS formulas are different updates with different guarantees, that the authors claim that they have the same inverse update.

===========================

EDIT: I still can't get the BFGS inverse update formula. Here's my work:

I'm trying to do the updates separately since I cannot find a way to make the two rank-1 matrices form the product $UV^T$.

Using the higher order Sherman-Morrison formula, $$B_k' = B_k - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}$$ choosing $$U=-\frac{1}{C} B_k s_k s_k^T$$ and $$V = B_k^T$$ gives

$${B'}_k^{-1} = \frac{1}{C} s_k s_k^T$$ $$B_{k+1} = B'_k + \frac{y_k y_k^T}{y_k^T s_k}$$ choosing $$U_2 = \frac{y_k}{K}$$ and $$V_2 = y_k$$

which gives $$B_{k+1}^{-1} = -\frac{{B'}_k^{-1} y_k y_k^T {B'}_k^{-1}}{K} = -\frac{1}{K} (\frac{1}{C} s_k s_k^T) y_k y_k^T (\frac{1}{C} s_k s_k^T)$$ does not work out to give the update that I want.

user2193268
  • 303
  • 4
  • 11

2 Answers2

4

You are right, the inverse approximation (2.21) seems to be for BFGS only. Compare with (6.17), page 140. The inverse approximation for SR1 is given in (6.25), page 144. To obtain inverses one applies the Sherman–Morrison formulae (A.27) and (A.28), page 612.

Edit:

Starting with the RHS of (6.19) and omitting subscript $k$ $$ B-\frac{Bss^TB}{s^TBs}+\frac{yy^T}{y^Ts}=\underbrace{B}_{A}+ \underbrace{\begin{bmatrix}Bs & y\end{bmatrix}}_{U} \underbrace{\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix} \begin{bmatrix}s^TB\\ y^T\end{bmatrix}}_{V^T}. $$ Then with $B^{-1}=H$ the inverse of the RHS is: \begin{align*} (A+UV^T)^{-1}&=H-H\begin{bmatrix}Bs & y\end{bmatrix} \left(I+\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix} \begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\begin{bmatrix}Bs & y\end{bmatrix}\right)^{-1}\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix}\begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\\ &=H-\begin{bmatrix}s & Hy\end{bmatrix} \left(\begin{bmatrix}-s^TBs & 0\\0 & y^Ts\end{bmatrix}+ \begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\begin{bmatrix}Bs & y\end{bmatrix}\right)^{-1}\begin{bmatrix}s^T \\ y^TH\end{bmatrix}\\ &=H+\frac{1}{y^Tss^Ty}\left(\begin{bmatrix}s & Hy\end{bmatrix} \begin{bmatrix}y^T+y^Ty & -s^Ty\\-y^Ts & 0 \end{bmatrix}\begin{bmatrix}s^T \\ y^TH\end{bmatrix}\right)\\ &=H+\frac{1}{y^Tss^Ty}\left(sy^Tss^T+sy^THys^T-Hyy^Tss^T-ss^Ty^TH\right)\\ &=H+\left(1+\frac{y^THy}{y^Ts}\right)\frac{ss^T}{s^Ty}-\frac{Hys^T+(Hys^T)^T}{y^Ts}\\ \end{align*}

Migolas
  • 59
A.Γ.
  • 29,518
  • I've updated the question to show my work on how to derive the BFGS inverse update, which I am stuck at. I've also tried separately applying Sherman-Morrison's form with rank-1 updates but I get a 0 denominator. Sorry for typos, this was typed in a rush. – user2193268 Jun 29 '16 at 00:06
  • @user2193268 Ok, I have edited the answer to show how to organize the usage of the formula for the inverse. The calculations are quite tedious, I leave the rest for you to work it out. – A.Γ. Jun 29 '16 at 01:15
  • Thanks very much! I am able to show the inverse update now. Is there a name for the technique you applied to find U and V, or was that just insight? – user2193268 Jul 01 '16 at 23:22
1

A.Γ.'s argument is correct, and the formula above only applies to the inverse of the BFGS update. However, the algebraic manipulation presented in their reply contains some typos (from the third equality) that it has taken me hours to identify and correct. Hopefully my derivation below helps future users in search of a detailed answer.

To keep things neat, I will try to simplify as much as possible without expanding the matrices. Calling:

\begin{align*} V_1 & = \begin{bmatrix} -\frac{1}{s^TBs} & 0 \\ 0 & \frac{1}{y^Ts} \end{bmatrix} \\ V_2 & = \begin{bmatrix} s^TB\\ y^T \end{bmatrix} \end{align*}

so that $V^T$ is simply $V^T = V_1 V_2$. Plugging H, U, V, $V_1$, and $V_2$ into the Sherman-Morrison-Woodbury formula: \begin{align*} H_{k+1} & = H - H U (I + V^T H U)^{-1} V_1 V_2 H \\ \end{align*}

Introducing \begin{align*} v \equiv HU = (V_2 H)^T = \begin{bmatrix} s & Hy \end{bmatrix} \end{align*}

and defining $M \equiv (I + V^T H U)^{-1} V_1$, we get the much simpler expression for the inverse of the update: \begin{align*} H_{k+1} & = H - v M v^T \end{align*}

To further proceed, we need to simplify M: \begin{align*} M & = (I + V^T H U)^{-1} V_1 \\ & = (V_{1}^{-1} (I + V^THU))^{-1} \\ & = (V_{1}^{-1} + V_2 H V_2^T)^{-1} \end{align*}

What is between parentheses is $M^{-1}$, so let's simplify it: \begin{align*} M^{-1} & = \begin{bmatrix} -s^TBs & 0 \\ 0 & y^Ts \end{bmatrix} + \begin{bmatrix} s^TB\\ y^T \end{bmatrix} H \begin{bmatrix} Bs & y \end{bmatrix} \\ & = \begin{bmatrix} -s^TBs & 0 \\ 0 & y^Ts \end{bmatrix} + \begin{bmatrix} s^TBs & s^Ty \\ s^Ty & y^T H y \end{bmatrix} \\ & = \begin{bmatrix} 0 & s^Ty \\ s^Ty & y^T H y + s^Ty \end{bmatrix} \end{align*}

$M$ is now easily seen to be, inverting the 2x2 matrix $M^{-1}$, and defining $\rho = (s^Ty)^{-1}$: \begin{align*} M = -\rho \begin{bmatrix} \rho y^T H y + 1 & -1 \\ -1 & 0 \end{bmatrix} \end{align*}

For a happy ending, we can expand $v M v^T$ as a quadratic form, using $v = \begin{bmatrix} s & Hy \end{bmatrix}$:

\begin{align*} H_{k+1} & = H - v M v^T \\ & = H + \rho (s s^T + \rho sy^T H ys^T - Hys^T - sy^T H) \\ & = (H - \rho Hys^T - \rho sy^T H + \rho^2 sy^T H ys^T) + \rho ss^T \\ & = (I - \rho s y^T) H (I - \rho y s^T) + \rho ss^T \end{align*}

That's it :)