2

for a matrix $A \in \mathbb{R^{m\times n}}$ and for $x,\epsilon \in \mathbb{R^n}$ and $\epsilon$ small we have that

$\|A(\hat{x}+\epsilon)-b\|_2^2=(A(\hat{x}+\epsilon)-b)^T(A(\hat{x}+\epsilon)-b)=\|A\hat{x}-b\|_2^2+2\epsilon^T(A^TA\hat{x}-A^Tb)+\epsilon^TA^TA\epsilon$

Now the book states that the $A^TA\hat{x} - A^Tb$ can be treated as a derivative of something, and so we can set it equal to 0 to get the normal equation? How is this possible, what is it the derivative of and why do we set it to 0?

1 Answers1

1

You have $f(x)=\frac12\,\|Ax-b\|^2$ and by you computation, only formalized differently $$ \frac{d}{dt} f(x+t·ϵ)|_{t=0}=(Ax-b)^TA·ϵ. $$ or transpose of that for the Gradient. For the (in this case unique) minimum the derivative has to be zero in all directions $ϵ$, which means $$ 0=A^T(Ax-b)=A^TA-A^Tx. $$ For a numerical solution you would employ the QR decomposition of $A$, which reduces, provided $A$ has maximum rank, $Rx=Q^Tb$.

Lutz Lehmann
  • 126,666
  • Where did you get $f(x)$ from and where did you get that derivative?? – blueonken2 Mar 26 '15 at 18:41
  • I'm sorry i don't follow how your solutions leads from mine, could you please explain – blueonken2 Mar 26 '15 at 18:43
  • By expanding the norm as you did and then differentiating the resulting quadratic expression. – Lutz Lehmann Mar 26 '15 at 18:59
  • Using my expression so I don't get confused, what are you differentiating with respect to and why? – blueonken2 Mar 26 '15 at 19:01
  • Note that I replaced, as usual in the definition of a directional derivative, your $ϵ$ with the scaled $tϵ$. And if $v^Tϵ=0$ for all vectors $ϵ$ (even all in some ball of small positive radius), then $v=0$. – Lutz Lehmann Mar 26 '15 at 19:18