1

I have this linear alegbra question that I don't know how to start. The question:

$$A = \left( \begin{matrix}1 & -2 \\3 & 1 \\2 & 3 \\\end{matrix}\right ) $$

$$b= \left (\begin{matrix} 0 \\ 1 \\ 0 \\ \end{matrix}\right )$$

$$r = \left ( \begin{matrix} x \\ y \\ \end{matrix} \right )$$

The matrix equation of $Ar-b$ has no solution because it is inconsistent. If we define the square error to be $\Delta^2 = \delta^T\delta$ where δ=Ar−b, we can still solve the least square sense.

Explain why $\Delta^2$ is a function of x and y and that $\Delta^2 $ is minimised by the solution of $$A^TAr = A^Tb$$

Need some guidance in solving this two parts.

lakshmen
  • 1,005
  • 2
    Several things about your way of stating the problem are not clear. You say "The matrix equation of $Ar-b$". That is not an equation. I surmise that you mean the equation $Ar-b=0$. That could be stated explicitly. By "solution", I likewise surmise that you mean "solution for $r$". Again, you could say that. Then you refer to $\delta^T\delta$. You could have said "where $\delta=Ar-b$", which is again my surmise rather than what you said. – Michael Hardy Aug 21 '13 at 18:54

3 Answers3

2

We have $$A^T A = \left( \begin{matrix}1&3&2\\-2&1&3 \\\end{matrix}\right ) \left( \begin{matrix}1 & -2 \\3 & 1 \\2 & 3 \\\end{matrix}\right ) = \left ( \begin{matrix} 14 & 7\\ 7 & 14 \end{matrix} \right )$$ and $$A^T b = \left( \begin{matrix}1&3&2\\-2&1&3 \\\end{matrix}\right ) \left ( \begin{matrix} 0\\1\\0 \end{matrix} \right )= \left( \begin{matrix} 3\\1 \end{matrix} \right )$$ So we result in a $2\times 2$ linear equation system with an unique solution: $$\left ( \begin{matrix} 14 & 7\\ 7 & 14 \end{matrix} \right ) r = \left( \begin{matrix} 3\\1 \end{matrix} \right)$$

So far for the solution. Now why is $\Delta^2$ a function of $r = (x,y)^T$? Well, because the definition says

$$\Delta^2 = (Ar-b)^T (Ar-b) = (Ar)^T(Ar) - (Ar)^T b - b^T Ar + b^Tb = r^T A^TA r - r^T A^Tb - b^T Ar + b^Tb = r^T (A^TAr-2A^Tb) + b^Tb$$ with constant $A, b$. Thus, $\Delta^2 = \Delta^2(r) = \Delta^2(x,y)$. Using the latter expanded form, minimizing $\Delta^2$ is the same as minimizing $r^T A^TAr - 2r^TA^Tb$. The rest is basic calculus.

AlexR
  • 24,905
2

\begin{align} \min_r &\|Ar-b\|_2^2\\ \phi(r)&=(Ar-b)^T(Ar-b)\\ &=(r^TA^T-b^T)(Ar-b)\\ &=r^TA^TAr-r^TA^Tb-b^TAr+b^Tb\\ &=r^TA^TAr-2b^TAr+b^Tb\\ \text{We want to minimize }\phi (r)\\ \nabla\phi(r)&=0\\ \implies 2A^TAr-2A^Tb+0&=0 \qquad \text{ Reference For This Step Below}\\ \implies A^TAr&=A^Tb\\ &\blacksquare \end{align} Reference for Matrix Calculus

You might also want to investigate a very powerful tool called the Moore-Penrose Pseudoinverse.

Inquest
  • 6,635
  • You still need to prove that the critical point is necessarily a Minimum and neither a Maximum nor a Saddle(?)-Point (~$x^3$, don't know the english word) – AlexR Aug 25 '13 at 14:03
  • @AlexR, agreed. I will add it later. – Inquest Aug 26 '13 at 00:43
1

Assume that $A^TA$ is invertible. Then consider the matrix $M=A\left(A^TA\right)^{-1}A^T$. It is not hard to verify that $$ M^2=M\tag{1} $$ and $$ M^T=M\tag{2} $$ $(1)$ and $(2)$ show that $M$ is an orthogonal projection onto the column space of $A$. To see that it is onto the column space of $A$, note that $MA=A$. It is orthogonal because for any $x,y$ $$ \begin{align} \langle Ax,My-y\rangle &=\langle Ax,My\rangle-\langle Ax,y\rangle\\ &=\langle M^TAx,y\rangle-\langle Ax,y\rangle\\ &=\langle MAx,y\rangle-\langle Ax,y\rangle\\ &=\langle Ax,y\rangle-\langle Ax,y\rangle\\ &=0\tag{3} \end{align} $$

Since $Mb$ is the orthogonal projection of $b$ onto the column space of $A$, it should be the closest point to $b$ of any $Ax$. Therefore, let $$ r=\left(A^TA\right)^{-1}A^Tb\tag{4} $$ which is equivalent to the condition $A^TAr=A^Tb$.

$Ar=Mb$ should be the closest point to $b$ of any $Ax$. In fact, if we use $(4)$ and apply $(3)$, we have $$ \begin{align} \|A(r+x)-b\|^2 &=\langle Mb+Ax-b,Mb+Ax-b\rangle\\ &=\|Mb-b\|^2+2\langle Ax,Mb-b\rangle+\|Ax\|^2\\ &=\|Ar-b\|^2+\|Ax\|^2\tag{5} \end{align} $$ Thus, $(5)$ shows that $\|A(r+x)-b\|^2$ is minimal iff $Ax=0$ and since $A^TA$ is invertible, $Ax=0\iff x=0$. Therefore, $(4)$ does give the least squares solution as implied by the orthogonality.


Another Approach

Consider $$ \begin{align} \|A(r+x)-b\|^2 &=\langle Ar+Ax-b,Ar+Ax-b\rangle\\ &=\langle Ar-b,Ar-b\rangle+2\langle Ar-b,Ax\rangle+\langle Ax,Ax\rangle\\ &=\|Ar-b\|^2+2\langle A^TAr-A^Tb,x\rangle+\|Ax\|^2\tag{6} \end{align} $$ Let $x=t(A^TAr-A^Tb)$, then $(6)$ says $$ \|A(r+x)-b\|^2=\|Ar-b\|^2+2t\|A^TAr-A^Tb\|^2+t^2\|AA^TAr-AA^Tb\|^2\tag{7} $$ For $\|Ar-b\|^2$ to be a minimum, the derivative of $(7)$ must equal $0$. That is, $\|A^TAr-A^Tb\|^2=0$

Therefore, for $\|Ar-b\|^2$ to be a minimum, we must have $$ A^TAr-A^Tb=0\tag{8} $$

robjohn
  • 345,667