12

Could someone please explain why in this Wiki page one says "we know that $\exists(k+1)$ dimension space $(v_1,v_2, \dots, v_n)$" ?

user26857
  • 52,094

3 Answers3

23

One needs to show that if $\mathrm{rank}(B)=k$, then $\|A-B\|_2\geq\|A-A_k\|_2$. This can be done as follows.

Since $\mathrm{rank}(B)=k$, $\dim\mathcal{N}(B)=n-k$ and from $$\dim\mathcal{N}(B)+\dim\mathcal{R}(V_{k+1})=n-k+k+1=n+1$$ (where $V_{k+1}=[v_1,\ldots,v_{k+1}]$ is the matrix of right singular vectors associated with the first $k+1$ singular values in the descending order), we have that there exists an $$x\in\mathcal{N}(B)\cap\mathcal{R}(V_{k+1}), \quad \|x\|_2=1.$$ Hence $$ \|A-B\|_2^2\geq\|(A-B)x\|_2^2=\|Ax\|_2^2=\sum_{i=1}^{k+1}\sigma_i^2|v_i^*x|^2\geq\sigma_{k+1}^2\sum_{i=1}^{k+1}|v_i^*x|^2=\sigma_{k+1}^2. $$ From $\|A-A_k\|_2=\sigma_{k+1}$, one gets hence $\|A-B\|_2\geq\|A-A_k\|_2$. No contradiction required, Quite Easily Done.


EDIT An alternative proof, which works for both the spectral and Frobenius norms, is based on the Weyl's theorem for eigenvalues (or more precisely, its alternative for singular values): if $X$ and $Y$ are $m\times n$ ($m\geq n$) and (as above) the singular values are ordered in the descreasing order, we have $$\tag{1} \sigma_{i+j-1}(X+Y)\leq\sigma_i(X)+\sigma_j(Y) \quad\text{for $1\leq i,j\leq n, \; i+j-1\leq n$} $$ (this follows from the variational characterization of eigen/singular values; see, e.g., Theorem 3.3.16 here). If $B$ has rank $k$, $\sigma_{k+1}(B)=0$. Setting $j=k+1$, $Y:=B$, and $X:=A-B$, in (1) gives $$ \sigma_{i+k}(A)\leq\sigma_i(A-B) \quad \text{for $1\leq i\leq n-k$}. $$ For the spectral norm, it is sufficient to take $i=1$. For the Frobenius norm, this gives $$ \|A-B\|_F^2\geq\sum_{i=1}^{n-k}\sigma_i^2(A-B)\geq\sum_{i=k+1}^n\sigma_i^2(A) $$ with the equality attained, again, by $B=A_k$.

  • Thank you. Maybe i'm missing something, but what i don't understand is this: what has $V_{k+1}$ to do with everything else and how it is connected to rank(B). – Alex Gaspare Apr 18 '14 at 13:41
  • 2
    @AlexGaspare I'm not sure I've understood the question. $V_{k+1}$ is formed from the first $k+1$ right (not left, sorry) singular vectors of $A$, that is, first $k+1$ columns of $V$ from the SVD $A=U\Sigma V^*$; truncating this SVD determines the optimal approximation $A_k$. It does not have much to do with the rank of $B$, except that it has a non-trivial intersection with the nullspace of $B$ by the dimensional argument (if the sum of the dimensions of two subspaces of an $n$-dimensional subspace is larger than $n$, they have non-trivial intersection). – Algebraic Pavel Apr 18 '14 at 13:46
  • i'm sorry to bring this up now, but why is squaring the norm needed again? And why does $||A−B||_2^2≥||(A−B)x||_2^2$ holds? – Alex Gaspare Apr 25 '14 at 12:35
  • 3
    @AlexGaspare Squaring norms is not needed, it's there just to avoid typesetting square roots. The numbers are all non-negative so it does not matter. If $C$ is a matrix and $|x|2=1$, then $|Cx|_2\leq|C|_2$ simply because $|C|_2=\max{|x|_2=1}|Cx|_2$ (so taking a particular $x$ in $\max$ is a lower bound for the $\max$). – Algebraic Pavel Apr 25 '14 at 13:04
  • @AlgebraicPavel: +1, but when you write $|\cdot|2$, which norm are you referring to? As you say that $|A|_2=\sigma\mathrm{max}$, I guess you are using the operator norm, but isn't Eckart-Young theorem (also?) true for Frobenius norm that is often denoted in exactly the same way $|A|_F = |A|_2=\left(\sum \sigma^2_k\right)^{1/2}$? How does the proof work in that case? The wiki article does start with Frobenius norm, but then somehow seems to switch to the operator norm. – amoeba Nov 26 '14 at 16:32
  • @AlgebraicPavel Could you please tell me what you used to get $|Ax|2^2=\sum{i=1}^{k+1}\sigma_i^2|v_i^*x|^2$? I cannot understand the hidden passages... thanks! – gosbi Apr 03 '15 at 17:57
  • 1
    @gosbi It's because $x\in\mathcal{R}(V_{k+1})$, so it can be written as a linear combination of $v_1,\ldots,v_{k+1}$. – Algebraic Pavel Apr 03 '15 at 18:26
  • 1
    @AlgebraicPavel Then, I don't understand where $A$ is, as for what I have understood, $v_i^x$ is just a way to write $x$. I tried to compute it explicitly, but I got quite a different result: $$ | A x |_2 = (Ax)^(Ax) = x^V\Sigma^U^U\Sigma V^x = (V^x)^\Sigma^2(V^x) = \sum_{i=1}^{k+1}\sigma_i^2\mid v_i^ x\mid $$ where the term in the sum is not squared, which makes a great difference in the following passage of the proof. – gosbi Apr 04 '15 at 08:58
  • 1
    @gosbi The problem of the last equality is that you forgot that square. – Algebraic Pavel Apr 04 '15 at 13:59
  • @amoeba Sorry for the delay :-) I've added some details for the Frobenius norm. – Algebraic Pavel Apr 13 '15 at 11:33
  • @AlgebraicPavel Sorry but you can't write ‖−‖2=+1 at the end of the first proof :) It's a sum from k + 1 to n. – Jill-Jênn Vie Oct 23 '20 at 04:42
  • @Jill-JênnVie I'm not sure what you mean. The largest singular value of $A-A_k$ is $\sigma_{k+1}$ so its 2-norm is $\sigma_{k+1}$. – Algebraic Pavel Oct 23 '20 at 08:54
  • Hi. I'll appreciate if someone clarify my following doubt: Why $N(B) \cap R(V_{k}) = {0}$. What i am understanding, and it must be certainly correct (since this same proof is in one of Gilbert Strang's books), is that the argument in line 7 will only work for $V_{k+i}$ where i is a positive integer. But how can one be certain that for $V_{k-i}$ will not work. In other words, why would the inersection between $V_{k-i}$ and $mathcal{N}(B)$ equal {0} where 0 is the zero vector. Thank you – Mangostino Nov 29 '20 at 18:13
  • 1
    @Mangostino Nobody claims that $N(B)\cap R(V_k)={0}$. What is true that for any $B$ of rank $k$, $N(B)\cap R(V_{k+1})\neq{0}$ which is the key for the proof. On the other hand, there exist $B$ of rank $k$ such that $N(B)\cap R(V_k)={0}$ but there are also such matrices for which these intersections are nontrivial. – Algebraic Pavel Dec 02 '20 at 11:13
  • Thanks. Finally got that spine out from my toe. – Mangostino Dec 02 '20 at 14:34
  • why is ()∩(+1) ≠ {0}? – rando Jan 23 '21 at 23:35
  • 1
    @rando If $V$ and $W$ are finite-dimensional subspaces of space of dimension $n$, we have $\mathrm{dim}(V\cap W)=\mathrm{dim}(V)+\mathrm{dim}(W)-\mathrm{dim}(V+W)$. With $\mathrm{dim}(V)=n-k$, $\mathrm{dim}(W)=k+1$, and $\mathrm{dim}(V+W)\leq n$, we have $\mathrm{dim}(V\cap W)\geq (n-k)+(k+1)-n=1$, that is, $V\cap W$ is non-trivial. – Algebraic Pavel Jan 24 '21 at 12:04
  • I understand now, thanks. – rando Jan 24 '21 at 12:46
2

Here's a slightly simplified version of the proof on wiki.

As shown there, $\left \| A - A_k \right \|_2 = \sigma_{k+1}$. Now, suppose $\exists \ B$ such that $\mathrm{rank}(B) \leq k$ and $\left \| A - B \right \|_2 < \left \| A - A_k \right \|_2$. Then, $\text{dim} (\mathrm{null}(B)) \geq n-k$. Let $w \in \mathrm{null}(B)$, then \begin{equation} \left \| A w \right \|_2 = \left \| (A - B)w \right \|_2 \leq \left \| A - B \right \|_2 \left \| w \right \|_2 < \sigma_{k+1} \left \| w \right \|_2 \end{equation} Also, for any $v \in \mathrm{span}\{ v_1,v_2,\cdots,v_{k+1} \}$, $\left \| A v \right \|_2 \geq \sigma_{k+1} \left \| v \right \|_2$. But, \begin{equation} \mathrm{null}(B) \cap \mathrm{span}\{ v_1,v_2,\cdots,v_{k+1} \} \neq \{0\} \end{equation} So, $\exists$ a non-zero vector lying in both spaces. This'll lead to a contradiction.

user26857
  • 52,094
-2

The original statement of Eckart-Young-Mirsky theorem on wiki is based on Frobenius norm, but the proof is based on 2-norm. Though Eckart-Young-Mirsky theorem holds for all norms invariant to orthogonal transforms, I think it is necessary to add a proof purely based on Frobenius norm since it is even easier to prove than that based on 2-norm.

The following proof is replicated from these notes.

Since $||M-X_k||_F = ||U\Sigma V^\intercal - X_k||_F = ||\Sigma - U^\intercal X_k V ||_F$, denoting $N = U^\intercal X_k V$, an $m \times n$ matrix of rank $k$, a direct calculation gives \begin{equation} ||\Sigma-N||_F^2 = \sum_{i,j} |\Sigma_{i,j} - N_{i,j}|^2 = \sum_{i=1}^r |\sigma_i-N_{ii}|^2+\sum_{i>r}|N_{ii}|^2+\sum_{i\neq j} |N_{i,j}|^2 \end{equation} which is minimal when all the non diagonal terms of $N$ equal to zero, and so are all diagonal terms with $i > r$. Obviously, the minimum of the terms left is attained when $N_{ii} = \sigma_i$ for $i = 1,2,\cdots,k$ and all other $N_{ii}$ are zero.

user26857
  • 52,094
  • 1
    I think this proof is incorrect. Since $\mathrm{rank}(N) \le k$, all the $N_{ij}$ are coupled with one another. So it does not follow that the minimum should be attained when $N_{ij}=0$ for $i\ne j$. – Laurent Lessard Oct 18 '16 at 05:43