1

Let $$A=U\Sigma V^\top=\sum_i \sigma_i u_i v_i^\top ,$$ be the SVD of a real matrix $A$ of rank $r$. We want to show that the matrix $X_k$ of rank $k < r$ that minimises $\lVert A - X_k\rVert_F$ is $$A^k=\sum_i^k \sigma_i u_i v_i^\top .$$

The proof that can be found on the Wikipedia (also here) is as follows:

Since $||A-X_k||_F = ||U\Sigma V^\intercal - X_k||_F = ||\Sigma - U^\intercal X_k V ||_F$, denoting $N = U^\intercal X_k V$, an $m \times n$ matrix of rank $k$, a direct calculation gives \begin{equation} ||\Sigma-N||_F^2 = \sum_{i,j} |\Sigma_{i,j} - N_{i,j}|^2 = \sum_{i=1}^r |\sigma_i-N_{ii}|^2+\sum_{i>r}|N_{ii}|^2+\sum_{i\neq j} |N_{i,j}|^2 \end{equation} which is minimal when all the non diagonal terms of $N$ equal to zero, and so are all diagonal terms with $i > r$. Obviously, the minimum of the terms left is attained when $N_{ii} = \sigma_i$ for $i = 1,2,\cdots,k$ and all other $N_{ii}$ are zero.

My understanding of this is that geometrically $\lVert A - X_k\rVert_F$ is the sum of the distances between a set of orthogonal vectors which form the columns of $\Sigma$ and another set of vectors which form the columns of N. In addition, we know that $n-r$ vectors in the first set are zero, and $n-k$ vectors in the second set must be linearly dependent.

What I don't see is the second part of the proof, namely that it's "obvious" that $N$ must be chosen to be diagonal, in other words that the columns of $N$ must point in the same directions as the columns of $\Sigma$. Intuitively, it does seem that the optimal $N$ must be diagonal, but it's not obvious to me. I'd appreciate if somebody could clarify this point?

Ernest A
  • 257

1 Answers1

0

There are three terms on the right hand side, each involving different elements of the $N$ matrix, and each a sum of squares. Since the right hand side is separable, you can minimize each of the three terms separately.

Is it clear to you that

$\min \sum_{i=1}^{r} | \sigma_{i}-N_{i,i} |^2$

is achieved with $N_{i,i}=\sigma_{i}, i=1, 2, \ldots, r$?

Is it clear to you that

$\min \sum_{i>r} | N_{i,i} |^{2}$

is achieved by setting $N_{i,i}=0, i=r+1, \ldots$?

Is it clear to you that

$\min \sum_{i \neq j} | N_{i,j} |^{2}$

is achieved by setting $N_{i,j}=0, i \neq j$?

  • 1
    Yes, but the terms are not independent are they? Since $\sum_j\sum_i N_{i,j}^2$ is fixed for a given matrix $X_k$. So it's not clear to me that you can minimise each of the terms separately. – Ernest A Aug 02 '16 at 08:04
  • 1
    In each case you've picked values of $N_{i,j}$ that result in the unconstrained minimimum, and this solution also happens to satisfy the constraint, so you're all set. It's as if I asked you to minimize $x^{2}$ subject to $0 \leq x \leq 1$. $x=0$ is the unconstrained minimum of $x^{2}$ and satisfies $0 \leq x \leq 1$, so it is also the constrained minimum. – Brian Borchers Aug 02 '16 at 12:10
  • I don't think this is right. If your answer is correct, the right-hand side becomes zero. However, the left-hand side is strictly larger than zero since it attempts to approximate rank $r$ matrix with rank $k(<r)$. – inmybrain Nov 12 '17 at 09:06