Let $$A=U\Sigma V^\top=\sum_i \sigma_i u_i v_i^\top ,$$ be the SVD of a real matrix $A$ of rank $r$. We want to show that the matrix $X_k$ of rank $k < r$ that minimises $\lVert A - X_k\rVert_F$ is $$A^k=\sum_i^k \sigma_i u_i v_i^\top .$$
The proof that can be found on the Wikipedia (also here) is as follows:
Since $||A-X_k||_F = ||U\Sigma V^\intercal - X_k||_F = ||\Sigma - U^\intercal X_k V ||_F$, denoting $N = U^\intercal X_k V$, an $m \times n$ matrix of rank $k$, a direct calculation gives \begin{equation} ||\Sigma-N||_F^2 = \sum_{i,j} |\Sigma_{i,j} - N_{i,j}|^2 = \sum_{i=1}^r |\sigma_i-N_{ii}|^2+\sum_{i>r}|N_{ii}|^2+\sum_{i\neq j} |N_{i,j}|^2 \end{equation} which is minimal when all the non diagonal terms of $N$ equal to zero, and so are all diagonal terms with $i > r$. Obviously, the minimum of the terms left is attained when $N_{ii} = \sigma_i$ for $i = 1,2,\cdots,k$ and all other $N_{ii}$ are zero.
My understanding of this is that geometrically $\lVert A - X_k\rVert_F$ is the sum of the distances between a set of orthogonal vectors which form the columns of $\Sigma$ and another set of vectors which form the columns of N. In addition, we know that $n-r$ vectors in the first set are zero, and $n-k$ vectors in the second set must be linearly dependent.
What I don't see is the second part of the proof, namely that it's "obvious" that $N$ must be chosen to be diagonal, in other words that the columns of $N$ must point in the same directions as the columns of $\Sigma$. Intuitively, it does seem that the optimal $N$ must be diagonal, but it's not obvious to me. I'd appreciate if somebody could clarify this point?