0

Assume we have a matrix $Y \in R^{n \times k}$ and a matrix $W \in R^{n \times n}$ that gives mutual weight between each $n$ datapoints $y_i$ and $y_j$. Also we define degree matrix $D_{ii} = \sum_{j=1}^n W_{ij}$ as a diagonal matrix.

Why minimizing $\sum_{i=1}^n \sum_{j=1}^n W_{ij} \|y_i-y_j\|^2$ is equivalent to minimizing Trace$(Y^T (D-w) Y)$?

1 Answers1

0

There is no loss of generality in taking $W$ to be symmetric (this follows since $\|y_i-y_j\| = \|y_j - y_i\|$).

Let $Y = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}$, then the $l$th components of the $y$ vectors are given by $Y e_l$.

Note that $\|y_i - y_j\|^2 = \sum_l [y_i]_l^2 - 2\sum_l [y_i]_l [y_j]_l + \sum_l [y_j]_l^2$, hence \begin{eqnarray} \sum_{i,j} W_{ij} \|y_i - y_j\|^2 &=& \sum_l \left( \sum_{i,j} W_{ij} ([y_i]_l^2 - 2[y_i]_l [y_j]_l + [y_j]_l^2)\right) \\ &=& \sum_l \left( \sum_{i} (\sum_{j} W_{ij}) [y_i]_l^2 - 2\sum_{i,j} W_{ij} [y_i]_l [y_j]_l + \sum_{j} (\sum_{i} W_{ij}) [y_j]_l^2 \right) \\ &=& \sum_l \left( \sum_{i} D_{ii} [y_i]_l^2 - 2\sum_{i,j} W_{ij} [y_i]_l [y_j]_l + \sum_{j} D_{jj} [y_j]_l^2 \right) \\ &=& 2\sum_l \left( \sum_{i} D_{ii} [y_i]_l^2 - \sum_{i,j} W_{ij} [y_i]_l [y_j]_l\right) \\ &=& 2\sum_l \left( \sum_{i,j} D_{ij} [y_i]_l [y_j]_l - \sum_{i,j} W_{ij} [y_i]_l [y_j]_l\right) \\ &=& 2\sum_l \left( \sum_{i,j} (D_{ij} - W_{ij}) [y_i]_l [y_j]_l\right) \\ &=& 2\sum_l (Y e_l)^T (D-W) (Y e_l) \\ &=& 2\sum_l e_l^T Y^T(D-W) Y e_l \\ &=& 2 \operatorname{tr} ( Y^T(D-W) Y ) \end{eqnarray}

(As an aside, note that $D-W$ is diagonally dominant.)

copper.hat
  • 172,524