3

I have run into a problem like this. Looks a bit unusual, but I think should be doable. Find $U$ achieving

$$\min_U \left( \| A - UW \|_2^2 + \| RU - H \|_2^2 \right)$$

$A,U,W,R,H$ are all matrices of dimension so that the operations above make sense. All the other matrices are given, except $U$. The norm is the Frobenius norm.

The difficulty here is that $U$ is multiplied from different sides in different terms.

Any ideas?

PA6OTA
  • 1,972
  • 10
  • 12
  • $U\mapsto A-UW$ and $U\mapsto RU-H$ are both linear operators on matrices, and the Frobenius norm is just the $L^2$ norm on the entries of the matrix. So you just have a quadratic minimization problem on your hands. –  Jul 07 '14 at 19:35
  • Yes, it's quadratic, but how do you express the answer in terms of $A, W, R, H$? – PA6OTA Jul 08 '14 at 18:13

2 Answers2

4

Let's compute the gradient of $g(U) = \|A - U W \|^2$. We'll use the fact that if $P,Q \in \mathbb R^{m \times n}$, then $\langle P, Q \rangle = \text{Tr}(P Q^T)$. \begin{align} g(U + \Delta U) &= \|A - U W - \Delta U W \|^2 \\ &\approx \|A - U W \|^2 - 2 \langle A - U W, \Delta U W \rangle \\ &= g(U) - 2 \text{Tr}\left((A - UW) W^T \Delta U^T \right) \\ &= g(U) - 2 \langle (A - UW)W^T, \Delta U \rangle. \end{align} Comparing this with $g(U + \Delta U) \approx g(U) + \langle \nabla g(U), \Delta U \rangle$, we see that \begin{equation} \nabla g(U) = -2(A - UW) W^T. \end{equation}

Next let's compute the gradient of $h(U) = \|RU - H \|^2$. We'll use the fact that if $P, Q \in \mathbb R^{m \times n}$, then $\langle P, Q \rangle = \text{Tr}(P^T Q)$. \begin{align} h(U + \Delta U) &= \| RU - H + R \Delta U \|^2 \\ &\approx \| RU - H \|^2 + 2 \langle RU - H, R \Delta U \rangle \\ &= h(U) + 2 \text{Tr} \left((RU - H)^T R \Delta U \right) \\ &= h(U) + 2 \langle R^T (RU - H), \Delta U \rangle. \end{align} Comparing this with $h(U + \Delta U) \approx h(U) + \langle \nabla h(U), \Delta U \rangle$, we see that \begin{equation} \nabla h(U) = 2 R^T(RU - H). \end{equation}

Now setting the gradient of $f(U) = g(U) + h(U)$ equal to $0$, we obtain \begin{align} -(A - UW) W^T + R^T (RU - H) = 0 \end{align} This is a Sylvester equation for $U$, and could be solved with standard techniques for Sylvester equations. For example, Matlab has a function called sylvester.

littleO
  • 51,938
2

Multiplication by a fixed matrix is a linear operator on the vector space of matrices, and can be represented in matrix-vector form using vectorization and the Kronecker product: $${\rm vec}(AXB) = (B^T\otimes A)\,{\rm vec}(X),$$ where, if $X$ is an $m\times n$ matrix, then ${\rm vec}(X)$ is an $mn$-dimensional vector. For convenience, in the following I will denote ${\rm vec}(X)$ as $\bar X$. Of course, vectorization is linear: $\overline{X+Y} = \bar X+\bar Y$ and $\overline{aX}=a\bar X$ for scalar $a$. One can also readily verify that $\|X\|_F=\|\bar X\|$.

Then $$\begin{align} \|A-UW\|_F^2+\|RU-H\|_F^2 &= \|\overline{A-UW}\|^2+\|\overline{RU-H}\|^2 \\ &= \|\bar A-\overline{UW}\|^2+\|\overline{RU}-\bar H\|^2 \\ &= \|\bar A-(W^T\otimes I)\bar U\|^2+\|(I\otimes R)\bar U-\bar H\|^2 \\ \end{align}$$

which you can expand into a quadratic in $\bar U$ and minimize using linear algebra.

  • Thanks, Rahul and littleO! Both answers are useful, but I can only check one... – PA6OTA Jul 09 '14 at 16:13
  • littleO's answer is better because it doesn't require you to build a massive $mn$-by-$mn$ matrix. –  Jul 10 '14 at 01:39