Let's compute the gradient of $g(U) = \|A - U W \|^2$.
We'll use the fact that if $P,Q \in \mathbb R^{m \times n}$, then
$\langle P, Q \rangle = \text{Tr}(P Q^T)$.
\begin{align}
g(U + \Delta U) &= \|A - U W - \Delta U W \|^2 \\
&\approx \|A - U W \|^2 - 2 \langle A - U W, \Delta U W \rangle \\
&= g(U) - 2 \text{Tr}\left((A - UW) W^T \Delta U^T \right) \\
&= g(U) - 2 \langle (A - UW)W^T, \Delta U \rangle.
\end{align}
Comparing this with $g(U + \Delta U) \approx g(U) + \langle \nabla g(U), \Delta U \rangle$,
we see that
\begin{equation}
\nabla g(U) = -2(A - UW) W^T.
\end{equation}
Next let's compute the gradient of $h(U) = \|RU - H \|^2$.
We'll use the fact that if $P, Q \in \mathbb R^{m \times n}$,
then $\langle P, Q \rangle = \text{Tr}(P^T Q)$.
\begin{align}
h(U + \Delta U) &= \| RU - H + R \Delta U \|^2 \\
&\approx \| RU - H \|^2 + 2 \langle RU - H, R \Delta U \rangle \\
&= h(U) + 2 \text{Tr} \left((RU - H)^T R \Delta U \right) \\
&= h(U) + 2 \langle R^T (RU - H), \Delta U \rangle.
\end{align}
Comparing this with $h(U + \Delta U) \approx h(U) + \langle \nabla h(U), \Delta U \rangle$,
we see that
\begin{equation}
\nabla h(U) = 2 R^T(RU - H).
\end{equation}
Now setting the gradient of $f(U) = g(U) + h(U)$ equal to $0$, we obtain
\begin{align}
-(A - UW) W^T + R^T (RU - H) = 0
\end{align}
This is a Sylvester equation for $U$, and could be solved with standard techniques for Sylvester equations.
For example, Matlab has a function called sylvester.