Multivariate Calculus: Differentiating the following problem

Question

Suppose $\mu$ is $m \times 1 $, $A$ is $m \times m$, $B$ is always $m \times n$ and $\Sigma$ is $n \times n$. Note that $\Sigma$ is symmetric.

I need to differentiate the follow form:

$$\ell = -\log( \det[B \Sigma B^T]) - \operatorname{tr}([B \Sigma B^T]^{-1} [\mu\mu^T - \mu\mu^T A^T - A(\mu\mu^T)^T + A \mu\mu^T A^T])$$

Now I would like to know how can I obtain the following:

$$\frac{\partial \ell }{\partial A} = \text{?}$$ $$\frac{\partial \ell }{\partial \Sigma} = \text{?}$$ $$\frac{\partial \ell }{\partial B} = \text{?}$$

And What would the optimal $A$ , $\Sigma$ and $B$ be after differentiating and rearranging the terms to one side ?

Update:

Through its differential,

I have taken an attempt and obtained the following: Let $Z = [\mu\mu^T - \mu\mu^T A^T - A(\mu\mu^T)^T + A \mu\mu^T A^T]$

$$d \ell = -tr\Big(\big[2 B^T \Sigma(B\Sigma B^T)^{-1})\big]dB + \big[ B^T(B\Sigma B^T)^{-1}B\big] d\Sigma + \big[ (B\Sigma B^T)^{-1}Z(B\Sigma B^T)^{-1} B\Sigma +\big((B\Sigma B^T)^{-1}Z(B \Sigma B^T)^{-1}B \Sigma\big)^T \big]dB + \big[ B^T(B\Sigma B^T)^{-1}Z^T (B\Sigma B^T)^{-1}B\big]d \Sigma\Big) - tr\Big(\Big(\big[B\Sigma B^T\big]^{-1}\big[ 2\mu^T\mu - 2\mu\mu^TA^T\big]\Big)dA\Big)$$

Please kindly verify if it is correct.

The problem that remains is how to rearrange the terms such that the optimal $A, B, \Sigma$ will be on one side.

I tried but I still do not know how to differentiate the trace inverse part $tr([B\Sigma B^T]^{-1} ...)$. — user1769197, May 10 '17 at 16:25
Ummm do you have additional constraints like $\Sigma$ being symmetric? (This's à usual one). — Vim, May 10 '17 at 16:33
Yes. Sorry. I forgot to put that one. I just updated the question highlighting that $\Sigma$ is symmetric. — user1769197, May 10 '17 at 16:35

score 0 · Accepted Answer · answered May 10 '17 at 19:09

We can hand-wave a bit and guess that your optimization problem won't have a solution. Set $A = \mathbb{1}-\frac{\mu\mu^T}{\mu^T\mu}$; then the second term vanishes and we can make the first term as large or as small as we like by (for example) setting $\Sigma = \sigma \mathbb{1}$, fixing $B$ arbitrarily and taking $\sigma$ to $0$ or $\infty$. The calculus bears out that intuition.

Start by rewriting the likelihood to make the calculus easier. Let $Y=(B\Sigma B^T)^{-1}$; note that $Y$ is symmetric and invertible. Then your likelihood becomes $$ l(Y, A) = \log \vert Y \vert -\mu^T Y \mu - \mu^T A^T Y A \mu + \mu^T A^T Y \mu + \mu^T Y A \mu$$ The derivatives of these expressions can be easily found using standard identities: $$ \frac{\partial l}{\partial Y} = Y^{-1} - \mu \mu^T - A \mu \mu^T A^T + \mu \mu^T A^T + A \mu \mu^T $$ $$ \frac{\partial l}{\partial A} = -2 Y A \mu \mu^T + 2 Y \mu \mu^T$$

Setting these derivatives equal to zero, we have a system of equations that cannot simultaneously be satisfied. Since $Y$ is invertible, setting $\frac{\partial l}{\partial A} =0$ implies $\mu\mu^T = A \mu\mu^T$. But this would imply $\frac{\partial l}{\partial Y} = Y^{-1} = 0$. So your optimization problem has no local extrema.

Multivariate Calculus: Differentiating the following problem

1 Answers1