Let ∇A(x) denote the derivative of X with respect to the matrix A. Let X^T denote the transpose of matrix X. Then the following two rules hold.
1) ∇A (trace of AB) = B^T
2) ∇A (trace of AB A^T C) = CAB + C^T A B^T
While both rules are mathematically correct, I was wondering why they both hold.
For instance, from 1), we can say that
∇A (trace of AB A^T C) = ∇A (trace of A (B A^T C) ) = (B A^T C)^T = C^T A B^T
However, the answer is CAB + C^T A B^T
not C^T A B^T
Is there something wrong with the way I calculated it? I just used the rule 1.