1

I'm in a deep learning class, and I always seem to mess up derivative questions, because I put the matrices in the wrong order or transposed/not when they were supposed to be the other way around.

Here's one simple question I have, what is:

$$\frac{ \partial (A B) }{ \partial X }$$

When $A \in \mathbb{R}^{M \times N}$, $B \in \mathbb{R}^{N \times P}$, and $X \in \mathbb{R}^{U \times V}$.

My class uses "denominator convention", which according to my notes means the answer should be a tensor with dimensions $U \times V \times P \times M$.

I'm aware of the "Matrix Cookbook", but that usually doesn't seem to contain what I need. If anyone can recommend a good book for learning this material, that would be great. My class doesn't talk about "contravariant, covariant" etc., so I'm not trying to learn differential geometry. I just want to know the matrix algebra equivalent of all of the calculus rules (given that these are matrices/tensors, not just real numbers).

Joe
  • 2,661

1 Answers1

3

Let $$C=A\star B$$ where $(A,B,C)$ are tensors (scalars, vectors, matrices, other) and $(\star)$ is any product (Matrix, Hadamard, Frobenius, Kronecker, Dyadic, other) which is compatible with the tensor dimensions.

The only rule that you should memorize is the product rule for differentials $$dC = dA\star B + A\star dB$$ where the order is important when the product is not commutative.

The nice thing about the differential expression is that the quantities $(dA,dB,dC)$ have same tensorial character as $(A,B,C)$ and no higher-order tensors are required.

For example if $(A)$ is a matrix and $(B,C)$ are vectors then $(dA)$ is a matrix and $(dB,dC)$ are vectors.

Further, if the independent variable $(x)$ is a scalar, then the gradient will have exactly the same form as the above product rule, i.e. $$\frac{dC}{dx} = \left(\frac{dA}{dx}\right)\star B + A\star\left(\frac{dB}{dx}\right)$$ Index notation is always an option, e.g. for the given example $$\eqalign{ C_{ik} &= \sum_{j=1}^N A_{ij}\,B_{jk} \\ dC_{ik} &= \sum_{j=1}^N dA_{ij}\,B_{jk} + A_{ij}\,dB_{jk} \\ \frac{\partial C_{ik}}{\partial X_{pq}} &= \sum_{j=1}^N \left(\frac{\partial A_{ij}}{\partial X_{pq}}\right)B_{jk} + A_{ij}\left(\frac{\partial B_{jk}}{\partial X_{pq}}\right) }$$

greg
  • 35,825
  • Thanks! It’s great to know the product rule still looks familiar with tensors. Any recommendations for books on this material? – Joe Feb 07 '20 at 20:01
  • 1
    The standard text is probably Matrix Differential Calculus by Magnus and Neudecker. I also quite like Complex-Valued Matrix Derivatives by Hjorungnes. Besides the Matrix Cookbook here is another online PDF worth a read if you deal with complex quantities. – greg Feb 07 '20 at 20:18