1

I want to get the derivative of a matrix function as follow: $$\frac{\partial f(\boldsymbol{AX})}{\partial \boldsymbol{X}}$$ which $f(\cdot)$ is a scalar function, and the result as I think should be the same shape as the matrix $\boldsymbol{X}$

maple
  • 2,883
  • 2
  • 29
  • 37

2 Answers2

3

Let $Y\!=\!AX$, so that $f\!=\!f(Y)$.   I assume that you know how to calculate the derivative $\frac{\partial f}{\partial Y}$ and now wish to calculate $\frac{\partial f}{\partial X}$.

So write down the differential in terms of the Frobenius product (:) and switch the independent variable from $Y$ to $X$. $$\eqalign{ df &= \frac{\partial f}{\partial Y} : dY \cr &= \frac{\partial f}{\partial Y} : (AdX) \cr &= (A^T\frac{\partial f}{\partial Y}) : dX \cr\cr \frac{\partial f}{\partial X} &= A^T\frac{\partial f}{\partial Y} \cr }$$

If you do not know how to calculate $\frac{\partial f}{\partial Y}$ and want help with that, then you'll need to give us more information about the function.

If you are uncomfortable with the Frobenius product, you can replace it with the trace function, $\,\,A\!:\!B = {\rm tr}(A^T\!B)$.

Update

When a scalar function ($f$) is applied element-wise to a matrix argument ($Y$), the differential can be expressed in terms of the Hadamard ($\circ$) product as $$ \eqalign { df &= f'\circ dY \cr } $$ We can use the single-entry matrix $E_{ij}$ and the Frobenius (:) product to isolate a single element $$ \eqalign { df_{ij} &= E_{ij}:df \cr &= E_{ij}:f'\circ dY \cr &= E_{ij}\circ f': dY \cr } $$ Finally, the sigmoid function mentioned in the comments is interesting because the derivative is $f'=(f-f^2)$, which allows us to write $$ \eqalign { df_{ij} &= E_{ij}\circ(f-f^2) : dY \cr } $$

Since $df_{ij}=(\frac{\partial f_{ij}} {\partial Y}:dY)$, the derivative of this element with respect to $Y$ is $$ \eqalign { \frac{\partial f_{ij}} {\partial Y} &= E_{ij}\circ(f-f^2) \cr } $$ and with respect to $X$ it's $$ \eqalign { \frac{\partial f_{ij}} {\partial X} &= A^T\,\frac{\partial f_{ij}} {\partial Y} \cr } $$

greg
  • 586
  • Thanks, and could you please tell me how you turn it from $\frac{\partial f}{\partial Y} : (AdX)$ to $ (A^T\frac{\partial f}{\partial Y}) : dX $ ? – maple Aug 11 '15 at 09:54
  • It is easy to show that $C:AB=A^TC:B$, by considering the equivalent expression in terms of traces ${\rm tr}(C^TAB)={\rm tr}((A^TC)^TB)$ – greg Aug 12 '15 at 01:40
  • Thanks and I have two more questions: 1, in this case, f is a sigmoid function, and the result of $f(Y)$ is also a matrix in which each element $f_{ij}(Y)=sig(Y_{ij})$. I don't know how to define the derivative that from matrix to matrix. 2, Could you please recommend me some books about this kind of matrix derivative? – maple Aug 12 '15 at 02:18
  • Could you please give me a list about books of matrix derivative like this? – maple Aug 12 '15 at 08:21
  • Hjorungnes' "Complex-Valued Matrix Derivatives" is good. There's also the "Matrix Cookbook" -- a free, online resource. – greg Aug 12 '15 at 17:59
1

It is basically the same as with vectors. The chain rule yields the total derivative (for any matrix $H$, having the same size as $X$) $$ D_X (f(AX)) [H] = f'(AX)[AH] = \langle \nabla f(AX), AH \rangle = \langle A^T \nabla f(AX), H\rangle. $$ Thus, the gradient is $A^T \nabla f(AX)$. Here, the inner product is given by $\langle X,Y \rangle = \operatorname{trace}(X^TY)$, and the gradient is the matrix of partial derivatives ordered as $X$.

user251257
  • 9,229