I want to get the derivative of a matrix function as follow: $$\frac{\partial f(\boldsymbol{AX})}{\partial \boldsymbol{X}}$$ which $f(\cdot)$ is a scalar function, and the result as I think should be the same shape as the matrix $\boldsymbol{X}$
2 Answers
Let $Y\!=\!AX$, so that $f\!=\!f(Y)$. I assume that you know how to calculate the derivative $\frac{\partial f}{\partial Y}$ and now wish to calculate $\frac{\partial f}{\partial X}$.
So write down the differential in terms of the Frobenius product (:) and switch the independent variable from $Y$ to $X$. $$\eqalign{ df &= \frac{\partial f}{\partial Y} : dY \cr &= \frac{\partial f}{\partial Y} : (AdX) \cr &= (A^T\frac{\partial f}{\partial Y}) : dX \cr\cr \frac{\partial f}{\partial X} &= A^T\frac{\partial f}{\partial Y} \cr }$$
If you do not know how to calculate $\frac{\partial f}{\partial Y}$ and want help with that, then you'll need to give us more information about the function.
If you are uncomfortable with the Frobenius product, you can replace it with the trace function, $\,\,A\!:\!B = {\rm tr}(A^T\!B)$.
Update
When a scalar function ($f$) is applied element-wise to a matrix argument ($Y$), the differential can be expressed in terms of the Hadamard ($\circ$) product as $$ \eqalign { df &= f'\circ dY \cr } $$ We can use the single-entry matrix $E_{ij}$ and the Frobenius (:) product to isolate a single element $$ \eqalign { df_{ij} &= E_{ij}:df \cr &= E_{ij}:f'\circ dY \cr &= E_{ij}\circ f': dY \cr } $$ Finally, the sigmoid function mentioned in the comments is interesting because the derivative is $f'=(f-f^2)$, which allows us to write $$ \eqalign { df_{ij} &= E_{ij}\circ(f-f^2) : dY \cr } $$
Since $df_{ij}=(\frac{\partial f_{ij}} {\partial Y}:dY)$, the derivative of this element with respect to $Y$ is $$ \eqalign { \frac{\partial f_{ij}} {\partial Y} &= E_{ij}\circ(f-f^2) \cr } $$ and with respect to $X$ it's $$ \eqalign { \frac{\partial f_{ij}} {\partial X} &= A^T\,\frac{\partial f_{ij}} {\partial Y} \cr } $$
- 586
-
Thanks, and could you please tell me how you turn it from $\frac{\partial f}{\partial Y} : (AdX)$ to $ (A^T\frac{\partial f}{\partial Y}) : dX $ ? – maple Aug 11 '15 at 09:54
-
It is easy to show that $C:AB=A^TC:B$, by considering the equivalent expression in terms of traces ${\rm tr}(C^TAB)={\rm tr}((A^TC)^TB)$ – greg Aug 12 '15 at 01:40
-
Thanks and I have two more questions: 1, in this case, f is a sigmoid function, and the result of $f(Y)$ is also a matrix in which each element $f_{ij}(Y)=sig(Y_{ij})$. I don't know how to define the derivative that from matrix to matrix. 2, Could you please recommend me some books about this kind of matrix derivative? – maple Aug 12 '15 at 02:18
-
Could you please give me a list about books of matrix derivative like this? – maple Aug 12 '15 at 08:21
-
Hjorungnes' "Complex-Valued Matrix Derivatives" is good. There's also the "Matrix Cookbook" -- a free, online resource. – greg Aug 12 '15 at 17:59
It is basically the same as with vectors. The chain rule yields the total derivative (for any matrix $H$, having the same size as $X$) $$ D_X (f(AX)) [H] = f'(AX)[AH] = \langle \nabla f(AX), AH \rangle = \langle A^T \nabla f(AX), H\rangle. $$ Thus, the gradient is $A^T \nabla f(AX)$. Here, the inner product is given by $\langle X,Y \rangle = \operatorname{trace}(X^TY)$, and the gradient is the matrix of partial derivatives ordered as $X$.
- 9,229
-
-
the total derivative is a linear operator, $H$ is just it's argument. i will expand the answer – user251257 Jul 29 '15 at 01:29
-
How to understand the argument of the total derivative? As far as I know, total derivative don't need any argument. Is there any reference about this? – maple Jul 29 '15 at 01:43
-
See Frechet derivative. In general the derivative is a linear operator. In case of $\mathbb R^n\to \mathbb R^m$ you can write it as multiplication with the Jacobean. In case of matrices it is not always possible. – user251257 Jul 29 '15 at 01:52