1

Let $A \in \mathbb{R}^{n \times n}$ be an invertible matrix, $v \in \mathbb{R}^{n}$ and $\kappa: \mathbb{R}^{n} \rightarrow \mathbb{R} $ . What is $\frac{\partial\ \kappa(A^{-1}v)}{\partial\ A}$?

I've been trying all sorts of equations from the Matrix Cookbook, but none of them leads to success.

ASML
  • 31

2 Answers2

2

You can decompose your function as

$$ f = \kappa \circ h \circ g $$

where

$$ g(X) =X^{-1} \quad ; \quad h(X) = Xv . $$ In differential form

$$ d \kappa = \langle \nabla \kappa(\mathbb{x}), d\mathbb{x} \rangle \quad ; \quad d h = (dX) v \quad ; \quad d g = - X^{-1} (dX) X^{-1} . $$ Then, by applying chain rule we get differental of $f$

$$ d f = -\Big\langle \nabla \kappa(A^{-1}v), A^{-1} (dA) A^{-1} v \Big\rangle. $$

You can compute derivative in form of matrix

$$ \frac{\partial f}{\partial A} (A) = (x_{i,j} )^n_{i,j = 1}, $$

where each entry has a value

$$ x_{i,j} = -\Big\langle \nabla \kappa(A^{-1}v), A^{-1} X_{i,j} A^{-1} v \Big \rangle $$

with $X_{i,j}$ being a matrix with $1$ at position $i,j$ and $0$ everywhere else.

Nik Bren
  • 1,869
  • Thanks a lot for your help! With respect to what is the gradient taken? I was hoping for an expression in terms of matrices, because this is only one term of a larger equation. Does it help that I know $\frac{\partial \kappa(u)}{\partial u}= -\kappa(u)\cdot u$? – ASML Oct 24 '17 at 21:22
2

For convenience, let's define two new vector variables $$\eqalign{ x &= A^{-1}v \cr g &= \frac{\partial\kappa}{\partial x} \cr }$$ Also, let's use a colon to denote the trace/Frobenius product, i.e. $$A:BC = {\rm tr}(A^TBC)$$ The properties of the trace give rise to lots of rules for rearranging the terms in a Frobenius product, e.g. $$\eqalign{ A:BC &= BC:A \cr &= AC^T:B \cr &= B^TA:C \cr }$$ Write the differential and gradient of the function in terms of these new variables $$\eqalign{ d\kappa &= g:dx \cr &= g:dA^{-1}\,v \cr &= -gv^T:A^{-1}\,dA\,A^{-1} \cr &= -A^{-T}gv^TA^{-T}:dA \cr \frac{\partial\kappa}{\partial A} &= -A^{-T}gv^TA^{-T} \cr }$$ From your other comments, we have an expression for $g$ which we can substitute $$\eqalign{ \frac{\partial\kappa}{\partial A} &= -A^{-T}(-\kappa x)v^TA^{-T} \cr &= \kappa A^{-T}A^{-1}vv^TA^{-T} \cr }$$

greg
  • 35,825
  • That's exactly what I was looking for! Thank you so much! I find all those different layout conventions so confusing and I'm never sure whether the chain rule is valid for higher-order tensors...but you made it look so easy! Didn't know the trace was so useful! Thanks again!! – ASML Oct 25 '17 at 00:33
  • It occurs to me that you can make this result look a bit cleaner by writing it as $$\kappa A^{-T}xx^T$$ – greg Oct 25 '17 at 01:03
  • I went through your derivation: The only thing that's not clear to me is why we can write $\textrm{d}\kappa$ as $\textrm{tr}(g^\top\textrm{d}x)$. – ASML Oct 25 '17 at 01:14
  • One of the defining characteristic of a gradient is that (fully) contracting it with the differential of the independent variable yields the differential of the dependent variable. Here's an example using index notation and high-order tensors for generality: $$dF_{ijk} = \Bigg(\frac{\partial F_{ijk}}{\partial X_{npqr}}\Bigg),dX_{npqr}$$ The thing in parentheses being the gradient. Of course, things are much simpler when working with mere vectors and matrices. – greg Oct 25 '17 at 02:08
  • In the present case, $$d\kappa = g_k dx_k=\Big(\frac{\partial\kappa}{\partial x_k}\Big) dx_k$$ which can also be written as a trace or scalar product, instead of index notation. – greg Oct 25 '17 at 02:15
  • Wish I could upvote your answer multiple times. I really want to learn how to use this technique in my daily work; it's so elegant. For instance: Now that we now the answer for $\frac{\partial \kappa}{\partial A}$, can we reuse this knowledge to infer $\frac{\partial \kappa}{\partial AA^\top}$? Naively, $\frac{\partial \kappa}{\partial A}=\frac{\partial \kappa}{\partial AA^\top}\cdot \frac{\partial AA^\top}{\partial A}$, but that obviously doesn't make any sense, because the second factor is a 4th order tensor; this is exactly the issue I keep running into when trying to use the chain rule. – ASML Oct 25 '17 at 05:55