1

I need a detailed step by step to understand please. This is one step from the broader proof 1 of $\nabla_A \mathrm{tr} AB=B^T$ whose preceding steps to this point I understand. This is a totally new area for me so please be explicit in detail.

The exact thing, I think, that I am struggling with in this and similar proofs is the action of the derivative on indices. I know the product rule and other rules of differential calculus pretty well, but I've not done them involving indices or at least at that level of detail. Why $b_{ji}$, the transpose yes, but how did the indices end up that way at the end? What exact process produced the indices? It is actually hard to find resources that work through the tedious calculations (I suspect) that are required to truly follow this.

Thanks in advance!

Joe
  • 489

2 Answers2

2

Let $$f_k(a_{11},a_{12},\ldots,a_{nm},b_{11},b_{12},\ldots,b_{mn}):=\sum_{l=1}^m a_{kl}b_{lk}$$ so that $$\frac{\partial}{\partial a_{ij}}\sum_{k=1}^n f_k=\sum_{k=1}^n \frac{\partial}{\partial a_{ij}}f_k$$ is the derivative in question. Then note that $f_k$ does not depend on $a_{ij}$ if $k\neq i$ so we have $$\sum_{k=1}^n \frac{\partial}{\partial a_{ij}}f_k=\frac{\partial}{\partial a_{ij}}f_i.$$ However, this is simply given by \begin{align*} &\frac{\partial}{\partial a_{ij}}f_i(a_{11},a_{12},\ldots,a_{nm},b_{11},b_{12},\ldots,b_{mn})=\frac{\partial}{\partial a_{ij}}\sum_{l=1}^m a_{il}b_{li} =\sum_{l=1}^m \frac{\partial}{\partial a_{ij}}a_{il}b_{li}\\=&\frac{\partial}{\partial a_{ij}}a_{ij}b_{ji}=b_{ji}. \end{align*}

Nightgap
  • 1,261
  • thanks for responding. I am stuck here: $\sum_{k=1}^n \frac{\partial}{\partial a_{ij}} f_k = \frac{\partial}{\partial a_{ij}} f_i$. Are there steps to be shown here that would help me understand how the index on $f$ changed from $k$ to $i$? – Joe Aug 22 '19 at 19:46
  • 1
    The argument is that if $k\neq i$ we have that $\frac{\partial}{\partial a_{ij}}f_k=0$ since $f_k$ does not depend on the variable $a_{ij}$ in this case. This is just like computing the derivative of $f(x,y):=x$ with respect to $y$, i.e. $\frac{\partial}{\partial y}f=0$ since $f$ does not depend on $y$. – Nightgap Aug 22 '19 at 19:50
  • Right, so that is also how we got rid of the summation in $k$? But then why did we switch to $i$ as the indice and retain the partial derivative on the right hand side? Didn't we already apply the partial (losing $f_k$ in the process)? – Joe Aug 22 '19 at 19:53
  • 1
    Yes, that's the way the sum disappears. Maybe it helps you to write $$\frac{\partial}{\partial a_{ij}}\sum_{k=1}^n f_k=\frac{\partial}{\partial a_{ij}}f_1+\ldots+\frac{\partial}{\partial a_{ij}}f_i+\ldots+\frac{\partial}{\partial a_{ij}}f_n=0+\ldots+0+\frac{\partial}{\partial a_{ij}}f_i+0+\ldots+0=\frac{\partial}{\partial a_{ij}}f_i.$$ – Nightgap Aug 22 '19 at 19:56
  • Checking my understanding (of using $f$), would $f_1$ equal $a_{11}b_{11}$? – Joe Aug 22 '19 at 20:12
  • 1
    No, $f_1(\ldots)$ is $a_{11}b_{11}+a_{12}b_{21}+a_{13}b_{31}+\ldots+a_{1m}b_{m1}$. – Nightgap Aug 22 '19 at 20:51
2

Using Einstein's notation, we have that $$\operatorname{Tr}(AB)=A_{ij}B_{ji}$$ Which means that $$\frac{\partial}{\partial A_{pq}} A_{ij}B_{ji}=\delta_{ip}\delta_{jq}B_{ji}=B_{qp}$$ This happens because $$\frac{\partial A_{ij}}{\partial A_{pq}}=\delta_{ip}\delta_{jq}$$ The same thing happens in the case of $\mathbb{R}^n \mapsto \mathbb{R}$ functions. For example, let $f(x,y,z)=x^2+y^2+z$. Then we have that $\partial_x f = 2x$, $\partial_y f=2y$ and $\partial_z f=1$. It's the same, but instead of different letters, we have indices. (It's like using $x_1$, $x_2$ and $x_3$ instead of $x$, $s$ and $z$)

Botond
  • 11,938