0

How exactly could I take the derivative of the following expression?

$$ y = xA^T + b$$

Let's say that I have $x \in \mathbb{R}^{n}$, $A \in \mathbb{R}^{m,n}$, $y \in \mathbb{R}^{m}$, and $b \in \mathbb{R}^{m}$. And, I wish to take the derivative of $y$ with respect to $A$ and $b$, i.e. $\frac{\partial y}{\partial A}$ and $\frac{\partial y}{\partial b}$. I understand that $\frac{\partial y}{\partial A}$ would a rank-3 tensor containing $\frac{\partial y_i}{\partial A_{jk}}$, although I'm not entirely sure how to get to the solution. I've tried looking through the matrix cookbook but the only other solution (that I can find at least) is for $\frac{\partial x^Ta}{\partial x}$ where $x$ and $a$ in this case are both vectors. So, I'm a little confused!

With regards to the second term, $\frac{\partial y}{\partial b}$, I would assume that this is just the identity matrix ($\mathbb{I} \in \mathbb{R}^{m \times m}$) as the terms is just element-wise addition?

Thank you in advance!

user550103
  • 2,688

1 Answers1

2

$\def\E{{\cal E}}\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\v#1{\operatorname{vec}(#1)}$Doing the calculation using index notation (i.e. element-wise) is your best option $$\eqalign{ y_i &= A_{ij}x_j + b_i \\ \p{y_i}{b_k} &= \p{b_i}{b_k} = \delta_{ik} \quad\implies \p{b}{b} = I \\ \p{y_i}{A_{k\ell}} &= \p{A_{ij}}{A_{k\ell}}x_j = \delta_{ik}\delta_{j\ell}\;x_j = \delta_{ik}\,x_\ell \\ }$$ You could also vectorize the equation using Kronecker products $$\eqalign{ a &\doteq \v{A} \\ y &= (x\otimes I) a + b \\ \p{y}{a} &= (x\otimes I) \p{a}{a} = (x\otimes I) I = (x\otimes I) \\ }$$ Or you could use indexed matrices (sort of a half-index notation) $$\eqalign{ y &= Ax + b \\ \p{y}{A_{jk}} &= \left(\p{A}{A_{jk}}\right)x = E_{jk}\,x \\ }$$ where $E_{jk}$ is a matrix containing all zeros except for a single ${\tt1}$ at the $(j,k)$ element.

Or you can go into full-tensor mode by giving a name $(\E)$ to the fourth-order tensor that we encountered using index notation, i.e. $$\eqalign{ \E_{ijk\ell} &\doteq \delta_{ik}\delta_{j\ell} \quad\implies\quad \p{y}{A} &= \E x \\ }$$

greg
  • 35,825
  • Probably a stupid non-mathematical or practical question: do you know how to implement these tensors in programming languages like matlab or python? thank you in advance. – user550103 Feb 03 '21 at 12:33
  • 1
    @user550103 The simplest way in Matlab is E = reshape(speye(m*n), m,n,m,n) In Julia I sometimes use an array comprehension E = [1*(i==k)*(j==l) for i=1:m,j=1:n,k=1:m,l=1:n] You can do something similar in Python / NumPy. – greg Feb 03 '21 at 13:53
  • Thank you very much, greg. Consequently, I have got one more question. Let us say in Matlab, how to multiply the tensor $\mathcal{E}$ with a vector $x$ to obtain the gradient? I must admit that I am still not able to use and understand tensors much (I really hope that I can understand more by playing practically). – user550103 Feb 03 '21 at 14:11
  • 1
    In Julia I use Einsum which I believe was ported from a NumPy package with the same name. Matlab must have libraries for dealing with tensors, but I'm not familiar with those. You might learn more by writing your own functions -- just reshape the tensors into matrices, do a matrix multiply, and reshape the result back into a tensor. The only tricky part is specifying the correct dimensions for the reshape commands. – greg Feb 03 '21 at 14:38
  • Thank you so much, greg!! I think I see the concept now. – user550103 Feb 03 '21 at 14:47