Differentiation of a matrix in respect to a vector

Question

Could someone show and explain the differentiation of the following

$$\frac{\partial(x^t Ax)}{\partial x} $$

Where x is a column vector and A is a symmetric square matrix. I'm in highschool so I have no experience with vector calculus and I only need this derivation for a partial derivative in a Lagrange function. It should come out as $2Ax$

Thanks a lot, Tom

$x$ is a column vector and $A$ is a symmetric square matrix? — grand_chat, Oct 08 '17 at 06:18

grand_chat · Accepted Answer · 2017-10-08T15:06:46.520

If you're not comfortable with vector calculus, try out the assertion on a small case, say $n=2$. Then you are trying to differentiate the scalar (one-dimensional) quantity $$\begin{align} Q(x_1,x_2):=x^TAx&= \begin{matrix}(x_1 &x_2)\end{matrix} \left(\begin{matrix}A_{1,1} &A_{1,2}\\ A_{2,1} &A_{2,2}\end{matrix}\right) \left(\begin{matrix}x_1 \\x_2\end{matrix}\right)\\ &=x_1A_{1,1}x_1 + x_1A_{1,2}x_2 + x_2 A_{2,1}x_1 + x_2A_{2,2}x_2. \end{align} $$ Taking the partial derivative of $Q$ with respect to $x_1$ gives $$ \frac{\partial Q}{\partial x_1}=2A_{1,1}x_1+A_{1,2}x_2 + x_2A_{2,1}=2(A_{1,1}x_1 + A_{1,2}x_2)\tag1$$ since $A_{1,2}=A_{2,1}$. We recognize the RHS of (1) as the first element in the column vector $2Ax$, which is the product of the scalar $2$ with the matrix $A$ and the column vector $x$.

A similar calculation shows that $\frac{\partial Q}{\partial x_2}$ is the second element in $2Ax$.

EDIT: Now that you see how the $n=2$ case works, the general case is similar. Write out the matrix product: $$ Q(x_1,x_2,\ldots,x_n):=x^TAx=\sum_{i=1}^n\sum_{j=1}^n x_iA_{i,j}x_j. $$ For a fixed $k$ you compute the partial derivative of $Q$ wrt $x_k$ by considering which of the indices $i,j$ are equal to $k$: $$\begin{align} \frac{\partial Q}{\partial x_k}&= \frac{\partial}{\partial x_k}(x_kA_{k,k}x_k)+ \frac{\partial}{\partial x_k}(\sum_{j\ne k}x_kA_{k,j}x_j)+ \frac{\partial}{\partial x_k}(\sum_{i\ne k}x_iA_{i,k}x_k)\\ &=2A_{k,k}x_k +\sum_{j\ne k}A_{k,j}x_j + \sum_{i\ne k} x_iA_{i,k}\\ &=2\sum_k A_{k,j}x_j, \end{align} $$ the last equality arising after relabeling index $i$ as $j$ and using the fact $A_{j,k}=A_{k,j}$. We recognize the final quantity as the $k$th element in the column vector $2Ax$.

Writing out the matrix product is typically the way to prove these vector calculus identities. You should be aware that different authors use different conventions for notation in these identities, depending on whether the derivative of a scalar with respect to a vector is seen as a column vector or as a row vector. See https://en.wikipedia.org/wiki/Matrix_calculus for a very detailed discussion.

Thanks for the answer. Do you think it'd be feasible to prove the assertion via induction for integers n greater than or equal to 2 just to be more rigorous? — Thomas Simpson, Oct 08 '17 at 08:07
@Nickkkk I've added some remarks on the general case. There's a pattern in the $n=2$ case that you can replicate for general $n$. — grand_chat, Oct 08 '17 at 15:03

vita nova · Answer 2 · 2017-10-08T09:17:01.670

2

First of all, the convention in linear algebra is to express vector $x$ as a $n\times 1$ matrix. Conforming to this would reduce some confusion while doing complicated matrix algebra.

I convinced myself with the following two steps while struggling the first year of grad school, and hope it helps you.

Step1

before getting to the expression, I guess you had already met $$ \dfrac {\partial Ax}{\partial x^T}=A $$ let me explain this one first.

define $g(x): \mathbb R^n \rightarrow \mathbb R^m$ where the $i^{th}$ element of $g$ is $g_i(x_1,x_2,...,x_n)$. Then we call it "Jacobian", the $m\times n$ matrix of first partial derivative : $$ \dfrac {\partial g(x)}{\partial x^T}=\dfrac {\partial (g_1,g_2,...,g_m)}{\partial (x_1,x_2,...,x_n)} $$ where element in row i and column j is $g_{ij}=\dfrac {\partial g_i(x_1,x_2,...,x_n)}{\partial x_j}$

Now, let $g(x)=Ax$, then $$ g_i(x_1,x_2,...,x_n)=a_{i1}x_{i1}+a_{i2}x_{i2}+...+a_{in}x_{in} $$ and we get $\dfrac {\partial g_i(x_1,x_2,...,x_n)}{\partial x_j}=a_{ij}$ which means $\dfrac {\partial Ax}{\partial x^T}=A$

Step2

note that $x^TAx=x^T(Ax)$, using the product rule: $$ \dfrac {\partial g^T h}{\partial x^T}= h^T \dfrac {\partial g}{\partial x^T}+g^T \dfrac {\partial h}{\partial x^T} $$ we get $$ \begin{align} \dfrac {\partial x^TAx}{\partial x^T}&=(Ax)^T\dfrac {\partial x}{\partial x^T}+x^T\dfrac {\partial (Ax)}{\partial x^T}\\ &=x^TA^T+x^TA\\ &=x^T(A+A^T)\\ &=2x^TA \end{align} $$ the last line comes form the symmetry of $A$.

ps. many Ph.D. students in econ or some other fields still don't know why it is true, based on my observation...

edited Oct 08 '17 at 09:17

answered Oct 08 '17 at 09:02

vita nova

669

Hey, thanks for the explanation before the problem, that definitely helped to lessen my confusion in this world of foreign mathematics. It seems like you took the partial derivative in respect to transpose x instead of x though. Did you incorrectly transcribe the question or am I missing something?
A Ph.D. in economics is my goal anyway, so I guess it's good that I've got the leg up now :) – Thomas Simpson Oct 08 '17 at 09:52
well, I think the way you write $\frac{\partial(x^T Ax)}{\partial x}$ is not correct: the denominator cannot be a $n\times 1$ vector. – vita nova Oct 08 '17 at 10:08
http://faculty.washington.edu/ezivot/econ424/portfolioTheoryMatrix.pdf Page 12 lists a Lagrangian function partial differential with respect to column vector x , specifically (1.15) – Thomas Simpson Oct 08 '17 at 10:33
x is a 3x1 vector and Sigma is a 3x3 vector – Thomas Simpson Oct 08 '17 at 10:48
yeah, I see the notes, and this is why I don't like the way they do this... Note that when $m=1$, the Jacobian is $1\times n$, but economists likes to transpose it to $n\times 1$, Then, the r.h.s. of $\frac{\partial(x^T Ax)}{\partial x^T}=2x^T A$ is $2Ax$ since A is symmetric. To make it more reasonable, they also transpose the denominator on the l.h.s. It does enhance interpretation though, they can say the derivative w.r.t. $x_j$ is the jth row of f.o.c. but sacrifices accuracy. – vita nova Oct 08 '17 at 11:51

Differentiation of a matrix in respect to a vector

2 Answers2