4

Given a matrix $X$ (which doesn't need to be square) and a vector $b$, how can I get the following equality? $$\frac{b^t X^t X b}{\partial b} = (X^t X ) b $$

Why is this wrong? $$\frac{b^t X^t X b}{\partial b} = \frac{(X b )^t X b}{\partial b} = \frac{( X b )^t}{\partial b} X b + (X b)^t\frac{X b}{\partial b} = X^t X b + (X b)^t X $$

Also, how can I directly calculate this without multiplying the terms in the numerator? $$ \frac{(y-X b)^t (y-X b) }{\partial b}$$

Micah
  • 38,108
  • 15
  • 85
  • 133

2 Answers2

1

Neither of those results is quite right. The derivative of $f = b^T X^T X b$ is

$$ \eqalign { \frac {\partial f} {\partial b} &= 2 X^TXb \cr } $$

To derive this, express the function in terms of the Frobenius product and rearrange the differential until you isolate $db$ on the RHS.

$$ \eqalign { f &= Xb:Xb \cr \cr df &= 2(Xb):d(Xb) \cr &= 2Xb:X db \cr &= 2X^TXb:db \cr }$$ You can also get to the same result using index notation. You'll end up with an expression with two terms, $(b_iX^T_{ij}X_{jk}{db}_k + {db}_iX^T_{ij}X_{jk}b_k)$.

Then you just have to remember that $X^TX$ is symmetric so that $X^T_{ij}X_{jk} = X^T_{kj}X_{ji}$, which allows you to combine the two terms.

For your second question, anytime you have a function which is the product of 2 identical terms, i.e. $f = w:w$, then the derivative/differential is of the form $df = 2w:dw$. This result was applied in the preceding derivation.

Updated

A quick review of the algebra of Frobenius products might make the above answer less "incomprehensible". It's nothing too deep, and flows easily from the definition, $$ A:B \equiv {\rm tr}(A^TB) $$ Just as there are mixed-product rules for Kronecker products $$ \eqalign { (AB)\otimes(XY) &= (A\otimes X)(B\otimes Y) \cr }$$ there are mixed-product rules for Frobenius products $$ \eqalign { (AB):(X) &= (A):(XB^T) \cr &= (B):(A^TX) \cr } $$ Basically you can move a matrix to the opposite side of the Frobenius product if you transpose it, and retain its relative position (RHS or LHS) on the other side.

Similar to the rule for transposing Kronecker products $$ \eqalign { A^T\otimes B^T &= (A\otimes B)^T \cr } $$ there's a rule for Frobenius products $$ \eqalign { A^T:B^T &= (A:B)^T \cr } $$

Frobenius products are also commutative, distributive, and follow the standard product rule under differentiation $$ \eqalign { A:B &= B:A \cr A:(B+C) &= (A:B) + (A:C) \cr d(A:B) &= (dA:B) + (A:dB) \cr } $$ which makes algebraic manipulations quite simple. For example, $$ \eqalign { d(w:w) &= (dw:w) + (w:dw) \cr &= (w:dw) + (w:dw) \cr &= 2 w:dw \cr } $$

lynne
  • 410
  • 1
    @ lynn , clearly your result is correct but your proof is almost incomprehensible. Give instructions for use. Moreover your result is the gradient and not the derivative. –  Nov 16 '14 at 20:31
  • @loupblanc I've updated the answer to be more comprehensive. Given a scalar function of a vector argument, I'm not aware of any derivative other than the gradient. What did you have in mind? – lynne Nov 17 '14 at 05:51
  • @ lynne , the derivative of $f(b)$ is the linear function: $h\in \mathbb{R}^n\rightarrow 2(X^TXb)^Th$. –  Nov 17 '14 at 11:10
1

It's known that $$ \frac{\partial x^tAx}{\partial x}=(A+A^t)x $$ (see for example here).

Thus, $$\begin{align} \frac{\partial b^tX^tXb}{\partial b}&=(X^tX+(X^tX)^t)b &\\ &=2X^tXb &\text{if $X^tX$ is symmetric} \end{align} $$ For $$ (y - Xb)^t(y - Xb) = y^ty - y^tXb - b^tX^ty + b^tX^tXb = y^ty - 2b^t(X^ty) + b^t(X^tX)b $$ we have $$ \frac{\partial (y - Xb)^t(y - Xb) }{\partial b}= -2X^ty + 2(X^tX)b, $$ where we have used the fact that $X^tX$ is symmetric.

alexjo
  • 14,976