0

I am currently trying to differentiate the function

$$SS(\beta) = (y - X\beta)^T(y - X\beta)$$

with respect to the vector $\beta$ using the notation of the matrix cookbook. Here, $y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times p}$ and $\beta \in \mathbb{R}^p$. \

First, $SS(\beta)$ is a scalar and $\beta$ is a $p$-dimensional column vector. Therefore, the derivative should be a $p$-dimensional column vector as well (see page 8 in the cookbook). Using identity (37) in the cookbook (the product rule), I find

$$ \frac{\partial}{\partial\beta} SS(\beta) = \Big[ \frac{\partial}{\partial\beta} (y - X\beta)^T \Big] \cdot (y - X\beta) + (y - X\beta)^T \cdot \frac{\partial}{\partial\beta} (y - X\beta). $$ For the first derivative, we can use identity (44) (derivative of transpose is equal to transpose of derivative). Finally, we use

$$ \frac{\partial}{\partial\beta}(y - X\beta) = -\frac{\partial}{\partial\beta} X\beta = -X\beta. $$

Together, this yields $$ \frac{\partial}{\partial\beta} SS(\beta) = -X^T (y - X\beta) + (y - X\beta)^T X. $$

This is generally not equal to $-2X^T(y - X\beta)$, which is what I expected. Where did I go wrong here?

  • The easiest way to do this calculation is to just use the chain rule. Let $f(\beta) = | X \beta -y|^2$. Note that $f(\beta) = g(h(\beta))$ where $h(\beta) = X \beta - y$ and $g(u) = |u|^2$. The derivatives of $h$ and $g$ are $h’(\beta) = X$ and $g’(u) = 2u^T$. By the chain rule, $f’(\beta) = g’(h(\beta)) h’(\beta) = 2(X\beta -y)^T X$. The gradient of $f$ is $\nabla f(\beta) = f’(\beta)^T = 2X^T(X \beta - y)$. – littleO Jan 04 '22 at 18:56
  • @littleO Thank you, I have seen a variety of derivations now. However, I still don't see where in the above derivation I go wrong. I think this insight would be important for my progress.. – rkvymvqt Jan 04 '22 at 19:17
  • 1
    @littleO I see, but that's a typo (it should be of course be equal to X) and not reflected in the final result. – rkvymvqt Jan 04 '22 at 19:32
  • your product rule is wrong. When $u,v$ are vectors that are also functions of another vector $x,$ then the product rule states that: $$\dfrac{\partial u^T v}{\partial x} = u^T\dfrac{\partial v}{\partial x} + v^T\dfrac{\partial u}{\partial x}$$ so as to keep the dimensions consistent. Or you can refer to (78) in the cookbook for the general case of your formula. – dezdichado Jan 04 '22 at 19:50
  • @dezdichado Thank you! Do you know if I can find this version of the product rule in the cookbook as well? – rkvymvqt Jan 05 '22 at 10:45
  • (78) is the general version of that product rule. Besides, it's generally better to use chain rule to simplify computations as much as possible. – dezdichado Jan 05 '22 at 15:41
  • Related https://math.stackexchange.com/questions/165930/derivative-of-a-particular-matrix-valued-function-with-respect-to-a-vector – user550103 Jan 06 '22 at 07:35

0 Answers0