Let $A=A^T\in \mathbb R^{k\times k}$ be a nonzero symmetric matrix and define $F:\mathbb R^k\to\mathbb R$ by $$f(x):=x^TAx$$ Then why $df(x)\xi=2x^TA\xi$ for $x,\xi\in\mathbb R^k$?
-
to begin to see, try the cases $k=1$, $k=2$ – janmarqz Dec 16 '14 at 17:07
-
1Possibly duplicate, see here (very similar) – alexjo Dec 16 '14 at 17:11
2 Answers
It's because
\begin{align}df(x)\xi &= \frac{d}{dt}|_{t = 0} f(x + t\xi)\\ &= \frac{d}{dt}|_{t = 0} (x + t\xi)^T A(x + t\xi)\\ &= \frac{d}{dt}|_{t = 0} (x^T + t\xi^T) A(x + t\xi)\\ &= \frac{d}{dt}|_{t = 0} (x^T + t\xi^T)(Ax + tA\xi)\\ &= \frac{d}{dt}|_{t = 0} (x^TAx + t(\xi^T Ax + x^TA\xi) + t^2\xi^TA\xi)\\ &= \xi^T Ax + x^TA\xi\\ &= 2x^TA\xi \end{align}
- 41,901
-
1Because both terms are real scalars; and $a^T=a$ when $a$ is a scalar. – Michael Grant Dec 16 '14 at 17:52
-
-
There's a nice answer linked by alexjo using coordinates. Here's an answer without coordinates, using the fact that we know the derivative of a linear map:
Consider $h(x,y) = x^T A y$, with $h: \mathbb{R}^k \times \mathbb{R}^k \rightarrow \mathbb{R}$.
We can restrict $h$ to each factor with $h(x,-) : \{x\} \times \mathbb{R}^k \rightarrow \mathbb{R}$ and $h(-,y): \mathbb{R}^k \times \{y\} \rightarrow \mathbb{R}$.
Then $dh_{x,y} (\xi_x\oplus \xi_y) = dh_{x,y}(\xi_x \oplus 0) + dh_{x,y}(0\oplus \xi_y) = d(h(-,y))_x (\xi_x) + d(h(x,-))_y (\xi_y)$
Because $h(-,y)(x) = x^T A y = y^T A x$ is linear in $x$, we get $d(h(-,y))_x(\xi_x) = y^T A \xi_x$.
Similarly, because $h(x,-)(y) = x^T A y$ is linear in $y$, we get $d(h(x,-))_y(\xi_y) = x^T A \xi_y$.
Thus $dh_{x,y}(\xi_x \oplus \xi_y) = y^T A \xi_x + x^T A \xi_y$.
Finally, we have $f(x) = h(x,x) = h \circ \Delta$ for $\Delta: \mathbb{R}^k \rightarrow \mathbb{R}^k \times \mathbb{R}^k$ given by $\Delta(x) = (x,x)$.
We have $d\Delta_x(\xi) = \xi \oplus \xi$
Thus $df_x(\xi) = (dh_{x,x} \circ d\Delta_x)(\xi) = 2 x^T A \xi$.