1

Why is $$\frac{\delta (e^Te)}{\delta e} = 2e^T$$ ?, where $e$ is a vector. Doesn't the product rule apply here ?

Because from understanding, isn't it like this:
$$\frac{\delta (e^Te)}{\delta e} = e\frac{\delta e^T}{\delta e} + e^T\frac{\delta e}{\delta e} = e + e^T$$, which is completely invalid since $e$ and $e^T$ have different dimensions. Where did my formula go astray?

2 Answers2

4

Let us write the indices down to understand what is going on. Denote components of the vector $e$ as $e^i$. $$\frac{\delta (e^{T}e)}{\delta e}=\frac{\delta}{\delta e^{i}}e_{j}e^{j}=\underbrace{e_{j}\delta^{j}_{i}+\delta_{ij}e^{j}}_{\text{product rule}}=e_{i}+e_{i}=2e_{i}=2e^T$$ Here we used the Einstein's notation where summation over repeating indices is always implied, $e_{i}$ corresponds to $e^{T}$, $\delta_{ij}$ is the Kronecker delta (the unit matrix), and we use the facts that $\frac{\delta e^{i}}{\delta e^{j}}=\delta^{i}_{j}$ and $\delta_{j}^{i}f^{j}=f^{i}$. Note that $\partial_{i}x^{j}=\delta^{i}_{j}$, but $\partial_{i}x_{j}=\delta_{ij}$ (see the clarification on Einstein's notation down below).

The reason you have $2e^T$ instead of $e+e^{T}$ is that you take the variation of a scalar quantity with respect to components of a vector; what you obtain must contain exactly one number for each component of the vector with respect to which you take variation. Speaking in fancy terms, if you take variation of a scalar with respect to a vector, you get a covector, and vice versa.

In general, when taking derivatives of complex expressions involving vectors, matrices, tensors etc., often the most robust way to get the right answer is to use Einstein's notation and several simple rules. We call the summation over a pair of repeating indices,one top and one bottom, contraction.

Let us denote $ \frac{\partial }{\partial x^{i}}$ as $\partial_{i}$, components of a vector with $x^{i}$, where $i$ has an appropriate number of values, depending on the dimension of space. Although it is not strictly necessary when working with regular Euclidean space, it is be convenient to denote $x=x^{i}$, $x^{T}=x_{i}$, and use the rule that only a pair of indices where one is on the top and another on the bottom can be summed over (you can matrix multiply $x^{T}x$, but not $xx$ and $x^{T}x^{T}$).

Then $$\frac{\partial }{\partial x^{i}}x^{j}=\delta^{j}_{i}$$

(the derivative a component of a vector with respect to a component of a vector is one when the indices coincide and 0 otherwise), matrices of linear operators correspond to objects with one top and one bottom index, which act on vectors with appropriate contractions: $$(A\vec{x})^{i}=A^{i}_{j}x^{j}$$ $$(\vec{x}^{T}A)_{j}=A^{i}_{j}x_{i}$$

Objects with two bottom indices are bilinear forms, since they can be contracted with two vectors and this contraction gives you a scalar, objects with two top indices are "bivectors". Derivatives with respect to vector components have bottom indices, since they can be contracted with vectors to produce a scalar quantity, the divergence. Likewise, derivatives with respect to covector components (i.e. components of transposed vectors) have top indices. In this notation, a derivative of arbitrary complex expression can be calculated, using $\frac{\partial }{\partial x^{i}}x^{j}=\delta^{j}_{i}$, the fact that $\delta^{i}_{j}$ is the unit matrix, so, when being contracted with another expression by one index, it just substitutes its other index into that expression, on the same height on which the index was in the $\delta$ (so $\delta_{ij}$ can "lower indices": $\delta_{ij}e^{j}=e_{i}$, and $\delta^{ij}$ can "raise indices": $\delta^{ij}e_{j}=e^{i}$), and the product rule: $$\partial_{i}(A(x)B(x))=(\partial_{i}A(x))B(x)+A(x)(\partial_{i}B(x))$$ The last thing to remember is that in this notation the order of components inside a monomial does not matter, since all summations are uniquely defined by the indices, so $A^{i}_{j}b_{i}=b_{i}A^{i}_{j}$, and that a monomial can't contain more than two identical indices.

  • So, i guess here the product rule can't be directly applied right? Also, could you explain why it isn't 2e. Because. It's actually 2e^T according to the my book :( – MathematicsBeginner Jan 15 '24 at 10:07
  • 2
    First, the product rule is exactly what I have applied, I have even explicitly indicated the moment when I apply it. Second, despite me making the mistake in the description of the result (naturally, the right answer is $e^{T}$), the Einstein's notation is so clever that it took care of it and forced the right calculation. You can notice that the calculation results in $e_{i}$, $e$ with the lower index. This corresponds to $e^{T}$. The vector $e$ has an upper index, $e^{i}$. – Daigaku no Baku Jan 15 '24 at 10:19
1

So you are looking for the differential of the map $x\mapsto x^Tx$ where $x\in R^n$ written as a column vector. For fixed $x$ this differential is the linear map from $R^n$ to itself defined by $h\mapsto h^T x+x^Th.$ If you decide to consider $R^n$ as a Euclidean space, ie a space with an inner product here defined by $\langle x,y\rangle=x^Ty,$ then $x^Tx=\|x\|^2.$ In general the differential of a real function $f$ at a point $x$ defined on a Euclidean space $E$, namely the linear form $h\mapsto f'(x)(h)$ is rather represented by a vector of $E$ called gradient. This is due to the fact that any linear form $h\mapsto \ell(h)$ on $E$ can be represented by a unique vector $g_{\ell}\in E$ such that $\ell(h)=\langle g_{\ell},h\rangle$. The tradition uses the abuse of notation $g=g_{\ell}=\ell.$ Coming back to your problem, here $R^n$ is Euclidean and the linear map $h\mapsto h^Tx+x^Th$ is written by the scalar product $2\langle x,h\rangle =2x^Th$. The gradient is $2x$ its action on $h$ is written $2x^Th=2h^Tx.$ This why your book speaks (vaguely) by saying that the differential is $2x^T.$