I'm reading the tutorial The Matrix Calculus You Need For Deep Learning: https://arxiv.org/abs/1802.01528. In Page 25, the derivative of the ReLu function $\text{max}(0, \mathbf{x})$, where the variable $\mathbf{x}$ is a vector $\in R^n$, is given as follows:
My question is, why is the derivative a vector instead of a diagnol matrix as follows?
\begin{align*} \frac{\partial}{\partial \mathbf{x}}max(0, \mathbf{x}) &= diag( \frac{\partial}{\partial x_1}max(0, x_1), \frac{\partial}{\partial x_2}max(0, x_2), \dotsc, \frac{\partial}{\partial x_n}max(0, x_n) ) \\ \end{align*}
The result of the ReLu function $max(0, \mathbf{x})$ is a vector, and the derivative of a vector with respect to a vector variable is a Jacobian matrix. In this case, though, the Jacobian matrix happens to be diagonal too.
Page 7 of the same tutorial presents a general rule as below. I'm not sure how this does not apply to the derivative of ReLu function.

