You're not missing anything, you're just noticing how sloppy the math is in your current field of study.
When operating on a vector argument, functions are applied element-wise. The differential of such a function is given by
$$f = f(x) \quad\implies\quad df = f'(x)\odot dx$$
where $\odot$ denotes the elementwise/Hadamard product and $f'(x)$ is the ordinary scalar derivative, which is also applied element-wise.
The Hadamard product between two vectors can always be eliminated by converting one of the vectors into a diagonal matrix, e.g.
$$\eqalign{
a\odot b = Ab \quad\Longleftarrow\quad A = {\rm Diag}(a)
}$$
Eliminating the Hadamard product from the differential yields the gradient as
$$\eqalign{
\frac{\partial f}{\partial x} &= F' = {\rm Diag}\big(f'(x)\big) \\
}$$
These ideas apply not just to $\,\tanh(x)\,$ but to any function including
$\,\max(0,x)\;-$ also known as $\,\operatorname{ReLu}(x).$
I notice that some of the comments mention broadcasting to explain/excuse the sloppy mathematics that afflicts the field of neural nets/machine learning. But broadcasting is something different.
Broadcasting simply pads the dimensions of a
scalar/vector/matrix/tensor via repeated dyadic multiplication with all-ones vectors. For example
$$\eqalign{
&A\in {\mathbb R}^{m\times n} \qquad
&v\in {\mathbb R}^{m\times 1} \qquad
{\tt1}\in {\mathbb R}^{n\times 1} \\
&A\odot v
\qquad&\big({\rm incompatible}\big) \\
&A \odot (v{\tt1}^T)
\qquad&\big({\rm compatible\,via\,broadcast}\big) \\
}$$
Broadcasting works for simple multiplication and division, but is worthless (and confusing) when calculating gradients and Jacobians.