Let's make it clear. Given $f\in C^1(\mathbb R^n;\mathbb R^k)$, the derivative, total derivative, Fréchet derivative, differential, total differential, pushforward, however you call it, of $f$ at a point $x$ is the linear map
$$
df_x=Df_x=f'(x)=\frac{df}{dx}:\mathbb R^n\to\mathbb R^k
$$
such that
$$
f(y)-f(x)-df_x(y-x) = o(|y-x|) .
$$
In the standard basis of $\mathbb R^n$ and $\mathbb R^k$, the derivative is represented by the Jacobian matrix $J_f(x)$.
In the case of a scalar function, $k=1$ and the derivative is a linear map $df_x:\mathbb R^n\to\mathbb R$, that is, an element of the dual space $(\mathbb R^n)^*$.
If you fix a non-degenerate bilinear form $B:\mathbb R^n\times\mathbb R^n\to\mathbb R:(v,w)\mapsto B(v,w)$, then for every linear functional $\phi\in(\mathbb R^n)^*$ there is a vector $v_\phi\in\mathbb R^n$ such that $\phi(w)=B(v_\phi,w)$ for all $w\in\mathbb R^n$.
The gradient is the vector that represents the derivative with respect to a chosen non-degenerate form.
In our case, the linear functional that we want to represent is the derivative $df_x$ and the bilinear form is $B(v,w)=\langle Av,w\rangle=v^TAw$, where $A$ is a symmetric positive definite matrix and $\langle\,\cdot\,,\,\cdot\,\rangle$ is the standard scalar product.
If $\nabla f(x)=J_f(x)$ denotes the standard gradient (the one with respect to the standard scalar product), then
$$
\nabla^A f(x) = A^{-1}\nabla f(x)
$$
is the vector you are looking for. In fact,
$$
B(\nabla^Af(x),w)=\langle A\nabla^Af(x),w\rangle
=\langle AA^{-1}\nabla f(x),w\rangle
=\langle \nabla f(x),w\rangle = df_x(w).
$$