If you like to think of the gradient as a vector, then it shouldn't matter if its components are written in lines or in columns.
What really happens for a more geometric perspective, though, is that the natural way of writing out a gradient is the following: for scalar functions, the gradient is:
$$
\nabla f = (\partial_x f, \partial_y f, \partial_z f);
$$
while for vectors
$$
v = \left (\begin{array}{c} a\\ b \\ c
\end{array} \right),
$$
it is:
$$
\nabla v = \left (\begin{array}{ccc} \partial_x a & \partial_y a & \partial_z a \\ \partial_x b & \partial_y b & \partial_zb \\ \partial_x c & \partial_y c & \partial_z c
\end{array} \right).
$$
This is because the differential of a (vector) function is a (vector-valued) differential form, rather than a vector. It is something that, in the scalar $f$ case, looks like a vector, but in general sends vectors to scalars.
So they looked naturally like transposed vector! A form $\omega = (x,y,z)$ sends the vector $v$ into a scalar, defined simply by the matrix product:
$$
\omega(v) = (x,y,z) \left(\begin{array}{c} a\\ b \\ c
\end{array} \right) = xa+yb+zc.
$$
The differential, or "gradient" $df$ is a differential form because it sends vectors to the directional derivative in their direction, which is a scalar!
$$
df(v) = (\partial_xf,\partial_yf,\partial_zf) \left(\begin{array}{c} a\\ b \\ c
\end{array} \right) = a\partial_x f + b\partial_y f + c\partial_z f.
$$
For vector-valued functions, just do this for every component. So you get gradients with multiple lines.
"Transposing it" is usually, underneath, transforming the differential form $df$ into a vector. This is something involving the metric and, in the classical physics/engineering case, is fortunately quite a trivial operation. Its meaning, though, is that you would like to work with a (regular) vector, instead of with a differential form.
(I wrote all the examples in three dimensions, but they work in general.)