I think it's easiest to first give a description of what $R(X,Y)Z$ represents geometrically. To be precise, let $p \in M$ and let $u,v,w \in T_pM$; we'll give a geometric description of $R(u,v)w$. To do this, extend $u$ and $v$ to coordinate fields $U, V$ on a neighborhood of $p$. Thus I'm supposing we have a chart $\phi : U \to M$, where $U \subseteq \mathbb{R}^n$ such that $\phi(0) = p$ and $d\phi(\partial_1)_p = u, d\phi(\partial_2)_p = v$.
Then let $\sigma$ denote the square of sidelength $\varepsilon$ in $\mathbb{R}^n$ tracing out the verticies
$$
(0,0) \to (\varepsilon, 0) \to (\varepsilon, \varepsilon) \to (0, \varepsilon) \to (0,0)
$$
in that order, and let $\gamma = \phi \circ \sigma$ be the image of this square in $M$. Then if $P_\gamma$ denotes parallel transport around $\gamma$, one can show
$$
P_\gamma w = w - \varepsilon^2 R(u, v)w + O(\varepsilon^3)
$$
where I don't guarantee the sign is correct. So $R(u,v)w$ gives the first-order correction to parallel translation around small coordinate loops, and the correction is proportional to the "area" of the loop.
Now, in the case that $M$ is two-dimensional and let $u, v$ be orthonormal vectors at $p$. Then by our formula above
$$
\langle P_{\gamma} u, v \rangle = -\varepsilon^2 \langle R(u,v)u, v \rangle + O(\varepsilon^3).
$$
On the other hand, $\langle P_\gamma u, v \rangle$ is equal to $\sin \theta_\gamma$, where $\theta_\gamma$ is the angle change due to parallel translate around our path $\gamma$ -- note that in this two-dimensional case $\theta_\gamma$ completely describes the path-dependence of parallel transport. By definition $\langle R(u,v)u, v \rangle$ is the sectional curvature $\kappa$ of $M$. So our formula says roughly that
$$
\theta_\gamma \approx - A(\gamma) \kappa,
$$
where $A(\gamma)$ is the area inside $\gamma$ and the approximation holds for small curves $\gamma$, and I don't guarantee the sign (which depends on defining the sign of $\theta_\gamma$ precisely anyway).
One thing I like about this picture is that it makes it more or less obvious that something like the Gauss-Bonnet formula should hold, since to parallel transport around a big loop in a surface $M$ you can instead imagine breaking its inside into small loops and parallel transporting around those.
In fact the Gauss-Bonnet formula is a good way of making precise for surfaces this intuition that the sectional curvature $\kappa(u,v)$ of the plane spanned by $u$ and $v$ measures the path dependence of parallel transport for small curves living in an $\mathbb{R}^2$ coordinate chart through $u$ and $v$.
You can also instead think of the sectional curvature $\kappa(u,v)$ (which is just $\langle R(u,v)u, v \rangle$) as a Gaussian curvature, which is a picture that better explains the use of the term "curvature". (In general $\kappa(u,v)$ is the Gaussian curvature of the surface swept out by geodesics whose initial tangent vectors lie in the plane spanned by $u$ and $v$.) The "informal picture" of Gaussian curvature given on Wikipedia is a good picture for this viewpoint.