Since we're explicitly considering an embedding, you can use the concept of a vector manifold to attack this problem. There are many benefits to doing so: in particular, you can use geometric (clifford) algebra to analyze the problem.
What Henry has done is clever: he's using the normal to the surface to characterize it, which is something you can always do in $\mathbb R^3$, but not something you can do generally. For instance, if your manifold $M$ has $\dim M = 2$ in $\mathbb R^4$, you can't construct a vector normal to it.
Geometric algebra allows you to characterize an orientable manifold by its pseudoscalar. This is what I will do here.
The unit sphere can be characterized by two angles $\theta, \phi$. We assign each point a vector in $\mathbb R^3$, which is obvious by the embedding. Let these vectors be $r$ (so that $r^2$, the second component of this vector, is $y$ as usual). We can then take, at any $r$, partial derivatives with respect to our angular coordinates to characterize tangent vectors.
$$\begin{align*}e_\theta &= \frac{\partial x}{\partial \theta} = \cos \theta \cos \phi e_1+ \cos \theta \sin \phi e_2 - \sin \theta e_3 \\ e_\phi &= \frac{\partial x}{\partial \phi} = \sin \theta (-\sin \phi e_1 + \cos \phi e_2)\end{align*}$$
For this problem, it helps to rewrite the variables back in terms of $x, y, z$.
$$\begin{align*}e_\theta &= \frac{zx}{\sqrt{x^2 + y^2}} e_x+ \frac{zy}{\sqrt{x^2+y^2}}e_y - \sqrt{x^2+y^2} e_z \\ e_\phi &= - y e_x + x e_y\end{align*}$$
We can now characterize the surface by its pseudoscalar $i$, which is formed by a wedge product of the tangent vectors.
$$i = e_\theta \wedge e_\phi = \sqrt{x^2 + y^2} (z e_{xy} + y e_{zx} + x e_{xy})$$
The pseudoscalar represents, for a surface, an area element. It is a function of position, as you can see. Because it has orientation, the pseudoscalar itself can tell you about the orientation of a manifold (indeed, if you were to get that the pseuscalar is multivalued, you would rightly conclude that the manifold is non-orientable). The pseudoscalar is used to perform projections under the geometric product. If a vector field lies entirely in the tangent space, then its wedge product with the pseudoscalar should be zero. We can see that this is indeed the case.
$$i \wedge X \propto (z e_{xy} + y e_{zx} + x e_{yz}) \wedge (x e_y - y e_x) = yx e_{zxy} - xy e_{yzx} = (yx -xy) e_{xyz} = 0$$
This approach is a bit more complicated for $\mathbb R^3$ than the standard approach, and the clifford algebra aspects of it may take some getting used to, but it's a powerful formalism when you have access to an embedding, and we could've just as easily used it for a 2-dimensional surface in $\mathbb R^4$ with almost no change in approach.