With just a "point cloud", about which you know nothing, there's nothing you can do besides checking each point. But with a curve like that, you might be able to do something. For example, let's say your curve is continuous and has a parametrization $f(t)$. Then what you have is really $\{ f(t_n) \}_{n=1}^N$ for some unknown values $t_n$. If we consider $f$ is a parametrization by arclength, then one way to approximate the $t_n$ is to start at the endpoint with $t_1=0$ and then have $t_k = t_{k-1} + \| f(t_k)-f(t_{k-1}) \|$ for $k=2,\dots,N$. There are better ways of doing this, but going forward, let us just say that we have the $t_n$ from some procedure which only involves processing the curve.
Now let $x$ be your point off the curve. Then $g(t) = \| x - f(t) \|^2$ is as smooth as $f$ is. Let's assume $g$ is twice differentiable. (This cannot be checked without knowing more about where the problem came from.)
Now consider attempting to find the minimum of $g$ with Newton's method. We can't use Newton's method as it is ordinarily defined, because we can't evaluate $g$ at arbitrary $t$. We also can't write the first or second derivative of $g$ for the same reason.
But we do have discrete approximations. For the first derivative we can use the usual forward or backward difference. For the second derivative the best thing to do is based on Taylor expanding $g(t_{n+1})$ and $g(t_{n-1})$ about $t_n$. Then you can get the appropriate weights for the best approximation of $g''(t_n)$ using only $g(t_{n+1}),g(t_n),g(t_{n-1})$.
The other problem that we will encounter is that Newton's method will want to send us to points where we cannot evaluate $g$. But this is easy enough to fix (at least mathematically): just find the closest $t_n$ to the $t$ that Newton's method is giving you. The ability to do this efficiently will be an important contribution to the choice of an appropriate data structure for this problem. Given a nice enough data structure, this can be done in $\log(N)$ time using a binary search.
Another problem with this approach based on parametrization is that the up-front expense of finding the $t_n$ is actually the same amount of work as it would be to brute force one case of the problem. This approach only saves any time if you need to find the closest point on the curve to many different points, because in this case we can re-use the work that was done to find the $t_n$.
Yet another problem with almost any approach other than full brute force is that $g$ may fail to be convex. In this case, as usual for non-convex optimization, you have to worry about the possibility that your algorithm is detecting a local minimum rather than the global one.
Goodness of fit: SSE: 58.07 R-square: 1 Adjusted R-square: 1 RMSE: 0.0241
– user1011182 Sep 18 '14 at 16:50