Shortest Distance between a Point and a Numerical 2D Curve

Question

I have a 2D Curve. I have all the numerical values for the line within a certain range. I do not have an equation for this line.

At several points in this 2D space I want to calculate the shortest distance from a point, P, to this non-linear line for which I have the values.

How can I do this? Any suggestion will be very much appreciated.

Sketch attached. sketch of prob

A 'nonlinear line' is a contradiction in terms. It sounds like what you have is a curve. — Semiclassical, Sep 17 '14 at 17:40
Do you want to find the closest one of your points to the given point, or the closest point on the continuous curve (that you don't have in full detail) to the given point? — Ian, Sep 17 '14 at 18:04
To put the issue raised by the answers in my own words: If you have only a set of points drawn from a 2D curve, then you don't have the curve. So you'll either have to minimize the distances pointwise, or you'll have to approximate the curve in some fashion and find the closest approach to that. — Semiclassical, Sep 17 '14 at 20:11
So for example if I were to curve fit.... then use the curve fit expression over the valid range? Would that help? — user1011182, Sep 18 '14 at 16:46
Just done a cf in matlab. MY discrete curve has 100,000 points. The 'y' values vary over the range from 0 to 10 on the 'x' axis. Matlab gives the following exact fit... (Does this help?) Linear model Poly4: f(x) = p1x^4 + p2x^3 + p3x^2 + p4x + p5 Coefficients (with 95% confidence bounds): p1 = 0.00112 (0.001117, 0.001124) p2 = -0.02883 (-0.0289, -0.02877) p3 = 0.2817 (0.2812, 0.2821) p4 = 0.3563 (0.3553, 0.3573) p5 = -0.1102 (-0.1109, -0.1095)
Goodness of fit: SSE: 58.07 R-square: 1 Adjusted R-square: 1 RMSE: 0.0241 — user1011182, Sep 18 '14 at 16:50
A curve fit should probably be for $(x(t),y(t))$, not $y(x)$. In the first case you get $d(z_1,z_2,t)=(z_1-x(t))^2+(z_2-y(t))^2$ which you minimize with respect to $t$; in the second case you get $d(z_1,z_2,x)=(z_1-x)^2+(z_2-y(x))^2$ which you minimize with respect to $x$. Either way is technically fine. — Ian, Sep 18 '14 at 19:31

Ian · Answer 1 · 2014-09-17T19:59:18.020

With just a "point cloud", about which you know nothing, there's nothing you can do besides checking each point. But with a curve like that, you might be able to do something. For example, let's say your curve is continuous and has a parametrization $f(t)$. Then what you have is really $\{ f(t_n) \}_{n=1}^N$ for some unknown values $t_n$. If we consider $f$ is a parametrization by arclength, then one way to approximate the $t_n$ is to start at the endpoint with $t_1=0$ and then have $t_k = t_{k-1} + \| f(t_k)-f(t_{k-1}) \|$ for $k=2,\dots,N$. There are better ways of doing this, but going forward, let us just say that we have the $t_n$ from some procedure which only involves processing the curve.

Now let $x$ be your point off the curve. Then $g(t) = \| x - f(t) \|^2$ is as smooth as $f$ is. Let's assume $g$ is twice differentiable. (This cannot be checked without knowing more about where the problem came from.)

Now consider attempting to find the minimum of $g$ with Newton's method. We can't use Newton's method as it is ordinarily defined, because we can't evaluate $g$ at arbitrary $t$. We also can't write the first or second derivative of $g$ for the same reason.

But we do have discrete approximations. For the first derivative we can use the usual forward or backward difference. For the second derivative the best thing to do is based on Taylor expanding $g(t_{n+1})$ and $g(t_{n-1})$ about $t_n$. Then you can get the appropriate weights for the best approximation of $g''(t_n)$ using only $g(t_{n+1}),g(t_n),g(t_{n-1})$.

The other problem that we will encounter is that Newton's method will want to send us to points where we cannot evaluate $g$. But this is easy enough to fix (at least mathematically): just find the closest $t_n$ to the $t$ that Newton's method is giving you. The ability to do this efficiently will be an important contribution to the choice of an appropriate data structure for this problem. Given a nice enough data structure, this can be done in $\log(N)$ time using a binary search.

Another problem with this approach based on parametrization is that the up-front expense of finding the $t_n$ is actually the same amount of work as it would be to brute force one case of the problem. This approach only saves any time if you need to find the closest point on the curve to many different points, because in this case we can re-use the work that was done to find the $t_n$.

Yet another problem with almost any approach other than full brute force is that $g$ may fail to be convex. In this case, as usual for non-convex optimization, you have to worry about the possibility that your algorithm is detecting a local minimum rather than the global one.

Thanks for detailed comment – user1011182 Sep 18 '14 at 16:52 — user1011182, Sep 18 '14 at 16:52

score 1 · Answer 2 · answered Sep 17 '14 at 18:11

1

If your line is short enough (less than $10^6$ points), you really can just do this by brute force: calculate the distance from $\vec{p}$ to each point in your discretized line, and take the minimum of these distances. This shouldn't be hard even on a basic desktop.

Since you only have a discrete collection of points, this really is as good as you can do. Anything fancier would have to assume something more about your curve.

answered Sep 17 '14 at 18:11

Danny W.

586

thanks for comment 100k points is my line. But I am working out shortest distance for a number of points. WIll this be a big deal? – user1011182 Sep 18 '14 at 16:52
It matters how many points you want to calculate - if you want to find all minimal distances, $(10^5)^2$, that is probably too much for a computer to calculate. In this case, you could probably do a smarter approximate search by looking first at every $nth$ point, and then finding which of those is the closest and searching more carefully around there. – Danny W. Sep 18 '14 at 18:50

armando · Answer 3 · 2014-09-17T20:13:36.640

I will solve this problem by using optimization and the algorithm called "Divide and conquer", I am a native French speaking, I am not so sure if that is how we called the method in English but in French is "Diviser pour mieux régner". This will work well with your problem, as your 2D Curve is a finite sample. Let say tab(N1,N2)=2D Curve N1 and N2 are the dimension of the curve. let call lhs (left hand side of tab) the head of tab and rhs (right hand side of tab) the tail of tab

BEGIN

c = False

while(c=False):

if distance(p,lhs)$>$distance(p,rhs) then:

 tab := tab[N1/2,N2/2] to tab[tail1,tail2]

else

 if distance(p,lhs)$<$distance(p,rhs) then

     tab := tab[head1,head2] to tab[N1/2,N2/2]

   else 
      sortest distance = distance(p,tab[N1/2,N2/2])
      c = True

END

head1 and head2 are the head of tab; remember to refresh these in the while loop. Sorry I just discribed this, for real implementation you need to consider lhs maybe as tab[head1,head2] and rhs as tab[tail,tail];

initialise head1=0; head2=0;tail1=N1;tail2=N2 and refresh them in the while loop after each decision. This method will work even if your 2D list is large.

Ian says that an O(n) search is necessary. Is this supposed to be O(log(n))? — user877329, Sep 25 '23 at 17:32

Shortest Distance between a Point and a Numerical 2D Curve

3 Answers3