Optimization for fat line intersecting most points

Question

Let's say I have a bunch of $(X,Y)$ points in 2D space. I want to find the line $y=mx+b$ which intersects the most points. We can add some kind of buffer (a delta) so if the line $y=mx+b$ is within delta of the point, then it also intersects the point.

I've never taken any optimization theory, but I'd assume this is a pretty simple optimization problem. I'm having some trouble formalizing the objective function to maximize/minimize, so any help with that would be awesome.

Thanks, Michael

You may be looking for the "line of best fit". Or say if the points are vertices of a regular hexagon, would any side suffice for your purposes? — Macavity, Mar 10 '14 at 13:51
A line intersecting one of the sides would be more appropriate in my case — Michael, Mar 10 '14 at 13:57
@Michael: But a line intersecting one of the sides wouldn't be near any of the points. — hmakholm left over Monica, Mar 10 '14 at 15:26

Nameless · Accepted Answer · 2014-03-10T19:03:51.243

Okay, so formally writing this down should be easy. The problem you describe is $$\max_{m,b} \sum_i \boldsymbol{1}\{mx_i+b-\delta\le y_i\le mx_i+b+\delta\},$$ where $\boldsymbol{1}$ is the indicator function, $\delta\ge 0$ and $i$ is the number of the observation. If you have $N$ observations, then the best thing would be if all observations $y_i$ were in the interval $[mx_i+b-\delta,mx_i+b+\delta]$, so that the objective function above takes value $N$. The fewer $y_i$ fall into this interval, the lower the value of the objective function.

To be honest I have no idea how to estimate this though. Least squares obviously is not the way to go here. Note that when you estimate $m$ and $b$, the solution might not be unique. You can then decrease $\delta$ and run it again to see if that reduces the set of solutions. In the alternative, you can use a secondary criterion to discriminate between multiple solutions (like least squares).

score 0 · Answer 2 · answered Mar 10 '14 at 13:53

Your optimization problem can be formulated as $$\min_{m,b \in \mathbb{R}} \|Y-mX-b\|_0,$$ where $\|\cdot\|_0$ is the $\ell^0$ semi-norm defined as $\|v\|_0 = \#\{v_i \neq 0\}$ (See here).

The only drawback is that $\|\cdot\|_0$ is not convex, thus you cannot employ the classical convex optimization tools.

Optimization for fat line intersecting most points

2 Answers2