2

Let's say I have a bunch of $(X,Y)$ points in 2D space. I want to find the line $y=mx+b$ which intersects the most points. We can add some kind of buffer (a delta) so if the line $y=mx+b$ is within delta of the point, then it also intersects the point.

I've never taken any optimization theory, but I'd assume this is a pretty simple optimization problem. I'm having some trouble formalizing the objective function to maximize/minimize, so any help with that would be awesome.

Thanks, Michael

Yiyuan Lee
  • 14,435
Michael
  • 143

2 Answers2

2

Okay, so formally writing this down should be easy. The problem you describe is $$\max_{m,b} \sum_i \boldsymbol{1}\{mx_i+b-\delta\le y_i\le mx_i+b+\delta\},$$ where $\boldsymbol{1}$ is the indicator function, $\delta\ge 0$ and $i$ is the number of the observation. If you have $N$ observations, then the best thing would be if all observations $y_i$ were in the interval $[mx_i+b-\delta,mx_i+b+\delta]$, so that the objective function above takes value $N$. The fewer $y_i$ fall into this interval, the lower the value of the objective function.

To be honest I have no idea how to estimate this though. Least squares obviously is not the way to go here. Note that when you estimate $m$ and $b$, the solution might not be unique. You can then decrease $\delta$ and run it again to see if that reduces the set of solutions. In the alternative, you can use a secondary criterion to discriminate between multiple solutions (like least squares).

Nameless
  • 4,045
  • 2
  • 20
  • 36
0

Your optimization problem can be formulated as $$\min_{m,b \in \mathbb{R}} \|Y-mX-b\|_0,$$ where $\|\cdot\|_0$ is the $\ell^0$ semi-norm defined as $\|v\|_0 = \#\{v_i \neq 0\}$ (See here).

The only drawback is that $\|\cdot\|_0$ is not convex, thus you cannot employ the classical convex optimization tools.

Paglia
  • 1,274