Given a set of points, is there a method to find a rectangle which is the best approximation of the points? For example, in this picture
the blue rectangle is the best fitted rectangle.
-
2Drawing a rectangle has five degrees of freedom. Measure the distances of each data point to the nearest point of the rectangle. With an appropriate loss function (e.g. sum of squares of distances), optimise the rectangle to minimise the loss. I would expect you to need to use numerical methods – Henry Mar 28 '18 at 09:12
3 Answers
Elaborating from Henry's comment, consider that the rectangle is defined by four lines the equations of which being $$y=a+b x \qquad y=c+bx \qquad y=d -\frac x b\qquad y=e -\frac x b$$ You have $n$ data points $(x_i,y_i)$ and the square of the distance from $(x_i,y_i)$ to the line $y=p+mx$ is given by $$d_i^2=\frac{(y_i-mx_i-p)^2}{m^2+1}$$ So, the shortest distance to consider is $$d_i^2=\min\left(\frac{(y_i-a-b x_i)^2}{b^2+1},\frac{(y_i-c-b x_i)^2}{b^2+1},\frac{(b (y_i-d)+x_i)^2}{b^2+1},\frac{(b (y_i-e)+x_i)^2}{b^2+1}\right)$$ and you need to minimize $$\Phi(a,b,c,d,e)=\sum_{i=1}^n d_i^2$$ which is not simple but perfectly doable numerically provided good guesses for the five parameters ("quite easy" to obtain from the scatter plot of the data).
- 260,315
This is about the rectangle's orientation.
If you have $n$ points $P_i$, then the $n(n-1)$ vectors $P_i-P_j$ often point in four perpendicular directions - along the sides of the rectangle. Take all $n(n-1)$ directions, modulo $\pi/2$, so these four perpendicular directions become the same value, and histogram them all.
Here is a set of forty points, ten along each side; and a histogram of the $40*39=1560$ directions.

You might delete the largest distances, which will be opposite sides of the rectangle, and the smallest distances, which will be affected most by noise.
Here is a histogram where the distances are between 12th percentile and 37th percentile.

- 50,853
How about partitioning your data set into disjoint subsets and calculating the line of best fit for each edge of the rectangle. That might work?!! Otherwise I would agree with Henry using calculus to minimize the square of the errors.