0

I am using simple linear regression to fit a line trough a set of points in 2D space.

With linear regression (in my interpretation) points $(x_i, y_i)$ are fit using a model functon $y=f(x)=\alpha+\beta x$ where $\alpha$ and $\beta$ are selected to minimize the sum of squared differences between $f(x_i)$ and $y_i$

I am actually interested in minimizing the sum of squared distances between the points and the fitted line, not the differences between the y coordinates of points and line for corresponding x. In the degenerate case in which all points have the same x coordinate $X_0$, the line $x = X_0$ would be a perfect fit which can not be represented by the model function.

Degenerate case aside, are the two problems equivalent? Do they result in the same fitted line? Or does an alternative closed form solution exist for my problem?

Gianni
  • 140
  • 1
    They are not the same problem, and will not give you the same line. I am not sure if a closed-form solution exists, but if you can write out the cost function, you can code it up in your language of choice and find your solution. I would certainly expect the answer to be similar to the standard problem. – Adrian Keister Jul 12 '19 at 15:05

1 Answers1

2

This is not the same problem at all.

For the case you are considering the distance is given by $$d_i=\frac{|a+b x_i-y_i| }{\sqrt{1+b^2}}$$ So, you need to minimize $$S=\sum_{i=1}^n\frac{(a+b x_i-y_i)^2 }{{1+b^2}}$$ Computing the partial derivatives (which will be set equal to $0$) $$\frac{\partial S}{\partial a}=\frac 2{1+b^2}\sum_{i=1}^n (a+b x_i-y_i)\tag 1$$ $$\frac{\partial S}{\partial b}=\frac 2{1+b^2}\sum_{i=1}^n x_i(a+b x_i-y_i)-\frac {2b}{(1+b^2)^2}\sum_{i=1}^n (a+b x_i-y_i)^2\tag 2$$

From $(1)$, we can extract $$a=\frac 1n \sum_{i=1}^n (y_i-b x_i)=\frac{S_y}n-\frac{S_x}n b\tag 3$$ and simplifying $(2)$ we need to find $b$ such that $$\sum_{i=1}^n x_i(a+b x_i-y_i)-\frac {b}{1+b^2}\sum_{i=1}^n (a+b x_i-y_i)^2=0\tag 4$$ which is a quadratic equation in $b$. If I am not mistaken, $(4)$ write $$\alpha b^2+\beta b +\gamma =0$$ where $$\alpha=n S_{xy}-S_x S_y\qquad \beta=S_y^2-S_x^2+n(S_{xx}-S_{yy})\qquad \gamma=S_xS_y-n S_{xy}$$

If you look here, at the top of page 205, you will find explicit formula for $b$.