0

I would like to understand more about the Least Squares, but it remains unclear to me, where these equations come from.

for a linear approximation $y=ax+b$

$a=\frac{m\sum_{i=1}^{m} x_iy_i-(\sum_{i=1}^{m}x_i)(\sum_{i=1}^{m}y_i)} {m\sum_{i=1}^{m}x_i^2 - (\sum_{i=1}^{m}x_i)^2}$

and

$b = \frac{(\sum_{i=1}^{m}x_i^2)(\sum_{i=1}^{m}y_i)-(\sum_{i=1}^{m}x_iy_i)(\sum_{i=1}^{m}x_i)}{m\sum_{i=1}^{m}x_i^2 - (\sum_{i=1}^{m}x_i)^2}$

What I have tried so far:

In the least squares method, the quadratic error should be minimized.

This means

$E(a,b)=\sum_{i=1}^{m}[y_i-(ax_i+b)]^2$

should be minimal.

For a minimum to exist, the partial derivatives of this expression must be zero.

$\frac{\partial{E}}{\partial{a}}$ = $2 \sum_{i=1}^{m}[y_i-(ax_i+b)]\hspace{2pt}(-x_i)=0$

$\frac{\partial{E}}{\partial{b}}$ = $2 \sum_{i=1}^{m}[y_i-(ax_i+b)]\hspace{2pt}(-1)=0$

Rearranging these, yields the following:

$a\sum_{i=1}^{m}x_i^2 + b\sum_{i=1}^{m}x_i = \sum_{i=1}^{m}x_iy_i$

$a\sum_{i=1}^{m}x_i+bm = \sum_{i=1}^{m}y_i$

This is where I'm stuck. I see the terms $x_iy_i$, etc. that also appear in the expressions in question - but I don't know how to rearrange to get there.

  • Things are much, much nicer if we make good use of matrix and vector notation. – littleO Jan 02 '24 at 11:10
  • Note that m appears in the answers for both a and b, as well as the sums you have in your equations. So maybe just temporarily name all of these using some new letters, then solve the system using maybe Cramer's rule, then sub back and simplify. – coffeemath Jan 02 '24 at 11:28

1 Answers1

1

You are doing everything right, but for such derivation it is easier to use matrix calculus (link). It is quite similar to usual $\mathbb R\to\mathbb R$ function differentiating rules. This is the one we are going to use: $$d\langle A\textbf{x}, \textbf{x}\rangle=\langle (A + A^T)\textbf{x}, d\textbf{x}\rangle $$ Let me briefly show: $$f(\textbf{x}) = \|A\textbf{x}-\textbf{b}\|_2^2 \to \min$$ To use the rule above we need other function presentation: $$f(\textbf{x}) = \langle A\textbf{x} - \textbf{b}, A\textbf{x} - \textbf{b} \rangle = \langle A\textbf{x}, A\textbf{x} \rangle - 2\langle A\textbf{x}, \textbf{b} \rangle + \langle \textbf{b}, \textbf{b} \rangle= $$ $$= \langle A^TA\textbf{x}, \textbf{x} \rangle - 2\langle A^T\textbf{b}, \textbf{x}\rangle + \langle \textbf{b}, \textbf{b} \rangle$$ Now it is easy to apply the rule: $$df(\textbf{x}) = 2 \langle A^TA \textbf{x}, d\textbf{x} \rangle - 2\langle A^T\textbf{b}, d\textbf{x}\rangle \Rightarrow \nabla f = 2(A^TA\textbf{x} - A^T\textbf{b})$$ To get the extremum point we need to find gradient zeros: $$A^TA\textbf{x} = A^T\textbf{b} \Rightarrow \textbf{x} = (A^TA)^{-1}A^T\textbf{b}$$ We get the same result as least squares method does. But the proof is not finished. We need to find the Hessian and proof that it is positive defined.

UPD: For your example: $$A = \begin{bmatrix} x_1 & 1\\ x_2 & 1 \\ \vdots & \vdots\\ x_m & 1 \end{bmatrix}$$ Then: $$ A^TA = \begin{bmatrix} x_1 & x_2 & \dots & x_m\\ 1 & 1 & \dots & 1 \end{bmatrix} \begin{bmatrix} x_1 & 1\\ x_2 & 1 \\ \vdots & \vdots\\ x_m & 1 \end{bmatrix}= \begin{bmatrix} \sum_{k=1}^m x_k^2 & \sum_{k=1}^m x_k\\ \sum_{k=1}^m x_k & 1 \end{bmatrix} $$ Now it is easy to get expressions for $a$ and $b$.