0

I am trying to understand linear regression. The typical model takes form $$y_{i}=ax_{i} +b + \epsilon_{i}, \ \ \ i=1..N$$ where $\epsilon_{i}$, is an i.i.d Gaussian random variable. The objective is to minimize $$\sum_{i=1}^{N} (y_{i} - ax_{i} – b - \epsilon_{i})^{2}.$$

The computation of the gradient yields to: $$\frac{\partial}{\partial a} = -\sum_{i=1}^{N} y_{i} x_{i} + a\sum_{i=1}^{N} x_{i}^{2} +b \sum_{i=1}^{N} x_{i} + \sum_{i=1}^{N} x_{i}\epsilon_{i}$$ $$\frac{\partial}{\partial b} = -\sum_{i=1}^{N} y_{i} + a\sum_{i=1}^{N} x_{i} +bN + \sum_{i=1}^{N} \epsilon_{i}$$ My question concerns the terms involving $\epsilon_{i}$. What are the arguments that allow us to state that these terms are equal to zero?

R. Ho
  • 1
  • 1
    They are not zero, but they vanish upon taking expectations of both sides. This is appropriate because you are looking for an unbiased estimator for $a,b$ anyway. (Look up the Gauss-Markov theorem, which is really the connection between "linear algebra" least squares and "statistics" least squares.) – Ian Sep 08 '16 at 11:59

1 Answers1

0

Usually we assume $\epsilon\sim N(0,\sigma^2)$, so $E[\epsilon]=0$ and if sample size is large enough $\sum^N\epsilon = NE[\epsilon]=0$

wenxi
  • 11