1

Why is

$$SSE=\sum(y-a-bx)^2$$

called the unexplained variation? I have real trouble understanding this concept which leads to the definition of the coefficient of determination. The books keep saying that the coefficient of determination is the fraction of the total variation that is explained by the variation in $x$ and:

$$r^2=\frac{SSY-SSE}{SSY}$$

In the numerator, SSE is the amount unexplained by the variation in $x$? Yet the formula for SSE certainly uses the variable $x$.

You can see how confused I am by this concept.

Anything that will help me understand this a bit would be appreciated.

D.

David
  • 2,262

2 Answers2

1

By Unexplained we mean the variance of the error term which we do not take into account into our linear model. If you remember we make certain assumptions about the nature of the error term for our OLS to have optimal properties.

Literally, SSE stands for Sum of Squares of Error term. An alternative way to write your first equation is

$$ SSE= \sum e_i ^2 $$ and that is the variance of the error term. Why do you not see the mean in the above? Because one of our assumptions is that the error term has 0 mean!

JohnK
  • 6,444
  • 4
  • 28
  • 54
0

In the numerator, $SSE$ is the amount unexplained by the variation in x? Yet the formula for SSE certainly uses the variable $x$.

Well the formula uses $x$ because $x$ is needed to estimate $y$. $a+bx$ is your estimate of $y$. The better your estimate the smaller $y-(a+bx)$ or $y-a-bx$ will be. However, $a+bx$ won't perfectly model $y$, so usually $y-a-bx$ will be some nonzero number. This is the error of your model. The sum of squares of this error term over all data points is your $SSE$, which is equivalent to unexplained variance.

casandra
  • 301