I have a concern with the unbiased sample variance that I can't seem to understand. $$S^2=\frac{1}{N-1}\sum_{i=0}^{N}(x_i-\bar{x})^2=\frac{\sum_{i=1}^{N}{x^2}_i}{N-1}-\frac{\left(\sum_{i=1}^{N}x_i\right)^2}{N(N-1)}$$ Now I can't understand the right hand side, surely from the fact that $V(X)=E(X^2)-E(X)^2$ shouldn't the $N$ in the denominator be squared as well as $$E(X)=\frac{\left(\sum_{i=1}^{N}x_i\right)}{N}$$ And also there is no $N$ in the denominator for the first term, why is that? Isn't $$E(X^2)=\frac{\sum_{i=1}^{N}{x^2}_i}{N}$$ I know the top formula is correct as I have calculated examples, I'm just failing to see why.
1 Answers
There's a whole slew of questions in this posting.
Get clear about one thing before attempting to think about the rest of this: the "unbiased sample variance" is not supposed to be the variance of a probability distribution; rather it is supposed to be an estimate, based on a sample taken from a large population, of the variance of the population. Some things that apply to variances of probability distributions simply do not apply to the "unbiased sample variance"; for example: the variance of the sum of independent random variables is the sum of their separate variances. That does not apply to this!
The quantity $\bar x =(x_1+\cdots+x_N)/N$ is the sample average, not the population average. Suppose $\mu$ is the population average. Then the expected value of $$ \frac 1 N \sum_{i=1}^N (x_i-\mu)^2 \tag 1 $$ would be $\sigma^2$. Putting $\bar x$ in place of $\mu$ moves that subtracted quantity closer to the observed values $x_1,\ldots,x_n$, thereby replacing the sum $(1)$ with a smaller number. Then changing $N$ to $N-1$ makes it larger again, so that it is again something whose average value is $\sigma^2$. That last step is "Bessel's correction", named after Friedrich Bessel, who is better known as the eponym of Bessel functions.
Now let's do some algebra: $$ \begin{align} \sum_{i=1}^N (x_i-\bar x)^2 & = \sum _{i=1}^N (x_i^2 - 2\bar x x_i + \bar x^2) \\[12pt] & = \left(\sum_{i=1}^N (x_i^2)\right) - 2\bar x\left(\sum_{i=1}^N x_i\right) + \sum_{i=1}^N (\bar x^2). \tag 2 \end{align} $$ The reason that $2\bar x$ can be pulled out of the sum is simply that it does not change as $i$ goes from $1$ to $N$. The same happens with the very last sum above: $$ \sum_{i=1}^N (\bar x^2) = \bar x^2\sum_{i=1}^N 1 = N\bar x^2. $$ The second sum in $(2)$ is $$ \sum_{i=1}^N x_i = N\bar x. $$ Hence $(1)$ becomes $$ \left(\sum_{i=1}^N (x_i^2)\right) - 2N\bar x^2 + N\bar x^2 = \left(\sum_{i=1}^N (x_i^2)\right) - N\bar x^2. \tag 3 $$ Now one can divide both sides by $N-1$.
But the last term in $(3)$ can be written thus: $$ N \bar x^2 = N\left(\frac{\sum_{i=1}^N x_i}{N}\right)^2 = \frac 1 N\left(\sum_{i=1}^N x_i\right)^2. $$
-
A very concise and informative answer, thanks! – George1811 May 15 '14 at 19:14