Pooled sample variance, how to prove

Question

I did read the related question, and if it did contain the answer to my question, it must have been above my level. This is my very first post, so I'll stick to letters for now.

The pooled sample variance for two stochastic variables with the same variance, is defined as:

$$\frac{((n-1)(∑X-(\bar{X}))^2 +(m-1)∑(Y-(\bar{Y})^2)}{n + m - 2}$$

Why on earth would you use this cumbersome expression? Why not simply add the two sample variances and divide by two?

Like this: $$\frac{((m-1)(∑X-(\bar{X}))^2 +(n-1)∑(Y-(\bar{Y})^2)}{2(n-1)(m-1)}$$

I did the math and...the expected value of this is "also" equal to the variance. It looks more complicated....but it certainly feels more intuitive.

Is there a reason for using the first expression, and not the second?

Thanks a lot!

/Magnus

See the answer to this recent question on stats.SE for a more general version of this idea. — Dilip Sarwate, Oct 23 '14 at 13:41
See also this question, where it is shown that for normally distributed data this expression is the minimum variance estimator of the variance, and assuming a uniform prior for the means it is also the maximum likelihood estimator of the variance. — joriki, May 26 '20 at 15:34

score 1 · Accepted Answer · edited Sep 02 '16 at 02:06

1

The cumbersome expression you are referring at, is nothing more than a weighted average. The weights are the respective sample sizes (the $-1$ is just a correction that yields more desirable statistical properties - in particular unbiasedness of the estimators). Indeed, you can see this if you write the expression for the pooled variance as follows

$$s_p^2=\dfrac{(n-1)s_1^2+(m-1)s_2^2}{n+m-2}=\dfrac{n-1}{(n-1)+(m-1)}s_1^2+\dfrac{m-1}{(n-1)+(m-1)}s_2^2$$

If the samples sizes are equal then the above expression is indeed - as you intuitively expect - equal to the average of the variances

$$s_p^2=\dfrac{n-1}{(n-1)+(n-1)}s_1^2+\dfrac{n-1}{(n-1)+(n-1)}s_2^2=\dfrac{1}{2}s_1^2+\dfrac{1}{2}s_2^2$$

in case $n=m$.

edited Sep 02 '16 at 02:06

DharmaTurtle

141

answered Oct 23 '14 at 12:28

Jimmy R.

35,868

$\frac{S_1^2}{n}$ and $\frac{s_2^2}{m}$ are biased estimators. Sample variances themselves are not biased estimators. Sorry I had to brush up. Let me know if I am right this time. – Satish Ramanathan Oct 23 '14 at 13:07
Understood, good explanation. Such a clear understanding of yours!! – Satish Ramanathan Oct 23 '14 at 13:15
Right, I read up on weighted averages, and I think it makes sense. If I read this correctly...this measures the average squared distance between an observed value and that value's samle mean, irrespective of what that mean that may be? I'd much rather find a common sample mean, but I guess that's not always possible :) – Magnus Oct 23 '14 at 13:57
Could you explain why it yields more desirable statistical properties ? Specifically, what is the value it tries to estimate? – Maciej Jałocha Feb 20 '24 at 10:17

Pooled sample variance, how to prove

1 Answers1