0

$15$ boys: mean of $47.6$, standard deviation of $11.2$. $10$ girls: mean of $49.1$, standard deviation of $15.4$. Overall standard deviation of the marks?

How would I go about accurately finding the standard deviation from this question? I already have the mean, but I have no idea how to calculate the standard deviation.

  • Next time, you can google first, to save time: https://stats.stackexchange.com/questions/25848/how-to-sum-a-standard-deviation – Matti P. Feb 25 '19 at 12:50
  • @MattiP. the formula given there seems incorrect. It seems the author is claiming that you can simply average the variances, but this is clearly false. Suppose you had two samples with different means but both with variance $0$. Then the average of the variances is clearly $0$ but the variance of the union of the data is not $0$. – lulu Feb 25 '19 at 12:54
  • @lulu In that Cross Validated post, there are comments and other answers pointing out what you just said. Having said that, I agree that it's not a good source to link to. – Lee David Chung Lin Feb 25 '19 at 13:11
  • @LeeDavidChungLin You are correct, I didn't look past the accepted answer (which, to stress, is incorrect). – lulu Feb 25 '19 at 13:23

1 Answers1

2

For greater generality lets say you had two samples $\{b_i\}_{i=1}^B$ and $\{g_j\}_{j=1}^G$. Then we define the statistical data $$\mu_B=\frac 1B \times \sum b_i\quad \quad \mu_G=\frac 1G \sum g_j$$

$$\sigma_B^2=\frac 1{B-1}\times \sum (b_i-\mu_B)^2\quad \quad \sigma_G^2=\frac 1{G-1}\times \sum (g_j-\mu_G)^2$$

Here I am assuming that you are using the sample variance.

The total sample then has $B+G$ elements and of course has average $$\mu = \frac 1{B+G} \times (B\mu_B+G\mu_G)$$

We want to compute $$\sigma^2=\frac 1{B+G-1}\times \left(\sum (b_i-\mu)^2+\sum (g_j-\mu)^2\right)$$

in terms of the standard statistical data for the individual samples.

Let's evaluate $\sum (b_i-\mu)^2$:

$$\sum (b_i-\mu)^2=\sum (b_i-\mu_B+(\mu_B-\mu))^2=\sum (b-\mu_B)^2+2(\mu_B-\mu)\sum (b_i-\mu_B)+B(\mu_B-\mu)^2$$

Now, $\sum (b_i-\mu_B)=0$ so we get $$\sum (b_i-\mu)^2=(B-1)\sigma_B^2+B(\mu_B-\mu)^2$$

Of course we also have $$\sum (g_j-\mu)^2=(G-1)\sigma_G^2+G(\mu_G-\mu)^2$$

Combining all this we have $$\boxed{\sigma^2=\frac 1{B+G-1}\times \left((B-1)\sigma_B^2+B(\mu_B-\mu)^2+(G-1)\sigma_G^2+G(\mu_G-\mu)^2\right)} $$

Worth noting: the "correction term" here is the sample variance of the two means.

Sanity check: note that if we had $\mu_B=\mu_G$ then this is (essentially) just the weighted average of the variances, as you'd expect.

lulu
  • 70,402