4

For this question from a large amount of data I have calculated that the mean is 44.22, the sample size is 100 and the standard deviation is 22.0773.

From this I am asked to , make the 98% confidence intervals for the (1) true mean µ of the module mark (2) true variance of the module mark

And for each: (a) Determine what quantity to look at, and which distribution table to use, justifying your choice. (b) Determine the number of degrees of freedom, justifying your answer. (c) Calculate the actual intervals.

So far for 1, I have used the z table to look for $99\%$ as I need $1\%$ to the right of $2.33$ and $1\%$ to the left of $-2.33$, so $98\%$ is between $\pm2.33$. Giving me

$$ \bar x \pm 2.33\frac{\sigma}{\sqrt{n}} $$

Which provides me with a 39.08 to 49.36 confidence interval, is this correct? And how would I determine degrees of freedom and go about answering part 2?

P.Math
  • 99
  • 1
    Given part 2, you do not know the population standard deviation. This means that your procedure for the mean is wrong because you need to use the $t$ distribution instead. The procedure for the $t$ distribution will answer your question about the degrees of freedom. – Ian May 09 '16 at 00:24

1 Answers1

3

Confidence intereval for population mean. A 98% confidence interval for $\mu$ is of the form $\bar X \pm t^* s/\sqrt{n},$ where $t^*$ cuts off 1% from the upper tail of Student's t distribution with $df = n-1.$ So you are almost correct for that part. Here $t^* = 2.365.$ I get the CI $(39.00, 49.44)$ from the following brief R session.

 a = 44.22;  s = 22.0773;  n = 100
 t.c = qt(.99, n-1);  pm = c(-1,1)
 a + pm*t.c*s/sqrt(n)
 ## 38.99959 49.44041

Confidence interval for population variance. To get a 98% CI for $\sigma^2,$ one can use the fact that $(n-1)s^2/\sigma^2 \sim Chisq(df = n - 1).$ So if $L$ and $U$ cut 1% from the lower and upper tails of $Chisq(99),$ we have $$P\left(L \le \frac{(n-1)s^2}{\sigma^2} \le U \right) = .98$$ from which we get $$\left((n-1)s^2/U,\; (n-1)s^2/L\right)$$ as a 98% CI for $\sigma^2.$ In particular, for your data the CI is $(358.4, 697.0).$ (You could get $L$ and $U$ from a suitable printed table of the chi-squared distribution.) Notice that $s^2 = 487.4$ is contained in this interval, but is $not$ at the very center of the interval---because the chi-squared distribution is not symmetrical. Take the square root of both endpoints if you want a CI for $\sigma.$

 (n-1)*s^2/qchisq(c(.99,.01),n-1)
 ## 358.3833 697.0011

Note: The so-called 'rule of 30' may be roughly OK for 95% CIs because when $n \ge 30$ we have $df \ge 29$ and the values that cut of .025 from the upper tail of the t distribution are 'near' 2.0, just as 1.96 is 'near' 2.0:

 qt(.975, 29:35)  # df from 29 through 35
 ## 2.045230 2.042272 2.039513 2.036933 2.034515 2.032245 2.030108

However, this sort of approximation does not generally work well for CIs at levels other than 95%. None of the values below is really 'near' 2.326:

 qt(.99, 29:35)
 ## 2.462021 2.457262 2.452824 2.448678 2.444794 2.441150 2.437723

Please check your t table, if R code mystifies you.

BruceET
  • 51,500
  • 1
    Thank you very much for this answer. I have just noticed I left out a piece of information from the question 'assuming the marks are normally distributed, make the 98% confidence intervals' does this mean that the z table has to be used even though the data I have provides the sample standard deviation instead of the population standard deviation? – P.Math May 09 '16 at 01:16
  • Near normality of the data makes it OK to use the t distribution when σ is unknown. See the note I added to my Answer about using z instead of t as an approximation for the CI. When σ is known, use z methods. When σ is unknown and estimated by s, use t methods, regardless the size of $n$. – BruceET May 09 '16 at 01:57