0

I'm reading a book right now on elementary statistics and I'm confused at this passage:

The normal approximation is crucial to the precision of these confidence intervals. Section 4.4 provides a more detailed discussion about when the normal model can safely be applied. When the normal model is not a good fit, we will use alternative distributions that better characterize the sampling distribution. Conditions for x ̄ being nearly normal and SE being accurate.

Important conditions to help ensure the sampling distribution of x ̄ is nearly normal and the estimate of SE sufficiently accurate:

• The sample observations are independent.
• The sample size is large: n > 30 is a good rule of thumb.
• The population distribution is not strongly skewed. This condition can be difficult to evaluate, so just use your best judgement.

I think I'm missing the bigger picture here. Why do we need to make sure the sampling distribution is nearly normal or normal for us to calculate Standard Error and confidence intervals and p values?

Jwan622
  • 5,704

1 Answers1

1

The question could do with a little more context: if the text in question is this one, the quotation is discussing specifically the rules for normal confidence intervals, where estimates of the confidence intervals for are calculated based on the assumption that the sample mean has an approximately normal distribution: if it doesn't, the numbers will not be the same.

One extreme example is the uniform distribution on $[a,b]$, where the standard deviation is $(b-a)/\sqrt{12}$. Then the probability that an observation lies within one standard deviations of the mean is $2/\sqrt{12} \approx 0.577 $, while all observations will lie within two standard deviations, in complete contrast to the normal distribution, where the probabilities are about $0.683$ and $0.954$. However, if we take enough independent observations, the Central Limit Theorem guarantees that a normal distribution is a good enough approximation.

On the other hand, there are distributions where even taking a lot of independent observations does not give an approximately normal distribution for the sample mean: if it turned out that the data had a Cauchy distribution, for example, the sample mean also has a Cauchy distribution; the CLT does not apply here because the tails of the Cauchy distribution contain too much probability for observations to clump in the way that produces a normal sample mean.


The moral of the story is that you can calculate confidence intervals (and $p$-values) for the mean for any distribution (by using the cumulative distribution function and its inverse, for example), and it is possible to extend this to the sample mean, but in the simple cases encountered in beginning statistics, it is normally good enough to take enough independent observations, and then the CLT says that assuming that the sample mean has a normal distribution is reasonably accurate; this is why the normal distribution is so important. Hence one can get away with having just the numbers for the normal distribution to hand a lot of the time (although professional statisticians are very happy to tell you that there are numerous important cases where this doesn't work and you have to pay them to tell you the right answer!).

Chappers
  • 67,606
  • Couple questions: Where does this come from? "One extreme example is the uniform distribution on [a,b], where the standard deviation is $(b−a)/12$. What does uniform distribution on [a,b] mean? Where did the $\sqrt{12}$ come from?

    Also, what does this mean: "Hence one can get away with having just the numbers for the normal distribution to hand a lot of the time"

    – Jwan622 Mar 20 '17 at 16:22
  • So, is it right to say that if the sampling distribution of sample means is not normally distributed, we cannot know that 95% of the time, the true population mean falls within 2 SD of the sample mean? – Jwan622 Mar 20 '17 at 16:26
  • https://en.wikipedia.org/wiki/Uniform_distribution_(continuous) – Chappers Mar 20 '17 at 17:30
  • For the second, if you don't have a computer to hand to calculate the confidence intervals of the distribution, or (very commonly) you don't actually know what the distribution is, the normal approximation is usually good enough to use to estimate the confidence intervals as a first guess. Hence someone calculates the appropriate numbers for the (standard) normal distribution, and then you just look them up in a table, which (at least used to be) quicker than doing the calculation manually—imagine adding up all the probabilities for the binomial distribution for $1000$ trials! – Chappers Mar 20 '17 at 17:39
  • And for the second comment, yes, you cannot know that. Even if the distribution of the data really is normal, but you don't know what the standard deviation is, the sample mean does not follow a normal distribution: instead, it has a $t$-distribution, which has more probability further away from the mean (as can be seen in the plots in the Wikipedia article), and so has wider $95%$ confidence intervals and so on. – Chappers Mar 20 '17 at 17:46