8

Does that mean if I draw samples from the population that 90% of the time I'll get a number between 1 and 9?

Added: assume normal distribution for the population.

  • When most people say "I'm 90% confident that..." I don't believe the number has any content. It means "I'm pretty sure that...". I would have gotten rich betting against these at 9 to 1 odds over my lifetime. – Ross Millikan Sep 07 '12 at 22:38
  • 1
    It means that the statistician has used a calculational procedure that has the following property. If the procedure is carried out many many times, about $90%$ of the time the statistician will turn out to be right about the interval she announces, and $10%$ of the time she will turn out to be wrong. – André Nicolas Sep 07 '12 at 23:04
  • 1
    @AndréNicolas You have given the frequentist definition of confidence interval which is probably what the OP is driving at. But if he is serious about fixing the end point at 1 and 9 then i think the only statistical interpretation comes from the Bayesian framework work where the mean is a paramneter that is given a prior distribution and the staement is actually about a credible region for the mean based on the posterior distirbution. Because there you can calculate the proability content for the interval [1,9]. – Michael R. Chernick Sep 08 '12 at 00:29
  • Also posted at http://mathoverflow.net/questions/106629/what-does-it-mean-when-a-statistician-says-im-90-confident-that-the-mean-of-the – JRN Sep 08 '12 at 05:18

4 Answers4

8

No. In a typical setting such a statement is based on the mean of a random sample. Each possible sample of that size from the given population has a certain sample mean. Some of these are very close to the population mean, and some are quite far away. (Imagine, for instance, trying to estimate the average height of an adult male by taking a random sample of adult males and just happening to get a sample consisting entirely of pro basketball players!) However, most of the possible samples will have sample means quite close to the population mean.

What the statement means, then, is that if the population mean is not between $1$ and $9$, the statistician must have drawn one of the very unrepresentative samples $-$ one that’s so unrepresentative that only $10$% of the possible samples are equally unrepresentative (or worse).

Added: Let’s say that there are $N$ possible samples of a given size from that population. The means of those samples will cover a range, from the smallest possible sample mean to the largest. The actual population mean will be somewhere in the middle. Now draw two lines, the first cutting off the $5$% of the samples with the smallest means, the second cutting off the $5$% with the largest means. Here’s a rough sketch of the situation, with $S$ for the smallest and $L$ for the largest possible sample means:

                 first cut                     second cut
   x-----------------|-----------------------------|------------------x  
   S<-------5%------>C<------------90%------------>D<-------5%------->L

The percentages are the percentages of all possible samples having means in the indicated ranges. If you draw a sample at random, on average you’ll get a sample with a mean between $C$ and $D$ $90$% of the time, because $90$% of all possible samples have means between $C$ and $D$, and the samples are all equally likely to be picked when you pick at random.

Similarly, on average you’ll get a sample with a mean between $S$ and $C$ about $5$% of the time, and one with a mean between $D$ and $L$ about $5$% of the time.

The statistician is saying that if the population mean is not between $1$ and $9$, then his sample was either below the first cut, $C$, or above the second cut, $D$. In other words, either he got one of the $10$% of samples that are least like the population, or the population mean is between $1$ and $9$. ‘I’m $90$% confident that the population mean is between $1$ and $9$’ is verbal shorthand for all of that explanation.

Brian M. Scott
  • 616,228
  • wow.... your assumption of the setting of such statement is correct. thank you for answering, but your final statement at explaining this make it a little more confusing X_X – user133466 Sep 07 '12 at 23:09
  • @user133466: I’ll expand the explanation a bit. – Brian M. Scott Sep 07 '12 at 23:14
  • can you turn around and say that 90% of the samples of size n have a mean between 1 and 9? – user133466 Sep 07 '12 at 23:27
  • 1
    @user133466: Not legitimately. The problem is that we don’t actually know what the population mean is, so we can’t make statements about what percentage of samples have means in a given range. What we can say is that if the population mean is not in a certain range, then our sample is a very unlikely one, and therefore we’re pretty confident that the population mean is in that range. – Brian M. Scott Sep 07 '12 at 23:28
  • how is this interval useful to us? I can only make the claim that I'm pretty sure that the population mean is between 1 and 9 – user133466 Sep 08 '12 at 00:01
  • @user133466: No, you can make a stronger claim: you can say that if the population mean is not between $1$ and $9$, then your sample is at best in among the $10$% of samples that are least representative of the population. This is a much more specific statement than ‘I’m pretty sure that the population mean is between $1$ and $9$’. And it might be among the least representative $1$%, or even worse. The thing to remember is that it’s the samples that vary; the population mean is a fixed quantity. – Brian M. Scott Sep 08 '12 at 00:04
  • how about this: It means if repeated samples were taken from the population and a mean computed for each samples, 90% of the samples would include the unknown mean between 1 and 9. – user133466 Sep 08 '12 at 17:57
  • 1
    This is similar to the fact that in hypothesis testing, you either reject the null hypothesis, or you don't. You never accept the null hypothesis. – M Turgeon Sep 16 '12 at 16:36
  • Is the sample size here fixed? Or is it allowed to vary the in set of possible samples (which would mean it would be every possible combination of the population ranging from size 0 to entire population size). That diagram seems to imply that 90% of all possible samples would result in a sample mean that is between 1 and 9, and 10% of all sample means would not. – CMCDragonkai Apr 26 '18 at 04:44
3

When a statistician makes such a statement, it usually means they are confused.

Most statisticians are frequentists, and the frequentist paradigm disallows such statements, though frequentists say things like this all the time.

To a Bayesian statistician, the statement means that the uncertainty in the population mean has been modeled as a probability distribution. Starting from a prior distribution that expresses what it known/believed before acquiring data, the distribution is updated in light of data according to Bayes theorem. The updated distribution contains 90% of its mass between 1 and 9.

John D. Cook
  • 7,038
  • 1
    It’s a perfectly acceptable informal statement that $(1,9)$ is a $90$% confidence interval, a statement that is certainly not disallowed by the frequentist view. – Brian M. Scott Sep 07 '12 at 22:42
  • @Brian: But your restatement is not the original title question, which treats the mean as a random variable. – Henry Sep 07 '12 at 23:44
  • 2
    @Henry: I take the original question to be the one in the title, and the one in the body to be the common misinterpretation. – Brian M. Scott Sep 07 '12 at 23:46
  • @Henry & John, why does the formalist paradigm forbid the title statement? It is the statement a formalist would make by leaving implicit some of the details (the statistical model and CI procedure that were used), and there are equivalent omissions when a Bayesian utters the same sentence. – zyx Sep 14 '12 at 08:44
  • @zyx: A Bayesian would regard 9-1 as fair odds on the mean being in the 90% credible interval, i.e. 90% is a probability. A frequentist would regard that as meaningless, as the confidence interval presupposes the hypothesis. – Henry Sep 14 '12 at 14:01
  • 1
    @Henry, the question title does not refer to probability, only confidence, so the frequentist who utters that phrase is not placed in the position of imputing a meaningless (from his point of view) random nature to a model parameter. – zyx Sep 14 '12 at 14:30
2

When you draw a sample, you don't get a number; you get a list of numbers. Or more precisely, a sample is a list of numbers.

No, it is not true that 90% of those numbers in the list are between $1$ and $9$, nor is it true that $90\%$ of the time, when you take a sample, anything in particular will be between $1$ and $9$.

A confidence interval depends on the list of numbers that you get in a sample. Say you take a sample of $20$ numbers, and the resulting confidence interval for the population mean is the interval from $1$ to $9$. Typically a much smaller proportion than $90\%$ of the numbers in the sample are between $1$ and $9$, and not infrequently, none of them are.

Now say you take another random sample of $20$ numbers from the same population, and the resulting confidence interval is from $2$ to $8.5$. And then you take another random sample of $20$ from the same population, and the confidence interval you get is from $1.5$ to $11$. And so on. Then $90\%$ of the time, the interval you get will include the population mean. That is what it means.

  • thank you for the explanation. My mistake sorry. – Seyhmus Güngören Sep 08 '12 at 10:01
  • Just a question, if you draw a sample that can also be a single sample. It doesnt make sense to make estimation with one sample but it does make for example for detection. Meanwhile I am confused: whenever we draw 20 samples how do we calculate a new confidence interval? as long as I know a confidence interval is fixed and should not depend on the samples you have. – Seyhmus Güngören Sep 08 '12 at 10:45
  • @SeyhmusGüngören : Read the account of confidence intervals in any textbook. You will find that a confidence interval always depends on the sample. Your confusion is merely the condition of someone who's never done that. – Michael Hardy Sep 08 '12 at 17:32
  • I agree. I read already. It depends on the sample size. No doubt on this matter. However according to my understanding whenever the sample size is fixed then so the confidence interval. As a result, given 20 samples, we have one confidence interval for example for 90% we have $1-9$. Whenever we draw another 20 samples this interval does not change. However according to your answer this does change. – Seyhmus Güngören Sep 08 '12 at 18:06
  • One more thing. What I am saying is based on the assumption that population mean is known. I think you assume that it is unknown and whenever you draw a sample you refine the population mean. Therefore confidence intervals are also approaching to the true intervals. That's why whenever a sample is received confidence interval also changes, in the direction of the true values. – Seyhmus Güngören Sep 08 '12 at 18:45
  • @SeyhmusGüngören : I've rarely if ever seen anyone more confused about a topic, when all the confusion would be resolved by simply reading a basic account in a beginning textbook. Confidence intervals depend on data, and not only on sample sizes, and if they did depend only on sample sizes, there'd be no reason to consider confidence intervals in the first place. You're just demonstrating that you're completely clueless. There is no "true interval" that confidence intervals converge to as the sample size increases. And you're misusing the word "sample" even after I corrected you. – Michael Hardy Sep 08 '12 at 20:13
  • @SeyhmusGüngören : A well-behaved confidence interval for a population mean converges to an interval of length $0$, containing only the population mean, as the sample size grows. Please stop making a fool of yourself. Read a basic textbook account of what confidence intervals are. – Michael Hardy Sep 08 '12 at 20:15
  • I suggest you to be kind. I remember another discussion of you with another person who is at least as well knowledged as you. You behaved in the same way to him as well. – Seyhmus Güngören Sep 08 '12 at 20:20
  • @SeyhmusGüngören : Your posted incorrect answer and your other incorrect and confused statement cause me to suspect something. Consider a normal distribution with (population) mean $\mu$ and variance $\sigma^2$. Say this distribution puts probability $0.9$ in the interval $\mu\pm a\sigma$. My suspicion is that you think either that that's what a confidence interval is, or that that is what a confidence interval is supposed to estimate. Both of those ideas are completely wrong. – Michael Hardy Sep 08 '12 at 20:44
  • Yes sure my post was incorrect. It was my mistake as I told you this too. So in my post I tried to say (on average) trying to mean the sample average. It was still incorrect. Additionally $P(1<X<9)=1/9$ was also incorrect. I realized my mistake immediately. So when I come back to your last comment, I dont think that it is the confidence invertal at all. Confidence interval is defined over the distribution of an unknown, to be estimated, parameter $\theta$. This can be either mean $\mu$ or another unknown. Whenever a sample is drawn, this gives one estimate of that parameter. – Seyhmus Güngören Sep 08 '12 at 20:51
  • having say $N$ estimates of that parameter from samples that we drawn (say each having length $K$) now we can estimate the distribution of $\Theta$, our estimator of $\theta$. Now the confidence interval is $P(\gamma_1<\Theta<\gamma_1)=0.9$. Is this correct or not? – Seyhmus Güngören Sep 08 '12 at 20:55
0

Assuming a nromal dsitribution for the population would only add a specific formula for obatining a confidence interval. If you truly want the Bayesian posteriori probability for the interval [1, 9] then the normal distirbution is needed for the likelihood piece but you also need to specify a prior distribution for the mean.

Michael R. Chernick
  • 4,639
  • 2
  • 19
  • 24