Bayesian statistics - explanation of evidence

Question

Despite trying to read multiple resources about Bayesian statistics, I cannot find a (free) resource which explains what is exactly $P(D)$. Most of the resources explain it somehow conceptually instead of numerically. Some call it "evidence", some call it "normalizing factor" and some call it "marginal distribution". All of them fail to provide exact numeric value of this expression when giving examples.

Therefore, I would like to make my own numerical example.

Let's say that we are tossing a coin which is not fair, as $\theta = 0.7$. However, we believe that the coin is fair, thus our $P(\theta) = 0.5$. After flipping the coin 1000 times, we obtained 720 heads. Thus, $P(D|\theta) = 0.72$.

The formula is given by:

$$P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$$

Thus we have:

$$P(\theta|D) = \frac{0.72*0.5}{P(D)}$$

Given these values, I am unsure what is $P(D)$ and what goes into the denominator of the fraction. I would appreciate the explanation, thank you.

I have no idea what $D$ means here. What is your definition? It can't be "the event that you get exactly $720$ Heads when you toss the biased coin $1000$ times"...that's some binomial expression." — lulu, Sep 25 '23 at 11:24
To use Bayesian methods you must have a prior distribution. If you are certain that $\theta =\frac 12$, then there is no mystery here. You just got the result that you got, probable or not. Usually, however, you were not absolutely sure of your value, so your prior had a distribution (unspecified in your example). Clarity is central...whatever assumptions you are making must be clear and explicit. — lulu, Sep 25 '23 at 11:26
@lulu From my understanding, you get a distribution because you repeat this process many times, not just once, i.e. there are iterations. However, in every iteration, you calculate the result of this fraction. To my understanding, everything that goes in the fraction is a single number. How am I going to put the distribution in the fraction? — J. Doe, Sep 25 '23 at 11:30
This doesn't make sense. "data" doesn't have probability, only events do. And the probability of observing exactly $720$ heads out of $1000$ tosses is very small, about $.01$ . Bayes doesn't generate a distribution...you need to get the prior some other way. All Bayes does is to let you refine your prior using observed results. As I say, if your prior was certainty regarding $\theta$ then the evidence changes nothing...it's just the outcome you got. — lulu, Sep 25 '23 at 11:33
@lulu I fail to understand this, unfortunately. Take a look at this resource, for example: https://www.quantstart.com/articles/Bayesian-Statistics-A-Beginners-Guide/
It is clearly stated that $P(\theta)$ is the prior. If it's a distribution and not a single number, then how can I put it in the fraction to calculate the result of the fraction? — J. Doe, Sep 25 '23 at 11:37
Again, you have not stated any prior, so anything I say would be pure guesswork. For a simple example, suppose your prior was that $\theta \in {.65, .7, .75}$ with equal probability. Then this experiment would be evidence that the $.65$ probably isn't right. Bayes let's you re-estimate your probabilities, and you'd get a lower value for $P(\theta = .65)$ and higher values for the other two. Of course, other priors are possible. — lulu, Sep 25 '23 at 11:42
So if I understood correctly, in this specific example, you would compute the result of the above written fraction trice - once with $P(\theta) = .65$, once with $.7$ and once with $.75$. Did I get this correctly?
In any case, I still don't understand what would go in the denominator of the fraction. — J. Doe, Sep 25 '23 at 11:46
Not at all. You shouldn't be writing $P(\theta)$ You should be writing things like $P(\theta = .65)$ which is $\frac 13$ initially but which will get revised given the data. As, I guess, you want $\theta$ to be a random variable, can't speak of $P(\theta)$ as such. — lulu, Sep 25 '23 at 11:47
I suggest: work the problem I proposed. Then do it again, only now assume that you were quite confident (but not certain) of the $.7$ For instance, take $P(\theta=.7)=.9$ and $P(\theta = .65)=.05=P(\theta = .75)$. You should still see some gains in the higher values but now, your confidence was so high (and the result hardly surprising) that you won't move much. — lulu, Sep 25 '23 at 11:49
I am afraid that I completely don't undersand this. I can't "not write" $P(\theta)$ when it is literally in the fraction which I am trying to understand. Are you trying to say that the fraction in the link I provided is wrong and I should be trying to do it somehow else instead? — J. Doe, Sep 25 '23 at 11:51
I have no idea what's in the link. All I know is that you haven't defined your terms. I think I've made sensible guesses as to what your terms mean, but they are still guesses. Without clear definitions, there's nothing to say. Surely you agree that, if $X$ is a random variable, $P(X)$ has no meaning. $P(X=x_0)$ has a meaning, of course. But not $P(X)$. — lulu, Sep 25 '23 at 11:54
Quickly glanced at the link...as you should expect, it's very vague and informal (not bad for a short discussion of a complicated topic, but still). Why use it? In any case, the author clearly refers to "The probability of seeing data under a particular value of $\theta$" which makes sense. For any $\theta_0$ you can look at $P(\theta = \theta_0)$ as I have said. But it's a bad abuse of notation to refer to that as $P(\theta)$. How is the reader to guess which "particular value" you had in mind? — lulu, Sep 25 '23 at 11:57
Thanks for checking out the link. I was trying to summarise it in the comment. Alright, so the $P(\theta)$ should actually be written out as $P(\theta = \theta_{0})$. I also get how would you calculate the $P(D)$, since it is summed/integrated over every value of $\theta$ you might have, as indicated in the link. — J. Doe, Sep 25 '23 at 12:08
Once again, you need to have a prior distribution. I can not stress that enough. Without that, there is nothing to be done. If you have that then, as you say, you can simply sum or integrate to get the total probability that you observe the given result. — lulu, Sep 25 '23 at 12:10

lulu · Accepted Answer · 2023-09-25T12:44:22.483

To summarize the discussion in the comments: The source is very vague and informal, though not actually inaccurate. In particular, they abuse notation in an unhelpful manner...using $P(\theta)$ to denote the probability that $\theta$ is some "particular value", which value is then supressed in the notation. Indeed, to use Bayes in the traditional manner, one must already have a prior distribution in mind. Bayes then lets you use the observed data to improve your distribution.

To illustrate, I'll work two examples based on the the OP's scenario. In both cases I'll assume that we know, a priori, that $\theta$ is one of $\{.65, .7, .75\}$ but I'll sketch the analysis of two different distributions on that set.

Example I (uniform):. Each of the values has the same priority.

Of course, given $\theta=\theta_0$, the probability of observing exactly $720$ heads out of $1000$ tosses is $$P(D\,|\,\theta_0)=\binom {1000}{720}\theta_0^{720}\times (1-\theta_0)^{280}$$ Thus the total probability of observing that result is given by the sum $$P(D)=\frac 13\times \left(\binom {1000}{720}.65^{720}\times (.35)^{280}+\binom {1000}{720}.7^{720}\times .3^{280}+\binom {1000}{720}.75^{720}\times (.25)^{280}\right)$$

To get the revised probability that $\theta = \theta_0$ we use Bayes. In this case we get $$P(\theta = .65)=.0000298\quad P(\theta = .7)=.798\quad P(\theta = .75)=.202$$

Thus (qualitatively) you can now pretty confidently reject the possibility that $\theta = .65$ though there is still a solid chance that $\theta = .75$

Example II (nearly certain that $\theta= .7$). Let's say the distribution is now $(.05, .9, .05)$ instead of uniform. The computation is exactly the same only now, instead of a constant factor of $\frac 13$ everywhere, we have weights. Thus $$P(D)=.05\times \binom {1000}{720}.65^{720}\times (.35)^{280}+.9\times \binom {1000}{720}.7^{720}\times .3^{280}+.05\times \binom {1000}{720}.75^{720}\times (.25)^{280}$$

Applying Bayes, we now get the revised probabilities to be $$P(\theta = .65)=.00000298\quad P(\theta = .7)=.986\quad P(\theta = .75)=.014$$

Thus, in this case, the data has simply confirmed your prior strong belief that $\theta = .7$

Bayesian statistics - explanation of evidence

1 Answers1