Conditional probabilities with impossible outcomes

Question

Suppose we want to predict the outcome of a race between three runners $A$, $B$, $C$. We know the prior probabilities for head-to-head runs: $p_{A>B}$, $p_{A>C}$, $p_{B>C}$, where $A>B$ means that A finishes before B. How do we get the probability $p_{A>B>C}$?

I thought I could just use a tree diagram but I run into the following problem: Some outcomes are impossible. For example, if my first edge is "A finishes before B" and my second edge is "A finishes after C", then "B finishes before C" is no longer possible. I thought about just setting probability of impossible outcomes to 0, but then my outcome probabilities are different depending on which of the three probabilities I start with.

I think I'm making some basic mistake here.

I've seen a lot of similar questions but I don't think they answer this. I've also tried looking at how horse race outcomes are predicted, but couldn't find an answer that I understood.

Phil H · Accepted Answer · 2018-09-20T15:41:29.417

4

I don't think you can determine $P(A>B>C)$ from the individual probabilities unless you know the actual order in which $A,B$ and $C$ finish which can change $P(A>B>C)$ for the same individual probabilities.

Example, in one $10$ race series the finishing order was: $ABC, ABC, ABC, ABC, ABC, ACB, ACB, BAC, BAC, BCA$ where $P(A>B) = 0.7; P(A>C) = 0.9$ and $P(B>C) = 0.8$.

In another $10$ race series the finishing order was: $ABC, ABC, ABC, ABC, ABC, ABC, ACB, BAC, BAC, CBA$ with identical $P(A>B) = 0.7; P(A>C) = 0.9$ and $P(B>C) = 0.8$ as the first $10$ race series.

However, $P(A>B>C)$ for the first race series is $0.5$ and $0.6$ for the second.

Then again, for a given number of races, one may just have to consider all the possible outcomes for the given individual probabilities assuming each one is equally likely. But to do that, one has to know the number of races.

edited Sep 20 '18 at 15:41

answered Sep 20 '18 at 15:28

Phil H

5,579

What if we assume there is only one race? Say we have used the race series you've suggested above in order to determine the pairwise win probabilities, and we now only want to predict the next outcome. Does it still matter if the probabilities came from the first or from the second series? – Darina Sep 21 '18 at 11:52
I think the situation is similar to this scenario. A room has 50 people 25 male and 25 female. Also, 25 are black and 25 are white. What is the probability of randomly selecting a black female? Well, we don't know the intersection of gender and color so there could be 25 black females in the room (p = 0.5) or zero (p = 0) (the 25 black could be all males). This is the equivalent information we are missing in your problem. While the pairwise probabilities are the same, taking P(A>B>C) from either series is subject to error. – Phil H Sep 22 '18 at 02:15
Thanks. Thinking further, for your example, there is no probability, but a uniform probability distribution between 0 and 0.5. Could we do the same with the races? There are infinitely many races that lead to the same priors, but we should be able to find a distribution for P(A>B>C), right? Should this be a new question? – Darina Sep 24 '18 at 09:22
This is what I alluded to in my comment to Christian in his answer. While theoretically my example would result in a 0.25 probability of picking a black female, in reality, for a given specific configuration it could be off by 0.25. Even so, we could take the mean of a uniform distribution of possible intersections and determine a probability. It may be a good idea to write it as a new question as new eyes can add more to this idea. – Phil H Sep 24 '18 at 13:00

score 4 · Answer 2 · answered Sep 20 '18 at 18:21

Let $$S_1:\>A>B>C,\qquad S_2: \>A>C>B,\qquad\ldots,\qquad S_6:\>C>B>A$$ be the six possible rankings (in lexicographic order) and $p_i$ $(1\leq i\leq 6)$ their probabilities. Then we have the four equations $$\eqalign{P[A>B]&=p_1+p_2+p_5\cr P[A>C]&=p_1+p_2+p_3\cr P[B>C]&=p_1+p_3+p_4\cr 1&=p_1+p_2+p_3+p_4+p_5+p_6\ .\cr}$$ Here the LHSs are given. These four equations are insufficient to determine the six $p_i$ individually; in particular $p_1$ is not determined by the given data. E.g., one finds that $$p_1=P[A>B]+P[B>C]+p_6-1\ ,$$ but we have no information about $p_6$.

That's a neat presentation. We do not know $P(A>B) \cap P(A>C)$ but I was deliberating whether it was correct to take the mean of the possible range of $P(A>B) \cap P(A>C)$ and then apply $P(B>C)$ to it. This is easy to calculate when $P(A>B) + P(A>C)\ge 1$ but a little trickier when it's $<1$. — Phil H, Sep 20 '18 at 20:17

score 2 · Answer 3 · answered Sep 20 '18 at 15:55

Along the lines of what the other responder has said, you can condition the probabilities as shown in the table below

The first finish is A>B>C which is the product of $P(A>B)*P(B>C)*P(A>C)$ You have six such finishes as below {ABC, ACB, BAC, BCA, CAB, and CBA}. The way you condition is $\frac{ABC}{\text{sum of all}}$

Conditional probabilities with impossible outcomes

3 Answers3