Applying law of total probability to conditional probability

Question

I was solving problems based on Bayes theorem from the book "A First Course in Probability by Sheldon Ross". The problem reads as follows:

An insurance company believes that there are two types of people: accident prone and not accident prone. Company statistics states that accident prone person have an accident in any given year with probability $0.4$, whereas the probability is $0.2$ for not-accident prone person. If we assume $30\%$ of population is accident prone, what is the conditional probability that a new policyholder will have an accident in his or her second year of policy ownership, given that the policyholder has had an accident in the first year?

The solution given is as follows:

Book Solution
$$ \begin{align} P(A)=0.3 & & (given)\\ \therefore P(A^c)=1-P(A)=0.7 & & \\ P(A_1|A)=P(A_2|AA_1)=0.4 & &(given)\\ P(A_1|A^c)=P(A_2|A^cA_1)=0.2 & & (given) \end{align} $$ $$ P(A_1)=P(A_1|A)P(A)+P(A_1|A^c)P(A^c) =(.4)(.3)+(.2)(.7)=.26 \\ P(A|A_1)=\frac{(.4)(.3)}{.26}=\frac{6}{13} \\ P(A^c|A_1)=1-P(A|A_1)=\frac{7}{13} $$ $$ \begin{align} P(A_2|A_1)& =P(A_2|AA_1)P(A|A_1)+P(A_2|A^cA_1)P(A^c|A_1) &&...(I)\\ &=(.4)\frac{6}{13}+(.2)\frac{7}{13}\approx .29\\ \end{align} $$

I dont understand the statement $(I)$.

My Solution
Shouldnt it be like this: $$P(A_2|A_1)=P(A_2|AA_1)P(AA_1)+P(A_2|A^cA_1)P(A^cA_1)$$ Continuing further:
$$ \begin{align} P(A_2|A_1)&=P(A_2|AA_1)P(A_1|A)P(A)+P(A_2|A^cA_1)P(A_1|A^c)P(A^c)\\ &=(.4)(.4)(.3)+(.2)(.2)(.7)=0.076 \end{align} $$

Am I wrong? If yes, where did I go wrong?

Added Later

After going through comments and thinking more, it seems that I am struggling to apply law of total probability (and my above solution is very well wrong). The basic form of law of total probability, which I came across till now, is as follows: $$P(A)=P(A|\color{red}{B})P(\color{red}{B})+P(A|\color{magenta}{B^c})P(\color{magenta}{B^c})$$ I am first time facing application of this law for conditional probability, as done book solution: $$P(A_2|A_1)=P(A_2|AA_1)P(A|A_1)+P(A_2|A^cA_1)P(A_c|A_1)$$ as it involves three events ($A,A_1,A_2$). Book did not explained this. Though in current problem, it looks "somewhat" intuitive,

can someone generalize it, so as to make my understanding more clear? Say for $n$ events?
Also, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same, as in case of simple form law of total probability.
I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here?
For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so?

I am now screwed at my ability to apply law of total probability. Please enlighten me.

As stated the problem is not clear. Did you mean to say that "accident prone people have at least one accident in each given year with probability $.4$" (and $.2$ for the rest)? As it stands, the probability is only given for the first year. — lulu, Jul 31 '17 at 15:27
Just to say, assuming I am reading the problem correctly, your answer is obviously too low. The least the probability could be is $.2$ and we believe it is higher than that since the first year accident is evidence that our fellow is accident prone. — lulu, Jul 31 '17 at 15:30
There is one reflection that may clarify the differencec: Notice that in your propose equation, $P(A_2\vert A_1 = P(\cdot), P(AA_1) + P(\cdot), P(A^cA_1)$, you are throwing out the window some critical information that is already given to you: $A_1$ has actualized itself. — Antoni Parellada, Jul 31 '17 at 15:36
@lulu it is "any given year". Actually the book gives first problem to find $P(A_1)$ in earlier chapter and in later chapter it references back the earlier problem asking to find$P(A_2|A_1)$. However the second problem changed "first year" to "any given year". By mistake, I typed that part from the first problem. Modified the original question to correct it. — RajS, Jul 31 '17 at 15:55
Sure, I figured that was the meaning. Do you understand my argument that the answer can not be less than $.2$ ? — lulu, Jul 31 '17 at 16:02
It seems that I am struggling to apply law of total probability. The basic form, till now I faced, is $P(A)=P(A|B)P(B)+P(A|B^c)P(B^c)$. First time facing above application of law of total probability for $P(A_2|A_1)$ as it involves three events ($A,A_1,A_2$). Book did not explained this. Though in current problem, it looks intuitive, can someone generalize it, so as to make understanding more clear? Say for $n$ events? For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so? — RajS, Jul 31 '17 at 16:38
More precisely, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same. I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here? I must be absolutely screwed at my concepts if I am wrong with this. Please enlighten me. — RajS, Jul 31 '17 at 16:51
@lulu and Antoni, I have clearly stated some doubts at the end of the original question. Can you please have a look? — RajS, Jul 31 '17 at 17:27

score 17 · Accepted Answer · answered Aug 01 '17 at 05:14

17

can someone generalize it, so as to make my understanding more clear? Say for $n$ events?

If $(B_k)_n$ is a sequence of $n$ events that partition the sample space (or if at least $(B_k\cap A_1)_n$ partitions $A_1$) then, $\mathsf P(A_2\mid A_1) = \sum_{k=1}^n \mathsf P(A_2\mid A_1\cap B_k)\mathsf P(B_k\mid A_1)$

Also, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same, as in case of simple form law of total probability.

They are not the same in the case of the simple form. So why should they be?

Where $\Omega$ is the entire sample space, then:

$${{\mathsf P(A_2)~}{= \mathsf P(A_2\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A}, \Omega)P(\color{red}{A}\mid \Omega)+\mathsf P(A_2\mid \color{magenta}{A^c}, \Omega)\,\mathsf P(\color{magenta}{A^c}\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A})P(\color{red}{A})+\mathsf P(A_2\mid \color{magenta}{A^c})\,\mathsf P(\color{magenta}{A^c})}}$$

I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_\,\mathsf 1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here?

:) Well, I would not say absolutely. But seriously, it is a rather common misunderstanding.

The conditioning bar is not a set operation. It seperates the event from the condtion that the probability function is being measured over. There can only be one inside any probability function; they do not nest.

For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so?

Yes, this is so. Specifically $\mathsf P(A_2,A,A_1)=\mathsf P(A_2\mid A,A_1)\mathsf P(A\mid A_1)\mathsf P(A_1)\\ \mathsf P(A_2,A^\mathsf c,A_1)=\mathsf P(A_2\mid A^\mathsf c,A_1)\mathsf P(A^\mathsf c\mid A_1)\mathsf P(A_1)$

$$\begin{align}\mathsf P(A_2\mid A_1) ~ & = \mathsf P((A\cup A^\mathsf c){\cap} A_2\mid A_1) && \text{Union of Complements} \\[1ex] & = \mathsf P((A{\cap}A_2)\cup(A^\mathsf c{\cap}A_2)\mid A_1) && \text{Distributive Law} \\[1ex] & = \mathsf P(A{\cap}A_2\mid A_1) + \mathsf P(A^\mathsf c{\cap}A_2\mid A_1) && \text{Additive Rule for Union of Exclusive Events} \\[1ex] & = \dfrac{\mathsf P(A{\cap}A_1{\cap}A_2)+\mathsf P(A^\mathsf c{\cap}A_1{\cap}A_2)}{\mathsf P(A_1)} && \text{by Definition} \\[1ex] & = \dfrac{\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A{\cap}A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c{\cap}A_1)}{\mathsf P(A_1)} && \text{by Definition} \\[1ex] & = {\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A\mid A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c\mid A_1)} && \text{by Definition of Conditional Probability} \end{align}$$

answered Aug 01 '17 at 05:14

Graham Kemp

129,094

1

This is very very satisfying answer. Why no book specify total probability with explicit $\Omega$ in it, at least once, like you did: $\mathsf P(A_2)=\mathsf P(A_2\mid \color{red}{A} \Omega)P(\color{red}{A}\mid \Omega)+\mathsf P(A_2\mid \color{magenta}{A^c} \Omega)\mathsf P(\color{magenta}{A^c}\mid \Omega)$. Is it extremely obvious or am too dull or am a bad reader? Anyways this clears everything for me. Thanks a billion!!! – RajS Aug 02 '17 at 06:12
Would you mind explaining your first step? Namely: $\mathsf P(A_2\mid A_1) = \sum_{k=1}^n \mathsf P(A_2\mid A_1\cap B_k)\mathsf P(B_k\mid A_1)$. If we apply TPT directly, we, instead, get: $\mathsf P(A_2\mid A_1) = \sum_{k=1}^n \mathsf P(A_2\mid A_1\mid B_k)\mathsf P(B_k\mid A_1)$. How come $P(A_2\mid A_1\mid B_k) = P(A_2\mid A_1\cap B_k)$? – alwaysiamcaesar Mar 15 '19 at 05:36
2

There is no such thing as $P(A_2\mid A_1\mid B_k)$ . $~$ The '$\mid$' symbol is placed between the event measured and the condition it is being measured over; there can only be at most one in any probability measure function. $P(A_2\mid A_1\cap B_k)$ (sometimes abbreviated as $P(A_2\mid A_1,B_k)$) is the conditional probability of event $A_2$ under condition of the intersection of events $A_1$ and $B_k$. @alwaysiamcaesar – Graham Kemp Mar 16 '19 at 05:59
It's been a while, but I was wondering if you could provide a reference for the first relationship in your answer? I just recently saw it used in a couple of places and I would like to read more on it. – David Mar 30 '20 at 23:30
@David $\mathsf P(A_2\mid A_1)=\sum_{k=1}^n \mathsf P(A_2\mid A_1\cap B_k),\mathsf P(B_k\mid A_1)$ is just an application of the Law of Total Probability. ${(B_k)}_n$ partitioning the outcome set $\Omega$, means that the events are mutually exclusive and exhaustive. (Each pairwise intersection is empty, and the union of all $n$ forms $\Omega$). – Graham Kemp Mar 30 '20 at 23:41
Right, I guess this is just foreign to me, and when I read https://en.wikipedia.org/wiki/Law_of_total_probability, it stated that "In probability theory, the law (or formula) of total probability is a fundamental rule relating MARGINAL probabilities to conditional probabilities." But in your example, I only see conditional probabilities and no marginals. – David Mar 30 '20 at 23:45
In addition, does $A_1$ and $A_2$ form a different partitioning of the sample space, $\Omega$? – David Mar 30 '20 at 23:49
1

The Law of Total Probability may be extended to conditionals. $A_1, A_2$ need not be events within a partition; they are just any two events. – Graham Kemp Mar 30 '20 at 23:51
5

$\begin{align}\mathsf P(A\mid C)&=\dfrac{\mathsf P(A\cap C)}{\mathsf P(C)}\[1ex]&=\dfrac{\sum_k\mathsf P(A\cap B_k\cap C)}{\mathsf P(C)}\[1ex]&=\dfrac{\sum_k\mathsf P(A\mid B_k\cap C)\mathsf P(B_k\cap C)}{\mathsf P(C)}\[1ex]&=\sum_k\mathsf P(A\mid B_k\cap C)\mathsf P(B_k\mid C)\end{align}$ – Graham Kemp Mar 30 '20 at 23:57
That derivation was really helpful. Can the law of total probability also be extended to joint probabilities? It would seem seem so – David Mar 31 '20 at 00:19
1

Yes, of course. It holds for any event, including an intersection.$$\mathsf P(A\cap D\mid C)=\sum_k\mathsf P(A\cap D\mid B_k\cap C),\mathsf P(B_k\mid C)$$ – Graham Kemp Mar 31 '20 at 01:03
Searched everywhere, this derivation was apt! Thanks! – Anish Aralikatti Sep 11 '22 at 06:22

Satish Ramanathan · Answer 2 · 2017-08-01T02:03:33.970

Another rationale for the answer is to get the cue from the statements:

1) Probability that an accident prone drive will have an accident on a given year is 0.4

2) Probability that an non accident prone driver will have an accident on a given year is 0.2

3) Probability that a person is accident prone is 0.3

These are all given:

What we can derive is $P(\text{Having an accident/ Accident Prone}) =.4\times .3$

$P( \text{Having an accident/ Not Accident Prone}) = .2\times .7$

Now use Bayes theorem to find $P(\text{if the person is accident prone/he has had an accident}) = \frac{.4\times .3} {.4\times .3+.2\times .7}$

$P(\text{if the person is not accident prone/he has had an accident}) = \frac{.2\times .7} {.4\times .3+.2\times .7}$

Now the new person has had an accident in the first year. (Given).

We need to find out the probabilities of whether he could be classified as accident prone or not accident prone which is what you found in the last two steps.

Having found that Now use Total Probability rule to find out if he will have an accident in the second year ( which has got little to do with the first year) using the first two facts.

$P(\text{Accident on the second year}) = P(\text{accident on a given year/Accident Prone})*P(\text{if the person is accident prone/he has had an accident}) + P(\text{accdient on a given year/ No Accident Prone})* P(\text{if the person is not accident prone/he has had an accident})$

I have clearly stated some doubts at the end of the original question. Can you please have a look? — RajS, Jul 31 '17 at 17:28
https://en.wikipedia.org/wiki/Law_of_total_probability. I want to highlight to you that the total law of probability given in the first few paragraphs are for a sample space with countably finite or infinite partitions and that for an event A in the same sample space is given by P(A). The same thing can be extended for conditional probabilities in the last paragraph of the first section. This is what you are dealing with. You are too bogged down by notation. As long as you understand the events and the partitions of the sample space,you can solve most problems of this nature. — Satish Ramanathan, Jul 31 '17 at 17:46

Applying law of total probability to conditional probability

2 Answers2

Linked