1

This question is from the second part of the problem posted on how to find the expected number of boxes with no balls. Essentially, the question asks "Given 10 balls that are randomly distributed among 5 boxes, what is the expected number of boxes with exactly 1 ball?"

I looked at Andre's answer, the second part labeled "Expected Number with 1 Ball."

In his comments he states that the indicator random variables he introduced aren't independent, which I agree with, but it seems his calculation for $P(Y_i=1)$ assumes that the random variables are indeed IID. I think I have the same concern as the first comment under his answer, excluding the part about linearity of expectation.

Doesn't the number of balls in box $i$ affect the number of balls, and therefore the probability distribution, in box $j$, where $j\neq i$? If so, then where is that accounted for in the following probability formulation that he posted?

$$ P(Y_i=1)=\binom{10}{1}\left(\frac{1}{5}\right)^1\left(\frac{4}{5}\right)^9 $$ The $\frac{1}{5}\frac{4}{5}^9$ is the probability of 1 of the 10 distinguishable balls being placed in box $i$ and the other 9 balls in the other 4 boxes. There are 10 balls, hence we multiply by $\binom{10}{1}$. But I'm not seeing how his probability accounts for whether or not the other 4 fixes already has a/some ball(s).

24n8
  • 1,455
  • Do not confuse what he said. Let the balls be distinctly labeled. The ten events corresponding to whether or not ball $i$ is in bucket $1$ are independent. The five events corresponding to bucket $i$ has exactly one ball in it are what are not independent. The value of $\binom{10}{1}\left(\frac{1}{5}\right)^1\left(\frac{4}{5}\right)^9$ he calculated is the probability that box $i$ has exactly $1$ ball in it and follows trivially from the binomial distribution. – JMoravitz May 04 '20 at 19:56
  • The punchline is that expected value is linear, which implies that we don't care exactly how these events are dependent upon one another. It doesn't matter whether or not $A$ and $B$ are independent or dependent random variables. Regardless how $A$ and $B$ are related, you will always have $E[A+B]=E[A]+E[B]$ – JMoravitz May 04 '20 at 19:58
  • @JMoravitz Right, I do understand these points, but I'm still confused because in my mind it just doesn't seem $P(Y_i=1)$ in its current state accounts for whether the other 4 boxes are filled or not. – 24n8 May 04 '20 at 19:59
  • Exactly! It doesn't! And... it doesn't make any difference! That is the whole point. – JMoravitz May 04 '20 at 20:00
  • @JMoravitz Hmm I'm still trying to wrap my head around this. Like this approach assumes there are 10 balls available to choose from for the current box. I feel like the number of balls available to choose from should be dependent on the distribution of balls in the other boxes. – 24n8 May 04 '20 at 20:17

1 Answers1

1

To reiterate some points... first, let $Y_i$ be the random variable which takes value $1$ if box $i$ contains exactly one ball and value $0$ otherwise.

It follows that the random variable $Y$ which counts the total number of boxes with exactly one ball in them is then equal to $Y=Y_1+Y_2+Y_3+Y_4+Y_5$.

Now... this is very important so listen closely... it doesn't matter whether or not these $Y_i$ are dependent or independent in the slightest! Regardless how they interact with one another, we will have $E[Y]=E[Y_1+Y_2+Y_3+Y_4+Y_5]=E[Y_1]+E[Y_2]+E[Y_3]+E[Y_4]+E[Y_5]$. I'll say it a third or fourth time... this works regardless how these random variables interact with one another. It works if they are independent. It works if they are dependent. It works if some are independent and others are dependent. It works!

You may have confused yourself with other related properties, such as $E[X\times Y]=E[X]\times E[Y]$ which might only be guaranteed $X$ and $Y$ are independent and might fail otherwise. This is not the same property as what we are using above. I'll say it a fifth time... the property above about the expected value of a sum being the sum of expected values works always.


Now... we turn our attention to finding $Pr(Y_i=1)$. For this, we make an assumption as to the random processes involved in distributing the balls into the boxes. The most sensible assumption is that we one at a time take a ball and then uniformly at random select a box to put the ball in. We then take the next ball and independently and again uniformly select which box to put the ball into.

Yes... the balls may themselves be identical to the casual observer. We may choose to imagine however what happens if they balls happened to be uniquely labeled and distinguishable. Doing so should not in any way affect the probabilities involved, though they may affect the size of the sample space or in what format the outcomes appear. What may have before been treated as "there was one ball in the first box and nine balls in the second box" may now be distinguished between the possibilities "the first ball is in the first box and the remaining nine balls are in the second box" versus "the second ball is in the first box and the remaining nine balls are in the second box" etc...

Recall, that to use counting techniques to approach probability questions, we require that the sample space consist of outcomes each of which are equally likely to occur. Using the balls as distinguishable and the boxes as distinguishable, these outcomes are now able to be seen as equally likely (rigorously shown via the assumptions made before about how we decided how to distribute the balls).

So... in calculating $Pr(Y_i=1)$, we see that there are $5^{10}$ different possibilities in the sample space, the $5^{10}$ different ways in which we can distribute balls into boxes, calculated by routine application of the rule of product. Among those, there are $10$ ways to choose which one ball goes into our box $i$. For the remaining nine balls, there are $4$ options for which box to place each into, which by rule of product arrives at a total number of $4^9$ ways to distribute the remaining balls.

Note... we don't bother keeping track of whether or not any other boxes received one ball at the moment! All we are interested in is whether or not box $i$ got one ball. If any others do or don't, it does not matter to us! Here, we are solely interested in finding the probability that the $i$'th box received one ball. All other information is ignored.

This result is extended/explained further by the binomial distribution.

So... taking the ratio, we find that for each $i$ we have $$Pr(Y_i=1)=\dfrac{10\times 4^9}{5^{10}}$$


Now, finishing the problem, we were interested in $E[Y]$ which we remember is $E[Y_1+Y_2+Y_3+Y_4+Y_5]$ which by linearity of expectation, which again remember works regardless of how the random variables are correlated, simplifies as $E[Y_1]+E[Y_2]+E[Y_3]+E[Y_4]+E[Y_5]$. Now, as these random variables $Y_i$ are all Bernoulli (in other words, only takes values $0$ or $1$) this simplifies further as $Pr(Y_1=1)+Pr(Y_2=1)+Pr(Y_3=1)+Pr(Y_4=1)+Pr(Y_5=1)$ which by symmetry simplifies as $5\times Pr(Y_i=1)$ giving the final result of:

$$E[Y]=5\times \dfrac{10\times 4^9}{5^{10}}$$

JMoravitz
  • 79,518
  • I think this makes sense now. Also, I wasn't confused about the linearity of expectation. That part has always been clear, and I understand it holds regardless of independence/dependence of the random variables. It was the probability distribution part that confused me. But I think I brainfarted when I was thinking about it. – 24n8 May 04 '20 at 20:44
  • Is this approach trivially adaptable to say expected number of boxes with exactly $m$ balls? It seems all we have to do is change $\binom{10}{1}$ to $\binom{10}{m}$? – 24n8 May 04 '20 at 20:45
  • 1
    You'll need to adjust the exponents as well. Rather than talking about a bernoulli variable, letting $Z_i$ be the random variable which counts the number of balls in box $i$, you have $Pr(Z_i=m)=\dfrac{\binom{10}{m}4^{10-m}}{5^{10}}$, so yes, the expected number with exactly $m$ balls will be $5\times\dfrac{\binom{10}{m}\times 4^{10-m}}{5^{10}}$ – JMoravitz May 04 '20 at 20:47
  • Oops yes, that's right. What if we want to find the expected number of boxes with $(Z_i=m) \cup (Z_i = j) | m \neq j $. The events $Z_i = m$ and $Z_i = j$ are mutually exclusive, so it seems the solution is simply $$ 5\frac{\binom{10}{m} 4^{10-m}}{5^{10}} + 5\frac{\binom{10}{j} 4^{10-j}}{5^{10}} $$.

    What if we make this a little more difficult and asked for the expected number of pairs of boxes that has $m$ balls in total?

    – 24n8 May 04 '20 at 20:55