3

Here's a variation of the coupon collector problem from Ross that I'm stuck on:

There are $n$ types of coupons. Each newly obtained coupon is, independently, type $i$ with probability $p_i, i = 1,...,n$. Find the expected number and the variance of the number of distinct types obtained in a collection of k coupons.

I thought maybe I would start with an indicator:

Let $X$ be the total number of distinct coupons and $X_i$ the probability that $i^{th}$ is a new coupon. Then $\sum_{i=1}^{n} X_i = X$. Each indicator is 1 w.p. $x/k$. I don't think this is correct though.

yoshi
  • 3,509
  • Instead of thinking about new coupons, think about the chance that coupon $i$ is represented in your set. The expected number is the sum of these, by linearity of expectation. – Ross Millikan Apr 19 '16 at 15:10
  • So if I have $x$ distinct coupons in my set of $k$, then $X_i$ is 1 w.p. $x/k$? That can't be right though -- If I go to take the expectation, I'll get $x$. But my expectation should be a number not an RV. – yoshi Apr 19 '16 at 15:15
  • 2
    Coupon $i$ is not present with probability $(1-p_i)^n$, so present with probability $1-(1-p_i)^n$. If you add these all up you have the expected number of coupons present. – Ross Millikan Apr 19 '16 at 15:24
  • Ah okay I see -- thx! – yoshi Apr 19 '16 at 15:40
  • 1
    @Ross Millikan So the answer is $E[X] = n - \sum_{i=1}^n (1-p_i)^k$ since there are $n$ distinct coupons and we draw $k$ of them? The power has $k$ not $n$ because we are saying that coupon was not found in the $k$ draws right? But we add $n$ indicator random variables to see how many distinct coupons were there. – Aditya P Sep 24 '19 at 13:48

1 Answers1

4

The solution for the expectation has already been given in the comments. The calculation is slightly simplified by considering the number $Y=n-X$ of coupon types not collected, with $\mathbb E[X]=n-\mathbb E[Y]$ and $\operatorname{Var}(X)=\operatorname{Var}(Y)$. Let $Y_i$ denote the indicator variable of the event that coupon type $i$ is not obtained. Its expectation is the probability $(1-p_i)^k$ of not obtaining a coupon of type $i$, so the expected number of coupons not obtained is

$$\mathbb E[Y]=\mathbb E\left[\sum_iY_i\right]=\sum_i(1-p_i)^k\;.$$

The variance is calculated analgously by expressing it in terms of expectations:

\begin{align} \operatorname{Var(Y)}&=\mathbb E\left[\left(\sum_iY_i\right)^2\right]-\mathbb E\left[\sum_iY_i\right]^2\\ &=\sum_{i,j}\mathbb E[Y_iY_j]-\mathbb E[Y_i]\mathbb E[Y_j]\\ &=\sum_i(1-p_i)^k\left(1-(1-p_i)^k\right)+\sum_{i\ne j}\left((1-p_i-p_j)^k-(1-p_i)^k(1-p_j)^k\right)\;. \end{align}

joriki
  • 238,052
  • So in the last step, you apply only the summation over j right? then $Y_i$ is a constant and is taken out of the expectation, so we get $\sum_{i} Y_i E[Y] + \cdots$. But how do you get those terms that you got? – Aditya P Sep 04 '19 at 14:55
  • 1
    @AdityaP: The sum is split into a component for $i=j$ and a component for $i\ne j$. For $i=j$, we have $Y_iY_j=Y_i^2=Y_i$, so $\sum_{i=j}\left(\mathbb E[Y_iY_j]-\mathbb E[Y_i]\mathbb E[Y_j]\right)=\sum_i\left(\mathbb E[Y_i]-\mathbb E[Y_i]^2\right)=\sum_i(1-p_i)^k\left(1-(1-p_i)^k\right)$. For $i\ne j$, the probability that neither $i$ nor $j$ is obtained in $k$ draws is $(1-p_i-p_j)^k$. – joriki Oct 17 '20 at 12:18