4

Humans have two copies of each of $23$ chromosome, for a total of $46$ chromosomes. If you want to sequence someone's DNA, you can just use a normal cell, since they have all the chromosomes. But if you have a sample of gamete cells, each one has only one copy of each chromosome. So, how would you find the expected number of gamete cells you'd have to sequence to get the person's entire genome?

If humans only had $1$ chromosome, then this would be a simple coupon collector problem, which asks the expected number of times you'd choose randomly between two coupons until you get them both. In this case, there are $23$ pairs of coupons, and you randomly select one from each pair, and you want to know how many times you expect to do this until you get all $46$ coupons.

Nishant
  • 9,155

3 Answers3

4

If you have $n$ cells, the chance that they all have he same copy of a given chromosome is $2^{-n+1}$, so the chance that you have both copies of a given chromosome is $1-2^{-n+1}$ The chance that you have both copies of all $23$ chromosomes is $(1-2^{-n+1})^{23}$ This gives almost $50\%$ at $6$ cells, over $90\%$ at $9$ cells, $99.4\%$ at $13$ cells and over $99.9999\%$ (less than one chance in a million of failure) at $26$ cells. It doesn't take many at all. What makes this different from coupon collection is getting one cell pair doesn't make getting another cell pair less likely.

Ross Millikan
  • 374,822
2

After the first gamete, you have one chromosome from each pair. Each subsequent step then completes a random subset of the uncompleted chromosome pairs. Let $E_n$ be the expected time it takes to complete $n$ half-completed pairs (so the answer you want is $1+E_{23}$). There is a $\binom{n}{n-k}(\frac12)^{n-k}(1-\frac12)^k$ chance that on the next step you will have $k$ more pairs to go, so that $$ E_n = 1+\frac1{2^n}\sum_{k=0}^n\binom{n}{n-k}E_k $$ The above equation has $E_n$ on both sides: solving for $E_n$, we get a formula for this in terms of earlier values: $$ E_n = \frac{2^n}{2^n-1}+\frac{1}{2^n-1}\sum_{k=0}^{n-1}\binom{n}{k}E_k $$ This allows you to compute $E_n$ recursively. Using Mathematica, I found that $1+E_{23}\approx 6.8877$.

Mike Earnest
  • 75,930
  • Great! I'm accepting this because it actually answers my question about the expected value. I guess hoping for a closed form was silly. – Nishant Sep 22 '14 at 01:57
1

You can get this as a single finite sum without recursion by expanding Ross's expression binomially. After the first cell, the expected number of additional cells required is

\begin{align} \sum_{n=0}^\infty\left(1-\left(1-2^{-n}\right)^{23}\right) &=\sum_{n=0}^\infty\left(1-\sum_{k=0}^{23}\binom{23}k\left(-2^{-n}\right)^k\right) \\ &=-\sum_{n=0}^\infty\sum_{k=1}^{23}\binom{23}k\left(-2^{-n}\right)^k \\ &=-\sum_{k=1}^{23}\binom{23}k(-1)^k\sum_{n=0}^\infty\left(2^{-k}\right)^n \\ &=-\sum_{k=1}^{23}\binom{23}k\frac{(-1)^k}{1-2^{-k}} \\ &\approx5.8874 \end{align}

(Wolfram|Alpha computation), so the total number of expected cells is about $6.8874$, in agreement with Mike's recursive result up to $3$ digits.

joriki
  • 238,052