2

Consider an urn with $k$ balls, labeled $1$ through $k$. One step shall be defined as the following procedure: randomly drawing one ball at equal probability for all balls in the urn, removing all balls with labels showing numbers greater than the number of the ball drawn, putting the ball drawn back. (Side remark: it is possible that the number of balls doesn't change by doing a step).

Question: What is the expected value for the number of steps $n$ needed to draw the ball with label $1$?

Answer (so far): If we look at the problem via tree diagrams we can conclude, that the cumulant probability to draw the ball labeled $1$ within $m+1$ steps is, starting with $k$ balls is given by $S(m,k)/k$ with: $$S(m,k_0)=\sum_{k_1=1}^{k_0}\cdots\sum_{k_m=1}^{k_{m-1}}(\prod_{j=1}^{m}k_j)^{-1}.$$

Replacing $1/i$ with $\int_0^1dx\;x^{i-1}$ the arising terminating geometric series $\sum_{i=0}^{k-1}x^i$ with $\frac{1-x^k}{1-x}$ and integrating out $\int_0^xdy\;(1-y)^{-1}$ this can be simplified to

$$S(n+1,k)\cdot n!=\int_0^1\frac{1-x^k}{1-x}\ln^n(\frac1{1-x}),$$

where of course the word 'simplified' is up to personal preference. This form only remains valid for nonnegative $n$ and as the number of steps is given by $n+2$ we need to know in addition to the above, that $S(0,k)=1$ (ie. that the probability of finding $1$ in the first step is $1/k$).

From all that we can in principle find the Expactation value by noting, that for discrete random variables with cumulative $F(n)$ the mean can be defined as $E=\sum_{n=0}^\infty \bigl(1-F(n)\bigr)$.

Unfortunately I'm not able enough to calculate this sum. $F$ relates to $S$ by $F(n)=S(n-1,k)/k$.


EDIT: The answer provided by WhatsUp is correct. I was unsatisfied that the formula $E_m = 1+\frac1m\sum_{i=1}^mE_i$ just came up from nowhere and derived it myself.

From tree diagrams one sees that the probability $P_k(n)$ to find $1$ in the $n$-th step with $k$ balls before the first step is given by $1/k$ if $n=1$ and $P_k(n+1)=\frac1k\sum_{k_1=2}^k\cdots\sum_{k_n=2}^{k_{n-1}}\prod_{j=1}^n\frac1{k_j}$ otherwise. This directly gives $P_k(n)=\frac1k\sum_{i=2}^kP_i(n-1)$.

Inserting that into the mean formula $E_k=\sum_{n=1}^\infty n\cdot P_k(n)$ gives $$\frac1k+\frac1k\sum_{n=1}^\infty(n+1)\sum_{i=2}^kP_i(n)\\ =\frac1k + \frac1k\sum_{i=2}^k\sum_{n=1}^\infty n\cdot P_i(n) + \sum_{n=2}^\infty\frac1k\sum_{i=2}^kP_i(n-1).$$ This can be found to be equal to $1+\sum_{i=2}^kE_i/k$ by noting that the last summand has to amount to $1-P_k(1)$.


Of course then $E_k=\frac1{k-1}(k+\sum_{i=2}^kE_i)$ follows like outlined by WhatsUp and $E_k=1+\sum_{i=1}^{k-1}\frac1i$ can be shown by induction.

3 Answers3

3

For $m\geq 1$, let $E_m$ be the expected number of steps until only one ball remains, if one starts with $m$ balls.

We then have $E_1 = 0$ and $E_m = 1 + \frac 1 m \sum_{i = 1}^m E_i$ for $m > 1$.

This gives $E_m = \frac 1{m - 1}\left(m + \sum_{i = 1}^{m - 1} E_i\right)$ for $m > 1$.

Therefore the sequence $(E_m)_{m\geq 1}$ looks like: $$0, 2, \frac{5}{2}, \frac{17}{6}, \frac{37}{12}, \frac{197}{60}, \frac{69}{20}, \dotsc$$

Guess what it is? It's simply $E_m = 1 + \sum_{i = 1}^{m - 1}\frac 1 i$, for any $m > 1$.


Proof by induction: For $m = 2$, it is clear, as $E_2 = 2 = 1 + \frac 1 1$.

Suppose it's true for $m$. Then we have, for $m + 1$:

\begin{eqnarray*} E_{m + 1} &=& \frac 1 m\left(m + 1 + \sum_{i = 1}^m E_i\right) \\ &=& 1 + \frac 1 m + \frac 1 m \sum_{i = 2}^m\left(1 + \sum_{j = 1}^{i - 1}\frac 1 j\right)\\ &=& 2 + \frac 1 m \sum_{i = 2}^m\sum_{j = 1}^{i - 1}\frac 1 j\\ &=& 2 + \frac 1 m \sum_{j = 1}^{m - 1}\sum_{i = j + 1}^m\frac 1 j\\ &=& 2 + \frac 1 m \sum_{j = 1}^{m - 1}\frac{m - j}{j}\\ &=& 1 + \frac 1 m + \sum_{j = 1}^{m - 1} \frac 1 j\\ &=& 1 + \sum_{j = 1}^m\frac 1 j. \end{eqnarray*}


So the expected number of steps needed to draw the ball with label 1 is exactly $1 + \sum_{i = 1}^{k - 1}\frac 1 i$, when we start with $k$ balls.

For $k > 1$ this is the same as the expected number of steps until only one ball remains, and for $k = 1$ it is also valid, since we need to perform $1$ draw to get that ball.

WhatsUp
  • 22,201
  • 19
  • 48
  • @InterstellarProbe Sorry, I see your point now. Yes, you are right. I somehow understood it as the number of draws needed to reach only one ball in the urn. – WhatsUp Dec 20 '19 at 21:06
  • @InterstellarProbe Ja, that's probably the way it is designed. – WhatsUp Dec 20 '19 at 21:08
  • The first formula for $E_m$ isn't obvious for me. – Abraxas Knister Dec 20 '19 at 22:21
  • @AbraxasKnister If $m > 1$, then you need at least one draw. After the first draw, there are $m$ possibilities, with equal probability $\frac 1 m$, and the $i$-th possibility is that you get the ball with label $i$ in the first draw, after which $E_i$ more draws are expected. – WhatsUp Dec 20 '19 at 22:31
  • I see. This solution is to devastingly simple... – Abraxas Knister Dec 21 '19 at 16:36
  • @WhatsUp: See my answer for another take on this that sheds some light on why this sum appears. – joriki Dec 21 '19 at 18:32
2

In WhatsUp’s nice proof by induction, the harmonic numbers seem to somewhat magically appear out of some algebraic manipulations. We can make some probabilistic sense of this algebra as follows.

The expected number of draws before we draw the ball labeled $1$ is the sum of the expected numbers of draws we will draw each of the other balls. We can prove by induction that the ball labeled $j$ is expected to be drawn $\frac1{j-1}$ times, independent of $k$. This trivially holds for $k=1$ (as in this case there are no other balls). Assume that it holds up to $k-1$. If we have $k$ balls, the ball labeled $j$ will be drawn with probability $\frac1k$. It will still be there as one of $k$ balls with probability $\frac1k$, and it will still be there as one of less than $k$ balls with probability $\frac{k-j}k$. Thus the number $E$ of times that we expect to draw it satisfies

$$ E=\frac1k+\frac1k\cdot E+\frac{k-j}k\cdot\frac1{j-1}\;, $$

with solution $E=\frac1{j-1}$. Summing this over all other balls yields $\sum_{j=2}^k\frac1{j-1}=\sum_{j=1}^{k-1}\frac1j=H_{k-1}$, and then we have to add the $1$ time that we are certain to draw the ball labeled $1$, for a total of $H_{k-1}+1$.

joriki
  • 238,052
  • 1
    Nice idea. This proof looks a bit more natural, although the first part still requires some calculation. I wonder whether we can also replace it with an argument. – WhatsUp Dec 21 '19 at 20:22
  • Why do we need to prove it? Doesn't it hold because $E=1/k + \frac{1-j+k} kE$? – Abraxas Knister Dec 22 '19 at 17:31
  • @AbraxasKnister: Well, the $E$ on the left and the $\frac1kE$ part on the right are the expectation for the present state whereas the $\frac{k-j}kE$ part on the right is the expectation of reduced states, where the induction assumption is that this is $\frac1j$. In a sense you're right: If we replace the value $\frac1j$ from the induction assumption by $E$ and then obtain $E=\frac1j$, that shows that $E=\frac1j$ also solves the equation before that replacement. I'd considered arguing that way, but then it seemed more complicated and I went for the slightly more complicated algebra instead. – joriki Dec 22 '19 at 23:00
  • (Side remark: actually $\frac1{j-1}$). I now think that doing it that way is only possible if we know beforehand that $E$ doesn't depend on $k$. Else we would have to consider $E_{k,j}$'s (expected number of draws of $j$ with $k$ balls at first) and sum them up $E_{k,j}=\frac1k(1+\sum_{i=j}^k E_{i,j})$. – Abraxas Knister Dec 22 '19 at 23:18
  • @AbraxasKnister: Yes, $\frac1{j-1}$, sorry. About the rest: I'm not sure what you mean by "doing it that way". Certainly my approach didn't assume the independence of $E$ from $k$; rather, I proved it by induction. And if you make the argument that I made in my previous comment, you can also make your approach work in the same way, using induction and not assuming anything and not summing anything. It's just that you have to make that somewhat more involved argument why an equation in which you've replaced an occurrence of the solution by the variable has the same solution. – joriki Dec 22 '19 at 23:31
  • 1
    I have worked out the factorial moment generating function (or PGF or whatever) and it turns out to be $ t \cdot \prod_{j=2}^k \frac{j-1}{j-t} $. Since the PGF of a sum of (independent!) random variables is the product of the PGFs of the summands we can think of the random variable in question to be $X=\sum_{i=1}^k X_i$, where $m_i(t)=(i-1)/(i-t)$ or $m_1(t)=t$. The former has mean $1/(i-1)$, the latter has mean $1$ (and is a trivial random variable with $P(1)=1$). It would be nice if we could show that indeed $X_i$ is the number of times we draw $i$. – Abraxas Knister Dec 24 '19 at 08:30
  • @WhatsUp The argument is as follows: the random variable "number of steps" is actually the sum of the independent variables "number of draws of $j$". Those random variables are distributed geometrically: $P(n)=p j^{-n}$ where $p=(j-1)/j$ is the probability to not draw $j$. The random variable describing the number of draws of $1$ is distributed $P(n)=0$ if $n\neq 1$ and $P(1)=1$. (The fact that those variables are all independent is actually a bit more than needed and allows us to get the full statistical content of the random variable in question in the form of its PGF; see my own answer). – Abraxas Knister Dec 24 '19 at 12:59
1

WhatsUp's and joriki's answers both have in common that they caracterise the random variable in question (number of draws needed to draw $1$) only by giving it's mean. In my previous attempt I tried to give a full characterisation of this random variable and arrived at its cumulative probability function. From there I couldn't go further. Noting that the probability $P_k(n)$ to last $n$ steps at $k$ balls before the first step is given recursively by $P_k(n)=1/k\sum_{i=2}^k P_i(n-1)$ and $P_k(1)=1/k$ opens up the possibility to derive WhatsUp's solution (as noted at the end of the Edit in my post) but also grants to derive the generating functions.


The factorial moment generating function $M(t)$ is the expected value $\sum_{n=0}^\infty t^n P(n)$. The factorial moments can be obtained from it by $$ E[N(N-1)\cdot\ldots\cdot(N-k+1)]=\lim_{t\uparrow1}\frac{d^k}{dt^k}M(t), $$ that is for example $E=M'(1)$ and $Var=M''(1)+M'(1)-M'(1)^2$.


For the present random variable $M_k(t) = \frac{(k-1)!\;\cdot\; t}{(k-t)\cdots(2-t)}$ is the factorial moment generating function for $k$ balls before the first step. From this value the already known $E_k=1+\sum_{i=1}^{k-1}\frac1i$ follows.

Proof: \begin{align*} M_k(t) &= \sum_{n=1}^\infty t^nP_k(n) \\ &= t\cdot\frac1k + \sum_{n=2}^\infty t^n\cdot\frac1k\sum_{i=2}^kP_i(n-1)\\ &= \frac tk\Bigl(1+\sum_{n=1}^\infty t^n\sum_{i=2}^k P_i(n) \Bigr)\\ &= \frac tk\Bigl(1+\sum_{i=2}^k M_i(t)\Bigr)\\ \Leftrightarrow\qquad M_k(t) &= \frac t{k-t}\Bigl(1+\sum_{i=2}^{k-1}M_i(t)\Bigr) \end{align*} Induction. Begin: $k=2$, $M_2(t)=\frac t{2-t}$. Step: \begin{align*} M_{k+1}(t) &= \frac t{k+1-t}\Bigl(1+\sum_{i=2}^kM_i(t)\Bigr) \\ &= \frac t{k+1-t}\Bigl(1+M_k(t) + \frac {k-t}t M_k(t) - 1\Bigr) \\ &= \frac t{k+1-t}\Bigl(\frac{(k-1)!\cdot t}{(k-t)\cdots(2-t)}\frac{t+k-t}t\Bigr)\\ \end{align*}


EDIT: We can go further.

The generating function of a finite sum of independent variables is just the product of the generating functions.

Therefore we can think of the random variable in question as of a sum $X=\sum_{i=1}^k X_i$ where $$ M_{X_i}(t)=\begin{cases} t & i=1\\ \frac{i-1}{i-t} & i\neq 1 \end{cases} $$ $X_1=1$ is constant and $X_{i\neq1}$ is distributed geometrically $P(n)=(1-p)p^n$ where $p=1/i$. joriki's answer suggests to interpret those $X_i$ as the number of times the number $i$ is drawn.