2

I am mainly confused about my reasoning/solution

Problem: I have a bag of numbers ranging from $1,2... n$. I randomly choose a number and then I throw out all numbers less than that. Then I keep choosing numbers until I've found the largest one. What's the expected number of things I have chosen?

My Argument: I expect on average to have chosen a number of $n/2$ on each pick. Hence, everytime I choose a number, I expect to discard half of them. Hence, the $E(X) = \log_2(n)$

Note that I devised this problem as an equivalent one while trying to solve the "cluster of cars on one long road" question. If this is not equivalent, please let me know but I am confident that it is. As a result of this equivalency, I know my solution is incorrect. However, I cannot pinpoint where the issue arises.

Probability problem: cars on the road

JobHunter69
  • 3,355
  • If $n=2$ then it takes one or two trials with equal probability, so $E[X_2]=\frac 32$. – lulu Sep 29 '19 at 23:56
  • @lulu Right, that's a good point, but I still don't understand what's wrong with my reasoning? – JobHunter69 Sep 29 '19 at 23:58
  • I really can't follow your argument. It isn't true that I expect to throw out $\frac n2$ on the first turn...sticking with $n=2$ I expect to throw out $\frac 32$ on the first turn. – lulu Sep 30 '19 at 00:00
  • @lulu If you randomly draw a number, wouldn't you expect that number to be n/2? Hence, if you draw n/2, you throw out n/2 numbers. – JobHunter69 Sep 30 '19 at 00:01
  • Again, that argument does not work for $n=2$. – lulu Sep 30 '19 at 00:03
  • There's an easy recursion available...as $E[X_n]=1+\frac 1n \times \sum_{i=0}^{n-1}E[X_{n-1}]$. Of course there may also be a simple closed formula, but at least the recursion let's you test lots of cases. – lulu Sep 30 '19 at 00:04
  • @lulu: I derived that and put that into a spreadsheet. It grows more slowly than $\log_2(n)$. In particular, $E(32)\approx 4.058$ and $E(300)$ is only $6.28$ – Ross Millikan Sep 30 '19 at 00:12
  • @RossMillikan Yes, I agree with that value. Are you citing that as a reason to doubt the recursion? – lulu Sep 30 '19 at 00:14
  • @lulu: Not at all. I am surprised that it grows so slowly. It must be that you have a fair chance to come on a big number quickly, so throw away a lot of balls. – Ross Millikan Sep 30 '19 at 00:15
  • @RossMillikan Right, I agree. It's not intuitive. In fact, I simulated the process for a random level (I picked $26$ for some reason I can't recall), and it seems to work just fine. Of course, my simulator follows the logic of the recursion (each choice gets you to the same operation with a lower cap) so perhaps I just re-coded a conceptual flaw. – lulu Sep 30 '19 at 00:17
  • 1
    @lulu That recursion gives $E[X_n]= \sum_{i=1}^{n}\frac1i = H_n$ – Henry Sep 30 '19 at 00:31
  • @RossMillikan it is smaller than $\log_2(n)$ because it is about $\log_e(n) + \gamma +\frac1{2n}$ with $\gamma \approx 0.5772$ – Henry Sep 30 '19 at 00:31
  • @Henry Ah, of course it does. That's the problem with writing everything numerically...harder to spot obvious patterns. Thanks! – lulu Sep 30 '19 at 00:32
  • @lulu: I found it quasi-numerically. Multiplying the numbers by $n!$ gives a sequence of integers which turns out to be OEIS A000254 and one of the formulae there is $n!, H_n$ at which point it became clearer – Henry Sep 30 '19 at 00:40
  • Just to clarify, since this question has also been asked here with the chosen number itself being returned, and the formulation here is slightly unclear: In this question, the chosen ball is removed from the bag, together with all lower ones. – joriki Jan 12 '20 at 22:16

2 Answers2

2

Here is the result of work in the comments by lulu, Henry, and me.

The expected number of balls is $E[n]=\sum_{i=1}^n\frac 1i=H_n\approx \log n + \gamma + \frac 1{2n}$, the $n^{th}$ harmonic number.

The recursion is $E[n]=1+\frac 1n\sum_{i=0}^{n-1}E[i]$ because you can imagine putting ball $n$ randomly into an arrangement of $n-1$ balls. $i$ represents the number of balls before ball $n$. You will then throw away $E[i]$ balls plus ball $n$. We define $E[0]=0$ to make this come out properly when ball $n$ is put first in line.

We have $E[1]=1$ as the base case for strong induction. Then if $E[j]=H_j$ for all $j$ from $1$ to $k$, $$\begin {align}E[k+1]&=1+\frac 1{k+1}\sum_{i=0}^{k}H_i\\ &=1+\frac 1{k+1}\left(1+(1+\frac 12)+(1+\frac 12+\frac 13)+\ldots H_{k}\right)\\ &=1+\frac 1{k+1}\left(k+\frac{k-2}2+\frac{k-3}3+\ldots\frac{k-(k-1)}{k}\right)\\ &=1+H_k-\frac k{k+1}\\&=H_{k+1}\end {align}$$ To go from the second line to the third we count the number of terms $\frac 1m$ and note that it is $n-m$

Ross Millikan
  • 374,822
1

Others in the comments have computed the correct solution. Here I will only comment on what I think went wrong with the "intuitive" solution of $\log_2 n$.

For slightly easier exposition, I change the game s.t. we throw out all numbers larger than the drawn number. This way the numbers left are always $\{1, 2, \dots, k\}$ and it is slightly easier (for me) to visualize.

Imagine $n=2^{10}=1024$ and I play the game $G$ as stated. Meanwhile, you start with an identical bag and play a different game $G'$: at every step, you throw out exactly the larger half of all your remaining numbers. So your game will take exactly $11$ steps to finish. Which of us will finish first?

On your first draw, you draw $512$. On my first draw, I have about $1/2$ chance to do worse than you (i.e. throw out fewer) by drawing $d > 512$ and about $1/2$ chance to do better by drawing $d < 512$. However, when I do worse, I lost "less than a step" compared to your schedule, whereas when I do better, I can be gaining up to a step ($d \in [256, 511]$) or up to $2$ steps ($d \in [128, 255]$), or up to $3$ steps ($d \in [64, 127]$), etc. compared to your schedule. Apologies if my boundaries above are not exactly correct, but hopefully you get the idea. So intuitively, for large $n$ I should win easily. This also means $E[X] < \log_2 n$.

Another way to look at it: Let $K_j$ be the largest remaining number (also number of remaining numbers) after $j$ steps, with $K_0 = n$. For game $G'$, we have $\log_2 K_{j+1} - \log_2 K_j = -1$ exactly. If it were true that for game $G$ we have $E[\log_2 K_{j+1} | K_j] - \log_2 K_j = -1$, then it is plausible that the answer would be $\log_2 n$. However, in fact $E[\log_2 K_{j+1} | K_j] - \log_2 K_j < -1$. For this reason $G$ finishes faster than $G'$.

antkam
  • 15,363
  • Actually, I don't really understand your first argument. What do you mean by lag behind less than one step? Wouldn't you, by symmetry, lag behind by more than one step? – JobHunter69 Sep 30 '19 at 02:05
  • What I meant is: if I do worse, I would lag behind, but the amount by which I lag behind would be "less than one step". (I would only lag behind by one step if I don't discard any numbers at all.) Your steps are measured by $\log_2 K_j$, which decreases by $1$ every step. If we measure my steps also by my $\log_2 K_j$, then it can decrease anywhere from almost $0$ to as high as $\log_2 K_j$ itself, and my average decrease is much more than $1$. – antkam Sep 30 '19 at 02:22
  • oh I see, that makes sense – JobHunter69 Sep 30 '19 at 04:21