Expected value of time between successes for binomial random variable

Question

Let $ X ~ Bin(n, p) $ and call $ X_{1}, ... X_{i} $ the index of each success, with $ X_{1} $ being the index of the first success.

I would like to compute the expected value of $ X_{i} - X_{i-1} $, that is, how many trials there are between successess on average.

When trying to do this, I found that this was very similar to a Negative Binomial Distribution or to a repetition of Geometric Distribution r.vs. However, if I understand it correctly, the difference is that there is a fixed number of trials. When $ n $ is large, the value is equal to the expected value in a geometric distribution, but when $ n $ is small (for instance $ n = 20 $), the value is lower.

To make things more concrete, I built a simple simulation in Python (the example used is a sequence of biased coin tosses).

If $ n $ is set to a large number, then $ E(X_{i} - X_{i-1}) = 1/p$, which is the expected value of the Geometric Distribution. However, if $ n $ is set to a small number, such as 20, then $ E(X_{i} - X_{i-1}) $ is lower. For instance, if $ p = 0.2 $ and $ n = 20 $ I find an average value around 4 (instead of 5). Yet, the average number of successes is still equal to $ n*p $ (in my example, 4).

I re-coded my simulation to adapt to and add Jeremy's answer.

I also added Jeremy's expected average computation.

def experiment(n_tosses, p):

  diffs = []
  success_indices = []

  for i in range(n_tosses):
    toss = np.random.binomial(1,p)
    if toss == 1:
      success_indices.append(i)

  if len(success_indices) > 1:
    for i in range(1, len(success_indices)):
      index = success_indices[i]
      last_index = success_indices[i-1]
      diff = index - last_index
      diffs.append(diff)
    return {"interval" : np.mean(diffs),
            "coverage" : sum(diffs),
            "successes" : len(success_indices)}

  elif len(success_indices) == 1:
    return {"interval" : 0,
            "coverage" : 1,
            "successes" : 1}

  else:
    return {"interval" : 0,
            "coverage" : 0,
            "successes" : 0}


def conduct_experiments(n_tosses, p, excl_zeros=False, n_exp=1000):
  experiments = [experiment(n_tosses, p=p) for i in range(n_exp)]
  successes = []
  intervals = []
  coverages = []
  for e in experiments:
    if (excl_zeros and e["successes"] > 1) or not excl_zeros:
        successes.append(e["successes"])
        intervals.append(e["interval"])
        coverages.append(e["coverage"])
  print "interval", np.mean(intervals)
  print "successes", np.mean(successes)
  print "coverage", np.mean(coverages)
  return coverages

def expected_interval(n_tosses, p):
    n = n_tosses
    sum_successes = 0
    for i in range(2, n+1):
        probs = p**i * (1 - p)**(n-i)
        sum_poss = 0
        for k in range(i-1, n):
            term = (n - k) * comb(k, i-1)
            sum_poss += term
        success_term = probs * sum_poss
        sum_successes += success_term

    return sum_successes

Results screenshot

So this is more a feeling than being backed up with specific analysis, but by limiting the number of trials to a small number like 20, you are not giving longer delays between successes a chance to develop. For example, if you have a string of 20 trials with one or zero successes (which with your $p=0.2$ parameter has a 7% chance of occurring, you do not record any intervals — Jeremy Dover, Sep 13 '17 at 12:23
Yes, that is completely right, I agree that this is what lowers the expected value but I am completely unable to model it (if that is possible but I assume it is). — Konrad, Sep 13 '17 at 13:23
In order for the $X_i$'s to be well defined you need $n\rightarrow\infty$. Then, $E(X_i-X_{i-1})=1/p$ -- see https://math.stackexchange.com/questions/102673/what-is-the-expected-number-of-trials-until-x-successes — A.G., Sep 15 '17 at 04:14
Hi A.G., I have taken a look but I don't completely understand. If I come up with specific constraints, why shouldn't I be able to compute E? It is different from what you would get in a NegBin distribution, for sure, but I don't see why it can't be computed. — Konrad, Sep 15 '17 at 09:42

Jeremy Dover · Accepted Answer · 2017-09-16T01:08:09.420

Looking at your code, you have biased your experiment against long strings of failures. For example, if you have a string of 20 failures (which given your parameters occurs slightly more than 1% of the time), you discard the trial. If you have a string with 1 failure and 19 successes (which occurs about 5% of the time), you record the trials but assign a mean value of 0.

To provide a fairer basis for experiment, you should probably run your trials until a certain number of successes is reached, regardless of the number of tosses required.

UPDATE: Based on OP's comment, I wish to update my answer:

I believe the reason that the expected value of the statistic is significantly lower for a smaller number of trials $n$ than for a large number is due to edge effect when there is 0 or 1 successes in the string of 20. With the parameters included ($n=20$, $p=0.2$) approximately 7% of the experiment repetitions will be in this case. If the experiment were run to a fixed number of successes, such a string of failures would create a long interval. However since the experiment is run to a fixed number of trials, this string of failures contributes 0 to the mean.

Obviously as $n$ increases, these effects become diminished in the average and the expectation approaches what is expected.

UPDATE: Now that I understand better, I think it is possible to calculate the expected value of the statistic $Y$, the mean of the interval times between successes in $n$ binomial trials with a probability $p$ of success. As above, let $i$ be the number of successes in any given experiment. From previous discussion, if $i=0,1$, we define $Y=0$.

Suppose $i \ge 2$. Then the mean interval time for any given experiment can be written as $$Y=\frac1{i-1}\sum_{j=1}^{i-1} (X_{j+1}-X_j) = \frac{1}{i-1}(X_i-X_1)$$ as the sum telescopes. So for any given experiment, we can calculate our statistic $Y$ knowing the number of successes $i$ and the time interval between the first and last success, which we define to be $k$. Note that $k$ necessarily satisfies $i-1 \le k \le n-1$.

For a fixed $i \ge 2$ and $k$, there are $$(n-k) {{k-1}\choose{i-2}}$$ outcomes of the experiment with $i$ successes and first-to-last interval $k$. To see this, note that once $k$ is chosen, there are $n-k$ possibilities for the pair $(X_1,X_i)$, namely $(1,k+1) \ldots (n-k,n)$. Once this interval is chosen, the remaining $i-2$ successes must be chosen from the $k-1$ trials strictly between $X_1$ and $X_i$, which can be done in ${{k-1}\choose{i-2}}$ ways.

For this fixed $i$ and $k$, each of these outcomes has probability $p^i(1-p)^{n-i}$ of occurring, and the statistic $Y$ has value $\frac{k}{i-1}$. Thus we can sum over the possible values of $i$ and $k$ to get our expected value

$$E(Y) = \sum_{i=2}^n \sum_{k=i-1}^{n-1} (n-k) {{k-1}\choose{i-2}} \frac{k}{i-1} p^i(1-p)^{n-i}$$

which has the easy simplification

$$E(Y) = \sum_{i=2}^n p^i(1-p)^{n-i} \sum_{k=i-1}^{n-1} (n-k) {{k}\choose{i-1}}$$

This can probably be simplified a little bit further, but I'm kinda tired :-)

UPDATE: One can also conduct this experiment by discarding results with less than two successful trials, rather than assigning $Y=0$ in those cases, creating a new random variable $Z$. To calculate the resulting expectation, the analysis is the same as above, except that contributions to the expectation from the 0 and 1 success cases need to be discarded (but they were 0 anyway), and the remaining probabilities need to be conditioned on the event that the number of successes is at least two. The expectation for this modified protocol can be obtained from the value $E(Y)$ via: $$E(Z) = \frac{E(Y)}{1-(1-p)^n-np(1-p)^{n-1}}$$

Thanks Jeremy. I don't think that my code has bias, it's just following a specific protocol. Things would be a whole lot easier if I were able to run the experiment until a certain number of successes (then it just follows a NegBin distribution). Unfortunately, the real-life problem I am asking this question for does have a limited number of "trials" and I can't modify the "experiment" (again because it's a real-life scenario). But maybe trying to see the problem in that perspective would help, that's what I've been trying to do but with limited success. — Konrad, Sep 13 '17 at 13:25
By the way note that the trial is not "discarded", it is included in the computation of the mean. — Konrad, Sep 13 '17 at 13:35
Apologies for comment and answer...meant the answer only. I may have misunderstood the question, and am sure my answer came across more negative than intended. I will edit the answer to more accurately reflect my thoughts. — Jeremy Dover, Sep 13 '17 at 22:22
Don't worry I didn't perceive your answer and comment negatively at all, I really appreciate your help! I completely understand what you are saying and it seems absolutely right. However, is there a way to properly estimate what I want given the edge effects you mentioned? Can you explain how you found the value of 7% and how that may be useful? — Konrad, Sep 14 '17 at 10:18
Figured out how you found the 7% (binomial PMF), I had miscomputed earlier. I'll try to think a way of how to implement that into the model maybe. — Konrad, Sep 14 '17 at 10:22
I did something that I thought might work, but, well, it didn't. I'll let you take a look at my edited post. — Konrad, Sep 14 '17 at 11:43
Your answer is perfectly spot on. Turns out my actual data does not exactly follow a binomial distribution for some reason, but your answer perfectly finds the value found in my binomial simulation. There's just a small mistake: you need to put (1 - p) to the power of (n - i), not (n - 1). Thanks a lot for this incredible reply! — Konrad, Sep 15 '17 at 13:10
If you still have time, I noted that your formula works perfectly for a p >= 0.2 approximately, but when p is lower it seems to give different results. — Konrad, Sep 15 '17 at 16:14
Looked an did not see an error in either my analysis or your computation. Is it possible you're hitting precision problems in calculating with Python? — Jeremy Dover, Sep 15 '17 at 19:45
I don't see why that would be the case (I'm not trying with a very low p or anything). I added a screenshot, you'll see it's pretty far off, but only when p is about lower than 0.2 otherwise it's spot-on. — Konrad, Sep 15 '17 at 19:56
The discrepancy is the excl_zeroes flag being set to true. When I calculated the statistic, I assume that 0,1 successes means Y=0. You don't see the difference when $p=0.2$ is larger since 0 successes occurs less than 1% of the time. But with $p=0.1$, this case occurs 12% of the time, which is skewing the statistic. If you want to discard the cases with 0 or 1 successes, we would need to recalculate the expectation. — Jeremy Dover, Sep 15 '17 at 21:35
Oooh, I see. I thought that what you gave only applied to cases where we have i > 1, but actually it's taken into account. I had not fully understood that. I will further look at your answer to better understand that part. Thanks again for your reply, it was extremely helpful. — Konrad, Sep 15 '17 at 22:47
Glad to help. If you wish to discard the cases of 0 or 1 success, you can condition the probabilities in the expectation accordingly. I'll just go ahead and update my answer, since the calculation is trivisl. — Jeremy Dover, Sep 16 '17 at 00:55

Expected value of time between successes for binomial random variable

1 Answers1