Let $ X ~ Bin(n, p) $ and call $ X_{1}, ... X_{i} $ the index of each success, with $ X_{1} $ being the index of the first success.
I would like to compute the expected value of $ X_{i} - X_{i-1} $, that is, how many trials there are between successess on average.
When trying to do this, I found that this was very similar to a Negative Binomial Distribution or to a repetition of Geometric Distribution r.vs. However, if I understand it correctly, the difference is that there is a fixed number of trials. When $ n $ is large, the value is equal to the expected value in a geometric distribution, but when $ n $ is small (for instance $ n = 20 $), the value is lower.
To make things more concrete, I built a simple simulation in Python (the example used is a sequence of biased coin tosses).
If $ n $ is set to a large number, then $ E(X_{i} - X_{i-1}) = 1/p$, which is the expected value of the Geometric Distribution. However, if $ n $ is set to a small number, such as 20, then $ E(X_{i} - X_{i-1}) $ is lower. For instance, if $ p = 0.2 $ and $ n = 20 $ I find an average value around 4 (instead of 5). Yet, the average number of successes is still equal to $ n*p $ (in my example, 4).
I re-coded my simulation to adapt to and add Jeremy's answer.
I also added Jeremy's expected average computation.
def experiment(n_tosses, p):
diffs = []
success_indices = []
for i in range(n_tosses):
toss = np.random.binomial(1,p)
if toss == 1:
success_indices.append(i)
if len(success_indices) > 1:
for i in range(1, len(success_indices)):
index = success_indices[i]
last_index = success_indices[i-1]
diff = index - last_index
diffs.append(diff)
return {"interval" : np.mean(diffs),
"coverage" : sum(diffs),
"successes" : len(success_indices)}
elif len(success_indices) == 1:
return {"interval" : 0,
"coverage" : 1,
"successes" : 1}
else:
return {"interval" : 0,
"coverage" : 0,
"successes" : 0}
def conduct_experiments(n_tosses, p, excl_zeros=False, n_exp=1000):
experiments = [experiment(n_tosses, p=p) for i in range(n_exp)]
successes = []
intervals = []
coverages = []
for e in experiments:
if (excl_zeros and e["successes"] > 1) or not excl_zeros:
successes.append(e["successes"])
intervals.append(e["interval"])
coverages.append(e["coverage"])
print "interval", np.mean(intervals)
print "successes", np.mean(successes)
print "coverage", np.mean(coverages)
return coverages
def expected_interval(n_tosses, p):
n = n_tosses
sum_successes = 0
for i in range(2, n+1):
probs = p**i * (1 - p)**(n-i)
sum_poss = 0
for k in range(i-1, n):
term = (n - k) * comb(k, i-1)
sum_poss += term
success_term = probs * sum_poss
sum_successes += success_term
return sum_successes