Discover to which batch a coin belongs

Question

The following question is taken from a book, in a chapter on probabilty:

You have two batches of unbalanced coins. One has coins which turn up head with probability $p_1$, and the other has coins that turn up head with probability $p_2$. All you know about $p1$ and $p2$ is that $|p_1-p_2| \ge 0.01$. A coin was accidentaly dropped from one of the batches.

Design an experiment that will determine which batch the coin belongs to, with error probability of no more than $10^{-12}$. You should use a reasonable number of coin tosses.

As far as I can tell, any experiment will essentially boil down to something like this:

Take a coin from each batch and toss it $n$ times. Let $X$ and $Y$ be the sample means of the tosses of the coins from the first and second batch, respectively. Then toss the coin in question $n$ times, and let $Z$ be the sample mean of the tosses of that coin. Then find $n$ for which: $$ \mathbb{P}[\,|X-Z|<|Y-Z|\,] \le10^{-12} $$ when the coin in question is from the second batch. I tried to play around with the inquality $|X-Z|<|Y-Z|$ to get something to which I may apply the Chernoff bound. But the best I got required $n\approx 3\cdot10^6$, which does not seem reasonable at all. Any help will be appreciated.

The standard deviation for one toss can be up to $\frac12$. You need $0.01$ to be about $10$ standard deviations, or $\frac12n^{-1/2}\approx0.001$, so something of the order of $10^6$ doesn't sound all that unreasonable. I'm not sure it's efficient to toss a coin of either batch; it might be better to only toss a coin from one batch and the coin to be determined. — joriki, Jun 18 '16 at 10:09
$3\times10^6$ looks right about where it should be. That's a very small margin of difference between the two coins and a very high degree of confidence required. — it's a hire car baby, Jun 23 '16 at 18:17
Designing your experiment need not just be about setting the number of tosses required before you stop. A stopping rule can also depend on how far ahead one coin starts to get. This will be more efficient since if one streaks ahead you can stop sooner. — it's a hire car baby, Jun 23 '16 at 18:20
I think the efficient method experiment will use $\chi^2$ distribution. — it's a hire car baby, Jun 23 '16 at 18:22
You have nothing to gain by tossing coins from the batches since the unknown coin has the exact same properties as every coin in one of the batches, so just toss that one. — it's a hire car baby, Jun 23 '16 at 18:38
actually my bad... you don't know which batch is the more likely to come up heads. — it's a hire car baby, Jun 23 '16 at 19:18

Dmytro Taranovsky · Answer 1 · 2022-06-27T20:19:02.580

I think the problem used "reasonable" to mean "somewhat reasonably close to optimal". I will give a more complete answer than what is probably expected in the problem, with an asymptotically optimal answer (up to a multiplicative error that approaches 1 in the limit) under different assumptions.

The optimal expected total number of coin tosses is approximately $8 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$ where $p = (p_1+p_2)/2$, $δ = |p_1-p_2| ≈ 0$, and $ε ≈ 0$ ($ε=10^{-12}$ in the problem); $ε$ is the permitted error probability. The formula is valid regardless of whether $p_1-p_2$ is known. The worst case permitted by the problem has $δ = δ_\min = 0.01$ and $p≈0.5$, leading to about 500000 coin tosses.

The multiplicative error in the above formula approaches 1 as $δ→0$ and $\frac{\log \log (1/δ)}{\log (1/ε)}→0$. This implies $ε→0$; also, $ε,δ_\min→0$ would suffice if the above formula used $δ_\min^2$ in place of $δ^2$. Note that $δ→0$ is the large numbers assumption. Also, $\lim_\limits{ε→0} \frac {(\mathrm{erf}^{-1}(1-2ε))^2} {\log(1/ε)} = 1$, but at $ε=10^{-12}$, $\log(1/ε)$ is about 10% larger, and it is unclear which one is more accurate above; $(\mathrm{erf}^{-1}(1-2ε))^2$ was chosen to align with the static case. Also, while per the statement of the problem, the experiment setup works with error $≤ε$ for every choice of $p_1$ and $p_2$, the above remains valid (as opposed to suboptimal) if we were given reasonable/typical priors on $(p_1,p_2)$ instead.

If the experiment setup is static (number of tosses of each type is set in advance), the required number of coin tosses is approximately 4 times higher; furthermore, since the number is not given, we would have to assume the worst case scenario, for a total of about 2000000 tosses. (Our static strategy works if we are given only the bounds on $δ$ and $≈p$, with the number of tosses based on the worst case scenario.)

It may also be interesting to work out what happens if

$δ$ is not too small (small numbers correction),
$\min (p_1,p_2,1-p_1,1-p_2)≈0$, and/or
$ε$ is not very close to 0; already at $ε=10^{-12}$ we are at just 7 stdev.

Static Strategy

Assume that we do $N$ tossses, of which $Nx$ are the unknown coin, $Ny$ batch 1, and $Nz$ batch 2. The unique optimal choice is $x = 1/2$ and $y = z = 1/4$. Any $x$ different from 1/2 is substantially suboptimal.

If $x$ is 1/2 and $δ$ is known, then every choice of $y$ is approximately optimal, but if $y≠1/4$:

$z = 0$ is a little inefficient because if batch 1 is correct, the error is two-sided (instead of one-sided).
If $δ$ is different from expected, the error is substantially suboptimal. However, if $δ$ is bigger than expected, the error can still be $≤ε$.

To understand the optimality (at known $δ$), given measured $p_1$ and $p_2$, and assuming actual $p_2>p_1$, the likelihood function for $p$ is approximately $\mathcal{N}(μ=p_1 + δ/2 + (p_2-p_1-δ) \frac {z} {y+z}, σ^2 = \frac {p(1-p)} {N(y+z)})$, while the measurement of the unknown coin has $σ^2 = \frac {p(1-p)} {Nx}$. The estimated $p$ (i.e. the μ above) forms the decision boundary (for classifying the unknown coin), leading to $σ^2 = \frac {p(1-p)} {N(y+z)} + \frac {p(1-p)} {Nx}$ for the difference, with a one-sided error of $δ/2$ leading to wrong decision. If $y≠z$, there is also a second decision boundary, and if $\min(y,z)$ is too small (or $ε$ is very big), the error in determining whether $p_2>p_1$ is nonnegligible, but when all is computed, the effect on the uncertainty is only to make the error partially two-sided.

Thus, the optimal error is one-sided Gaussian error at $\frac {δ/2} {\sqrt{4p(1-p)/N}}$ standard deviations. If the error is $ε$, $N = 32 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$.

Dynamic Strategy; known $δ$

As is common at $ε≈0$, choosing a stopping point dynamically reduces the expected number of tosses in about 4 times. Intuitively, we can wait until the statistics is reasonably "typical" (this is inefficient if $ε$ is too large), so the required error is twice as large: The error has to be one side to another side rather than one side to the middle.

Here, one side corresponds to typical statistics if the unknown coin is in batch 1, and the other — batch 2. In the static case, errors happen primarily if the unknown coin appears in the middle between the two choices, but with a dynamic stopping point, we can wait this out.

Similarly to the static strategy (above), the optimal $x$ is 1/2, while the choice of $y$ and $z$ is flexible.

Dynamic Strategy; unknown $δ$

If the sequence of coin tosses is not adaptive (with only the stopping point chosen dynamically), then we pay a price because our measurement of $δ$ is imprecise. Assuming batch 1 and batch 2 are indistinguishable before the experiment, the optimal strategy minimizes $x^{-1} + \max(y^{-1},z^{-1})$, so $x = \sqrt 2 - 1$, and $y = z = 1 - \sqrt 2 / 2$. The expected number of tosses will be (in the limit) $\sqrt 2 / 2 + 3/4 ≈ 1.4571$ times the optimal value, as the coin tosses for the known coin from the unknown coin's batch are essentially wasted. The strategy is different and the penalty is less if we have an informative prior on whether the coin comes from batch 1 or batch 2.

However, using an adaptive strategy, we can reach the optimal value by:

about half the time, tossing the unknown coin
about half the time, tossing the batch that appears different from the unknown coin
a small fraction of the time, tossing the batch that appears the same as the unknown coin.

At $ε≈0$, this is possible by choosing the batch that appears more different from the unknown coin except when the number of tosses for the other batch is much smaller. The stopping point for the experiment can be based on the confidences on the equality of the means (using unknown vs batch1 test and unknown vs batch2 test). The data also determine whether $p_2 > p_1$ with a similar accuracy.

One caveat is that with $δ$ unknown, there is an equivalent of $O(\log (1/δ))$ independent possible stopping points, but we can keep the error $O(ε)$ by requiring confidence $1-O(ε / \log (1/δ))$ to stop. We do not know $δ$, but we can use confidence $1-O(ε / \log^{1.001}(1/δ_\mathrm{guess}))$ instead where $δ_\mathrm{guess}$ is the current guess; 1.001 can be any constant >1. If we also have $δ_\min$ (in the problem, $δ_\min = 0.01$), we can replace $\log^{1.001}(1/δ_\mathrm{guess})$ with $\min(\log^{1.001}(1/δ_\mathrm{guess}), \log(1/δ_\min), \log^{1.001}(δ_\mathrm{guess}/δ_\min + 1))$. As long as confidence $1-ε^{1+o(1)}$ suffices, the formula at the top of the answer remains asymptotically valid; otherwise a correction is to use the required confidence in place of $1-ε$.

score 1 · Answer 2 · answered Jan 26 '19 at 21:33

Let's identify head with $1$, tail with $0$.

At every step $i=1,...,n$, toss a coin from the first batch $X_i$, a coin from the second batch $Y_i$, and from the the lost coin $Z_i$. Set \[ S_n = \frac 1 n \sum_{i=1}^n (X_i-Z_i), \qquad T_n=\frac 1 n \sum_{i=1}^n (Y_i-Z_i). \] $E[S_n]=0$ and $|E[T_n]|\ge 0.01$ if the coin belongs to the first batch, $E[T_n]=0$ and $|E[S_n]|\ge 0.01$ if it belongs to the second batch. By Hoeffding inequality (https://en.wikipedia.org/wiki/Hoeffding%27s_inequality#General_case_of_bounded_random_variables), both $S_n$ and $T_n$ are sums of random variables in $[-1,1]$ hence \[ P(|S_n - E[S_n]| > t n^{-1/2}) \le 2 e^{- t^2 / 2},\qquad P(|T_n - E[T_n]| > t n^{-1/2}) \le 2 e^{- t^2 / 2}. \] Choose $t$ such that $10^{-12} = 4e^{-t^2/2}$, then set $n$ so that $t n^{-1/2} = 0.005$, then you can assert that the coin belongs to the first batch if $|S_n|\le 0.005$, the second batch otherwise. You are correct with probability at least $1-10^{-12}$.

This gives $t=\sqrt{24\ln(10) + 2 \ln(4)}$ and $\sqrt{n} = t / 0.005 \approx 1523$, $n\approx 2.3 \times 10^6$. Same answer as yours!

score 1 · Answer 3 · answered Jan 27 '19 at 15:39

What you obtained is roughly optimal, up to a factor 10.

Let's say that you know $p_1=0.5$ and $p_2=0.5 + 0.01$. Then you draw $n$ times the coin with unknown mean, and you just want test whether its mean is $p_1$ or $p_2$.

Someone comes up with a solution, an estimator $\phi=\phi(X_1,...,X_n)$ that depends on the observed coin tosses $X_1,...,X_n$, and you want \[ \max\Big[P_0(\phi = 1),P_1(\phi=0)\Big]\le \times 10^{-12} \] where $P_0$ is the product measure of $n$ Bernoulli($p_0$) and $P_1$ the product measure of $n$ Bernoulli($p_1$). Using information theoretic inequalities such as Le Cam's or Fano's inequalities, we have \[ \max\Big[P_0(\phi = 1),P_1(\phi=0)\Big] \ge (1/4)\exp(-KL(P_0,P_1)) \] see for instance Theorem 2.2(iii) of the book "Introduction to nonparametric estimation" by Tsybakov. Here, $KL(P_0,P_1)$ denotes the Kullback-Leibler divergence between the two product measures, equal to \[n \Big(p_1\ln(p_1/p_2) + p_2 \ln(p_2/p_1) \Big)= n (p_2-p_1) \ln(p_2/p_1) = n\times 0.000198. \] Putting the pieces together, this shows that necessarily \[ n \ge [12 \ln(10) -\ln(4)]/0.000198 \approx 132549. \] Your answer is optimal up to a factor 10.

Discover to which batch a coin belongs

3 Answers3