I think the problem used "reasonable" to mean "somewhat reasonably close to optimal". I will give a more complete answer than what is probably expected in the problem, with an asymptotically optimal answer (up to a multiplicative error that approaches 1 in the limit) under different assumptions.
The optimal expected total number of coin tosses is approximately $8 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$ where $p = (p_1+p_2)/2$, $δ = |p_1-p_2| ≈ 0$, and $ε ≈ 0$ ($ε=10^{-12}$ in the problem); $ε$ is the permitted error probability. The formula is valid regardless of whether $p_1-p_2$ is known. The worst case permitted by the problem has $δ = δ_\min = 0.01$ and $p≈0.5$, leading to about 500000 coin tosses.
The multiplicative error in the above formula approaches 1 as $δ→0$ and $\frac{\log \log (1/δ)}{\log (1/ε)}→0$. This implies $ε→0$; also, $ε,δ_\min→0$ would suffice if the above formula used $δ_\min^2$ in place of $δ^2$. Note that $δ→0$ is the large numbers assumption. Also, $\lim_\limits{ε→0} \frac {(\mathrm{erf}^{-1}(1-2ε))^2} {\log(1/ε)} = 1$, but at $ε=10^{-12}$, $\log(1/ε)$ is about 10% larger, and it is unclear which one is more accurate above; $(\mathrm{erf}^{-1}(1-2ε))^2$ was chosen to align with the static case. Also, while per the statement of the problem, the experiment setup works with error $≤ε$ for every choice of $p_1$ and $p_2$, the above remains valid (as opposed to suboptimal) if we were given reasonable/typical priors on $(p_1,p_2)$ instead.
If the experiment setup is static (number of tosses of each type is set in advance), the required number of coin tosses is approximately 4 times higher; furthermore, since the number is not given, we would have to assume the worst case scenario, for a total of about 2000000 tosses. (Our static strategy works if we are given only the bounds on $δ$ and $≈p$, with the number of tosses based on the worst case scenario.)
It may also be interesting to work out what happens if
- $δ$ is not too small (small numbers correction),
- $\min (p_1,p_2,1-p_1,1-p_2)≈0$, and/or
- $ε$ is not very close to 0; already at $ε=10^{-12}$ we are at just 7 stdev.
Static Strategy
Assume that we do $N$ tossses, of which $Nx$ are the unknown coin, $Ny$ batch 1, and $Nz$ batch 2. The unique optimal choice is $x = 1/2$ and $y = z = 1/4$. Any $x$ different from 1/2 is substantially suboptimal.
If $x$ is 1/2 and $δ$ is known, then every choice of $y$ is approximately optimal, but if $y≠1/4$:
- $z = 0$ is a little inefficient because if batch 1 is correct, the error is two-sided (instead of one-sided).
- If $δ$ is different from expected, the error is substantially suboptimal. However, if $δ$ is bigger than expected, the error can still be $≤ε$.
To understand the optimality (at known $δ$), given measured $p_1$ and $p_2$, and assuming actual $p_2>p_1$, the likelihood function for $p$ is approximately $\mathcal{N}(μ=p_1 + δ/2 + (p_2-p_1-δ) \frac {z} {y+z}, σ^2 = \frac {p(1-p)} {N(y+z)})$, while the measurement of the unknown coin has $σ^2 = \frac {p(1-p)} {Nx}$. The estimated $p$ (i.e. the μ above) forms the decision boundary (for classifying the unknown coin), leading to $σ^2 = \frac {p(1-p)} {N(y+z)} + \frac {p(1-p)} {Nx}$ for the difference, with a one-sided error of $δ/2$ leading to wrong decision. If $y≠z$, there is also a second decision boundary, and if $\min(y,z)$ is too small (or $ε$ is very big), the error in determining whether $p_2>p_1$ is nonnegligible, but when all is computed, the effect on the uncertainty is only to make the error partially two-sided.
Thus, the optimal error is one-sided Gaussian error at $\frac {δ/2} {\sqrt{4p(1-p)/N}}$ standard deviations. If the error is $ε$, $N = 32 (\mathrm{erf}^{-1}(1-2ε))^2 p(1-p) / δ^2$.
Dynamic Strategy; known $δ$
As is common at $ε≈0$, choosing a stopping point dynamically reduces the expected number of tosses in about 4 times. Intuitively, we can wait until the statistics is reasonably "typical" (this is inefficient if $ε$ is too large), so the required error is twice as large: The error has to be one side to another side rather than one side to the middle.
Here, one side corresponds to typical statistics if the unknown coin is in batch 1, and the other — batch 2. In the static case, errors happen primarily if the unknown coin appears in the middle between the two choices, but with a dynamic stopping point, we can wait this out.
Similarly to the static strategy (above), the optimal $x$ is 1/2, while the choice of $y$ and $z$ is flexible.
Dynamic Strategy; unknown $δ$
If the sequence of coin tosses is not adaptive (with only the stopping point chosen dynamically), then we pay a price because our measurement of $δ$ is imprecise. Assuming batch 1 and batch 2 are indistinguishable before the experiment, the optimal strategy minimizes $x^{-1} + \max(y^{-1},z^{-1})$, so $x = \sqrt 2 - 1$, and $y = z = 1 - \sqrt 2 / 2$. The expected number of tosses will be (in the limit) $\sqrt 2 / 2 + 3/4 ≈ 1.4571$ times the optimal value, as the coin tosses for the known coin from the unknown coin's batch are essentially wasted. The strategy is different and the penalty is less if we have an informative prior on whether the coin comes from batch 1 or batch 2.
However, using an adaptive strategy, we can reach the optimal value by:
- about half the time, tossing the unknown coin
- about half the time, tossing the batch that appears different from the unknown coin
- a small fraction of the time, tossing the batch that appears the same as the unknown coin.
At $ε≈0$, this is possible by choosing the batch that appears more different from the unknown coin except when the number of tosses for the other batch is much smaller. The stopping point for the experiment can be based on the confidences on the equality of the means (using unknown vs batch1 test and unknown vs batch2 test). The data also determine whether $p_2 > p_1$ with a similar accuracy.
One caveat is that with $δ$ unknown, there is an equivalent of $O(\log (1/δ))$ independent possible stopping points, but we can keep the error $O(ε)$ by requiring confidence $1-O(ε / \log (1/δ))$ to stop. We do not know $δ$, but we can use confidence $1-O(ε / \log^{1.001}(1/δ_\mathrm{guess}))$ instead where $δ_\mathrm{guess}$ is the current guess; 1.001 can be any constant >1. If we also have $δ_\min$ (in the problem, $δ_\min = 0.01$), we can replace $\log^{1.001}(1/δ_\mathrm{guess})$ with $\min(\log^{1.001}(1/δ_\mathrm{guess}), \log(1/δ_\min), \log^{1.001}(δ_\mathrm{guess}/δ_\min + 1))$. As long as confidence $1-ε^{1+o(1)}$ suffices, the formula at the top of the answer remains asymptotically valid; otherwise a correction is to use the required confidence in place of $1-ε$.