This comes down to hypothesis testing.
If we make the hypothesis that Alice is not cheating, then she has a 50% chance of correctly guessing the correct value with each bit guessed.
Using this information we can look at the using a normal distribution to see how likely it is that Alice's submission is the result of cheating.
Given $n$ bits of data, and $m$ incorrect guesses then the $p$-value (probability of $m$ or fewer incorrect guesses from $n$ guesses) is:
$$p = \sum_{i=0}^m {^n \text{C} _i \times \left ( \frac{1}{2} \right ) ^n}$$
Now you need to interpret $p$-value, in order to determine whether the results are so unlikely that the hypothesis (not cheating) appears unlikely and thus tampering appears evident.
Typical (though arbitrary) values for $p$-value interpretation are:
- $p > 0.1$ - Little evidence against
- $0.10 \ge p \gt 0.05$ - Weak evidence against
- $0.05 \ge p \gt 0.01$ - Moderate evidence against
- $0.01 \ge p \gt 0.001$ - Strong evidence against
- $0.001 \ge p$ - Very strong evidence against
So for example if $p < 0.01$ you should be looking closely at Alice, if $p < 0.001$ then its a pretty safe bet that Alice is cheating, as the number of incorrect guess are so low as to be highly unlikely.
A final caveat: This is a statistical likelihood of cheating rather then outright proof, using this method, someone who was extraordinarily lucky would be assumed to be cheating. Or course as $n$ becomes larger the chances of being so lucky become smaller.