Greater/lesser search with one false answer allowed

Question

It is well known that you can determine the values of $n\geq 2$ bits using $k$ yes/no questions about the bits (for example, "is $x_1 \oplus x_3 = 1$?), even if one (but not more) of the answers obtained may be a lie, if and only if $$ 2^n k \leq 2^k $$ This is just the observation that you can always construct a Hamming code using $k-n$ extra bits, which corrects single-bit errors, if $k$ is of that size.

(This does not work if $n=1$ since you require $3$ bits rather than $2$ to achieve single error correction.)

For example, you can determine a 4-bit number using seven questions, even allowing for one possible lie, by asking for the values of each bit, and then asking for the values of $x_0 \oplus x_1 \oplus x_2$, then $x_0 \oplus x_1 \oplus x_3$, then $x_0 \oplus x_2 \oplus x_3$. Note that these questions can be fixed beforehand; later questions need not depend on answers to the earlier ones.

But what if the nature of the yes/no questions allowed is restricted?

In particular, can you determine (with certainty) a specific number $x \in [0,15]$ using no more that seven questions of the form "Is $x > q$", coping with up to one lie? It is allowable to tailor later questions based on the answers to earlier questions.

As a generalization, what are the conditions on $N$ and $k$ that suffice to ensure that you can determine a specific $x \in [0,N)$ using up to $k$ questions of the form "Is $x > q$", coping with up to one lie? My real question to be posed is:

What is the largest value of $N$ such that a specific number $x \in [0,N)$ can with certainty be determined in $7$ "Is $x > q$" questions, coping with at most one lie?

Stanislaus Ulam did some work on this in the 1940's and that led to the work that led to Hamming codes, but I don't remember whether he actually presented a condition for general $N$.

For some examples, $$k(N=2) = 3 \\ k(N=3) = 4 \\ k(N=4) = 5 \\ k(N=5) = 6 \\ k(N=6) = 6 \\ k(N=7) = 7 \\ k(N=8) = 7 \\ $$

As for the case of $N = 15$, I strongly suspect that the "information theory value" of seven questions does not suffice when the ordering restriction is imposed. But I'd be interested to see a proof.

Can you explain why you claim that you only need 3 queries to distinguish 3 possibilities? Or do you mean the half-open interval $[0,N)$? I'm very sure 3 possibilities needs 4 questions (and 4 is enough). — user21820, May 30 '16 at 15:32
Yes, you need 4 questions to resolve 3 possibilities. Thus $k(N=3) = 4$ which is what I said in the question. We are in violent agreement here. — Mark Fischler, Jun 03 '16 at 00:19
Erm your question has the closed interval $[0,N]$, which includes $N+1$ numbers. That is why I mentioned the half-open interval in my comment. — user21820, Jun 03 '16 at 00:35
Wait a minute: I no longer think you can resolve $N=3$ even in $4$ questions. wlog, start with $>1?$ and assume the answer is Y. Now either $>1$? again, Y again, and it takes 3 more questions, or $2?$ with an answer of N. At this point you by symmetry might as well repeat $>1?$ and get Y again. It takes 2 more questions (both $>2?$) to resolve, for a total of $5$ questions. — Mark Fischler, Jun 03 '16 at 00:51
Hmm I think you are right. I don't remember how I came to the conclusion that 4 is enough, but certainly the first 2 questions must be different, and the answerer can make the apparent answer be the middle number, and answer the third question consistently, after which we are essentially back in the case of same first 2 questions, which does need 5 questions. And you should fix your question to say $[0,N)$. — user21820, Jun 03 '16 at 01:47

score 1 · Accepted Answer · edited Apr 13 '17 at 12:20

1

I have proven an optimal strategy and written a short program to compute for small $N$; see my answer to my generalized question.

This completely answers your question rigorously. Specifically, $7$ queries is enough to distinguish up to at most $12$ possibilities, while $8$ queries is enough to distinguish up to at most $22$ possibilities.

edited Apr 13 '17 at 12:20

Community

1

answered Jun 03 '16 at 16:28

user21820

57,693
9
98
256

Right you are! And by the way, 9 queries is enough for at most 40 possibilities, 10 for 70, 11 for 130, 12 for 244 possibilities. I wonder what the asymptotic behavior is? – Mark Fischler Jul 01 '16 at 20:08
@MarkFischler: Well according to my conjecture it is asymptotically the same as the lower bound. But I couldn't even prove the weaker version that $k \in \log_2(m) + O(\log_2(\log_2(m)))$. Weakening seems to not help the induction, while strengthening seems to not add up, so I don't know what to do. – user21820 Jul 02 '16 at 02:13

score 0 · Answer 2 · edited Jun 03 '16 at 16:30

This answer specifically addresses the question "Are 7 questions enough for $N=15$?". I am assuming that this relates to the 15 possible numbers from 0 to 14 inclusive.

Let us suppose that 7 questions are sufficient. Then, following the method described in my first answer:

First round

We start with 8 sub-games, one for each possible "false" answer (plus one for the absence of a "false" answer). Each has 15 possibilities, making 120 in all.

So we ask "Is $x<7$?". A "Yes" answer reduces the 1st sub-game to 7 possibilities (8 to 14) and all the others to 8 each (0 to 7). Thus there are 63 possibilities for Yes and 57 for No. ("Is $x<8$?" would have made the split 57:63).

Thus we end up with at worst 63 possibilities.

Second round

The 1st sub-game now covers a range of possibilities distinct from the remaining ones. The remaining sub-games all have the same range as each other.

Case 1: 1st game has 0-6, other games have 7-14.

In this case we start with 63 possibilities.

An answer "Yes" to "Is $x<0$?" will leave just 6 possibilities, all of them in the 2nd game, in which this answer is a lie.

"Is $x<1$? Yes!" will add one possibility from the 1st game (where the answer is truthful), making 7 in all. "Is $x<2$? Yes!" will add one more, and so on, until when we have "Is $x<7$? Yes!" we have a total of 13 possibilities, 7 of them from the 1st game and 6 from the 2nd.

Moving up to "Is $x<8? Yes!" will remove $x=7$ as a possibility from the 2nd sub-game but add it to the 3rd to 8th, a total of 6 sub-games. Thus the total number of possibilities increases by 5, from 13 to 18.

Each successive increase will add 5 more to the number of possibilities. We want to get as close as possible to $63/2=31\frac12$. The closest point is 33, if we ask "Is $x<11$?". This splits the remaining possibilities 33:30.

Case 2: 1st game has 7-14, other games have 0-6.

In this case we start with 57 possibilities.

An answer "Yes" to "Is $x<0$?" will leave just 7 possibilities, all of them in the 2nd game, in which this answer is a lie.

An answer "Yes" to "Is $x<1$?" will remove 1 possibility ($x=0$) from the 2nd game but add $x=0$ to the 3rd to 8th games, thus adding 6 possibilities to make an overall net increase of 5: a total of 12 possibilities.

By the same logic as before, if we ask "Is $x<4$?", we will split the original 57 possibilities 27:30, and this is the most balanced split available.

Third round

The worst possible outcome coming into the 3rd round is a "Yes" answer from case 1 of the 2nd round, which yields 33 possibilities. Since we have only five Yes/No questions to go, and they can only cover $2^5=32$ possibilities between them, we will not be able to distinguish between every remaining possibility.

So 7 questions are not enough.

Repeating the analysis, but assuming 8 questions and therefore using 9 sub-games, gives a larger margin of error: 135 possibilities to be distinguished by 8 questions. The worst-case number of possibilities by following the most even split is:

After Round 1: 71.
After Round 2: 38.
After Round 3: 21.
After Round 4: 12.
After Round 5: 7.
After Round 6: 4.
After Round 7: 2.
After Round 8: 1.

So 8 questions appear to suffice. The problem is that we have not proven that the most even split is the worst case, although it is likely to be so.

This is wrong too, not only for the same reasons as I have now twice described to you. You also implicitly assume that the most even split is the best split, without proving it. In mathematics you are obliged to make clear what you have assumed and what you have not, otherwise it is downright misleading to other people. — user21820, Jun 03 '16 at 01:02
I've minimally edited your answer to make it reasonably correct; I hope you don't mind. After all, your idea can be made rigorous for the lower bound, though it does not seem to produce an upper bound. — user21820, Jun 03 '16 at 16:32

score -1 · Answer 3 · edited Apr 13 '17 at 12:20

There has been some discussion of this in a newer question, where you will find a detailed exposition and a certain amount of disagreement; but I will summarise the information-theoretic aspects here.

The key is to note the correct information content of the problem. Let us suppose that $Q$ questions are to be asked. I will cause irritation and confusion by supposing that the values to be guessed are in the half-open interval $[0,N)$ rather than the closed one you specified. This means that there are exactly $N$ possibilities to be guessed, rather than $N+1$, and it makes the algebra easier to read. The other change in notation is that I will refer to the questions as being of the form "$x<k$?".

The answerer has to have two pieces of information to do his job. One is the number $n$ to be guessed: $N$ possibilities. The other is the number $q$ of the question which he is to answer with a lie. $q$ does lie in the closed interval $[0,Q]$, because we can take $q=Q$ to mean "the answerer never lies". Thus the total number of possibilities between which the answerer has to choose is $(Q+1)N$.
The questioner can't reach his conclusion without deducing all the information that the answerer needed to produce his answers: thus, $(Q+1)N$ possibilities. Another way of looking at it is that when the questioner has guessed the number $n$, he also knows the number $q$.

Since each question can distinguish between two possibilities only, $Q$ questions are needed to distinguish between $2^Q$ possibilities. Thus, comparing the possibilities distinguishable by $Q$ questions with the possibilities that need to be distinguished, we have $$2^Q\ge(Q+1)N$$

I shall present an algorithm which, I claim, is within one or two units of achieving a successful guess with the smallest integer $Q$ that satisfies the above equation.

The strategy

The questioner's strategy is to play $Q+1$ games simultaneously: one in which the answerer lies at question $0$, one in which he lies at question $1$, and so on up to question $Q-1$; plus one in which the answerer doesn't lie at all (which it is not illogical to describe at "lies at question $Q$", since question $Q$ is never asked).

At each stage, the questioner asks one and only one question, "is $x<k$?", and applies the answer to all $Q+1$ simultaneous games. Thus the answerer only hears $Q$ questions, as required.

At the start, each game has all $N$ numbers as possibilities. Each answer to a question reduces that number. It should be noted that there is no question of lying in these sub-games: each one receives, at each stage, a true answer. All the questioner has to remember is that the answer to question $q$ should be inverted ("Yes" becomes "No" and "No" becomes "Yes") for game $q$, to allow for the fact that game $q$ is the one in which answer $q$ is a lie.

As time goes on, the possible numbers resulting from each of the $Q+1$ games become fewer and fewer. For some games they all disappear, and eventually they disappear for all games except one, and in that game only one number is left. At that point the number is known, and the identity of the game which still contains that number gives the number of the question that was answered falsely (with, as before, game number $Q$ denoting "there were no false answers").

The strategy at step $q$ is therefore:

Define $N_q$ as the number of possible numbers, totalled across all games (thus, the number of possible $\langle n,q\rangle$ pairs). Clearly $N_0=(Q+1)N$.
Ask a question "$x<k$?" such that a Yes answer eliminates precisely half of the possible numbers, totalled across all games. Then a No answer will also eliminate half the possible numbers. We can say that $N_<=N_\ge=\frac12N_q$.
Given the answer, eliminate all numbers in game $q$ which do correspond to the answer, because in this round the answer to game $q$ is a lie so that "$x<k$" means that $x$ is greater than or equal to $k$. In all other games, eliminate all numbers which do not correspond to the answer, because in this round the answer to all games other than $q$ is true.
You now have either $N_{q+1}=N_<$ or $N_{q+1}=N_\ge$: that is to say, $N_{q+1}=\frac12 N_q<$ if $k$ was chosen perfectly.

The possibility of choosing $k$

At any stage, games can be of three kinds:

Games in which the lie has already been told.
Games in which the lie is being told in the current round.
Games in which the lie will be told in the future (or will never be told).

The ranges of numbers covered by class (1) are all disjoint (since each game comes from a different sequence of Yes/No after allowing for lies). Classes (2) and (3) all contain identical ranges to each other, but that range is disjoint from any of the class (1) ones.

The effect of this on the value of $N_<$ for various potential values of $k$ is as follows:

Wherever $k$ is in class (1), $dN_</dk=+1$. That is, each increase of 1 in "$x<k?$" yields an increase of 1 in the number of numbers that remain possible if the answer is Yes.
Wherever $k$ is in class (2) or (3), $dN_</dk=Q-q-2$. That is, each increase of 1 in "$x<k?$" yields an increase of $Q-q-2$ in the number of numbers that remain possible if the answer is Yes. This is because there are $Q-q-1$ games in class (3) and they all share the same range of possibilities and are all affected by answers in the same way, and there is $1$ game in class (2) which shares the same range, but for which the number of possible numbers decreases by 1 unit for each 1-unit increase in $k$. (The reverse direction is because a "Yes" to "$x<k?$" means, for class (2), "No").
If $k$ is not in any of these classes, $dN_</dk=0$.

Leaving aside occasional flat spots, the graph of $N_<$ versus $k$ is therefore monotone. Where the slope is 1, it is easy to find a $k$ such that $N_<=\frac12N_q$. This would yield an explicit algorithm which halves the number of possibilities at each stage and therefore reaches a conclusion in the minimum theoretical time.

The only time where this breaks down is when the slope is not 1 at the point where the horizontal line $N_<=\frac12 N_q$ intersects the graph. In that region $N_<$ jumps by $Q-q-2$ units for each one-unit step in $k$. This means that the attainable value of $N_<$ closest to $\frac12 N_q$ could be $\frac12(Q-q-2)$ units away from it. This, in turn, means that rather than having the ideal $$N_{q+1}=\frac12 N_q$$ we may have $$N_{q+1}=\frac12 N_q+\frac12(Q-q-2)$$

This slightly slows the progress of the game, and it makes explicit analysis difficult; but since the largest value of $(Q-q-2)$ will be of the order of $\log_2{N}$, it will be seen that it does not have an overwhelming effect on progress.

The algorithmic complexity of finding the right value of $k$ is essentially logarithmic. It is a question of sorting the available ranges into order; a binary search to find the range which brackets the desired value of $N_<$; and simple arithmetic to find the desired value in the range.

I already told you why your reasoning is wrong and you insist on pushing your invalid mathematics. I appreciate your interest in this problem and the time you spent writing code, but your mathematics is simply wrong. I'll try one last time to explain, but if you don't listen I can't help you. You claim there are $(Q+1)N$ possibilities that the questioner who asks $Q$ questions needs to distinguish. That is utter nonsense. Given $N$, the exact questions which were lied to are completely determined, and hence there are exactly $N$ possibilities, not more. — user21820, Jun 03 '16 at 00:39
To make it very concrete just for your sake, since you don't seem to understand basic logic, let's only consider the case of $N = 2$ and $Q = 1$. You claim that there are $4$ distinct situations to be distinguished. List them. You can't!!! There are only $2$ distinct situations. — user21820, Jun 03 '16 at 00:41
And as I stated in a comment on your other post, if the questioner's first 4 queries happen to be $<x$ twice and $<x+1$ twice, the answerer is forced to lie to one of those queries if he ever wants to lie, and so not only are the possibilities restricted, if he doesn't lie then your value of $q$ is totally undetermined by the questioner even though he knows $x$!!! — user21820, Jun 03 '16 at 00:59

Greater/lesser search with one false answer allowed

3 Answers3