Take a truly random binary string of say $64$ bits , all the ones and zeros are considered IID variables. Now perform the Fisher-Yates shuffle on this string to rearrange the bits. Are the bits still considered to be IID variables after shuffling?
-
1Do you think they are? If so, how have you tried to prove it? If not, what are your thoughts on showing that they are not? – John Douma Oct 21 '18 at 05:14
-
This could have a good question had you showed what you've tried for proving or disproving it. I could post a proof if you add your thoughts on this problem so far. – Ѕᴀᴀᴅ Oct 24 '18 at 11:57
-
@AlexFrancisco: I am just a computer science hobbyist , I don't have the knowledge to begin to prove or disprove it. Actually I'm not looking for a proof, a yes or no with an explanation will suffice. – William Hird Oct 24 '18 at 19:18
1 Answers
$\DeclareMathOperator{\be}{Be}\DeclareMathOperator{\bi}{Bi}$
At first glance I would have said they are, because Fisher-Yates algorithm doesn't care for the values, only their positional index. Of course, this isn't a proof, just intuition.
But then I thought: if I know the original string, then the individual bits of the shuffled one can't be independent of each other. In the same way, if you draw something from a hat without replacement, you're bounded by what's left in the hat.
So I did the math. Let's consider the individual $n$ (64) bits, their values is represented by a random variable $X_i\sim\be(p)$[Bernoulli], where $p$ is the probability of $X_i=1$, in this case $p=0.5$.
The total number of 1s in the string, $S=\sum_0^n X_i$, is also a random variable $S\sim\bi(n,p)$[Binomial]. In particular we have $$ P(S=k)=\binom{n}{k}p^k(1-p)^{n-k}\qquad 0\le k\le n $$
Now let's consider the individual bits of the shuffled string and let's call their values $Y_i$. If we knew the value of $S$, we could easily calculate the probability of $Y_i=1$. That's because Fisher-Yates yields an unbiased shuffle, every sequence is equally likely, so we just divide the number of favorable cases by the number of possible cases $$ P\bigl(Y_i=1\mid S=k\bigr)=\frac{\binom{n-1}{k-1}}{\binom{n}{k}}=\frac{k}{n} $$
Knowing the conditional probability $P(Y_i=1\mid S=k)$ we can find \begin{align} P(Y_i=1)&=\sum_{k=0}^n P\bigl(Y_i=1\mid S=k\bigr) P(S=k)=\sum_{k=0}^n\frac{k}{n}\cdot P(S=k)=\sum_{k=1}^n\frac{k}{n}\cdot P(S=k)\\ &=\sum_{k=1}^n\frac{n}{k}\binom{n}{k}p^k(1-p)^{n-k}=\sum_{k=1}^n\frac{\binom{n-1}{k-1}}{\binom{n}{k}}\binom{n}{k}p^k(1-p)^{n-k}\\ &=\sum_{k=1}^n\binom{n-1}{k-1}p^k(1-p)^{n-k} \end{align}
Applying index substitution $k=j+1$ \begin{align} P(Y_i=1)&=\sum_{j=0}^{n-1}\binom{n-1}{j}p^{j+1}(1-p)^{n-j-1}=p\sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j}\\ &=p\sum_{j=0}^{m}\binom{m}{j}p^j(1-p)^{m-j}=p\bigl(p+(1-p)\bigr)^m=p \end{align}
Therefore $Y_i\sim\be(p)$. Not only bits in the shuffled sequence are IID random variables, but they have the same distribution of the original ones.
- 1,124
-
Interesting answer. Having read this I thought the answer would be no, they can't be IID random variables because the Fisher-Yates shuffle is a deterministic process so your bit string is the result of a deterministic process and therefore cant be truly random and therefore cant be IID. Does that make sense ? – William Hird Oct 25 '18 at 09:18
-
1There are two things with which I do not agree. First, "Fisher-Yates shuffle is a deterministic process", is it though? Run it with the same input twice and you'll likely end up with two different results, so no. Second, consider a deterministic process such as a hash function. Give it a random input, you'll get a random output. – francescop21 Oct 25 '18 at 10:54
-
Fisher-Yates runs on a seeded pseudorandom generator, that seed has only so much entropy so this "engine" can only produce so many IID permutations with that limited entropy, right? – William Hird Oct 25 '18 at 11:11
-
That surely introduces some sort of bias, but it doesn't mean that it is deterministic. Of course Fisher-Yates being unbiased depends on the ability to generate random numbers, but this have nothing to do with the theoretical aspect. – francescop21 Oct 25 '18 at 11:18
-
OK, I'm not knowledgeable enough to disagree with you here, so please accept the bounty :-) – William Hird Oct 25 '18 at 11:45
-
@WilliamHird what I'm trying to say is that real-world implementations are an approximation of the theory. For instance, is 1 divided by 2 equal to 0? Of course not, but if you consider the 4-bit representation 0001 and divide that by 2 (i.e. shift all bits by one to the right), you get 0000. But, again, this doesn't mean that $1/2=0$. [Overflow error is another example of this kind.] In our case FY shuffle gives you IID bits, provided you can generate random numbers well enough. Otherwise, you just get an approximation. – francescop21 Oct 25 '18 at 15:10
-
I'm not sure what you are saying is true from an information theoretic point of view. But never mind, we are being thrown out by the Stack Exchange Stasi :-) – William Hird Oct 25 '18 at 19:22