Permutation Algorithms

Question

Take a truly random binary string of say $64$ bits , all the ones and zeros are considered IID variables. Now perform the Fisher-Yates shuffle on this string to rearrange the bits. Are the bits still considered to be IID variables after shuffling?

Do you think they are? If so, how have you tried to prove it? If not, what are your thoughts on showing that they are not? — John Douma, Oct 21 '18 at 05:14
This could have a good question had you showed what you've tried for proving or disproving it. I could post a proof if you add your thoughts on this problem so far. — Ѕᴀᴀᴅ, Oct 24 '18 at 11:57
@AlexFrancisco: I am just a computer science hobbyist , I don't have the knowledge to begin to prove or disprove it. Actually I'm not looking for a proof, a yes or no with an explanation will suffice. — William Hird, Oct 24 '18 at 19:18

score 0 · Accepted Answer · answered Oct 25 '18 at 09:01

$\DeclareMathOperator{\be}{Be}\DeclareMathOperator{\bi}{Bi}$

At first glance I would have said they are, because Fisher-Yates algorithm doesn't care for the values, only their positional index. Of course, this isn't a proof, just intuition.

But then I thought: if I know the original string, then the individual bits of the shuffled one can't be independent of each other. In the same way, if you draw something from a hat without replacement, you're bounded by what's left in the hat.

So I did the math. Let's consider the individual $n$ (64) bits, their values is represented by a random variable $X_i\sim\be(p)$^[Bernoulli], where $p$ is the probability of $X_i=1$, in this case $p=0.5$.

The total number of 1s in the string, $S=\sum_0^n X_i$, is also a random variable $S\sim\bi(n,p)$^[Binomial]. In particular we have $$ P(S=k)=\binom{n}{k}p^k(1-p)^{n-k}\qquad 0\le k\le n $$

Now let's consider the individual bits of the shuffled string and let's call their values $Y_i$. If we knew the value of $S$, we could easily calculate the probability of $Y_i=1$. That's because Fisher-Yates yields an unbiased shuffle, every sequence is equally likely, so we just divide the number of favorable cases by the number of possible cases $$ P\bigl(Y_i=1\mid S=k\bigr)=\frac{\binom{n-1}{k-1}}{\binom{n}{k}}=\frac{k}{n} $$

Knowing the conditional probability $P(Y_i=1\mid S=k)$ we can find \begin{align} P(Y_i=1)&=\sum_{k=0}^n P\bigl(Y_i=1\mid S=k\bigr) P(S=k)=\sum_{k=0}^n\frac{k}{n}\cdot P(S=k)=\sum_{k=1}^n\frac{k}{n}\cdot P(S=k)\\ &=\sum_{k=1}^n\frac{n}{k}\binom{n}{k}p^k(1-p)^{n-k}=\sum_{k=1}^n\frac{\binom{n-1}{k-1}}{\binom{n}{k}}\binom{n}{k}p^k(1-p)^{n-k}\\ &=\sum_{k=1}^n\binom{n-1}{k-1}p^k(1-p)^{n-k} \end{align}

Applying index substitution $k=j+1$ \begin{align} P(Y_i=1)&=\sum_{j=0}^{n-1}\binom{n-1}{j}p^{j+1}(1-p)^{n-j-1}=p\sum_{j=0}^{n-1}\binom{n-1}{j}p^j(1-p)^{n-1-j}\\ &=p\sum_{j=0}^{m}\binom{m}{j}p^j(1-p)^{m-j}=p\bigl(p+(1-p)\bigr)^m=p \end{align}

Therefore $Y_i\sim\be(p)$. Not only bits in the shuffled sequence are IID random variables, but they have the same distribution of the original ones.

Interesting answer. Having read this I thought the answer would be no, they can't be IID random variables because the Fisher-Yates shuffle is a deterministic process so your bit string is the result of a deterministic process and therefore cant be truly random and therefore cant be IID. Does that make sense ? — William Hird, Oct 25 '18 at 09:18
There are two things with which I do not agree. First, "Fisher-Yates shuffle is a deterministic process", is it though? Run it with the same input twice and you'll likely end up with two different results, so no. Second, consider a deterministic process such as a hash function. Give it a random input, you'll get a random output. — francescop21, Oct 25 '18 at 10:54
Fisher-Yates runs on a seeded pseudorandom generator, that seed has only so much entropy so this "engine" can only produce so many IID permutations with that limited entropy, right? — William Hird, Oct 25 '18 at 11:11
That surely introduces some sort of bias, but it doesn't mean that it is deterministic. Of course Fisher-Yates being unbiased depends on the ability to generate random numbers, but this have nothing to do with the theoretical aspect. — francescop21, Oct 25 '18 at 11:18
OK, I'm not knowledgeable enough to disagree with you here, so please accept the bounty :-) — William Hird, Oct 25 '18 at 11:45
@WilliamHird what I'm trying to say is that real-world implementations are an approximation of the theory. For instance, is 1 divided by 2 equal to 0? Of course not, but if you consider the 4-bit representation 0001 and divide that by 2 (i.e. shift all bits by one to the right), you get 0000. But, again, this doesn't mean that $1/2=0$. [Overflow error is another example of this kind.] In our case FY shuffle gives you IID bits, provided you can generate random numbers well enough. Otherwise, you just get an approximation. — francescop21, Oct 25 '18 at 15:10
I'm not sure what you are saying is true from an information theoretic point of view. But never mind, we are being thrown out by the Stack Exchange Stasi :-) — William Hird, Oct 25 '18 at 19:22

Permutation Algorithms

1 Answers1