I am trying to analyze analytically what would happen if I build a random simulator as follow:
I begin with $ N $ lists of length 1, at each iteration, I will pick a random (non-empty) list, reduce its length by 1. And then, with probably $ p $, I create a new list of length 1, otherwise, I pick a random (non-empty) list and increase its length by 1. Every iteration keeps the total lengths of the lists constant, and I am interested in the distribution of list lengths.
A simple Monte Carlos simulation yield some interesting findings. With $ N = 100 $ and $ p = 0.5 $, we have, approximately $ P(L = 1) \approx 2 P(L = 2) $ and $ P(L = 2) \approx 2 P(L = 3) $, and so on. As it is just a simulation, the number is not exact of course. I run the simulator 1,000,000 times to be sure we reach stationary.
I really wonder there could be some time-reversible Markov Chain that could explain this, but my skill there is really rusty and some help would be greatly appreciated.
Attached below is the code I used to perform the Monte Carlos simulation, just for reproducibility.
from random import *
Size = 100
D = {}
N = [1] * Size
L = Size
R = Size
p = 50
for i in range(0, Size + 1):
D[i] = 0
for i in range(0, 1000000):
toremove = randint(0, R - 1)
N[toremove] = N[toremove] - 1
if N[toremove] == 0:
N[toremove] = N[R - 1]
N[R - 1] = 0
R = R - 1
tocreate = randint(0, 100)
if tocreate < p:
N[R] = N[R] + 1
R = R + 1
else:
if R == 1:
toappend = 0
else:
toappend = randint(0, R - 1)
N[toappend] = N[toappend] + 1
for n in N:
D[n] = D[n] + 1
print(D)
EDIT:
To make the problem more tractable for a manual calculation. I reduced the list size to 3. In this case, we have only 3 meaningful states. We have 3 lists of size 1, 1 list of size 2 together with 1 size of size 1, and 1 list of size 3.
The transition matrix for these states is as follow (the probability values can be reasoned by a case by case analysis).
$\left(\begin{array}{ccc} p & 1-p & 0 \\ \frac{p}{2} & \frac{1}{2} & \frac{1-p}{2} \\ 0 & p & 1-p \end{array}\right)$
For $ p = \frac{1}{2} $, the stationary probability can be found as the eigen vector of $ T' $ corresponding to eigenvalue 1 to be
$\left(\begin{array}{c} \frac{1}{4} \\ \frac{1}{2} \\ \frac{1}{4} \end{array}\right) $
That explains the list length distribution because the expected number of lists of length 1 will be $ N \times (\frac{1}{4} \times 3 + \frac{1}{2} \times 1) = \frac{5N}{4} $. The number of list of length 2 would be $ N \times \frac{N}{2} = $ and the number of lists of length 1 would be $ N \times \frac{1}{4} = \frac{N}{4} $, which is roughly the doubling that we are seeing. In fact $ \frac{5N}{4} $ matches better to the experiment data than $ N $.
Attached is the octave code for computing the Markov chain stationary state probability as well as the expected list lengths. The 10000 power is used as an easy way to approximate. With this code, now I can easily tune $ p $ and figure $ p $ can be used effectively to change list length distributions.
p = 0.5
T = [p, (1-p), 0;p/2,1/2,(1-p)/2;0,p,(1-p)];
([1,0,0] * T^10000) * [3,0,0;1,1,0;0,0,1]
All the above shown Markov chain could be used to analyze the situation. However, it requires a lot of effort to analyze the states in the case by case manner and it certainly won't scale for big $ N $. I am planning to run this simulator with $ N = 1000000 $ or above.


