Random list lengths

Question

I am trying to analyze analytically what would happen if I build a random simulator as follow:

I begin with $ N $ lists of length 1, at each iteration, I will pick a random (non-empty) list, reduce its length by 1. And then, with probably $ p $, I create a new list of length 1, otherwise, I pick a random (non-empty) list and increase its length by 1. Every iteration keeps the total lengths of the lists constant, and I am interested in the distribution of list lengths.

A simple Monte Carlos simulation yield some interesting findings. With $ N = 100 $ and $ p = 0.5 $, we have, approximately $ P(L = 1) \approx 2 P(L = 2) $ and $ P(L = 2) \approx 2 P(L = 3) $, and so on. As it is just a simulation, the number is not exact of course. I run the simulator 1,000,000 times to be sure we reach stationary.

I really wonder there could be some time-reversible Markov Chain that could explain this, but my skill there is really rusty and some help would be greatly appreciated.

Attached below is the code I used to perform the Monte Carlos simulation, just for reproducibility.

from random import *
Size = 100
D = {}
N = [1] * Size
L = Size
R = Size
p = 50
for i in range(0, Size + 1):
    D[i] = 0
for i in range(0, 1000000):
    toremove = randint(0, R - 1)
    N[toremove] = N[toremove] - 1
    if N[toremove] == 0:
        N[toremove] = N[R - 1]
        N[R - 1] = 0
        R = R - 1
    tocreate = randint(0, 100)
    if tocreate < p: 
        N[R] = N[R] + 1
        R = R + 1
    else:
        if R == 1:
            toappend = 0
        else:
            toappend = randint(0, R - 1)
        N[toappend] = N[toappend] + 1
    for n in N:
        D[n] = D[n] + 1
print(D)

EDIT:

To make the problem more tractable for a manual calculation. I reduced the list size to 3. In this case, we have only 3 meaningful states. We have 3 lists of size 1, 1 list of size 2 together with 1 size of size 1, and 1 list of size 3.

The transition matrix for these states is as follow (the probability values can be reasoned by a case by case analysis).

$\left(\begin{array}{ccc} p & 1-p & 0 \\ \frac{p}{2} & \frac{1}{2} & \frac{1-p}{2} \\ 0 & p & 1-p \end{array}\right)$

For $ p = \frac{1}{2} $, the stationary probability can be found as the eigen vector of $ T' $ corresponding to eigenvalue 1 to be

$\left(\begin{array}{c} \frac{1}{4} \\ \frac{1}{2} \\ \frac{1}{4} \end{array}\right) $

That explains the list length distribution because the expected number of lists of length 1 will be $ N \times (\frac{1}{4} \times 3 + \frac{1}{2} \times 1) = \frac{5N}{4} $. The number of list of length 2 would be $ N \times \frac{N}{2} = $ and the number of lists of length 1 would be $ N \times \frac{1}{4} = \frac{N}{4} $, which is roughly the doubling that we are seeing. In fact $ \frac{5N}{4} $ matches better to the experiment data than $ N $.

Attached is the octave code for computing the Markov chain stationary state probability as well as the expected list lengths. The 10000 power is used as an easy way to approximate. With this code, now I can easily tune $ p $ and figure $ p $ can be used effectively to change list length distributions.

p = 0.5
T = [p, (1-p), 0;p/2,1/2,(1-p)/2;0,p,(1-p)];
([1,0,0] * T^10000) * [3,0,0;1,1,0;0,0,1]

All the above shown Markov chain could be used to analyze the situation. However, it requires a lot of effort to analyze the states in the case by case manner and it certainly won't scale for big $ N $. I am planning to run this simulator with $ N = 1000000 $ or above.

@Semiclassical, there was a typo, the eigenvector I was showing was normalized, but it wasn't typed correctly, it should be [0.25,0.5,0.25]' instead of [0.5,0.25,0.5]', sorry about that mistake, the question is now corrected. — Andrew Au, Feb 04 '21 at 19:57

Semiclassical · Accepted Answer · 2021-02-05T00:55:10.593

I decided to write out the details assuming $4$ initial lists of length $1$. The five partitions are

$$4,3+1,2+2,2+1+1,1+1+1+1.$$ The transition probabilities are listed, with all others being zero.

\begin{align} &P(4\to 4)=p\\ &P(4\to 3+1)=1-p\\ &P(3+1\to 4)=\frac{1}{4}(1-p)\\ &P(3+1\to 3+1)=\frac12\\ &P(3+1\to 2+2)=\frac14(1-p)\\ &P(3+1\to 2+1+1)=\frac12p\\ &P(2+2\to 3+1)=\frac12(1-p)\\ &P(2+2\to 2+2)=\frac12(1-p)\\ &P(2+2\to 2+1+1)=p\\ &P(2+1+1\to 3+1)=\frac13(1-p)\\ &P(2+1+1\to 2+2)=\frac13(1-p)\\ &P(2+1+1\to 2+1+1)=\frac13(1+p)\\ &P(2+1+1\to 1+1+1+1)=\frac13 p\\ &P(1+1+1+1\to 2+1+1)=1-p\\ &P(1+1+1+1\to 1+1+1+1)=p \end{align}

Hence the transition matrix is

$$ \begin{pmatrix} p & 1-p & 0 & 0 & 0 \\ \frac14(1-p) & \frac12 & \frac14(1-p) & \frac12 p & 0\\ 0 & \frac12(1-p) & \frac12 (1-p) & p & 0 \\ 0 & \frac13(1-p) & \frac13 (1-p) & \frac13(1+p) & \frac13 p\\ 0 & 0 & 0 & 1-p & p \end{pmatrix} $$

Specializing to $p=1/2$, the left-eigenvector corresponding to eigenvalue $1$ is $\frac{1}{15}(1,4,2,6,2)$.

$$4,3+1,2+2,2+1+1,1+1+1+1.$$

Hence the expected number of lists of lengths $1$, $2$, $3$, $4$ are respectively \begin{align} N\cdot 1\cdot \frac{4}{15}+N\cdot 2\cdot \frac{6}{15}+N\cdot 4\cdot \frac{2}{15}&=\frac{24}{15}N,\\ N\cdot 2\cdot \frac{2}{15}+N\cdot 1\cdot \frac{6}{15}&=\frac{10}{15}N,\\ N\cdot 1\cdot \frac{4}{15}&=\frac{4}{15}N,\\ N\cdot 1\cdot \frac{1}{15}&=\frac{1}{15}N. \end{align} These gives ratios of $2.4,2.5,4$ which are roughly in agreement with the conjectured 2-fold ratio for $N\to \infty$.

In comments I noted a possible way to visualize the states and transitions for various list sizes. To display this I'll need to introduce a few concepts. The first is that of using a Young diagram to visualize integer partitions: For each term of the partition, we draw a row of that many boxes. We then join these stacks horizontally, yielding a quantity of boxes equal to the integer being partitioned. For instance, the partition $(5,4,1)$ of $10$ would be represented by the Young diagram (Wikipedia source)

One convenient feature of a Young diagram is that it's easy to visualize adding or removing boxes. For instance, if we remove $1$ from $3$ to get the partition $(5,3,1)$ of $9$, and then add it to $1$ to get the partition $(5,3,2)$ of $10$, the sequence of Young diagrams is

What this idea in turn leads to is the concept of Young's lattice: We partially order the integer partitions according to whether one Young diagram can be obtained by removing some number of boxes. This is presented graphically as a Hasse diagram:

Note that said diagram is organized so that each level consists of those partitions for a given integer $(N=1,2,3,4,\ldots)$. By removing one box and adding another, we move down one level and then back up to our original level.

With all that built up, how do we use this to visualize the transitions in the problem of interest? If we take $N=4$, then the possible states are the diagrams in the highest level shown in the above diagram. Taking one of these Young diagrams as our state, we randomly choose one of the rows and remove the last box. This moves us down one level, with probability depending on multiplicity of identical rows (e.g., there are two ways to remove one box from the partition $(2,2)$ to reach $(2,1)$.) At this point, we choose with probability $p$ whether to place that box at the bottom as a new row. If not, then we choose one of the remaining rows at random and place it there. In either case, we end up moving back to the original highest level of the lattice.

If we consider various ways of moving in this lattice, we can work out the transition probabilities to go from one diagram in a given row to another. (It's still tedious but at least it's organized.) The real hope, though, would be to use this presentation to find a more systematic way of determining the probabilities. It does give one point of visual intuition, however: If $p=1/2$, then the most probable move is always to add one row. This amounts to picking the right-most upward move after moving down. As such, we expect our random walk on the Young lattice to pile up on the right. That, in turn, means we expect to see partitions with many short rows more often than a few long ones. This is at least qualitatively in agreement with the proposed $1/2^L$ scaling. Hopefully more quantitative insights are possible as well.

In principle, one could do a similar analysis starting with 5 lists of length 1, yielding the partitions $$5,4+1,3+2,3+1+1,2+2+1,2+1+1+1,1+1+1+1+1.$$ But handling that much casework by hand is more tedious than I'm willing to deal with. — Semiclassical, Feb 04 '21 at 07:04
Thank you for your attention and the detailed work for N = 4.
This is exactly what I did for the case N = 3, (although I didn't spell it out as clearly). We knew that the Markov chain approach works, but it requires an enormous amount of work if we wanted to compute for N = 1000000.

I was hoping for a shortcut where we can argue using the time-reversibility of the Markov chain using detailed balance. It may not be possible, I don't know.

https://en.wikipedia.org/wiki/Detailed_balance#Reversible_Markov_chains — Andrew Au, Feb 04 '21 at 20:05
@AndrewAu I do see one way of making it at least simpler to derive transition probabiilties, and that's to start with the Hasse diagram of Young's lattice. In that case, the possible states are one level of the lattice. For each transition, we 1) pick one of the adjacent lower partitions and move there, and 2) pick a random higher partition and move up to it. What makes it a bit weird is that the latter step doesn't have uniform probability, due to the $p$ vs $1-p$ probabilities. — Semiclassical, Feb 04 '21 at 20:24
I am unaware of the Hasse diagram or Young's lattice. Feel free to modify my problem as you see fits, it would be great to see those in action.
The reason I designed the $ p $ is to have a tunable parameter so that I can control the lengths of the lists. My idea is that as $ p $ increases, I should get longer lists. — Andrew Au, Feb 04 '21 at 20:40
Young's lattice is always a bit of a hassle to actually compute with. It seems more natural to me to use weak compositions of $N$ with at most $N$ parts (i.e. lists of length $N$ non-negative integers whose sum is $N$). Andrew's process could then be rephrased as: (1) randomly pick a non-zero entry to decrease (i.e. uniformly pick a downwards covering relation in the composition lattice), then (2) pick either one of the empty rows with fixed probability $p$ (at least one must exist by pigeonhole) or with probability $1-p$ uniformly randomly chose from the non-empty rows and pick... — Joshua P. Swanson, Feb 05 '21 at 03:38
...one of those. Note that (1) and (2) can be encoded in rectangular matrices and the transition matrix can be written as their product. This may give some vague theoretical hope, since such products arise in combinatorial Laplacians, e.g. see the Matrix-Tree Theorem and the Cauchy--Binet Formula. As for figuring out the stationary distribution, based on the matrices you two posted, it looks like the entries could be rational functions of $p$. You might be able to guess an explicit form and prove it. Computing, say, the expected length could then reduce to some explicit combinatorial identity. — Joshua P. Swanson, Feb 05 '21 at 03:38
@JoshuaP.Swanson I don't grok all of that, but I think I get the upshot: While the initial presentation in terms of partitions may seem simpler, it's just as natural to preserve the order of the lists and therefore work with the lattice of such compositions instead. In that case, the number of states (based on OEIS A001700) is simply $\binom{2N-1}{N}$ (so $N=3$ would have 10, $N=4$ would have 35, etc.) That makes me think it'd be interesting to reexamine the $N=3$ case using these states to see if the transition probabilities have more structure. — Semiclassical, Feb 05 '21 at 06:11
I don't mean to over-emphasize the partition vs. composition difference. The problem is symmetric with respect to rearranging compositions, so really it'll just change how some multiplicities are tracked. But yes, the growth rate of the dimensions is nicer with compositions, maybe it's more reasonable to try recursive approaches, etc. Something to play around with, anyway. Looking towards an ansatz, the $N=3$ matrix has stationary distribution $(p^2, 2 (1 - p) p, (1 - p)^2)$ (sums to $1$) and the $N=5$ matrix has $(1/2 (1 - p)^2, 2 (1 - p)^2, (1 - p)^2, 3 (1 - p) p, p^2)$ (doesn't sum to $1$). — Joshua P. Swanson, Feb 05 '21 at 10:38

Random list lengths

1 Answers1