4

I asked this question a week ago but it ended up being closed for not being specific enough. I am going to try and define everything as mathematically as possible in this one, so please let me know if there is anything ambiguous.

How can I randomly distribute W litres of water into n empty buckets without them overflowing (i.e. without putting more water into the bucket then it can hold)

There are $n$ buckets of water with the $i^{th}$ bucket containing $w_i$ litres of water.

1) $W = w_1 + w_2 + ... + w_n =$ Total amount of water

2) $w_i$ $\epsilon$ $[0, x]$ for $i = 1, 2, ..., n$ where $x$ $\epsilon$ $\Re$

Given a natural number $n$ and real number value for $x$ and $W$, what process can I use to assign a random value to each $w_i$, such that (1) and (2) remain true?

To clarify what I mean by random, I am thinking of a function which returns a value in the interval $[0, 1]$ similar to the .NextDouble() method in the Random class (And yes, I know this is pseudo-random and not truly random).

Example:

For $n$ = 3, $W$ = 1 and $x$ = 1 one possible solution is:

$w_1$ = 0.22 $w_2$ = 0.43 $w_3$ = 0.35

Since $w_1 + w_2 + w_3 = 0.22 + 0.43 + 0.35 = 1 = W$ and $w_1$, $w_2$ and $w_3$ are in the interval $[0,1]$

  • 1
    Wait so $w_i$ already has between $[0,x]$ liters of water before we add the extra W? Or it can only hold between $[0,x]$ liters of water. – Tony Hellmuth May 30 '18 at 03:34
  • The buckets are empty to begin with and each one can only hold between $[0,x]$ litres of water. – interrupt0 May 30 '18 at 03:41
  • But then as long as $W/n \le x$ we can always fill the buckets with probability 1? – Tony Hellmuth May 30 '18 at 03:47
  • Have you ever seen the problem of breaking a stick into three pieces, and finding the probability that a triangle can be formed from the pieces? This seems like that. Perhaps you could just assign amount of water randomly, then throw away all data sets such that there exists at least one $i$ for which $w_i>x$? Then work within the confines of your 'sifted' set? I'm spitballing. – The Count May 30 '18 at 03:48
  • Also, this is such a basic idea, yet I am astonished how hard it is to formulate in a rigorous way. – The Count May 30 '18 at 03:49
  • I'm not really sure how the stick problem relates to the question. Could you explain it a bit more? – interrupt0 May 30 '18 at 04:09
  • As for assigning the water randomly, I don't think it would satisfy (1) but even if it did, I would potentially have to do this for a very large number of buckets so I don't think that would be the best idea. It is weird how conceptually simple the problem is yet it is so hard to answer. :/ – interrupt0 May 30 '18 at 04:13
  • 1
    When people say they want a "process" to assign "random" values, they can mean few different things. By "process", do you just mean an algorithm, or do you specifically want to compute $w_1$, then $w_2$, and so on in sequential order in a single pass? By "random", do you mean only that there should be a different solution every time even if some possible solutions are never generated, or that there should be a nonzero probability to generate every possible solution, or even more strongly that every possible solution should have an equal (i.e. uniform) probability of being generated? –  May 30 '18 at 04:49
  • For example, here's a solution for the most lax requirements: Randomly pick an empty bucket. Fill it with as much water as you can. Repeat until you run out of water. –  May 30 '18 at 05:26
  • Sorry @Rahul that was a bit vague. I am looking for an algorithm (preferably as efficient as possible) that can generate every possible solution with a uniform probability. – interrupt0 May 30 '18 at 21:34

4 Answers4

1

As clarified in the comments, you want to uniformly sample the set of all points $\vec w = (w_1,\dots,w_n)$ such that $$w_1+\dots+w_n = W,\\ 0\le w_i\le x\ \text{for all $i=1,\dots,n$.}$$

Here are two possibilities.

  1. You can use rejection sampling: Uniformly generate a point satisfying $w_1+\dots+w_n=W$, $0\le w_i\ \forall i$; if it violates the capacity constraint $w_i\le x$ for any $i$, reject it and try again.

    The first step can be done easily by generating $n-1$ uniform random numbers between $0$ and $W$, sorting them so they are $s_1,\dots,s_{n-1}$ in increasing order, and setting $w_1=s_1-0$, $w_2=s_2-s_1$, . . . , $w_n = W-s_{n-1}$. This is the stick-breaking model alluded to by The Count, since it is equivalent to breaking a stick of length $W$ into $n$ pieces by making cuts at locations $s_1,\dots,s_{n-1}$.

    Rejection sampling is guaranteed to give you a uniform distribution, but the downside is that you may have to try many many times before obtaining a valid sample, especially if $x\ll W$.

  2. Tim Seguine on MathOverflow recommends hit-and-run sampling: Start at an arbitrary point $\vec w$ inside your set; pick a direction $\vec d$ uniformly at random; find the range of values $t$ for which $\vec w+t\vec d$ lies in the set; pick a $t$ uniformly at random from this range; replace $\vec w$ with the new point $\vec w+t\vec d$ and repeat. (This is essentially Seguine's description with the variables renamed.)

    For the initialization, the choice $\vec w=(W/n,\dots,W/n)$ is as good as any. To pick a direction $\vec d$ uniformly, a simple way is to choose $n$ numbers from a standard normal distribution, subtract the mean, form them into a vector and normalize it. To find the range of values for $t$, compute all the critical values $t_{i,1} = -w_i/d_i$ and $t_{i,2} = (x-w_i)/d_i$ where $\vec w+t\vec d$ crosses a constraint boundary; since we already know $\vec w$ is in the set, $t=0$ must be valid, so $t$ must lie between the largest negative value and the smallest positive value.

    Hit-and-run sampling converges to a uniform distribution as you repeat the process, but if you stop it after a finite number of iterations, the distribution may only be approximately uniform. Nevertheless, I did some experiments for low $n$ and the distribution appeared to approach uniformity quite rapidly. My totally uninformed conjecture is that $n$ iterations is probably "good enough", at least for your problem.

Both methods are easy to implement, so you should give them both a try. Thanks for asking this question; I wouldn't have learned about hit-and-run sampling otherwise.

  • No problem. Thank you for your answer. I will definitely look in to hit-and-run and rejection sampling although I will probably have to cut some corners for efficiency's sake. – interrupt0 Jun 01 '18 at 04:05
1

The exact method

The formula $W = w_1 + w_2 + ... + w_n$ describes an $(n-1)$-dimensional hyperplane in $\mathbb R^n.$ The intersection of this hyperplane with the $n$-dimensional hypercube $[0,x]^n$ is a convex $(n-1)$-dimensional polytope embedded in $\mathbb R^n.$ Find that polytope and sample uniformly over it. In fact you can take the image of that polytope in $\mathbb R^{n-1}$ when you remove the $n$th coordinate, sample the first $n-1$ coordinates uniformly over that polytope, and solve for the $n$th coordinate.

Admittedly, this is just changing the problem to the problem of describing and sampling the polytope for a given $W$ and $x,$ which also can be a complicated problem.

Rejection sampling

As suggested already in another answer, you can use uniform sampling over the polytope $W = w_1 + w_2 + ... + w_n \cap [0,\infty)^n$ (that is, where all the $w_i$ are non-negative).

If $W \leq x$ this is a complete solution.

If $x < W \leq \frac12 nx,$ sample uniformly over $W = w_1 + w_2 + ... + w_n \cap [0,\infty)^n$ but reject the outcome if there is any value of $i$ for which $w_i > x$ in that outcome.

If $W > \frac12 nx,$ change the problem to one of randomly distributing $nx - W$ "empty space" among $n$ buckets, each of which can hold no more than $x$ "empty space". Then fill the rest of each bucket with water.

This avoids the extreme rejection rates that could otherwise occur when $W$ is close to $nx.$ In fact, for $(n-1)x \leq W \leq nx$ this gives a solution with no rejection.


Either of these methods can be adapted to the more general problem in which each bucket's capacity is individually specified, that is, where the constraint is $0 \leq w_i \leq x_i.$

David K
  • 98,388
  • I accepted Rahul's answer because I found it easier to understand but nonetheless thank you for taking the time to make such a detailed response. – interrupt0 Jun 01 '18 at 04:09
  • @Darqk I think you chose the best answer. My answer is just adding some details. – David K Jun 01 '18 at 19:58
0

I think what you are asking is can I empty W, given x and n. That is I start with $W$ and pour into each $w_i$ for $i \in [1,n]$ until it's capacity of between $[0,x]$ is full. Then if after filling $w_n$ we have water left over, we cannot do it. So we want: $$P(\sum_{i=1}^{n}w_i- W=0|w_1=x) =P(\frac Wn\le x|w_1=x)= {1, \ W\le nx \brace 0, \ W>nx}$$

You could make it more interesting by having $w_i \in [0,x_i]$. That way at least you assign a "random value" to $w_i$ so each bucket is different.

Sorry if I misinterpreted your question.

Tony Hellmuth
  • 1,157
  • 7
  • 19
  • Just to clarify Tony, I am not trying to find the probability that I will be able to fill the buckets. What I am looking for is an algorithm that I can use to determine the amount of water to put into each bucket such that (1) and (2) remain true. Thanks for the answer though. – interrupt0 May 30 '18 at 04:01
  • Since each bucket has capacity x, then is it not $x-\frac Wn$? Ah I see what you mean. – Tony Hellmuth May 30 '18 at 04:03
0

EDIT: I had a different answer, but Darq pointed out that it wasn't correct.

Discretize the problem, say all values are integers. (If they're rational to begin with, you can multiply by a common denominator.) Repeatedly do the following: take one unit of water and randomly add it to any non-full bucket.

  • The problem with this is that it doesn't take into account that the weights have to sum up to $W$. For example, if n = 3, x = 2, W = 5, and using the random weights w'_1 = 1.2, w'_2 = 0.2 and w'_3 = 0.6 then the normalized weights would be w_1 = 0.6, w_2 = 0.1 and w_3 = 0.3. This currently sums to 1 and we need it to sum to W. If we multiple the weights by W, then the individual weights may not satisfy condition (2). If we do that for this example we get w_1 = 3 which is definitely outside of the interval [0, 2] – interrupt0 May 30 '18 at 21:15