0

I'm looking for a convenient way to generate $\text{Beta}(2,2)$ random variables, using independent $\text{Uniform}(0,1)$ random variables and elementary functions. I'd prefer to avoid rejection or iterative methods.

The PDF is $6 x (1-x)$. The CDF is $x^2(3-2x)$.

I know of one method that seems a bit messy: $X_i\overset{iid}{\sim}\text{Uniform}(0,1)$ $\implies Y_i\equiv-\log(X_i)\overset{iid}{\sim}\text{Exp}(1)$ $\implies \frac{Y_1+Y_2}{Y_1+Y_2+Y_3+Y_4}\sim\text{Beta}(2,2)$. This is a bit inefficient, requiring four uniform samples per beta sample, and it uses logarithms.

The CDF doesn't have a convenient inverse, so an inverse CDF transform won't work either.

EDIT: The median of three $\text{Uniform}(0,1)$ random variables will also follow a $\text{Beta}(2,2)$.

Can anyone think of an elegant method to generate these?

  • 1
    Any particular reason to avoid rejection methods? The PDF can be bounded inside a rectangle of area $3/2$, so the number of samples required for the naive rejection method is geometric(2/3), and therefore has mean 3/2. That's not bad. – Ian Jun 24 '16 at 03:41
  • Well, rejection methods certainly work, but you're forced to throw out draws. I'd rather avoid that waste if possible. – Benjamin Roycraft Jun 24 '16 at 03:46
  • You are probably prematurely optimizing by being concerned with such things. I just wrote up a script that generates a million such beta variables in about 0.09 seconds in MATLAB. Very, very rarely it will take much longer than that (because the geometric variable for the number of rejections will wind up being large), but it'll pretty consistently take about that long. – Ian Jun 24 '16 at 04:18
  • The script in question, by the way: function y=beta_nocache(n) y=zeros(n,1); k=1; while k<=n u1=rand; u2=rand; if (3/2)u2<=6u1*(1-u1) y(k)=u1; k=k+1; end end end – Ian Jun 24 '16 at 04:22
  • My first version cached the uniform variables and surprisingly turned out to be more than 10 times slower. (Also surprisingly: a version that uses a two-vector u instead of separate u1 and u2 is also more than 10 times slower.) – Ian Jun 24 '16 at 04:23

3 Answers3

1

The CDF does have a manageable inverse. Consider:

$$ x^2(3-2x)=u, \;x\in[0,1] $$

That is, you want to know where the CDF crosses the level $u$. We simply need a mapping that tells us what $x$ corresponds to a given $u\in[0,1]$, with the condition that the $x$ we found is in $[0,1]$. This is the same as finding the roots of $x^2(3-2x)-u$ for some fixed $u\in[0,1]$. That's a cubic in $x$ and has at most three roots. I found them using WolframAlpha:

http://www.wolframalpha.com/input/?i=Solve%5Bx%5E2(3-2x)-u,x%5D

It turns out that only one of those gives the result we are after with $x\in[0,1]$: $$ x = -\frac{1}{4}(1+i \sqrt{3}) \sqrt[3]{2 \sqrt{u^2-u}-2 u+1}-\frac{1-i \sqrt{3}}{4 \sqrt[3]{2 \sqrt{u^2-u}-2 u+1}}+\frac{1}{2} $$

I know that looks like it will be a complex number, but it works out to a real number on the interval of interest:

http://www.wolframalpha.com/input/?i=Plot%5B-1%2F4+(1%2Bi+sqrt(3))+(2+sqrt(u%5E2-u)-2+u%2B1)%5E(1%2F3)-(1-i+sqrt(3))%2F(4+(2+sqrt(u%5E2-u)-2+u%2B1)%5E(1%2F3))%2B1%2F2,%7Bx,0,1%7D%5D

I'm guessing there is some simplification to write it without any imaginary units, but the above gets the job done.

So, generate a $u\sim U(0,1)$, plug it into the inverse CDF given above, and that will be $\sim \text{Beta}(2,2)$.

rajb245
  • 4,755
  • 15
  • 34
  • Note that the second link doesn't really work. – Ian Jun 24 '16 at 03:43
  • This doesn't really count as elegant. – Benjamin Roycraft Jun 24 '16 at 03:43
  • @BenjaminRoycraft What would? You've said rejection and iterative methods are off the table, that really only leaves "direct" methods, which would mean something like this or a similar transformation from a different distribution. – Ian Jun 24 '16 at 03:44
  • @Ian A transformation method would count, but the one you gave is pretty messy. The advantage is that it only needs one uniform draw. I gave one in the OP which is much cleaner, but requires four draws. – Benjamin Roycraft Jun 24 '16 at 03:50
  • @Ian I click the link and it works for me. If you can be more specific about what you see maybe I can address the problem. To the OP, notions such as elegance and cleanliness in mathematics are hard to pin down, but I don't disagree with your assessment. I only claimed it was "manageable", which to me means that it can be implemented in a few lines of code, and looks about as messy as you'd expect for solving a cubic. I'd also add it's convenient, uses the uniform distribution, and only elementary operations. – rajb245 Jun 24 '16 at 04:00
  • @rajb245 It loads but the grouping is wrong in my browser. – Ian Jun 24 '16 at 04:01
1

A more elegant computation can be performed to obtain the required inverse of the $\operatorname{Beta}(2,2)$ CDF: if $U \sim \operatorname{Uniform}(0,1)$, then $$X = \frac{1}{2} + \sin \left( \frac{1}{3} \tan^{-1} \frac{U-1/2}{\sqrt{U(1-U)}} \right) \sim \operatorname{Beta}(2,2).$$ Is this computationally efficient? No. Using Mathematica, the command

Total[Take[#, 2]]/Total[#] & /@ Partition[-Log /@ RandomReal[{0, 1}, 4*10^6], 4]

generates $10^6$ realizations of $X$ using $4 \times 10^6$ realizations of $U$ according to the method described in your post. On my machine, this takes about $0.7$ seconds. The command

1/2 + Sin[ArcTan[(-1/2 + #1)/Sqrt[(1 - #1) #1]]/3] & /@ RandomReal[{0, 1}, 10^6]

by contrast, takes about $6.2$ seconds, nearly an order of magnitude slower. This is because it is very much faster to generate extra uniform random variates than to perform trigonometric functions on a single variate. But perhaps a different implementation would reduce this gap in computing time.

heropup
  • 135,869
  • That is a much nicer form of the inverse CDF. Can you derive that directly from the solution of the cubic equation? – rajb245 Jun 24 '16 at 14:25
  • One problem is that higher entropy methods do not parallelize as well as lower entropy methods. So in high speed applications, a lower entropy application which is slow because of "local" computations of standard functions can be made fast by throwing more processors at the problem. Higher entropy methods are eventually bounded by the rate at which the system can acquire entropy. – Ian Jun 24 '16 at 14:34
  • @rajb245 The formula given is derived from the trigonometric solution of the cubic. – heropup Jun 24 '16 at 14:35
1

Here's elegant solution to the problem, drawn from the idea that the median of three $\text{Uniform}(0,1)$ random variables follows a $\text{Beta}(2,2)$ distribution. We can use this without generating three uniform random variables directly.

First off, a formula exists for the joint density of two order statistics within a random sample, as well as the marginal density of each individual statistic. I can't think of a good reference right now, so I'll just show the specific formulas.

For a sample of size $3$ from a $\text{Uniform}(0,1)$, the joint density of the median, $X_{(2)}$, and the maximum, $X_{(3)}$, is $F_{X_{(2)},X_{(3)}}(x,y)=6x$ for $y\in[0,1]$ and $x\in[0,y]$. Likewise, the marginal density of the maximum is $F_{X_{(3)}}(y)=3y^2$ for $y\in[0,1]$.

From this, we can calculate the conditional density of the median, given the maximum. We get $F_{X_{(2)}|X_{(3)}}(x;y)=\frac{2x}{y^2}$ for $x\in[0,y]$. The conditional CDF is $\frac{x^2}{y^2}$ for $x\in[0,y]$. The inverse CDF is $y\sqrt{x}$ for $x\in[0,1]$.

Now, we can generate a random variable with the density $3y^2$ by $Y_1\equiv \sqrt[3]{U_1}$, where $U_1\sim\text{Uniform}(0,1)$. Then, we can generate a second random variable following the conditional density $\frac{2x}{y^2}$ by $Y_2\equiv y\sqrt{U_2}$, where $U_2\sim\text{Uniform}(0,1)$.

We see then that $\sqrt[3]{U_1}\cdot\sqrt{U_2}\sim\text{Beta}(2,2)$.

  • It looks nice, but unless entropy is scarce it will be considerably slower than the median implementation. – Ian Jun 24 '16 at 17:51