6

Let $X_i \sim \operatorname{Beta}(\alpha_i, \beta_i)$ be independent beta-distributed random variables for $i = 1, \ldots, k$. What can we say about $$X =\max(X_1, \ldots, X_k)?$$ In particular, can we estimate $\alpha$ and $\beta$ so that $X$ is approximately distributed like $\operatorname{Beta}(\alpha, \beta)$? We may assume that $\sum_i \alpha_i + \sum_i \beta_i$ is large if it helps.

We can reduce the question to the case $k =2$, since $$\max(X_1, \ldots, X_k) = \max(\max(\max(X_1, X_2),X_3, \ldots))),$$ although some accuracy might be lost in making successive approximations. Note also that we have $$P(\max(X_1, X_2) \leq z) = P(X_1 \leq z, X_2 \leq z) = P(X_1 \leq z) P(X_2 \leq z),$$ giving the cumulative distribution function for $X$.

Using Sage I was able to take the case $\alpha_1 = 10,\beta_1 = 15,\alpha_2 = 13,\beta_2 = 12$ and approximate the density function of $X$ pretty well with $\alpha = 16.796, \beta = 14.830$. See image.

enter image description here

Context:

This would be useful for the bandit problem or Monte-Carlo tree search. Suppose you are playing $k$ games $Y_i$, and $Y_i$ is either a win, with probability $p_i$, or a loss. Then the game $Y$ which consists of a choice of one of the games $Y_i$ can be modeled by a Bernoulli random variable with parameter $p = \max(p_1, \ldots, p_k)$, since the best strategy is to always choose the game $Y_i$ that has the highest win rate. If we only have limited information about each $Y_i$ (some samples of each, for example), we can put a prior $p_i \sim \operatorname{Beta}(\alpha_i, \beta_i)$ on each parameter $p_i$ and try to infer information about $p$ from this.

Jair Taylor
  • 16,852
  • How did you obtain the beta distribution parameters for approximating the distribution of the maximum? Using Mathematica and using the method of moments I get $\alpha=16.8681$ and $\beta=14.671$ (although there's not much difference between the two approximations). – JimB Oct 22 '18 at 03:01
  • @JimB: I used Sage's find_fit method after computing the density for 100 points. The parameters I get vary a bit depending on where I put the points. – Jair Taylor Oct 22 '18 at 03:20
  • If that is a "least squares" approach I don't know why one would want to use such a technique to estimate the parameters of a probability distribution as you don't have a regression situation. – JimB Oct 22 '18 at 04:35
  • Are the $X_i$'s independent? – StubbornAtom Oct 22 '18 at 04:39
  • @StubbornAtom Yes, they are independent. Will edit. – Jair Taylor Oct 22 '18 at 05:15
  • @JimB Hmm, I don't really see why not. We can compute the density function of $X$ exactly, and we're trying to approximate it with a curve from a given family. There are lots of ways to do this - we don't have to infer from sample statistics (although that would also work.) – Jair Taylor Oct 22 '18 at 05:31
  • There are lots of reasons not to want to use least squares for fitting a probability distribution: (1) Maybe it's the tail region that is more important for one's objective, (2) the fit depends on the arbitrary set of grid points, (3) There are more standard methods (method of moments, for example). And I'm sure there are other reasons. However, I can see using least squares to get starting values for a more standard iterative procedure. – JimB Oct 22 '18 at 05:58

1 Answers1

3

One can certainly find a beta distribution with the same mean and variance as $X$ but whether that is a good enough approximation depends on what you need.

If you only want the probability density function of $X$, then that is

$$\sum _{i=1}^n \left(\frac{x^{a_i-1} (1-x)^{b_i-1} \prod _{j \neq i} \frac{B_x(a_j,b_j)}{B(a_j,b_j)}}{B(a_i,b_i)}\right)$$

where $B(a_i,b_i)$ is the beta function and $B_x (a_i,b_i)$ is the incomplete beta function.

I'm not sure there's a nice compact form for the mean and variance but for specific parameters one can calculate the mean and variance which can be matched to a beta distribution. Here's some Mathematica code to do so:

n = 3;
parms = {a[1] -> 1, b[1] -> 6, a[2] -> 4, b[2] -> 7, a[3] -> 4, b[3] -> 5}; 
pdf[x_] := 
  Sum[(x^(a[i] - 1) (1 - x)^(b[i] - 1)/Beta[a[i], b[i]]) Product[Beta[x, a[j], b[j]]/Beta[a[j], b[j]],
   {j, Delete[Range[n], i]}] /. parms, {i, n}];

mean = Integrate[x pdf[x], {x, 0, 1}]; variance = Integrate[x^2 pdf[x], {x, 0, 1}] - mean^2; sol = N[Solve[{mean == a/(a + b), variance == a b/((a + b)^2 (a + b + 1))}, {a, b}][[1]]] (* {a -> 6.80319, b -> 6.85957} *)

Plot[{pdf[x], PDF[BetaDistribution[a, b] /. sol, x]}, {x, 0, 1}, PlotLegends -> {"Actual", "Beta approximation"}]

True density and beta distribution approximation

Addition:

To echo @AhmedFasih 's comment below, even the mean for $k=2$ is not simple:

$$\frac{\Gamma \left(\alpha _1+\alpha _2+1\right) \Gamma \left(\alpha _1+\beta _1\right) \Gamma \left(\alpha _2+\beta _2\right)* \, _3F_2\left(\alpha _1,\alpha _1+\alpha _2+1,1-\beta _1;\alpha _1+1,\alpha _1+\alpha _2+\beta _2+1;1\right)}{\Gamma \left(\alpha _1+1\right) \Gamma \left(\alpha _2\right) \Gamma \left(\beta _1\right) \Gamma \left(\alpha _1+\alpha _2+\beta _2+1\right)}+$$

$$\frac{\Gamma \left(\alpha _1+\alpha _2+1\right) \Gamma \left(\alpha _1+\beta _1\right) \Gamma \left(\alpha _2+\beta _2\right)*\, _3F_2\left(\alpha _2,\alpha _1+\alpha _2+1,1-\beta _2;\alpha _2+1,\alpha _1+\alpha _2+\beta _1+1;1\right)}{\Gamma \left(\alpha _1\right) \Gamma \left(\alpha _2+1\right) \Gamma \left(\beta _2\right) \Gamma \left(\alpha _1+\alpha _2+\beta _1+1\right)}$$

JimB
  • 1,861
  • Hmm, but this very computationally intensive. I was hoping for something fast, like a simple formula. For the purposes of Monte-Carlo tree search it would need to be computed a very large number of times. – Jair Taylor Oct 22 '18 at 06:05
  • Are you sure that is the PDF of the maximum? My simulations don't agree with it – Itay Dec 12 '22 at 08:55
  • @Itay Posting your simulation code and results would be a good answer. – JimB Dec 15 '22 at 03:19
  • 2
    @Itay one source of confusion that I suspect is your math library's incomplete Beta function being regularized vs not. Scipy's betainc is regularized so its return values are divided by the Beta already—I didn't have to divide by the Beta—whereas JimB's expression uses the non-regularized version. Here's my Python code showing the excellent match between Jim's expression and Monte Carlo https://gist.github.com/fasiha/fe0a79d294f222d060fa6aeff1424db2 – Ahmed Fasih Feb 05 '23 at 17:28
  • @AhmedFasih Thanks for that! (Challenging anyone's results is not a bad thing.) – JimB Feb 05 '23 at 18:05
  • I deleted my comment above https://math.stackexchange.com/a/3136814 is likely useful for the case when you have two or three Betas, but for the arbitrary case for $k$ random variables, it seems quite difficult to compute the moments of the density you provide – Ahmed Fasih Feb 06 '23 at 00:38
  • For others who wonder how JimB got the density, it follows from (example) the distribution $F_X(x) = \mathbb{P}[X \leq x] = \mathbb{P}[X_1\leq x,...,X_n\leq x] = \prod_i \mathbb{P}[X_i\leq x] = \prod_i F_{X_i}(x)$ and then applying the product rule and a tiny bit of simplification to get the density $f_X(x) = \sum_i \left( f_{X_i}(x) \prod_{j\neq i} F_{X_j}(x) \right)$ – Ahmed Fasih Feb 06 '23 at 04:26