Intuitive approximation and Order Statistics

Question

Given $x_1, x_2, \dots, x_n$ identically distributed random variables with probability density and cumulative distribution functions $f$ and $F$ respectively, the probability density function for the maximum is $$ f_{max} \big(x_{max} (x) \big) = \\ P ( X_j \in [x, x+ \epsilon] \land X_k \leq x \,\,\forall k \neq j ) \\ = n f(x) F(x)^{n-1}$$ The mean of the maxima will hence be given by $$ \int_{-\infty}^{\infty} nx f(x) F(x)^{n-1} \mathrm{d}x $$

Now, to the point: I have intuitively thought that a decent approximation of the mean of the maxima, $ \bar{x}_{max}$ (of its value, not its distribution that is) could be obtained by considering the solution to the equation $$f(\bar{x}_{max}) = \frac{1}{n}$$ An alternative, possibly more meanigful as commented by Jean Marie, is to consider the cumulative distribution instead, and impose the condition $$ F(\bar{x}_{max}) = 1- \frac{1}{n} $$ whose solution should provide an estimate for the mean of the maxima. The rest of the post will focus on this possibility.

To clarify the line of thought yielding my "guess", having $n$ samples to extract, one might be tempted to assume that even events, whose probability of the order $\frac{1}{n}$, are plausible. The farther from the mean, the less likely one event is (under reasonable conditions): hence I would expect the maximum to be related to the lowest probability value to be reasonaly expected.

So, the question is: how good is the approximation of saying that the mean of the random variable (maximum over each $n$ extractions), can be estimated by a value $\bar{x}_{max}$, such that $(1-\frac{1}{n})$ % of the samplings are less than it?

I did perform some numerical checks for the exponential distribution, supporting the idea the approximation could be reasonable.

I would like to investigate such approximation analytically: confirm is converges asymptotically to the mean for $n \to \infty$, and ideally get a meausure of the error involved for finite $n$.

In order to do so, I would nevertheless need to be able to compute the mean of the maxima $$ \int_{-\infty}^{\infty} x f(x) F(x)^{n-1} \mathrm{d}x $$ which I am unable to do even for the simple case of the exponential distribution, in which case the expression above specializes to $$ \int_{0}^{\infty} x \lambda e^{- \lambda x} (1 – e^{- \lambda x })^{n-1} \mathrm{d}x $$

As I am interested mainly in the case where $n$ is large, I thought I could make an attempt using Laplace’s method.

Using $$xf(x) =[e^ {ln \big(xf(x)\big )}] ^{\frac{n-1}{n-1}}$$ I tried to re-write the above in a form suitable for Laplace's method $$ \int_{-\infty}^{\infty} e^{(n-1) [\frac{1}{n-1}\ln (x f(x)) + \ln(F)]} $$ But I have not achieved much, as I would need to find stationary points of the function $$ \frac{1}{n-1}\ln (x f(x)) + \ln\big(F(x) \big) $$

Any comment on the intutive approach to estimate the mean of the maxima is welcome. Any hint or suggestions on ways to characterise the error involved would be very much appreciated as well.

Thanks in advance.

$\mathbf{EDIT \,\, Following \,Claude \,Leibovici's \, asnswer}$

In an answer below Claude Leibovici has calculated a closed form for the mean of the maxima, for an exponential distribution. His result states $$x_{max} = \frac{H_n}{\lambda} $$ This confirms the conjecture presented in the post, at least for the exponential distribution. Indeed, the conjecture states that an (asymptotic) approximation of the mean of the maxima, $\bar{x}_{max}$, can be estimated by the equation $$ F(\bar{x}_{max}) = 1 - \frac{1}{n}$$ which reads, specifically for the exponential distribution $$ 1 - e^{-\lambda \bar{x}_{max} } = 1 - \frac{1}{n}$$ Using the closed form value courtesy of Claude Leibovici one verifies it obeys the desired asymptotics $$ 1 - e^{-\lambda \frac{H_n}{\lambda}} \sim 1 - e^{-(\gamma + \ln n)} = \mathcal{O} ( 1 - \frac{1}{n})$$ which is encouraging.

The question whether this holds in general for an arbitary cdf $f$ is still open.

Your postulate : $f(\bar{x}_{max}) = \frac{1}{n}$ looks to me counter intuitive (maybe I am wrong) because it deals with a value $\frac{1}{n}$ taken by the pdf $f$. But values taken by a pdf are seldom used because they are not very significant. I wouldn't say the same for $F$, the cdf. — Jean Marie, Sep 08 '17 at 11:47
@Jean Marie, I certainly see your point, following which it might be better to use the cdf, in which case it all translates to $F(\bar{x} _{max})=1- \frac{1}{n}$ I will edit to reflect such possibility. — An aedonist, Sep 08 '17 at 12:08
@Lee David Chung Lin, thanks for your comments. Indeed by $x_{max}$ I denote the mean of the maxima, as mentioned in the post. But $\mu x_{max} \neq 1 - \frac{1}{n}$, cannot be: the idea is to approximate the mean of the maxima by taking a value $\bar{x}_{max}$, such that the cdf calculated with it as argument, returns a probability $1 - \frac{1}{n}$. Sorry if I did not get your point. — An aedonist, Sep 08 '17 at 12:44
To clarify: I have a distribution, and extract $n$ samples out of it. the maximum is a random variable which can be characterised, as I describe at the very beginning. Now the question is: how good is the approxiation, of saying that the mean of said random variable (maximum over each $n$ extractions), can be estimated by a value $\bar{x}_{max}$, such that $1-\frac{1}{n}$ % of the samplings are less than it? — An aedonist, Sep 08 '17 at 12:48
Okay I made a wrong interpretation about your notations. I'm gonna remove my previous comments. Your clarification here is clear, and I suggest you add those words (that follow "Now the question is") to your post to explain the symbols, since you're using them in a very specific way (usually people use capitals like $X$ and $\bar{X}$ for random variables and lowercases for values, here you're using lowercases for both ... and certainly one would not just add a bar to a random variable to refer to a value within its range). — Lee David Chung Lin, Sep 08 '17 at 13:02
Done, I will certainly keep your comments present for future posts, thanks — An aedonist, Sep 08 '17 at 13:07
$\int_{0}^{\infty} x \lambda e^{- \lambda x} (1 – e^{- \lambda x })^{n-1} \mathrm{d}x$ is very interesting. I was able to compute it exactly for any $n$. Is it of any interest ? — Claude Leibovici, Sep 10 '17 at 09:14
@Claude Leibovici,, it would certainly be of great interest, it would help me to check the conjecture for the exponential distribution at least, would love to see your computations, thanks — An aedonist, Sep 11 '17 at 06:52

score 2 · Accepted Answer · answered Sep 11 '17 at 07:52

2

Considering $$I_n=\int_{0}^{\infty} x \lambda e^{- \lambda x} (1 – e^{- \lambda x })^{n-1} \,dx=\frac 1 \lambda\int_{0}^{\infty} y\,e^{-y}(1-e^{-y})^{n-1}\,dy$$ The first thing to notice is that the antiderivative has a "closed" form expression $$\int y\,e^{-y}(1-e^{-y})^{n-1}\,dy=\frac{\left(1-e^{-y}\right)^n }{n^2 \left(1-e^y\right)^{n}} \, _2F_1\left(-n,-n;1-n;e^y\right)+\frac{y \left(1-e^{-y}\right)^n}{n}$$ from which the definite integrals can be "easily" computed.

The end result is surprizingly simple $$\color{red}{I_n=\frac{H_n}{\lambda n}}$$ provided $\Re(n)>-1$.

For large values of $n$, $$I_n=\frac 1\lambda \left(\frac{\gamma +\log \left({n}\right)}{n}+\frac{1}{2 n^2}-\frac{1}{12 n^3}+O\left(\frac{1}{n^4}\right) \right)$$

answered Sep 11 '17 at 07:52

Claude Leibovici

260,315

+1, Really interesting, I got to the antiderivative and did not proceed further to that suprising result of yours. Let me check if this supports the conjecture in the post, thanks a lot – An aedonist Sep 11 '17 at 07:57
@Anaedonist. PLease, let me know. – Claude Leibovici Sep 11 '17 at 07:58
could you please give us a hint on how to compute that definite integral with the hypergeometric function? I made some attempts but have not got it fully right yet, thanks a lot – An aedonist Sep 15 '17 at 16:48
@Anaedonist. What exactly do you want to compute ? – Claude Leibovici Sep 16 '17 at 05:09
thanks for getting back to me, but I sorted it out, thanks again – An aedonist Sep 22 '17 at 17:32

Intuitive approximation and Order Statistics

1 Answers1