7

Let $X_1, \ldots, X_N$ be $N$ hidden random iid variables, all with the same standard distribution, let's say uniform $\mathcal{U}(0, 1)$ or Gaussians $\mathcal{N}(0, 1)$ (probably easiest). I observe $N$ corresponding 'noisy' variables $Z_n = X_n + \mathcal{N}(0, \sigma^2)$. I know how to derive the distribution of $X^* = \max({X_1, \ldots, X_N})$ (and correspondingly $Z^*$ if the $X$'s are Gaussians). What I would like to know is how to compute the distribution (or at least the expectation) of $X_{\mathrm{argmax}_n(Z_n)}$.

Intuitively if $\sigma^2$ is small my observed variables will closely follow the hidden ones, and the distribution will be close to $X^*$, while if it is big, they will be dominated by the noise, and the distribution will be the original one of the $X$'s.

cdubout
  • 75
  • Here's another very recent and very related question: http://stats.stackexchange.com/questions/10369/distribution-of-unmixed-parts-based-on-order-of-the-mix – cardinal May 13 '11 at 12:37
  • For me, is not very clear the distribution you want to compute. The most sensible interpretation would lead to the question linked by cardinal above. – leonbloy May 13 '11 at 13:58
  • Yes I think that the question is the same (if the $X$'s are Gaussians) except that I am only interested in the 1st largest value, i.e. the max. – cdubout May 13 '11 at 14:20
  • I would like to say that the answers to the question linked above answer my question but I have some trouble with them. I do not understand how the last answer derived the expression for the conditional distribution of $\max(X)$ given $\max(Z)$, and I cannot make those results to agree with those of my simulation... – cdubout May 13 '11 at 15:49
  • As if I understood correctly $P(\max(X) = x) = \int_{-\infty}^{+\infty} P(\max(X) = x | \max(Z) = z) P(\max(Z) = z) dz$, with $P(\max(X) = x | \max(Z) = z) = \phi \left( \frac{x - \frac{\sigma^2}{1+\sigma^2} z}{\sqrt{\sigma^2 \left( 1 - \frac{\sigma^2}{1+\sigma^2} \right) }} \right)$, and $P(\max(Z) = z) = N \frac{1}{\sigma_z} \phi \left( \frac{z}{\sigma_z} \right) \Phi \left( \frac{z}{\sigma_z} \right)^{N-1}$ where $\sigma_z = \sqrt{1 + \sigma^2}$ right? – cdubout May 13 '11 at 16:06
  • I would be cautious regarding the answers posted at the linked question. I provided the link simply to connect the two questions since they are related. Also, you should reference the answer using the poster's name, since the ordering can change based on various factors (including randomly). – cardinal May 13 '11 at 17:01

1 Answers1

2

Let's say $Z_n = X_n + Y_n$ with $X_n \sim N(0,1)$, $Y_n \sim N(0,\sigma^2)$, and all $X_n$ and $Y_n$ independent. Let $W = X_{{\rm argmax}_n Z_n} = \sum_n X_n \prod_{j \ne n} I_{Z_n > Z_j}$. Thus $E[W] = N E[X_1 \prod_{j > 1} I_{Z_1 > Z_j}$. Now $Z_n \sim N(0,1+\sigma^2)$. Moreover $X_1$ and $I_{Z_1 > Z_j}$ are conditionally independent given $Z_1$, so $E[W] = N E[E[X_1 | Z_1] \prod_{j > 1} E[I_{Z_1 > Z_j}|Z_1]] = N E[Z_1 \Phi(Z_1/\sqrt{1+\sigma^2})^{N-1}$. This is $N \sqrt{1+\sigma^2} E[Z \Phi(Z)^{N-1}]$ where $Z \sim N(0,1)$.

Robert Israel
  • 448,999
  • Ok thank you very much, I think I more or less understood. I particularly did not know about the law of total expectation that you used line 4. Shouldn't line 5 read $\frac{1}{\sqrt{1+\sigma^2}}$ rather than $\sqrt{1+\sigma^2}$? And could I simply write the result as $E[W] = \sigma_z^{-1} E[\max_n X_n]$, with $\sigma_z = \sqrt{1+\sigma^2}$? – cdubout May 13 '11 at 23:25
  • @cdubout: Please don't add a substantial amount of new material to someone else's answer; you can write your own answer with that material, indicating it is a supplement if you want. – Arturo Magidin May 15 '11 at 22:11
  • $E[X_1|Z_1] = \frac{Z_1}{1+\sigma^2}$ and not simply $Z_1$ as written above! This is because conditioned on $Z_1$, $X_1$ and $Y_1$ are no longer independent. One can see $X_1,Y_1$ as a multivariate normal distribution with a diagonal covariance, and $X_1,Z_1$ as an affine transformation of $X_1,Y_1$, and then use the formula of conditional distributions of multivariate normal distribution to reach that result. – cdubout May 16 '11 at 08:57