0

In Euclidean $R^M$ space, I want to compute the pdf of the Euclidean distances between $d^M(\mathbf{z_i}) $= $||\mathbf{z_i -z_j}||^M = r_i^M , i \neq j$. What will be the pdf $f(r)$ ?

Let there be two vectors $\mathbf{x} = \{x_i\}_{i=1}^N$ and $\mathbf{y} = \{y_i\}_{i=1}^N$ and $\mathbf{Z} = [\mathbf{x} ,\mathbf{y}]$ represents a $M = 2 $ dimensional point. The pdf $f_Z(z)$ is a mixture of Gaussians. Can somebody please show how to compute the pdf for the Euclidean distance $f(r) = \sqrt{\mathbf{z_i} - \mathbf{z_j}} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2} $ from a Mixture of Gaussians for $M$ dimensional space.

1 Answers1

0

This is not a mixture of Gaussians, whose cumulative distribution function is a convex combination of cumulative distribution functions of Gaussian random variables having different parameters. And similarly in terms of pdfs.

The distance squared is a Chi-Squared random variable with degrees of freedom equal to the dimension. Look it up.

The distance in 2 dimensions, comes out to a Rayleigh distribution. Look it up and you ought to see how to calculate its pdf. In 3 dimensions, it's called a Maxwell Distribution, and more generally, in N dimensions it could be called an N dimensional Maxwell Distribution, even if that terminology is not that standard.

If you are skilled in integration, or perhaps with the help of a Computer Algebra system such as MAPLE, you can write out the integrand for a multiple integral, which has a messy Jacobian (as number of dimensions increases) and messy limits to calculate the cunulative distribution function in closed form (counting erf as closed form).. However, you don't need to worry about the Jacobian and all the integral limits because they all boil down to a constant positive multiplicative factor when you integrate across all those dimensions. This leaves you with a one dimensional integral (with an r^(N-1) times exponential type term (for N = number of dimensions)) from 0 to k, say, which MAPLE can solve (to within a constant multiplicative factor of being correct if you don't worry about the Jacobian and integrating over other dimensions). The limit as k goes to infinity is easy to compute, and you know that must be one, therefore, you can determine the constant multiplicative factor by which you need to adjust your solution. Depending on whether dimension is odd or even, you'll get a combination of erf and/or exp and a polynomial in k for the cumulative distribution. Differentiate to your heart's content to get the pdf.

With these hints, I leave it to you to work out the details.

  • Thank you for your reply. I found a document link http://www.cs.tut.fi/~moltchan/pubs/distances2011.pdf titled "Distance distribution in Random Network . Eq(26) is the pdf which is a Generalized Gamma distribution is the samples are drawn from a Poisson Point Process. In my case, the samples are drawn from a Mixture of Gaussians. But, you mention that it will be Maxwell distribution irrespective of whether the samples are drawn from Poisson point or Mixture of Gaussians or any distribution? It will be of immense help if you can show some initial steps of how to work out the pdf. – Ria George May 25 '15 at 00:32
  • I am now unclear what you mean. You need everything in your problem statement spelled out explicitly. I thought your x and y are each Gaussian (not mixture of Gaussian). My answer was predicated on x and y both being gaussian. – Mark L. Stone May 25 '15 at 03:37
  • If you have an intractable (or beyond your abilities) mess, and you are just interested in getting a practical answer, as opposed to a spiffy formula to show a prof. or someone else, you can employ stochastic (Monte Carlo) simulation to estimate whatever you're interested in. You would generate a random sample of values for x and y, and for each simulation replication, carry out the calculations on them just as though they're not random, that will produce the value for that replication. Do this for a large number of replications, and you have the empirical cumulative distribution function. – Mark L. Stone May 25 '15 at 03:43
  • x and y are not each Gaussian. I did mention Z is a mixture of Gaussians, any ways thank you for the other pointers even though the answer is not what I had asked. – Ria George May 25 '15 at 03:49
  • Tell me what the (joint) distribution of x and y is, that's the input "data". If x and y are independent, that's fine, just say so. Spell out explicitly the distribution of x and y. Don't just say they are mixtures of Gaussians. If you are not extremely explicit, I believe there will be misunderstandings or misassumptions by someone. – Mark L. Stone May 25 '15 at 03:49
  • I do not know about the joint distribution of x and y : Is Z = [x y] and if f(z) is a mixture of Gaussians then can we say the joint distribution is mixture of Gaussians? My knowledge in probability is very limited, so please pardon for silly questions. – Ria George May 25 '15 at 03:51
  • Tell me explicitly what the distribution of x is. Tell me explicitly what the distribution of y is. Tell me if x and y are probabilistically independent. With your limited knowledge of probability, you may not even be correct about mixture of Gaussians. If a mixture of Gaussians, need to know whether means are the same for the various Gaussians being mixed. – Mark L. Stone May 25 '15 at 03:53
  • Originally, I had a univariate time series model $u$ with Gaussian distribution. Now, $u$ is delay embedded (Takens' phase space delay embedding technique) into an embedding dimension of $M = 2 $ to yield $Z$. Based on literature, it is mentioned that the distribution of the delay embedded space $Z$ can be assumed to be Mixture of Gaussians. So, after phase space embedding $Z$ is a 2 dimensional time series with $x,y$ as its variable. This is all the information that I have & I cannot say what the distribution $x$ or $y$ has. Can you throw some light into this, please?Thank you. – Ria George May 25 '15 at 03:58
  • Most definitely the pdf of r does depend on the distribution of x and y. My original answer was predicated on x and y being Gaussian with a common standard deviation and k is the number of standard deviations to go out. The same result applies if you go out "ellipsoidally even if the standard deviations for x and y are not the same. – Mark L. Stone May 25 '15 at 04:00
  • Should I re-post this as a new Question including all the discussion which we had as inputs to the Question? – Ria George May 25 '15 at 04:02
  • Can you "answer" your own question, and put the info I asked for in there? – Mark L. Stone May 25 '15 at 04:25
  • Sorry, I missed your most recent post. Well, I don't know what your paper is saying, and apparently neither do you. Who knows what they mean? Even if you are willing to do stochastic simulation to get an empirical distribution function, you still have to know how to generate the "primitive" (first level) random variables which form the foundation for the remaining calculations What are trying to do with the answer?. Do you know the cumulative distribution function of r to have a (nice) closed form (or be expressible in terms of standard distributions)? – Mark L. Stone May 25 '15 at 04:46
  • No, I do not know the cumulative distribution function. What is the procedure to know what distribution the time series would follow in phase space? I know of curve fitting to get the distribution and I got mixture of Gaussian for the distribution of $Z$ from which the distances $r$ will be calculated. Therefore, how to find pdf of $f(r) = ||z_i - z_j||$ where the pdf of $Z$ is a mixture of Gaussians. This is how I had posted my Question. – Ria George May 25 '15 at 05:42
  • If it's really a mixture of Gaussians, you should be able to follow my original approach, but you will have more terms due to the additive terms in the pdf for mixed Gaussian. It will be some mess. But you need to have the exact explicit parameters of the mixed Gaussian to do this. And if you really just want to know the answer for your benefit, then just simulate - it's easier and less error prone. But you need the distribution for your "starting point" variables, upon which you do the calculations. Perhaps contact the authors of the paper you're using? – Mark L. Stone May 25 '15 at 05:58