0

In Pattern Recognition and Machine Learning Ch 1.6, the author derives the distribution which maximises the differential entropy;

$$H(\textbf{x})-\int p(\textbf{x}) \ln (p(\textbf{x})) d\textbf{x}$$

To do so the author comes up with three constraints;

$$\int_{-\infty}^{\infty} p(x) dx = 1$$ $$\int_{-\infty}^{\infty} xp(x) dx = \mu$$ $$\int_{-\infty}^{\infty} (x-\mu)^2p(x) dx = \sigma^2$$

This results in the Lagrangian functional;

$$F(p)=-\int_{-\infty}^{\infty} p(x) \ln(p(x)) dx + \lambda_1(\int_{-\infty}^{\infty} p(x) dx - 1) + \lambda_2 (\int_{-\infty}^{\infty} x p(x) dx - \mu) + \lambda_3(\int_{-\infty}^{\infty} (x-\mu)^2 p(x) dx - \sigma^2)$$

Taking the derivative of this functional using the calculus of variations and setting it equal to zero gives;

$$p(x)=\exp(-1+\lambda_1+\lambda_2 x + \lambda_3 (x-\mu)^2)$$

The author states that you can find the Lagrange multipliers by back substitution of this result into the three constraint equations, leading to the conclusion that $p(x)$ is a normal density.

I'm wondering how to derive this last step, specifically how to find the Lagrange multipliers. If we substitute back into the constraints we get three integral equations with three unknowns. How would I go about solving these equations?

Sebastiano
  • 7,649

1 Answers1

0

Assume that $\mu=0$ and $\sigma=1$, and let $z:=\sqrt{\pi}e^{-1+\lambda_1}e^{-\lambda_2^2/(4\lambda_3)}$. Then, assuming that $\lambda_3<0$, the equations are $$ I_1:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} e^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z}{(-\lambda_3)^{1/2}}=1, $$ $$ I_2:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} xe^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z\lambda_2}{2(-\lambda_3)^{3/2}}=0, \quad\text{and} $$ $$ I_3:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} x^2e^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z\lambda_2^2}{4(-\lambda_3)^{5/2}}+\frac{z}{2(-\lambda_3)^{3/2}}=1. $$ Plugging $z=(-\lambda_3)^{1/2}$, we get $$ \frac{\lambda_2}{-\lambda_3}=0\quad\text{and}\quad \frac{\lambda_2^2}{4\lambda_3^2}+\frac{1}{-2\lambda_3}=1, $$ so that $\lambda_2=0$ and $\lambda_3=-1/2$. Finally, using $z=(-\lambda_3)^{1/2}$, we get $\lambda_1=1-\ln \sqrt{2\pi}$.

Therefore, $$ p(x)=e^{-\ln \sqrt{2\pi}-x^2/2}=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}. $$


For the general case, consider $y=(x-\mu)/\sigma$ and notice that $$ -\int p(x)\ln(p(x))\,dx=-\frac{1}{\sigma}\int p(y)\ln(p(y))\, dy. $$


Evaluation of $I_1$, $I_2$, and $I_3$:

First, recall that for $c>0$, $$ \int_{-\infty}^\infty e^{-cx^2}\,dx=\sqrt{\frac{\pi}{c}}, $$ and notice that $$ bx-cx^2=-c\left(\frac{b}{2c}-x\right)^2+\frac{b^2}{4c}. $$ Thus, letting $\lambda_1=a$, $\lambda_2=b$, and $\lambda_3=-c$, $$ I_1=e^{-1+a}e^{b^2/(4c)}\int_{-\infty}^\infty e^{-c(b/(2c)-x)^2}\,dx=e^{-1+a}e^{b^2/(4c)}\times \sqrt{\frac{\pi}{c}}, $$ As for the second integral, notice that $$ \int_{-\infty}^\infty \left(x-\frac{b}{2c}\right)e^{-c(b/(2c)-x)^2}=0, $$ and so $I_2=I_1b/(2c)$. Finally, $$ \frac{d}{dc}\int e^{-c(b/(2c)-x)^2}\,dx =\int \left(\frac{b^2}{4c^2}-x^2\right)e^{-c(b/(2c)-x)^2}\,dx. $$ Therefore, $$ I_3=I_1\frac{b^2}{4c^2}-e^{-1+a}e^{b^2/(4c)}\times\frac{d}{dc}\sqrt{\frac{\pi}{c}}. $$

  • @tail_recursion I added the limits of integration for clarity. –  Oct 17 '20 at 10:42
  • Would be useful if you could add some more detail on how to do the integrals. I'm getting limits involving the imaginary error function erfi, where the argument is going to $\pm \infty$ so the limits don't exist. – tail_recursion Oct 17 '20 at 11:11
  • @tail_recursion https://math.stackexchange.com/questions/628681/how-to-compute-moments-of-log-normal-distribution –  Oct 17 '20 at 11:13
  • Could you expand on that a little bit? – tail_recursion Oct 17 '20 at 13:25
  • I'm still not sure what you did there. I'm not clear how you got from the second step to the last step. I was however able to do the first integral using a formula given here; https://en.wikipedia.org/wiki/Gaussian_integral – tail_recursion Oct 18 '20 at 05:37
  • I was able to figure out the other integrals using a formula at the bottom of this page; https://mathworld.wolfram.com/GaussianIntegral.html – tail_recursion Oct 18 '20 at 07:40
  • @tail_recursion OK. I added some calculations. –  Oct 18 '20 at 10:20