Information inequality

Question

Information inequality If $\theta_0$ is identified $[\theta \neq \theta_0,\implies f(z, \theta) \neq f(z, \theta_0)]$ and $E [\ln f(z, \theta) ] < \infty$ for all $\theta$ then $L(\theta) = E[\ln f(z,\theta)]$ has a unique maximum at $\theta_0$.

Proof By the strict version of Jensen's inequality, for any nonconstant, positive random variable

$$ L(\theta_0) - L(\theta) = E[ { - \ln [f(z,\theta)/f(z,\theta_0)] } ] > - \ln E [ { f(z, \theta)/f(z,\theta_0) } ]= 0. $$

Why this implies then we have unique maximum at $\theta_0$?

score 0 · Answer 1 · answered May 16 '17 at 23:24

0

Consider the non-strict version (i.e., remove the condition of non-constant):

If there existed some $\theta_1$ that were also a maximum, then $L(\theta_1)=L(\theta_0)$, so $$ 0 = L(\theta_0)-L(\theta_1)= E[-\ln[f(z,\theta_1)/f(z,\theta_0)]] \geq -\ln E[f(z,\theta_1)/f(z,\theta_0)] = 0$$ Equality occurs if and only if $-\ln[f(z,\theta_1)/f(z,\theta_0)]=0$ almost everywhere, so $f(z,\theta_1)=f(z,\theta_0)$ almost everywhere, but we know that $\theta \neq \theta_0 \implies f(z,\theta)\neq f(z,\theta_0)$, so it follows that $\theta_1=\theta_0$, so the maximum is unique.

answered May 16 '17 at 23:24

adfriedman

3,641

Why we know that this is maximum? – alto de aitana May 17 '17 at 08:00
We are told that $\theta_0$ is a maximum, so we took an arbitrary other maximum $\theta_1$ and showed that it had to be equal to $\theta_0$. In other words $\theta_0$ is the unique maximum.
To fill in some extra detail: The non-strict form of Jensen's says that, because $-ln(x)$ is not linear, equality occurs if and only if $f(z,\theta_1)/f(z,\theta_0)$ is a constant a.e.. As they are maximums, $L(\theta_0)=L(\theta_1)$, so that constant must be zero. Then $−ln[f(z,\theta_1)/f(z,\theta_0)]=0$ implies $\frac{f(z,\theta_1)}{f(z,\theta_0)} = 1$, hence we have equality almost everywhere
– adfriedman May 17 '17 at 16:22
I have one more question about proof of consistency of maximum likelihod estimator. I found this lecture link and we get that $\theta_0$ is maximum of function $L$ and in next step we use uniform strong law of large number (page 22 in link) – alto de aitana May 17 '17 at 19:55
link in wikipedia i have $$sup_\theta ||\frac{1}{n}\sum_{i=1}^nf(X_i,\theta)-E(f(X_i,\theta))|| \rightarrow 0$$ why we can write as lecture $$ sup \frac{1}{n}\sum_{i=1}^nf(X_i,\theta) \rightarrow sup E(f(X_i,\theta))$$ why we can go inside with supremum – alto de aitana May 17 '17 at 19:59
This should be posted as another question. – adfriedman Jun 22 '17 at 19:42

Information inequality

1 Answers1