Formula for conditional expectation in a binomial sample

Question

Let's take $N$ i.i.d. stochastic variables $X_i$, where $X_i \sim Bin(n,p)$.

Taking inspiration from here, we should have the following facts:

$Var(X_i)=np(1-p)$.
The sample statistics $M(X_1,...,X_n)=\frac{\sum_i X_i}{N}$ is a sufficient and complete statistics.
The MLE estimator for $np(1-p)$ is $T_{MLE}=nM(1-M)$.

Combining these points we have that the $T_{MLE }$ is also UMVUE by the Lehmann-Scheffe' lemma.

Now we have also the following fact:

The (corrected) sample variance $S^2=\frac{1}{N-1}\sum_i{(X_i-M)^2}$ is an unbiased estimate of $Var(X_i)$.

From Lehmann-Scheffe' we should have then by consistency:

$$E[S^2\mid M]=nM(1-M)$$

My questions:

Is my reasoning correct or am I applying some theorem in a wrong way ?
If the reasoning is correct, what would be a direct derivation of the final result ? Is the formula trivial for some reason I do not see now ?

Are you sure the MLE is umbiased for the variance? And I think your sample size is $N$ and not $n$ as in the very first line. — StubbornAtom, Mar 25 '20 at 18:48
You are right there may be some prefactors but I hope this does not change the reasonings... let me check this... — Thomas, Mar 25 '20 at 18:51
You are right I should change the notation like that... At the moment I am getting that to get something proportional to $npq$ I should have $n^2M-nM^2$ and in that case $E[n^2M-nM^2]=Var(X_i)(n^2-n/N)$ . At least for $n=1$ this formula is compatible with the link. But I should redo better than also the MLE estimate, even if the reasoning is there is any case, regardless of the estimator being suggested by maximum likelyhood or not... I think I will try tomorrow to fix things and make a proper question. Thanks a lot for the correction and for checking the reasonings in the meanwhile. — Thomas, Mar 25 '20 at 19:36

StubbornAtom · Accepted Answer · 2020-03-25T20:50:31.283

1

Your reasoning is correct except MLE is not the UMVUE of the population variance.

A complete sufficient statistic for $p$ is $T=\sum\limits_{i=1}^N X_i$, which has a $\mathsf{Bin}(nN,p)$ distribution.

Now $E_p[T]=nNp$ and $\operatorname{Var}_p[T]=nNp(1-p)$ for all $p\in(0,1)$.

Again, $$E_p[T^2]=\operatorname{Var}_p[T]+(E_p[T])^2=nNp(1-p)+n^2N^2p^2$$

Or, $$E_p[T^2-T]=nNp^2(nN-1)$$

That is, $$E_p\left[\frac{T(T-1)}{N(nN-1)}\right]=np^2$$

So you have an unbiased estimator of population variance based on $T$ (and hence UMVUE):

$$E_p\left[\frac TN-\frac{T(T-1)}{N(nN-1)}\right]=np-np^2=np(1-p)\quad,\forall\,p\in(0,1)$$

With $\overline X=\frac TN$, the sample variance $S^2=\frac1{N-1}\sum\limits_{i=1}^N (X_i-\overline X)^2$ is unbiased for population variance. So by Lehmann-Scheffe, $E\left[S^2\mid T\right]$ is also UMVUE of $np(1-p)$.

As UMVUE is unique whenever it exists, you can say

$$E\left[S^2\mid T\right]=\frac TN-\frac{T(T-1)}{N(nN-1)}\tag{*}$$

This can be rewritten in terms of $\overline X$ of course.

A direct way to obtain $(*)$ would be to proceed using linearity of expectation.

I think it should be something like

\begin{align} E\left[S^2\mid T=t\right]&=E\left[\frac{1}{N-1}\sum_{i=1}^N\left(X_i-\frac tN\right)^2\mid T=t\right] \\&=E\left[\frac{1}{N-1}\left(\sum_{i=1}^N X_i^2-\frac{t^2}{N}\right)\mid T=t\right] \\&=\frac{1}{N-1}\sum_{i=1}^N E\left[X_1^2\mid T=t\right]-\frac{t^2}{N(N-1)} \end{align}

Now we only have to recall that $X_1$ conditioned on $T$ has a hypergeometric distribution.

edited Mar 25 '20 at 20:50

answered Mar 25 '20 at 20:36

StubbornAtom

17,052

Thanks I will check better the computations and the hypergeometric distribution after conditioning .... but +1 for the moment! :) anyway isn't it interesting? The second proof has only algebraic manipulations, whereas the first one apparently derives from a different type of algebra and general principles... – Thomas Mar 25 '20 at 21:27
I am not an expert in statistics so maybe I should not be so surprised I do not know... – Thomas Mar 25 '20 at 21:31
1

No you are right. In fact I had asked a similar question here sometime back for Poisson distribution. – StubbornAtom Mar 25 '20 at 21:32
Calculations get easily complicated in this question... For future readers, I just add why the hypergeometric distribution plays a role. First of all $p(x_1,t)=p(x_1)p(\sum_{i>2}x_i=t-x_1)$. The first term is simple. The second can be derived thinking that the binomial $Bin(n,p)$ distribution is the number of successes in $n$ Bernoulli trials. Therefore the probability $p(\sum_{i>2}x_i=t-x_1)$ is the probability of having $t-x_1$ successes in $(N-1)n$ trials. Analogously one can compute $p(t)$ and the conditional $p(x_1|t)=p(x_1,t)/p(t)$, arriving at the already cited distribution. – Thomas Mar 26 '20 at 22:06
Yes, it is this familiar result: https://math.stackexchange.com/q/575459/321264. – StubbornAtom Mar 27 '20 at 06:29

Formula for conditional expectation in a binomial sample

1 Answers1