showing that $S=\sum^n_{i=1} \ln(X_i)$ is a complete sufficient statistic

Question

We have a random sample $X_1,X_2,\ldots,X_n$ from a probability with density:

$$ f(x)=\theta x^{-(\theta +1)} $$ given that $x>1$ and $0$ else.

now the question is:

Show that $S=\sum^n_{i=1} \ln(X_i)$ is a complete sufficient statistic and use this result to derive the UMVUE for $\frac{1}{\theta}$.

Now I made the joint density function like so: $$ \begin{align} f_\bar{x}(x_1,\ldots,x_n;\theta) &= \theta^n\prod_{i=1}^n x_i^{-(\theta +1)}I_{(1,\infty)}(x_i)\\ &= \theta^n\prod_{i=1}^n [I_{(1,\infty)}(x_i) ] e^{\sum(-\theta-1)\log(x_i)}\\ &=\theta^n\prod_{i=1}^n I_{(1,\infty)}(x_i)e^{(-\theta-1)\sum \log(x_i)} \end{align} $$ Now im doing this blindy, but im just following my book. what am I doing? what does being a complete sufficient statistic tell me? This a kink in my brain, I cant do this sort of thing without knowing what it is. Also how do i go from here? how to get the UMVUE?

didnt know $\ln$ was a tag, are you going to shout at me michael? :( — WiseStrawberry, Jun 02 '13 at 19:59
And \log, and \sin and \cos and \min and \max and \det ...... — Michael Hardy, Jun 02 '13 at 20:00
These don't only prevent italicization but also provide proper spacing in expressions like $x\ln y$ (without the backslash that looks like $x ln y$ no matter how many blank spaces you put in the code) and also affects positioning of subscripts in things like $\displaystyle\max_{x\in\mathcal{X}}$. — Michael Hardy, Jun 02 '13 at 20:02
@WiseStrawberry hey! I saw your post and have a simple question. Why did you multiply the function with $I_(1, \inf)$ ? what's the point of it, is it necessary? Thanks in advance ! — Smarties, Oct 23 '18 at 18:19
Hey, because x is defined on $(1, \inf)$ (because its given that $x>1$), we need to define it in our function. Or else we get a zero in a product...
Youre asking this on a 5 year old question haha, but I'm more than happy to help :)

It is however not required afaik, its just good practice for yourself as well. It needs to be defined somwhere in your function,you can also write "... where $x>1$". — WiseStrawberry, Oct 24 '18 at 09:07

score 1 · Accepted Answer · answered Jun 02 '13 at 20:20

You can write $\prod_i I_{(1,\infty)}(x_i)$ as $I_{(1,\infty)} (\min\{ x_1,\ldots,x_n\}) \cdot \prod_i x_i$. That matters in contexts where instead of $(1,\infty)$ you have $(\kappa,\infty)$ and $\kappa$ itself is to be estimated. It means that the minimum observation is itself one component of a sufficient tuple.

One thing to be careful about is that your last displayed expression is really $$ \theta^n\left(\prod_i I_{(1,\infty)}(x_i) \right)\left(e^{(-\theta-1)\sum_i\log x_i}\right) $$ (where, of course $\log$ is the same thing as $\ln$; I mention this since you used both notations in your posted question).

Fisher's factorization criterion tells you that that sum of logarithms is indeed sufficient.

What that means is: The conditional probability distribution of $X_1,\ldots,X_n$ given the value of $\sum_{i=1}^n\log X_i$ does not depend on $\theta$.

Next, what does "complete" mean? A complete statistic is one that admits no unbiased estimator of $0$ except the trivial one. That means there is no function $g$ such that $$ E\left(g\left(\sum_{i=1}^n \log X_i \right)\right) $$ remains equal to $0$ as $\theta$ changes (where of course one must not allow $g$ to vary as $\theta$ changes).

The Lehmann-Scheffe theorem now says you can get the UMVUE by starting with any crude unbiased estimator of $1/\theta$ --- call this estimator $T=T(X_1,\ldots,X_n)$ --- and finding $E(T\mid \sum_{i=1}^n \log X_i)$. Because of sufficiency this will be a function of the data that does not depend on $\theta$ ---- hence a statistic. It will be the UMVUE.

It may be a good idea to seek your crude unbiased estimator of $1/\theta$ among functions of $X_1$ alone rather than all of the $X$s, simply because it's easier to find.

Then you have to find the conditional expecation, which may take some work.

I think finding that conditional expectation may be less work than I at first thought. Here's a big fat hint: Find $\mathbb E(\log X_1)$. ${{}\qquad{}}$ — Michael Hardy, Jun 02 '13 at 20:32
when talking about the Lehmann-scheffe theory your taking $T=T(X_1,...,X_n)$, but its $1/\theta$, so how does that work? its not even a function of it. — WiseStrawberry, Jun 02 '13 at 20:51
So in my work I havent yet showed that I have a complete sufficient statistic? — WiseStrawberry, Jun 02 '13 at 21:04
I'd say you've done enough to do that if you cite Fisher's factorization criterion. And explain which factors play which roles, i.e. which one depends on the data only through the sufficient statistic and which does not depend on $\theta$. Also, being clear about the point I said you should be careful about wouldn't hurt. — Michael Hardy, Jun 03 '13 at 03:29
I don't understand you second comment. I suggested that you find $\mathbb E(\log X_1)$. For that, you should get $1/\theta$. So there you have an unbiased estimator of $1/\theta$. You need to find the conditional expected value of that estimator given your sufficient statistic. — Michael Hardy, Jun 03 '13 at 03:30

showing that $S=\sum^n_{i=1} \ln(X_i)$ is a complete sufficient statistic

1 Answers1