4

enter image description here

This sample follows a Normal Distribution with Mean $= 280 / 20 = 14$, and Variance $= (3977.57 / 20) - 14^2 = 2.88$. To find the unbiased variance, we can divide it by $19$ to get $3.03$.

However, in the following question:

enter image description here

enter image description here

I used $\displaystyle\int\frac{3x^3 + 2x^2}{10} \, dx$ and $\displaystyle\int \frac{3x^4 + 2x^4}{10}\,dx - \text{mean}^2$ to get the mean and variance that the sample follows. The answer gives the variance as $\displaystyle\int \frac{3x^4 + 2x^4}{10}\, dx - \text{mean}^2$ divided by $n$.

I understand that this is the definition given by the Central Limit Theorem. What I don't understand is why we don't divide by $n$ in the first example if we do in this case. Is it because we only divide by $n$ when using the central limit theorem, and the first example had only 20 samples taken, meaning that the central limit theorem could not be used? In this way, does it depend on the size of $n$?

Any help will be greatly appreciated, thanks in advance.

3 Answers3

2

If $X_i \sim \operatorname{Normal}(\mu, \sigma^2)$ for each $i = 1, 2, \ldots, n$, then define $$S_n = \sum_{i=1}^n X_i, \quad T_n = \sum_{i=1}^n X_i^2.$$ We then have, by the linearity of expectation $$\operatorname{E}[S_n] = \sum_{i=1}^n \operatorname{E}[X_i] = \sum_{i=1}^n \mu = n\mu, \\ \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[X_i^2] = \sum_{i=1}^n (\operatorname{Var}[X_i] + \operatorname{E}[X_i]^2) = \sum_{i=1}^n (\sigma^2 + \mu^2) = n(\sigma^2 + \mu^2).$$ Therefore, $$\frac{\operatorname{E}[T_n] - \operatorname{E}[S_n]}{n} = \sigma^2.$$ Indeed, this calculation does not depend on the distribution of $X_i$ at all: nowhere have I relied on the fact that the observations were normally distributed, only that they are identically distributed with mean $\mu$ and variance $\sigma^2$. I didn't even use the CLT.

Now, in regard to the second question, what we want to do is use the CLT in the large-sample case. So if we calculate the $k^{\rm th}$ raw moment, we have $$\begin{align*} \operatorname{E}[X^k] &= \int_{x=1}^2 x^k \frac{3x^2 + 2x}{10} \, dx = \frac{1}{10} \int_{x=1}^2 3x^{k+2} + 2x^{k+1} \, dx = \frac{1}{10}\left[\frac{3x^{k+3}}{k+3} + \frac{2x^{k+2}}{k+2} \right]_{x=1}^2 \\ &= \frac{1}{10} \left(\frac{3(2^{k+3} - 1)}{k+3} + \frac{2(2^{k+2} - 1)}{k+2} \right). \end{align*}$$ For $k = 1$, this is $$\operatorname{E}[X] = \frac{1}{10} \left(\frac{3(15)}{4} + \frac{2(7)}{3}\right) = \frac{191}{120}.$$ For $k = 2$, this is $$\operatorname{E}[X^2] = \frac{1}{10} \left( \frac{3(31)}{5} + \frac{2(15)}{4} \right) = \frac{261}{100}.$$ Therefore, applying the CLT, the sampling distribution of the sample mean $\bar X$ is approximately normal with mean $\mu$ and variance $\sigma^2/n$, where $\mu = 191/120$ and $$\sigma^2 = \frac{261}{100} - \left(\frac{191}{120}\right)^2 = \frac{1103}{14400}.$$ It follows that $$\Pr[\bar X > 1.6] = \Pr\left[\frac{\bar X - \mu}{\sigma/\sqrt{n}} > \frac{1.6 - 191/120}{\frac{\sqrt{1103/14400}}{150}} \right] \approx \Pr\left[Z > \frac{150}{\sqrt{1103}}\right],$$ where $Z$ is standard normal. The step where we use the approximation is when we claim $(\bar X - \mu)/(\sigma/\sqrt{n})$ is standard normal via the CLT.

heropup
  • 135,869
2

ok, Let's try a more concrete approach. Suppose $$ X_1 = \begin{cases} 0 & \text{with probability }1/2, \\ 2 & \text{with probability } 1/2. \end{cases} $$ Then $\mu=\operatorname{E}(X_1) = 1$ and $\sigma^2 = \operatorname{var}(X_1) = 1.$ So $$ X_1 + X_2 + X_3 = \begin{cases} 0 & \text{with probability } 1/8, \\ 2 & \text{with probability } 3/8, \\ 4 & \text{with probability } 3/8, \\ 6 & \text{with probability } 1/8. \end{cases} $$ Then $\operatorname{E}( X_1+X_2+X_3) = 3\mu,$ so $\operatorname{E}\left( \dfrac{X_1+X_2+X_3} 3 \right) = \mu,$ and $\operatorname{var}( X_1+X_2+X_3) = 3\sigma^2,$ so $\operatorname{var}\left( \dfrac{X_1 + X_2 + X_3} 3 \right) = \dfrac{3\sigma^2} {3^2} = \dfrac{\sigma^2} 3.$

0

In the first problem one is using the sample variance as an estimate of the variance of the population from which the sample was taken.

If the observations $X_1,\ldots,X_n$ in a sample are independent and identically distributed with expected value $\mu$ and variance $\sigma^2,$ then the sample mean $\overline X = (X_1+\cdots+X_n)/n$ is a random variable whose expected value is $\mu$ and whose variance is $\sigma^2/n.$ So the sample mean seldom gets as far from $\mu$ as the individual observations typically do. Thus $$ \frac{X_1-\mu} \sigma \quad \text{and} \quad \frac{\overline X - \mu}{\sigma^2/n} $$ both have expected value $0$ and variance $1$. All of that can be established without knowing anything about the central limit theorem.

  • Thank you for your answer. I now understand how the central limit theorem makes sense, so thank you for that, but I still do not understand why the sample variance of the first question wasn't divided by n, as it seems it should given that $\overline X = (X_1+\cdots+X_n)/n$ gives variance to be divided by n... – StopReadingThisUsername May 06 '17 at 02:42