I am trying to understand an approach that was discussed in my pattern recognition class today of using uniform random variables, which can be sampled using some random generator, to obtain samples that are approximately $N(0,1)$. Unfortunately there were no notes given so I wish to reproduce what my professor said from my understanding.
Knowing that for random variables $X \sim U(0, 1)$ we have $\mu_{X} = 0.5, \sigma_{X}^2 = \frac{1}{12}$, then consider a random variable $$Y = (\sum_{i=0}^{12} X_i) - 6 $$ $Y$ has $u_{Y}= 0, \sigma_{Y}^2 = 1$.
Now I wish to use the Central Limit Theorem (CLT), where according to the Lindeberg-Levy version of the CLT, if $\{X_1, \ldots, X_n \}$ is a sequence of i.i.d random variables with $\mathbb{E}[X_i] = \mu$, and $Var[X_i] = \sigma^2 \leq \infty$, then as $n$ approaches infinite, the random variables $\sqrt{n}(\bar{X} - \mu) $ converges to $N(0, \sigma^2)$.
Then as far as I understand it, by taking $n$ samples of $Y$, we would have, for sufficiently large $n$, that $\sqrt{n} * \bar{Y} \sim N(0, 1)$.
Is this algorithm correct?
I also see in the wikipedia entry here (the computational methods section) that they do something similar: "Generate $12$ uniform $U(0,1)$ deviates, add them all up, and subtract $6$ – the resulting random variable will have approximately standard normal distribution", but they do not further average these values and multiply by $\sqrt{n}$, so how does the CLT apply here?
I'm trying to think why it would be inefficient as well?, but these seem to be pretty simple computations, in python ...
import random
import math
import numpy as np
def get_y_bar(num_samples=50):
y_bar = 0
for i in range(num_samples):
y_bar += sample_y()
y_bar /= num_samples
y_bar *= math.sqrt(num_samples)
return y_bar
def sample_y():
y = 0
for i in range(12):
y += random.random()
y -= 6
return y
def sample_normal(N=100000):
sample_list = []
for i in range(N):
sample_list.append(get_y_bar())
sample_list = np.array(sample_list)
sample_mean = np.mean(sample_list)
sample_std = np.std(sample_list)
print(f"The sample mean is {sample_mean}, and the sample std is {sample_std}")
sample_normal()
With output: The sample mean is $-0.0015446492547001867$, and the sample std is $0.9989513839711084$, which is pretty close, about $2$ digits of precision off from the theoretical idea.