Verifying a simple method to use $U(0,1)$ random generators with the CLT to sample from $N(0, 1)$

Question

I am trying to understand an approach that was discussed in my pattern recognition class today of using uniform random variables, which can be sampled using some random generator, to obtain samples that are approximately $N(0,1)$. Unfortunately there were no notes given so I wish to reproduce what my professor said from my understanding.

Knowing that for random variables $X \sim U(0, 1)$ we have $\mu_{X} = 0.5, \sigma_{X}^2 = \frac{1}{12}$, then consider a random variable $$Y = (\sum_{i=0}^{12} X_i) - 6 $$ $Y$ has $u_{Y}= 0, \sigma_{Y}^2 = 1$.

Now I wish to use the Central Limit Theorem (CLT), where according to the Lindeberg-Levy version of the CLT, if $\{X_1, \ldots, X_n \}$ is a sequence of i.i.d random variables with $\mathbb{E}[X_i] = \mu$, and $Var[X_i] = \sigma^2 \leq \infty$, then as $n$ approaches infinite, the random variables $\sqrt{n}(\bar{X} - \mu) $ converges to $N(0, \sigma^2)$.

Then as far as I understand it, by taking $n$ samples of $Y$, we would have, for sufficiently large $n$, that $\sqrt{n} * \bar{Y} \sim N(0, 1)$.

Is this algorithm correct?

I also see in the wikipedia entry here (the computational methods section) that they do something similar: "Generate $12$ uniform $U(0,1)$ deviates, add them all up, and subtract $6$ – the resulting random variable will have approximately standard normal distribution", but they do not further average these values and multiply by $\sqrt{n}$, so how does the CLT apply here?

I'm trying to think why it would be inefficient as well?, but these seem to be pretty simple computations, in python ...

import random
import math
import numpy as np
def get_y_bar(num_samples=50):
    y_bar = 0
    for i in range(num_samples):
        y_bar += sample_y()
    y_bar /= num_samples
    y_bar *= math.sqrt(num_samples)
    return y_bar
def sample_y():
    y = 0
    for i in range(12):
        y += random.random()
    y -= 6
    return y
def sample_normal(N=100000):
    sample_list = []
    for i in range(N):
        sample_list.append(get_y_bar())
    sample_list = np.array(sample_list)
    sample_mean = np.mean(sample_list)
    sample_std = np.std(sample_list)
    print(f"The sample mean is {sample_mean}, and the sample std is {sample_std}")
sample_normal()

With output: The sample mean is $-0.0015446492547001867$, and the sample std is $0.9989513839711084$, which is pretty close, about $2$ digits of precision off from the theoretical idea.

yes it works because the $Y_i$ are independent if the $X_i$ are independent and so you can use the c.l.t. on them — Tortar, Jan 27 '22 at 01:27
It is an approximation (it has the correct mean and variance but the distribution does not not quite have the right shape). But it uses many uniform random variables to generate one approximately normal random variable, so it may be less efficient than other methods, for example which generate two normally distributed random variables from two uniform random variables — Henry, Jan 27 '22 at 01:38
@Henry ok I see, Box Muller only uses two, wow! I think my main point of confusion now is what the difference between my approach and the approach mentioned in wikipedia, how are they using the CLT ? — IntegrateThis, Jan 27 '22 at 01:40
They are not using the CLT, which is why they are not approximations — Henry, Jan 27 '22 at 01:43

Golden_Ratio · Accepted Answer · 2022-01-27T02:00:34.987

1

Classical CLT says if $W_i$ are iid, each with mean $\mu$ and variance $\sigma^2>0$, then

$$ \frac{(\frac{1}{N}\sum_{i=1}^N W_i)-\mu}{\sigma/\sqrt N}\overset{d}{\rightarrow }N(0,1).$$

The wiki article you link to says that if $X_i\overset{\text{iid}}{\sim} U(0,1),$ then

$$Y:=-6+\sum_{i=1}^{12}X_i$$ is approximately standard normal, which follows by CLT since you can write

$$Y=\frac{(\frac{1}{N}\sum_{i=1}^N X_i)-\mu}{\sigma/\sqrt N},$$

where $\mu=1/2,\sigma^2=1/12$ and letting $N=12.$ Note this is not exactly the same as $\sqrt n \bar Y$, which is what you wrote, although $\sqrt n \bar Y$ should also be approximately standard normal by another application of CLT after generating iid $Y_j,j=1,...,n$.

However, you can obtain an exact standard normal using Box-Muller transform, which is also mentioned in the wiki link you provide.

edited Jan 27 '22 at 02:00

answered Jan 27 '22 at 01:52

Golden_Ratio

12,591

Thanks for the answer, very clear. Small other question, what does the $d$ above the convergence arrow mean? Pointwise convergence? – IntegrateThis Jan 27 '22 at 02:10
@IntegrateThis It means convergence in distribution, which is how CLT is formally stated: https://en.wikipedia.org/wiki/Convergence_of_random_variables#Convergence_in_distribution – Golden_Ratio Jan 27 '22 at 02:11

Verifying a simple method to use $U(0,1)$ random generators with the CLT to sample from $N(0, 1)$

1 Answers1