3

I am trying to understand an approach that was discussed in my pattern recognition class today of using uniform random variables, which can be sampled using some random generator, to obtain samples that are approximately $N(0,1)$. Unfortunately there were no notes given so I wish to reproduce what my professor said from my understanding.

Knowing that for random variables $X \sim U(0, 1)$ we have $\mu_{X} = 0.5, \sigma_{X}^2 = \frac{1}{12}$, then consider a random variable $$Y = (\sum_{i=0}^{12} X_i) - 6 $$ $Y$ has $u_{Y}= 0, \sigma_{Y}^2 = 1$.

Now I wish to use the Central Limit Theorem (CLT), where according to the Lindeberg-Levy version of the CLT, if $\{X_1, \ldots, X_n \}$ is a sequence of i.i.d random variables with $\mathbb{E}[X_i] = \mu$, and $Var[X_i] = \sigma^2 \leq \infty$, then as $n$ approaches infinite, the random variables $\sqrt{n}(\bar{X} - \mu) $ converges to $N(0, \sigma^2)$.

Then as far as I understand it, by taking $n$ samples of $Y$, we would have, for sufficiently large $n$, that $\sqrt{n} * \bar{Y} \sim N(0, 1)$.

Is this algorithm correct?

I also see in the wikipedia entry here (the computational methods section) that they do something similar: "Generate $12$ uniform $U(0,1)$ deviates, add them all up, and subtract $6$ – the resulting random variable will have approximately standard normal distribution", but they do not further average these values and multiply by $\sqrt{n}$, so how does the CLT apply here?

I'm trying to think why it would be inefficient as well?, but these seem to be pretty simple computations, in python ...

import random
import math
import numpy as np

def get_y_bar(num_samples=50): y_bar = 0 for i in range(num_samples): y_bar += sample_y() y_bar /= num_samples y_bar *= math.sqrt(num_samples) return y_bar

def sample_y(): y = 0 for i in range(12): y += random.random() y -= 6 return y

def sample_normal(N=100000): sample_list = [] for i in range(N): sample_list.append(get_y_bar()) sample_list = np.array(sample_list) sample_mean = np.mean(sample_list) sample_std = np.std(sample_list) print(f"The sample mean is {sample_mean}, and the sample std is {sample_std}")

sample_normal()

With output: The sample mean is $-0.0015446492547001867$, and the sample std is $0.9989513839711084$, which is pretty close, about $2$ digits of precision off from the theoretical idea.

1 Answers1

1

Classical CLT says if $W_i$ are iid, each with mean $\mu$ and variance $\sigma^2>0$, then

$$ \frac{(\frac{1}{N}\sum_{i=1}^N W_i)-\mu}{\sigma/\sqrt N}\overset{d}{\rightarrow }N(0,1).$$

The wiki article you link to says that if $X_i\overset{\text{iid}}{\sim} U(0,1),$ then

$$Y:=-6+\sum_{i=1}^{12}X_i$$ is approximately standard normal, which follows by CLT since you can write

$$Y=\frac{(\frac{1}{N}\sum_{i=1}^N X_i)-\mu}{\sigma/\sqrt N},$$

where $\mu=1/2,\sigma^2=1/12$ and letting $N=12.$ Note this is not exactly the same as $\sqrt n \bar Y$, which is what you wrote, although $\sqrt n \bar Y$ should also be approximately standard normal by another application of CLT after generating iid $Y_j,j=1,...,n$.

However, you can obtain an exact standard normal using Box-Muller transform, which is also mentioned in the wiki link you provide.

Golden_Ratio
  • 12,591
  • Thanks for the answer, very clear. Small other question, what does the $d$ above the convergence arrow mean? Pointwise convergence? – IntegrateThis Jan 27 '22 at 02:10
  • @IntegrateThis It means convergence in distribution, which is how CLT is formally stated: https://en.wikipedia.org/wiki/Convergence_of_random_variables#Convergence_in_distribution – Golden_Ratio Jan 27 '22 at 02:11