4

This is a question I have been thinking of:

Suppose I have a Normal Distribution with a specific mean (e.g. "a") and standard deviation (e.g. "b") - if I draw "n" random numbers from this distribution and take the mean of these "n" numbers : on average, how close will this mean be from "a"?

For example, using the R programming language, I tried to run this simulation:

set.seed(123)
results = list()

for (i in 1:1000)

{

n = 100, a = 5, b = 5

sample_i = rnorm(100, 5, 5) mean_i = mean(sample_i) difference_i = abs(5 - mean_i) results[[i]] = data.frame(i,difference_i) }

final = do.call(rbind.data.frame, results) plot(density(final$difference_i), main = "Spread of Errors : n = 100, a = 5, b = 5")

enter image description here

I can now show this for n = 1000:

results = list()

for (i in 1:1000)

{

n = 1000, a = 5, b = 5

sample_i = rnorm(100, 5, 5) mean_i = mean(sample_i) difference_i = abs(5 - mean_i) results[[i]] = data.frame(i,difference_i) }

final = do.call(rbind.data.frame, results) plot(density(final$difference_i), main = "Spread of Errors : n = 1000, a = 5, b = 5")

enter image description here

My Question: In general, given a specific probability distribution - is there some mathematical formula which shows on average, how far the mean from a sample of size "n" will deviate from the true mean of this specific probability distribution?

Thanks!

EDIT - NOTE:

As a concrete example :

  • Consider 1000 random draws from a Normal Distribution with Mean=a and Standard_Deviation = b : On average, what will be the expected difference between the mean of these 1000 random draws and the true mean (i.e. "a")?

  • Consider 1000 random draws from a Poisson Distribution with the Rate_Parameter = "lambda: : On average, what will be the expected difference between the Rate Parameter calculated from these 1000 random draws and the true Rate Parameter (i.e. "lambda")?

  • In general, for "n" random draws from some general probability distribution - how will the mean calculated from these "n" random draws differ from the true mean of this distribution (on average)? Is there a mathematical formula that can be used to describe this relationship? (e.g. via Central Limit Theorem)

stats_noob
  • 3,112
  • 4
  • 10
  • 36
  • 5
    Mathematically, you are asking about the distribution of $|\bar{X}_n - a|$. This is the absolute value of a normal$(0, \sigma = b/\sqrt{n})$ variable. https://en.m.wikipedia.org/wiki/Folded_normal_distribution – Mason Nov 23 '22 at 04:01
  • @ Mason : thank you for your answer! This is the first time I have heard of a "Folded Normal Distribution"! I will start reading the link you posted. Would you like to elaborate on this a bit more in an answer (e.g. how can the properties of the Folded Normal Distribution be used to determine the relationship between the sample size and the expected deviation from the true mean? and is this true for all distributions?) Thank you so much! – stats_noob Nov 23 '22 at 05:11
  • Related: https://math.stackexchange.com/questions/1850653/mean-absolute-deviation-of-normal-distribution – user51547 Nov 23 '22 at 05:48
  • user51547 your related post does not account for the distribution of a sample mean. so unfortunately it is not applicable – RyRy the Fly Guy Nov 23 '22 at 06:14
  • 1
    As Mason pointed out, when the samples are drawn i.i.d. from normal(a, $\sigma=b$), the sample mean is normal(a, $\sigma=b/\sqrt{n}$), so it is directly applicable. – user51547 Nov 23 '22 at 07:07
  • user51547 downvoting someone who challenges you is kind of petty, don't you think? – RyRy the Fly Guy Nov 23 '22 at 18:54
  • If you find the answer below satisfactory, then please close your inquiry by clicking the green check mark. Thank you! – RyRy the Fly Guy Nov 26 '22 at 00:01

1 Answers1

1

Given the random variable $X$ with expectation $\mu$ and standard deviation $\sigma$, the average deviation of the sample mean $\bar X = \frac{1}{N} \sum_{i=1}^N X_i $ from the "true" population mean $E[\bar X] = E[X] = \mu$ is by definition the standard deviation of the sample mean as follows

$$\sigma_{\bar X} = \sqrt{Var[\bar X]} = \sqrt{E[(\bar X - \mu)^2]} = \frac{\sigma}{\sqrt N}$$

This is because the standard deviation of the sample mean is the square root of a weighted average of the squared deviations between the sample mean and the true mean. Of course, each deviation is weighted by the probability of its corresponding sample mean. The above equation is true regardless of the function form of the probability distribution of $X$. This is a direct result of the central limit theorem which implies the sample mean $\bar X$ is always normally distributed and increasingly so for large $N$.

Answer to Edit

As a concrete example :

  • Consider 1000 random draws from a Normal Distribution with Mean=a and Standard_Deviation = b : On average, what will be the expected difference between the mean of these 1000 random draws and the true mean (i.e. "a")?

$$ \frac{b}{\sqrt{1000}} = \frac{b}{10 \sqrt{10}}$$

  • Consider 1000 random draws from a Poisson Distribution with the Rate_Parameter = "lambda: : On average, what will be the expected difference between the Rate Parameter calculated from these 1000 random draws and the true Rate Parameter (i.e. "lambda")?

Note that the rate parameter lambda $\lambda$ is the expectation of the Poisson distribution and hence the "true" population mean, so this question is no different than the first one above. However, in order to solve it you must recognize that the Poisson distribution has the unique property where the mean and variance are equal. In other words, $ E[X] = a = Var[X] = b^2 = \lambda $. In turn, this implies the standard deviation of the Poisson distribution is $b = \sqrt \lambda$. You now have what you need to solve the problem. The Central Limit Theorem tells us the sample mean will be normally distributed even if the underlying probability distribution is Poisson, so the standard deviation of the sample mean is given by

$$ \frac{\sqrt \lambda}{\sqrt{1000}} = \frac{\sqrt \lambda}{10\sqrt{10}} $$

which is no different from above. We are still dividing the standard deviation by the square root of the sample size.

  • In general, for "n" random draws from some general probability distribution - how will the mean calculated from these "n" random draws differ from the true mean of this distribution (on average)? Is there a mathematical formula that can be used to describe this relationship? (e.g. via Central Limit Theorem)

This would be the equation i gave you above...

$$ \frac{b}{\sqrt{n}}$$

RyRy the Fly Guy
  • 5,950
  • 1
  • 11
  • 27
  • @ RyRy: thank you so much for your answer! I had also considered the Central Limit Theorem and how this might be applicable to the question I am asking. As an example - consider 1000 random draws from a normal distribution with mean=a and standard_deviation = b : on average, what will be the expected difference between the mean of these 1000 random draws and the true mean (i.e. "a")? Thank you so much! – stats_noob Nov 23 '22 at 05:14
  • no problem! i'm glad i could help. if you are satisfied with the answer, then please close the question by clicking the green check mark. – RyRy the Fly Guy Nov 23 '22 at 05:17
  • @ RyRy: thank you for your reply! I made a small update to my question - could the analysis that you provided in your answer be used to address these updates? thank you so much! – stats_noob Nov 23 '22 at 05:18
  • yes, the analysis i provided is directly applicable. i updated my response to explain this. – RyRy the Fly Guy Nov 23 '22 at 05:41