1

I am a bit confused by different definitions of the standard error term in statistics that I find. For simplicity I will only refer to the standard error of mean. In one place I find that it's defined as the standard deviation of sample means and in other places I find it defined as $SE = \frac{\sigma}{n}$ where $n$ is the sample size. This also creates some ambiguity for me already because what is that $n$ in that formula? If we have multiple samples with different sample sizes, which size do we use?

I understand the algebra behind the proof of the derivation of the formula but I am confused in practice what would happen if there are multiple samples (of maybe even varying sample sizes).

As an example, let's say we have a population of $\{1,2,3,4,5,6,7\}$ where the true mean is $\mu = 4$ and true standard deviation is $\sigma = 2$. Now, lets say we take three samples as follows:

$A = \{1,2,3,4\} , B = \{1,5,6,7\}, C = \{2,4,6,7\}$. How would the calculation of the SE look like? According to formula above it would just be $SE = \frac{2}{4}$ but according to those sources that say it is the standard deviation of the sample means then we would have to take the mean of means of these samples and find the standard deviation of the means. This gives a different result and it doesn't even begin to describe what would happen if for example sample $C$ has a different sample size.

The only way I can reconcile the above in my mind is the SE is defined only for one sample of size $n$ and not for multiple samples as some sources claim. Or perhaps even the definitions coincide but maybe only when having taken all possible samples with an identical size. Any help is appreciated in clearing my rather elementary confusion.

Esoog
  • 725
  • "... but according to those sources ..." Which sources? But you are right that the definition of SE usually refers on one sample with a size of $n$. – callculus42 Jun 27 '22 at 15:34
  • You are more likely to see the standard error of the mean as $SE = \frac{\sigma}{\sqrt{n}}$ or $SE = \frac{s}{\sqrt{n}}$ (note the $\sqrt{n}$) based on a single sample sized $n$. The first is the standard deviation of the distribution of the sample mean if you know the population standard deviation. The second is an estimator of that using the sample standard deviation from your actual sample. – Henry Jun 27 '22 at 15:34
  • @callculus42 Here is a quote from page 43 from the book "Discovering Statistics Using R": "The standard deviation of sample means is known at the standard error of the mean. Therefore, the standard error could be calculated by taking the difference between each sample mean and the overall mean, squaring these differences, adding them up and then dividing by the number of samples. Finally the square root of this value would need to be taken to get the standard deviation of sample means, the standard error." And I have read this elsewhere too being explained as such. – Esoog Jun 27 '22 at 15:46
  • @Esoog You might do that in a simulation to illustrate that this is the standard deviation of the distribution of the sample mean. But in reality instead of taking $k$ samples size $n$, you would combine them to have a single sample sized $kn$ and so a standard error of the single sample mean which is smaller, namely $\frac{s}{\sqrt{kn}}$ – Henry Jun 27 '22 at 16:08
  • @Henry But this then is the part that confuses me. That if you don't combine the samples to a big one like you suggest and instead have separate samples, then aren't the definitions conflicting since the numerical value of the SE using different definitions is different? Not to mention not well defined if we have different sample sizes in the case we want to use the formula involving a unique sample size. – Esoog Jun 27 '22 at 16:15
  • @Esoog If you have a population normally distributed with mean $2022$ and variance $9$ so standard deviation $3$, and take a sample sized $100$, then the distribution of the mean of that sample has an expectation of $2022$ and standard deviation of $0.3$. A single sample sized $100$ would have a mean close to $2022$ and a standard deviation close to $3$ (though not exactly, since this is a random sample) and you have $\frac{3}{\sqrt{100}}=0.3$. If you took a large number of samples each sized $100$ then their means would also have an average of about $2022$ and standard deviation about $0.3$ – Henry Jun 27 '22 at 16:26
  • If you had samples of different sizes then, since the standard error of the mean is proportional to $\frac{1}{\sqrt{n}}$, they would have different standard errors of the mean. – Henry Jun 27 '22 at 16:28
  • @Henry Thank you, I think my main concern is a reconciliation of the two approaches that give different actual numerical results. That people view it as both the standard deviation of means of multiple samples and as the quotient of the standard deviation of a single sample divided by the sample size. – Esoog Jun 27 '22 at 16:56
  • To repeat, it is the standard deviation divided by the square root of the sample size. And the numbers are not substantially different (except for the fluctuations from random fluctuations) – Henry Jun 27 '22 at 17:00
  • @Henry Indeed the square root, I misspoke. As for the numbers being not substantially different, this may give intuition but fails as a rigorous definition in my mind. Either something is identically the same or it isn't. I think I should keep the single sample perspective as a definition and the other as more intuition as to what it relates to rather than a strict definition. – Esoog Jun 27 '22 at 17:03

0 Answers0