0

I'm interested in the relation between the variance of the mean of set of real numbers, $A$, and sum of the variances of the means of any arbitrary partition of $A$ into a smaller number of sets.

Let's call the variance of the mean of the original set

$$ V_A = \frac{ \frac{1}{n} \sum^n_{i=1} (A_i - \bar{A})^2 }{n} $$

and the weighted sum of the variances of the means of some arbitrary partitioning of the set

$$ V_P = \left( \frac{n}{n_x}\right)^2 \frac{ \frac{1}{n_x} \sum^{n_x}_{i=1} (A_i - \bar{A_x})^2 } {n_x} + \left( \frac{n}{n_y}\right)^2 \frac{ \frac{1}{n_y} \sum^{n_y}_{i=1} (A_i - \bar{A_y})^2 } {n_y} + ... + \left( \frac{n}{n_k}\right)^2 \frac{ \frac{1}{n_k} \sum^{n_k}_{i=1} (A_i - \bar{A_k})^2 } {n_k} $$

where $n = n_x + n_y + ... + n_k$. (In case this notation isn't clear, I provide a short demonstration below in R.)

Obviously, $V_P$ is minimized when $k$ is set to $n$ such that $V_P = 0$. Thus, $V_P$ can be arbitrarily smaller than $V_A$. By contrast, I haven't thought of many cases where $V_A < V_P$, and I'm wondering if it's possible to find a bound for the difference between $V_A$ and $V_P$ when $V_A < V_P$.

Here's a simple example.

t <- 1

A <- rep(c(0, 20), t) B <- rep(c(11, 9), t) C <- rep(c(11, 9), t)

n <- length(A)

Calculate variances (not sample variances)

var_A <- var(A) * (n / (n-1)) var_B <- var(B) * (n / (n-1)) var_C <- var(C) * (n / (n-1)) var_ABC <- var(c(A, B, C)) * (3 * n / (3 * n-1))

var_ABC / (3 * n) (1/n^3) * (var_A / n) + (1/n^3) * (var_B /n ) + (1/n^3) * (var_C /n)

In this case, for $t = 1$, $V_A < V_P$ but for $t > 1$, $V_P < V_A$.

I suppose there is not a novel question, but I haven't found an answer so would be grateful if someone can point me in the right direction. If there's a bound, how do we establish it; if not, perhaps an example to show that there is no bound.

num_39
  • 101
  • If $A$ is fixed than so is its mean (i.e. with variance $0$) – Henry May 23 '23 at 08:10
  • I don't follow this. Are you saying that for a fixed set, say $A = {1, 7, 5, 2, 19}$, I can't calculate $V_A$ as defined above? – num_39 May 23 '23 at 08:37
  • The mean of your $A$ is $\mu_A=\frac{1}{5} \sum\limits^5_{i=1} A_i =6.8$ and its variance is $\sigma^2_A=\frac{1}{5} \sum\limits^5_{i=1} (A_i - \mu_A)^2=41.76$: the mean of $A$ does not have a variance. You could talk about the variance of the mean of a sample from $A$ – Henry May 23 '23 at 09:04
  • Okay. I see that 41.76 / 5 may not have much meaning. In reality, I am interested in a sample from A but was trying to simplify the question by avoiding (n-1) in the denominator. – num_39 May 23 '23 at 09:13
  • If you take a random sample size $n$ from $A$ with replacement then the sample mean has expectation $\mu_A$ and variance $\frac{\sigma^2_A}{n}$. You may get duplicates in the sample. Here $n$ does not need to be the same as the size of $A$, though it can be if you want. – Henry May 23 '23 at 09:20
  • Yes, this is clear. I'm interested in comparing this variance of the sample mean for a particle sample to the variance you get when you take that particular sample and divide it into an arbitrary number of subsets and then take the variance of the mean of each subset and add them together. – num_39 May 23 '23 at 09:27

0 Answers0