I am continuing from a previous question Expected Value for Sum of Unfair Dice equals Expected Value for Sum of Fair Dice?
Introduction:
Suppose there are two dice : $D_1$ and $D_2$. Both $D_1$ and $D_2$ have 6 sides (i.e. $i = 1,2,3,4,5,6$).
If $D_1 = i$, there is a $0.5$ probability that $D_2 = i$ and an evenly distributed probability that $D_2$ equals all other remaining numbers (i.e. there is a 0.5 probability that the outcome of the second dice will have the exact same outcome as the first dice, and a 0.5/5 probability that the second dice will have any other outcome)
Suppose both dice are rolled $n$ times: $(x_1,y_1)$, $(x_2, y_2)$, ... $(x_n,y_n)$
The objective is to estimate the Expected Value of the sum of both dice, i.e. $Z = D_1 + D_2$ , thus we are interested in $E(Z)$
My Problem: Suppose there are 3 people that are trying to estimate $E(Z)$ based on these $n$ rolls
- Person 1 is provided the individual dice rolls $(x_1,y_1)$, $(x_2, y_2)$, ... $(x_n,y_n)$ and is told that the dice rolls are independent
- Person 2 is provided only with the sums of the dice rolls $z_1, z_2, ... z_n$ and is told that the dice rolls are independent
- Person 3 is provided the individual dice rolls $(x_1,y_1)$, $(x_2, y_2)$, ... $(x_n,y_n)$ and told that all outcomes of Dice 2 depends on the outcome of Dice 1
Part 1: Expected Values (i.e. Mean Estimator)
- Person 1: To calculate the expected value of the dice sum for Person 1, we can use the formula $E(x+y) = E(x) + E(y)$ :
$$E(Z) = \hat{Z} = \hat{E}(D_1) + \hat{E}(D_2) = \left( \sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n} \right) + \left( \sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n} \right)$$
Person 2: To calculate the expected value of the dice sum for Person 2, we can directly calculate the expected value of $z$: $$E(Z) = \hat{Z} = \sum_{k=2}^{12} z \cdot \frac{n_z}{n}$$
Person 3: To calculate the expected value of the dice for Person 3, we assume that every outcome of Dice 2 potentially depends on the outcome of Dice 1, thus resulting in a large sum of conditional probabilities (in reality, many of these will likely be $0$):
$$ E(Z) = \hat{Z} = \sum_{i=1}^{6} \sum_{j=1}^{6} (i+j) \cdot P(Y=j|X=i) \cdot P(X=i) = \sum_{i=1}^{6} \sum_{j=1}^{6} (i+j) \cdot \frac{n_{2ij}}{n_{1i}} \cdot \frac{n_{1i}}{n}$$
Apparently, because of the law of linear expectations (Proof of linearity for expectation given random variables are dependent), the expected values for all 3 people are equal.
Part 2: Variances of the Mean Estimators
For all people, we use the formulae:
- $Var(Z) = E(Z^2) - [E(Z)]^2$
- $Var(E(Z)) = \frac{Var(Z)}{n}$
- $Cov(X,Y) = E(XY) - E(X)E(Y)$
Thus,
- For Person 1:
$$\text{Var}(Z) = \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{1i}}{n}\right) - \left(\left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right)^2\right) + \left(\sum_{j=1}^{6} j^2 \cdot \frac{n_{2j}}{n}\right) - \left(\left(\sum_{j=1}^{6} j \cdot \frac{n_{2j}}{n}\right)^2\right)$$
$$ \text{Var}(\hat{Z}) = \text{Var}(E(Z)) = \frac{1}{n} \left[ \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{1i}}{n}\right) - \left(\left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right)^2\right) + \left(\sum_{j=1}^{6} j^2 \cdot \frac{n_{2j}}{n}\right) - \left(\left(\sum_{j=1}^{6} j \cdot \frac{n_{2j}}{n}\right)^2\right) \right]$$
- For Person 2:
$$\begin{align*} {Var}(Z) = \left(\sum_{k=2}^{12} k^2 \cdot \frac{n_k}{n}\right) - \left(\left(\sum_{k=2}^{12} k \cdot \frac{n_k}{n}\right)^2\right) \end{align*}$$
$$\text{Var}(\hat{Z}) = \text{Var}(E(Z)) = \frac{1}{n} \left[ \left(\sum_{k=2}^{12} k^2 \cdot \frac{n_k}{n}\right) - \left(\left(\sum_{k=2}^{12} k \cdot \frac{n_k}{n}\right)^2\right) \right]$$
- For Person 3:
$$Var(X) = E(X^2) - [E(X)]^2 = \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{1i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right)^2$$
$$Var(Y) = E(Y^2) - [E(Y)]^2 = \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{2i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right)^2$$
$$Cov(X,Y) = E(XY) - E(X)E(Y) = \left(\sum_{i=1}^{6} \sum_{j=1}^{6} i \cdot j \cdot \frac{n_{2ij}}{n_{1i}} \cdot \frac{n_{1i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right) \cdot \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right)$$
$$Var(Z) = Var(X+Y)= Var(X) + Var(Y) + 2Cov(X,Y) = \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{1i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right)^2 + \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{2i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right)^2 + 2\left(\sum_{i=1}^{6} \sum_{j=1}^{6} i \cdot j \cdot \frac{n_{2ij}}{n_{1i}} \cdot \frac{n_{1i}}{n}\right) - 2\left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right) \cdot \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right)$$
$$\text{Var}(\hat{Z}) = \text{Var}(E(Z)) = \frac{1}{n} \left[ \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{1i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right)^2 + \left(\sum_{i=1}^{6} i^2 \cdot \frac{n_{2i}}{n}\right) - \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right)^2 + 2\left(\sum_{i=1}^{6} \sum_{j=1}^{6} i \cdot j \cdot \frac{n_{2ij}}{n_{1i}} \cdot \frac{n_{1i}}{n}\right) - 2\left(\sum_{i=1}^{6} i \cdot \frac{n_{1i}}{n}\right) \cdot \left(\sum_{i=1}^{6} i \cdot \frac{n_{2i}}{n}\right) \right]$$
Based on the above analysis, it seems that the Variance for Person 3 will be larger than the Variance of Person 1 and the Variance of Person 2 (since both Person 1 and Person 2 are not taking into consideration covariances). However, this suggests that the Variance estimate for Person 3 will be closer to the actual variance, while Person 1 and Person 2 will underestimate the actual variance.
My Question: Have I correctly calculated the variances for all 3 people?
Thanks!
- Note: Additional Variance Identity
Case 1:$$ Var(X+Y) = E(X^2) - [E(X)]^2 + E(Y^2) - [E(Y)]^2 + 2Cov(X,Y)$$
Case 2:
$$\begin{align*} \text{Var}(X+Y) &= E((X+Y)^2) - (E(X+Y))^2 \\ &= E(X^2 + Y^2 + 2XY) - [ (E(X))^2 + (E(Y))^2 + 2E(X)E(Y) ] \\ &= E(X^2) - (E(X))^2 + E(Y^2) - (E(Y))^2 + 2E(XY) - 2E(X)E(Y) \end{align*}$$
- In Case 1, we assume $2Cov(X,Y)= 0$
- In Case 2, we assume $2Cov(X,Y) = 2E(XY) - 2E(X)E(Y) = 0$
- Thus, after making these substitutions in Case 1 and Case 2: the variances are same