I have a population that can be subdivided into sub-populations (e.g., a sack full of Apples,Pears,Oranges). Every item shares a binary attribute (e.g.,ripe:true/false). I want to figure out if one of the sub populations (e.g., Apples) significantly differs from the rest (are significantly more frequent riper). As this is a comparison of categorical values I figured that chi-squared is the appropriate test. However, I do not know how to proceed.
My $H_0$ would be: Apples are not significantly more frequently riper than the other fruits.
I could have one degrees of freedom as I could have
(a) $Apple \land Ripe$ vs. $\neg Apple \land Ripe$
I could also have two degrees of freedom, as I have Apple, Pears, Oranges.
(b) $Apple \land Ripe$ vs. $Pear \land Ripe$ vs. $Orange \land Ripe$
Assuming the measurement table
| / | Apple | Pear | Orange |
|---|---|---|---|
| Ripe | 70 | 50 | 50 |
| $\neg$ Ripe | 30 | 50 | 50 |
On average 170/300 fruits are ripe (i.e., 56%).
(a) $\frac{(70 - 56)²}{56} + \frac{(100 - 112)²}{112} = 4.7058825$
(b) $\frac{(70 - 56)²}{56} + \frac{(50 - 56)²}{56} + \frac{(50 - 56)²}{56} = 4.7058825$
Either degree of freedom yields the same number however using the value for 0.05 (3.841 for 1 degrees vs. 5.991 for 2 degrees) from the corresponding table we have different conclusions now. I.e., assuming two degrees of freedom we can reject H0, assuming three degrees of freedom we cannot.
How can I formulate/calculate the chi-squared test for my problem?
This question on cross validated would suggest that (b) is correct, however, the OP does not specify the hypothesis tested. Furthermore, assuming that I would want to apply the test to the hypothesis that oranges are not more frequently ripe than the other fruits I would write down the same equation, this lets me doubt the proposed $(columns-1)(rows-1)$ rule.
Thank you for reminding me to be aware of multi-hypthesis tests.
– Sim Feb 08 '22 at 10:14