goodness-of-fit test for categorical data with small expected frequencies

Question

I have a categorical data set with 8 categories and I want to test my data against a theoretical "ideal" distribution. Usually I think a chi square test does this just fine; however, some of my theoretical probabilities are small and therefore my theoretical distribution says that 3 of my 8 categories have an expected frequency smaller than 5. I've been told that such small expected frequencies mean that chi square is not appropriate. So my question is: when we have some categories with small expected frequencies, what kind of goodness-of-fit test is appropriate? I've been googling around and getting conflicting answers.

BruceET · Accepted Answer · 2021-04-06T07:23:49.340

You need to think about your objectives as well as the appropriateness of a chi-squared test.

Suppose you roll a die 18 times to get an idea whether it is fair. If it is biased with probabilities $(2,2,3,3,4,4)/18$ of the respective faces, you have two issues to deal with: (a) some of the faces may have counts below five, (b) even if the test runs OK, it does not have much power to reject the null hypothesis that all faces are equally likely.

(a)

set.seed(405)
pr = c(2,2,3,3,4,4)/18
TAB = tabulate(sample(1:6, 18, rep=T, p = pr))
TAB
[1] 2 1 3 4 6 2
chisq.test(TAB)
    Chi-squared test for given probabilities


data:  TAB
X-squared = 5.3333, df = 5, p-value = 0.3766
Warning message:
In chisq.test(TAB) : Chi-squared approximation may be incorrect

The given probabilities are all equal, unless otherwise specified. The expected count in each of the six cells is $3,$ which generates a warning message.

There is a 'cure' for moderately low cell counts. The implementation of chisq.test in R allows you to simulate a reasonably accurate P-value, which finds no significant difference from fairness.

chisq.test(TAB, sim=T)
Chi-squared test for given probabilities 
with simulated p-value 
(based on 2000 replicates)


data:  TAB
X-squared = 5.3333, df = NA, p-value = 0.4138

(b) However, there is no cure for the low power of the test with only 18 rolls of the die. With so few few rolls of this die, the probability of detecting that the die is biased is disappointingly small.

Even with 60 rolls of the die (and consequently very rare low cell counts), the power of the chi-squared test is only about 31%, as illustrated by the simulation below.

set.seed(2021)
pr = c(2,2,3,3,4,4)/18
m = 10^5;  pv = numeric(m)
for(i in 1:m) {
  TAB = tabulate(sample(1:6, 60, rep=T, p=pr))
  pv[i] = chisq.test(TAB)$p.val
  }
mean(pv <= .05)
[1] 0.31181

Note: With 600 rolls of the die, you would almost surely detect the bias; power above 99% (simulation not shown).

Addendum: Using data from your Comment, with the simulated option to accommodate small counts.

x= c(16, 15, 6, 9, 7, 6, 4, 5)
a=c(19.2, 13.3, 8.8, 8.2, 6.7, 4.7, 3.9, 3.2)
chisq.test(x, p=a/sum(a), sim=T)
    Chi-squared test for given probabilities 
    with simulated p-value (based on 2000 replicates)

data:  x
X-squared = 3.1077, df = NA, p-value = 0.8786

Here given probabilities are provided by re-scaling your "ideal" distribution to sum to $1.$ The null hypothesis that data x fit your ideal distribution is not rejected. (Of course, as always, with very much more data you might reject, but if agreement seems reasonable to you based on what you know about the subject matter, then this outcome is likely OK.)

If you do chisq.test without simulation, then you get P-value 0.8749, which is actually not much different, but does generate a warning message because of the 'low' counts.

X-squared = 3.1077, df = 7, p-value = 0.8749

In a formal report, you might give the P-value as 0.8749, with a note saying that simulation confirms it is nearly correct. Alternatively, show simulated results from R (as earlier), mentioning R's simulation capability in a note.

[The 'rule', triggering the warning message, is that all expected counts must be smaller than 5. This rule is considered to be too 'fussy' by some, who say it's OK to have a 'few' counts as low as 3, provided 'most' exceed 5. Of course, all such rules are arbitrary. Because we did it with and without simulation, we know the P-value from the chi-squared distribution isn't far off.]

Thank you for the answer. My sample size is 68, close to your value of 60 which you warn about. Given my (unavoidable) small sample size, is there an appropriate test to use? I read online about "exact" tests like Fisher or multinomial that could be used when there are low expected frequencies, but I couldn't quite tell if those were appropriate in my context. — David, Apr 05 '21 at 17:43
Depending on how much your 8 categories might differ, 60 could be enough. If you could say more about your 'ideal distribution' and an alternative distribution that you would want to detect, then it would be easier to say what sample size is needed. — BruceET, Apr 05 '21 at 18:57
My ideal distribution is something like [19.2, 13.3, 8.8, 8.2, 6.7, 4.7, 3.9, 3.2] and an observed distribution is something like [16, 15, 6, 9, 7, 6, 4, 5]. Chi square seems like a natural way to test the fit of the second to the first, but I've been told that this is problematic because of the three expected frequencies that are less than 5. — David, Apr 06 '21 at 02:20
Just now saw your Comment, and posted an addendum to my Answer using that information. — BruceET, Apr 06 '21 at 06:57

goodness-of-fit test for categorical data with small expected frequencies

1 Answers1