1

When comparing two proportions of two independant samples of size $n_A$ and $n_B$ (with random variables $X_A^i$ and $X_B^i$ following a Bernoulli law $B(p_A)$ and $B(p_B)$) using a test statistics, and dealing with the null hypothesis $H_0$ that $p_A=p_B$,

I could see in the litterature these two distinct formulas for the behaviour of the test statistic under $H_0$.

Some use :

  • $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p_A(1-p_A)/n_A + p_B(1-p_B)/n_B}}\rightarrow N(0,1)$

  • $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$, with $p=\frac{n_A\bar{X}_A+n_B\bar{X}_B}{n_A+n_B}$

  1. Is one of them correct ? Is one of them wrong ?

  2. Now, if we consider the second formula : $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$ :

In this context, is it true to state that "in the general case", $\bar{X}_A-\bar{X}_B\rightarrow N(p_A-p_B,p_A(1-p_A)/n_A + p_B(1-p_B)/n_B)$

(it looks incompatible to me with the formula of $T$ for the variance part, but maybe there is a trick ?)

Any comment ?

1 Answers1

1

Yes, at least two. As you indicate, the two main versions involve:

(a) Using the null hypothesis, thus assuming proportions are equal, and pooling all of the data in both groups to get a single standard error for the test statistic.

(b) Using two separate estimates of variance, one from each sample, to get the standard error.

One advantage of this method is that test results and confidence intervals agree. (That is, the CI for difference in proportions includes $0$ exactly when $H_0$ is not rejected.)

Other differences center on various methods of continuity correction. IMHO continuity correction almost always leads to too 'conservative' a test (too reluctant to reject $H_0,$ thus reduced power), unless sample sizes are very small (i.e, especially below 100). Also, various software implementations have slightly different rounding conventions. (In computing by hand, premature intermediate rounding can result in surprisingly large errors.)

Another similar method is a chi-squared test on a $2\times 2$ contingency table with rows Yes and No and columns Gp1 and Gp2.

I have found the procedure prop.test in R (which is roughly equivalent to a chi-squared test) to be a convenient and reliable method (using parameter cor=F to suppress continuity correction). If you are given the option to estimate variances separately (not to pool), do so. Also, if you are given the option to omit Yates correction, do so for samples of more than 100.

As an example, suppose Gp 1 has 55 Successes in 150, and Gp 2 has 71 Successes in 200. Then 'prop.test' in R gives the following output.

prop.test(c(55,71), c(150,200), cor=F)
    2-sample test for equality of proportions 
    without continuity correction

data: c(55, 71) out of c(150, 200) X-squared = 0.050637, df = 1, p-value = 0.822 alternative hypothesis: two.sided 95 percent confidence interval: -0.09004438 0.11337772 sample estimates: prop 1 prop 2 0.3666667 0.3550000

The $2\times 2$ table for chisq.test in R, gives the following output.

succ = c(55, 71);  tot = c(150,200)
fail = tot - succ
TBL = rbind(succ, fail);  TAB
     [,1] [,2]
succ   55   71
fail   95  129

chisq.test(TBL, cor=F)

    Pearson's Chi-squared test

data: TBL X-squared = 0.050637, df = 1, p-value = 0.822

Notes:

  • The P-value is the same for both procedures is the same. The null hypothesis that the proportions are equal is not rejected at the 5% level

  • The CI in 'prop.test' includes $0.$

  • The prop.test permits one-sided tests (by use of parameters alt="gr" or alt="less"), while the chi-squared test is inherently two-sided.

  • If counts are too small, computer software may warn that the test statistic does not have the expected distribution, so that the displayed P-value may not be trustworthy. In that case, one can use Fisher's Exact Test (fisher.test in R) instead of chisq.test. Another alternative is to use simulation (with parameter 'sim=T`) to simulate a more accurate P-value.

BruceET
  • 51,500
  • Thank you. Would you know for the question (2) ? – Mathieu Krisztian Jan 05 '22 at 20:58
  • Your (2) seems to refer to the pooled variances approach [my (a)]. For the null hypothesis, one assumes population proportions $p_1 = p_2 = p$ and the P-value is computed under the assumption that $H_0$ is true. So it's OK to assume the test statistic is aprx normal for sufficiently large $n_1$ and $n_2.$ Difficulties may arise in making confidence intervals and 'power and sample size' computations. // In practice, there is seldom enough difference btw formulas to make a difference whether to reject $H_0.$ // If one method were clearly superior, I guess the other would have fallen out of use. – BruceET Jan 05 '22 at 21:12
  • sorry, for (2), I mean, it is mathematically feasible to have simultaneously

    $T=\frac{\bar{X}_A-\bar{X}_B}{\sqrt{p(1-p)(1/n_A + 1/n_B)}}\rightarrow N(0,1)$ with with $p=\frac{n_A\bar{X}_A+n_B\bar{X}_B}{n_A+n_B}$ and $\bar{X}_A-\bar{X}_B\rightarrow N(p_A-p_B,p_A(1-p_A)/n_A + p_B(1-p_B)/n_B)$

    – Mathieu Krisztian Jan 05 '22 at 21:17
  • I have already mentioned that the two methods are not exactly the same theoretically. However, for practical purposes, important differences are rate. // If you're looking for slight differences between actual data and assumptions for tests, you will find them almost everywhere. [In the real world there may be no such thing as an exactly normal population. CLT is true, but $\infty$ is elusive.] Here the main issue is usually whether sample sizes are large enough to assume null distributions are close enough normal (or chi-squared) to get useful answers. – BruceET Jan 05 '22 at 21:36
  • Apologizes : you have not understood my question (2). My question (2) is meaning : if we choose for the centred-reduced formula the "p from mixture of pA and pB", it is mathematically possible to write that the noncentred-reduced is not using the "p mixture of pA and pB" ? – Mathieu Krisztian Jan 05 '22 at 21:48