3

When calculating vaccine effectiveness against hospitalization from publicly available data, I came across a strange (mathematical) problem, which I do not know how to interpret. The problem occurs on real-world data, though for this question I hand-crafted the data to better illustrate the problem.

I calculate the vaccine effectiveness against hospitalization by comparing numbers of hospitalized in the group of positive. Here are my data:

enter image description here

Note that while the effectiveness of the vaccine for the whole population is negative, for individual age ranges it is positive. Mathematically it is clear - young people are often positive but rarely hospitalized, while old people are rarely positive (because they are more vaccinated) however they are hospitalized more often.

Question 1: How to interpret the fact that the effectiveness for the whole is negative, while for parts it is positive?

When I perform the categorization differently than by age range, I can get completely different effectiveness for the parts:

enter image description here

Note that the total numbers are the same as before, only the distribution of hospitalized among positive is different. My impression is that by carefully choosing the category, I can get any results I want.

Question 2: I am sure this phenomenon is well known and studied in statistics. Can you point me to the right topic I can look at?

Question 3: As shown above, the calculated vaccine effectiveness depends substantially (and can give completely different results) on the division into categories chosen. Why is categorization by age considered better (and correct) than categorization e.g. by colour (as in my example)? IMO, it's just a wishful thinking. For instance, how can we be sure that if we sub-divide the age ranges more finely (either by individual years or by another criteria, e.g. type of vaccine, factory where it was made, region where the patients live etc.), the effectiveness won't be negative again?

tomerr
  • 39
  • 5
xarx
  • 131
  • 2
    https://www.wikiwand.com/en/Simpson%27s_paradox – Taladris Oct 31 '21 at 10:24
  • How does that data show that the effectiveness of the vaccine for the whole population is negative? (That -36% comes from where? What's the definition of "effectiveness" here?) – David C. Ullrich Oct 31 '21 at 10:27
  • effectiveness = 1 - (hosp. vacc. per 100000)/(hosp. non-vacc. per 100000). – xarx Oct 31 '21 at 10:39
  • @Taladris Thanks for the excellent link. I edited my question and added Question 3. – xarx Oct 31 '21 at 18:51
  • 1
    Let $p%::(0<p<100)$ of the population be vaccinated, $z%::(0<z<100)$ of the COVID-positive subjects be vaccinated, and $h%::(0\leq h<100)$ of the COVID-hospitalised subjects be vaccinated. I explained in this answer that $$\text{vaccine effectiveness against hospitalisation}\=1-\left(\frac{h}{100-h}\right)\div\left(\frac{p}{100-p}\right).$$ On the other hand, – ryang Feb 12 '22 at 16:07
  • 1
    your supplied data and tables indicates that your formula for "vaccine effectiveness against hospitalisation" is actually $$1-\left(\frac h{100-h}\right)\div\left(\frac z{100-z}\right),$$ which gives an incorrectly smaller value for any vaccine that's effective against disease contraction. (A negative value given by your formula means that if COVID-positive, a vaccinated person is more likely than a non-vaccinated person to become hospitalised.) P.S. I'm merely leaving a comment as my correction is orthogonal to your Question, whose substance pertains to Simpson's paradox. – ryang Feb 12 '22 at 16:10

0 Answers0