1

I apologize for the title. I am not even sure how to phrase this question per se. I feel like this should be easy and yet I am questioning my thinking. Here's the scenario:

I have two groups. Group A has 50 members. Group B has 400 members. What I want to know is what the calculation would be to adjust Group A to have the same "influence" on its other metrics (as if it also had 400 members). So, if Group A had 50 followers and, for their last 1,000 tweets has a mean of 5 likes, what would be the number that I would use to multiply against the mean of 5 likes?

Example: Group A: 50 followers, mean of 5 likes, total tweets 1,000 Group B 400 followers, mean of 20 likes, total tweets 1,000

Since Group B has 8X the number of followers, it would be obvious that they will have more likes (as they are reaching more people). Is it as simple to say that Group A's 5 likes would need to be multiplied by 8 to adjust Group A to be the same as Group B?

Thanks for your assistance!

JimmyK4542
  • 54,331
  • Sounds right if you assume that the like-per-follower rate is constant for different numbers of followers, holding tweet counts constant. – Amaan M Jun 18 '21 at 20:53
  • Good question, but this kind of speculation can be very misleading. Saying 5 likes among 50 followers has very little information,. Saying 20 likes among 400 followers has much more information. // Saying you have 5 likes among 50 is not significantly different from saying you have 20 in 400, because the statement about the group of 50 is so weak. That is, you cannot say proportion $\hat p_1=.10$ (for $n_1=50)$ is significantly different in from $\hat p_2=0.5$ (for $n_2=400.$ // So No, it isn't 'basic math', it's elementary statistics. // Statistical tests in 'Ans' format, if interested. – BruceET Jun 20 '21 at 00:39
  • I agree with the comment of @BruceET. However, as a rough estimate, it seems like simply multiplying by $8$ is the best that you can do. That is, it depends on whether (in general), you believe that there is a linear relationship between size of the group, and the given metric. For example, consider radius of a circle $R$, against the two metrics, perimeter $= 2\pi R$, and area $= \pi R^2.$ There is a linear relationship between perimeter and radius, but a non-linear relationship between area and radius. – user2661923 Jun 20 '21 at 00:44
  • The point is that there is not a linear relationship between proportions involving successes and probabilities as sample sizes change. // If you toss a fair coin 10 times you have more than probability 0.37 of getting 4 or fewer Heads; but in 100 tosses the probability of getting 40 or fewer is less than 0.03. // If the "best you can do" is horribly wrong, then best not to do anything. – BruceET Jun 20 '21 at 01:00
  • 1
    @BruceET: Thank you very much for all the information! I am, by no means, a statistician and I had a feeling that multiplying by 8 to "even out" the two samples seemed questionable.
    That being said, is there a statistical "term" for this? In other words, what type of question would this be considered in statistics? I would like to read more on the subject and get more familiar with it. Lastly, as I am a Python person, do you have any familiarity with how one would code this in Python as opposed to R? Thanks again!!!
    – bossjimmark Jun 20 '21 at 19:02
  • I think there are too many possible issues preventing a cogent general rule of the kind you are looking for. I Think you'd have to compute probabilities for each situation of interest. // Specific computations paralleling those in my first Comment should be about as easy in Python as in R, but you should get specific help from Python documentation or frequent Python users. – BruceET Jun 20 '21 at 19:44

1 Answers1

0

Comment continued with results from test procedures in R:

prop.test(c(5,20),c(50,400), cor=F)
    2-sample test for equality of proportions 
    without continuity correction

data: c(5, 20) out of c(50, 400) X-squared = 2.1176, df = 1, p-value = 0.1456 alternative hypothesis: two.sided 95 percent confidence interval: -0.03585336 0.13585336 sample estimates: prop 1 prop 2 0.10 0.05

Warning message: In prop.test(c(5, 20), c(50, 400), cor = F) : Chi-squared approximation may be incorrect

Notice the warning message, triggered by the small counts in the first group. Because the P-value is so far above 5%, rejecting $H_0$ seems out of reach. But we don't need to speculate because Fisher's exact test gives a reliable P-value that is even larger.

fisher.test(TBL)$p.val
[1] 0.1931484
BruceET
  • 51,500