Is it even possible to compare two smples with the data I have?

Question

I'm pretty new to statistics. and recently I learned how to check if the samples are equal. I want to learn how to find out which of the samples is greater. I have some friends in college, that are good at statistics, and they said that's impossible to find it out with the data I have.
For example, I have two drugs ($x_1$, $x_2$), and I have to find out which of them is more effective. Here is the data:

Sample mean:$\bar x_1=5.9$, $\bar x_2=4.1$
Sample standard deviation: $\sigma_1=4.1$, $\sigma_2=3.7$
Sample size: $n_1=42$, $n_2=47$

And I want to test: $$H_0:\mu_1-\mu_2>0$$ $$H_a:\mu_1-\mu_2<0$$

It says that it's used to test that two populations have equal means. Is it possible to find out which mean is bigger using this method? — Ruslan Tolstsikau, Jan 31 '21 at 22:55
yes, of course. It doesn't matter whether you test $\mu_1 - \mu_2 > 0$, $\mu_1 - \mu_2 \geq 0$, or $\mu_1 - \mu_2 = 0$ in your $H_0$. This is a very basic result from testing theory — lmaosome, Jan 31 '21 at 22:58
So as far as I understand I have to calculate my t value using the Welch's t-test, then I have to make a one-tailed test? — Ruslan Tolstsikau, Jan 31 '21 at 23:07

BruceET · Accepted Answer · 2021-02-01T03:01:35.300

I have some quibbles with your notation:

(a) The sample standard deviations should be denoted $S_1, S_2.$ Notations $\sigma_1$ and $\sigma_2$ are for population standard deviations.

(b) It is pointless to test $H_0: \mu_1 - \mu_2 \ge 0$ against $H_1: \mu_1 - \mu_2 < 0,$ because you say $\bar X_1 = 5.9 > \bar X_2 = 4.1.$ So you have no evidence that $\mu_1 - \mu_2 < 0$ might be true.

I will test $H_1: \mu_1 - \mu_2 \le 0$ against $H_1: \mu_1 - \mu_2 > 0.$ You know that the first sample mean is greater than the second. The question whether it is enough greater that the difference would be called 'statistically significant'.

(3) Also, notice that the null hypothesis $H_0$ must always contain an equal sign (maybe as $=, \le$ or $\ge).$

Finally, you have given summary data rather than the actual observations. So I have no way to check whether the data are approximately normally distributed, as is required for the Welch two-sample t test to be valid. So I will simply assume your data are nearly normal.

I am using a Welch two-sample t test instead of a 'pooled' two-sample t test because the two sample standard deviations you provide are different. Thus the population standard deviations may also be different, which would violate an assumption of the 'pooled' test.

Here is printout from a recent release of Minitab statistical software:

Two-Sample T-Test and CI
Sample   N  Mean  StDev  SE Mean
1       42  5.90   4.10     0.63
2       47  4.10   3.70     0.54
Difference = μ (1) - μ (2)
Estimate for difference:  1.800
95% lower bound for difference:  0.417
T-Test of difference = 0 (vs >): 
  T-Value = 2.16  P-Value = 0.017  DF = 83

Because the P-value $0.017$ is smaller than $0.05 = 5\%,$ we reject the null hypothesis and conclude that $\bar X_1 = 5.9$ is significantly larger than $\bar 4.1$ at the 5% level of significance. (You could also declare significance at the 2% level.)

"Proving" that one drug is better than another is a practical situation requires evidence from thousands of randomly selected subjects and significance levels much smaller than 5%.

If required to show computations by hand using a calculator, you can look for the Welch t statistic in a statistics text (or online). Also notice that there is a special formula for the degrees of freedom DF in a Welch t test; it uses sample sizes and sample standard deviations.

That was the answer I was looking for. Thank you for taking your time to write this all! — Ruslan Tolstsikau, Feb 02 '21 at 03:04

Is it even possible to compare two smples with the data I have?

1 Answers1