1

This is my first time posting here. So I am a Chemical Engineer working on quantifying the strength of my materials. Basically I synthesise beads and press the beads using a piston and see what load they are breaking.So for each sample I'll repeat the test 20 times, and note down the load for each run. I'll calculate mean and the standard deviation for each sample.

My question is what statistical measure I use for analysing my data properly? I do the same test using two different instruments, I need to compare the data between these two machines. I need to quantify the following things for each sample in each instrument -

  1. Repeatibility
  2. Accuracy

My aim is to use this statictical measure to screen the samples, right now I was using S.D/mean percentage, and if the S.D/mean percentage was coming above 20 percent I would ditch that sample, saying there is too much variation in strength of individual beads. Is there any other measure that I can use that will be more effective for my task?

Sorry if my question sounds too vague, I don't have much working knowledge of statisics other than the basic highschool level knowledge.

wulf
  • 11

1 Answers1

1

From the first instrument you have crushing pressures $X_1, X_2, \dots, X_{20}.$ Similarly, from the second instrument you have $Y_1, Y_2, \dots, Y_{20}.$

Perhaps you assume that the $X_i$ are random samples from a normal distribution $\mathsf{Norm}(\mu_x, \sigma_x),$ where the population mean is $\mu_x$ is estimated by $\bar X \frac 1 n \sum_{i=1}^n X_i$ and the population standard deviation is $\sigma_s$ is estimated by $S_x = \sqrt{\frac{1}{n-1}\sum_{i=i}^n (X_i - \bar X)^2},$ where $n = 20.$ Also, similarly for the observations $Y_i.$

Formally, you want to test whether the null hypothesis $H_0: \mu_x = \mu_y$ is true or whether the alternative $H_a: \mu_x \ne\mu_y$ is true.

Welch two-sample t test: I assume you have no way to know whether $\sigma_a = \sigma_y.$ If so, you need to do a Welch two-sample t test, which does not assume population variances (or standard deviations) are equal.

Formulas for finding the test statistic $T$ and for Accepting $H_0$ or believing the alternative hypothesis $H_a$ are given in many elementary statistics text books and online forums. Also, software programs are available to perform the Welch t test. One of these is R statistical software.

Suppose you have the (fictitious) data sampled and summarized below in R.

set.seed(2022)  # for reproducibility
x = rnorm(20, 500, 50)
y = rnorm(20, 540, 52)

summary(x); length(x); sd(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 355.0 452.1 493.4 486.1 519.8 551.0 [1] 20 # sample size of x [1] 50.01483 # sample standard deviation of x

summary(y); length(y); sd(y) Min. 1st Qu. Median Mean 3rd Qu. Max. 458.9 526.2 547.8 547.1 576.2 603.0 [1] 20 [1] 38.80676

hdr = "Stripchart for Samples 'x' (bottom and 'y'" stripchart(list(x,y), ylim=c(.5,2.5), pch="|", main=hdr)

enter image description here

The sample mean $\bar X = 486.1$ is smaller than the sample mean $\bar Y = 547.1.$ And the stripchart shows x-values generally smaller then y-values. The question is whether this difference between instruments is statistically significant at the 5% level. (That is, about 5 in 100 such tests would mistakenly indicate a difference, when there is none.)

The P-value of the Welch test is nearly $0.$ There is almost no chance of such a large difference in sample means (taking variability and sample size into accounrt) if the two instruments were really giving the equivalent results.

t.test(x,y)
    Welch Two Sample t-test

data: x and y t = -4.3092, df = 35.791, p-value = 0.0001222 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -89.71150 -32.28332 sample estimates: mean of x mean of y 486.0894 547.0868

Notes: (1) Of course the decision to reject $H_0$ above depends on the fictitious data I used for my example. Your real data might lead to a different conclusion.

(2) If previous experience has shown that the two devices have the same variability in results, you might do a pooled two-sample t test instead.

(3) If the previous experience or a look at the current data cast doubt on whether the devices give normally distributed results, then you might do a nonparametric Wilcoxon rank sum test. (But you'd have to look at the specific requirements for doing that test.)

(4) There are some links to other Q&A's in the margin of this page under the heading 'Related'. I did not find direct answers to your question there. However, you may learn something by looking at them. A couple of the answers are mine.

These links discuss tests of normality (maybe useful), a one-sample t test (not useful), and a Welch t test (different setting, but maybe useful). This additional link shows that the Mann-Whitney test is equivalent to the Wilcoxon rank sum test, and has a link to the latter.]

BruceET
  • 51,500