From the first instrument you have crushing
pressures $X_1, X_2, \dots, X_{20}.$
Similarly, from the second instrument you
have $Y_1, Y_2, \dots, Y_{20}.$
Perhaps you assume that the $X_i$ are
random samples from a normal distribution
$\mathsf{Norm}(\mu_x, \sigma_x),$ where
the population mean is $\mu_x$ is estimated
by $\bar X \frac 1 n \sum_{i=1}^n X_i$
and the population standard deviation is $\sigma_s$
is estimated by
$S_x = \sqrt{\frac{1}{n-1}\sum_{i=i}^n (X_i - \bar X)^2},$ where $n = 20.$ Also, similarly for the observations $Y_i.$
Formally, you want to test whether the null hypothesis
$H_0: \mu_x = \mu_y$ is true or whether the alternative
$H_a: \mu_x \ne\mu_y$ is true.
Welch two-sample t test: I assume you have no way to know whether $\sigma_a = \sigma_y.$ If so, you need to do a Welch two-sample t test,
which does not assume population variances (or standard deviations) are equal.
Formulas for finding the test statistic $T$ and for
Accepting $H_0$ or believing the alternative hypothesis
$H_a$ are given in many elementary statistics text books
and online forums. Also, software programs are available
to perform the Welch t test. One of these is R statistical
software.
Suppose you have the (fictitious) data sampled and summarized below in R.
set.seed(2022) # for reproducibility
x = rnorm(20, 500, 50)
y = rnorm(20, 540, 52)
summary(x); length(x); sd(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
355.0 452.1 493.4 486.1 519.8 551.0
[1] 20 # sample size of x
[1] 50.01483 # sample standard deviation of x
summary(y); length(y); sd(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
458.9 526.2 547.8 547.1 576.2 603.0
[1] 20
[1] 38.80676
hdr = "Stripchart for Samples 'x' (bottom and 'y'"
stripchart(list(x,y), ylim=c(.5,2.5), pch="|", main=hdr)

The sample mean $\bar X = 486.1$ is smaller than
the sample mean $\bar Y = 547.1.$ And the stripchart
shows x-values generally smaller then y-values.
The question is whether this difference between
instruments is statistically significant at the 5% level.
(That is, about 5 in 100 such tests would mistakenly
indicate a difference, when there is none.)
The
P-value of the Welch test is nearly $0.$ There is almost
no chance of such a large difference in sample means
(taking variability and sample size into accounrt)
if the two instruments were really giving the equivalent
results.
t.test(x,y)
Welch Two Sample t-test
data: x and y
t = -4.3092, df = 35.791, p-value = 0.0001222
alternative hypothesis:
true difference in means is not equal to 0
95 percent confidence interval:
-89.71150 -32.28332
sample estimates:
mean of x mean of y
486.0894 547.0868
Notes: (1) Of course the decision to reject $H_0$ above
depends on the fictitious data I used for my example.
Your real data might lead to a different conclusion.
(2) If previous experience has shown that the
two devices have the same variability in results, you
might do a pooled two-sample t test instead.
(3) If the previous experience or a look at the current
data cast doubt on whether the devices give normally
distributed results, then you might do a nonparametric
Wilcoxon rank sum test. (But you'd have to look at the
specific requirements for doing that test.)
(4) There are some links to other Q&A's in the margin
of this page under the heading 'Related'. I did not find
direct answers to your question there. However, you
may learn something by looking at them. A couple of
the answers are mine.
These links discuss tests of normality (maybe useful), a one-sample t test (not useful), and a Welch t test (different setting, but maybe useful). This additional link shows that the Mann-Whitney test is equivalent to the Wilcoxon rank sum test, and has a link to the latter.]