Could a version of probability theory be made rigorous with only calculus?

Question

I am wondering if one could, or if there has been built a version of probability theory that could exist rigorous on its own without real analysis and measure theory?

The motivation for this quesiton is that in almost every introductory text in mathematical statistics I've seen, they only rely on calculus as a prerequisite. Could one only rely on calculus and multivariate calculus to do this rigorous? In introductory texts random variables are described as random functions, but their characteristics is their pmf or pdf, would this be enough?

So if we only restrict ourselves to what is called discrete or continuous random variables(or vectors) in introductory statistics text, is calculus and multivariate calculus enough to get the theory to stand on its own? or will we need measure theory?

If it is the case that this is impossible to do without measure theory, does this mean that most introductory texts in statistics are wrong? Since they base everything on Riemann-integrals, but the theory is wrong?

UPDATE: From the discussion in the comments we see that by the definition of a probability space we need it to be a measure, and we also need the events to be a sigma-algebra. Hence we need these two concepts from measure theory. So I should have asked the question like this instead: Will the theory of continuous random variables in introductory texts work out if we only allow Riemann-integration and not Lebesgue-integration? Is the integration-theory in introductory texts in calculus which only deals with Riemann-integration enough to make sure that the theory is rigorous?

I suspect that one would be limited to obtaining a probability theory of Riemann-integrable domains (in essence, a function with integral over reals=1 that is strictly positive can be used to define a probability measure over riemann integrable domains). — Justin Benfield, Aug 11 '17 at 18:53
What is a probability space without a measure? For dealing with discrete random variables combinatorial manipulations are enough, but there are severe issues in defining continuous random variables without a flexible and general concept of measure. — Jack D'Aurizio, Aug 11 '17 at 19:24
@JackD'Aurizio I agree, and from the definition that is given in elementary statistics books about probability we have that $P$ must be a measure, and also even though it is not stated the events should be a sigma-algebra in order to satisfy the axioms of probability. So these things are likely needed. But from there, do we need more, or is Riemann-integration enough?(if we ofcourse assume that the pdfs are Riemann-integralble). — user119615, Aug 11 '17 at 22:44
You mention single variable and multivariable calculus. Do you also include the calculus of variations as part of your "calculus" definition? — Larry B., Aug 30 '17 at 23:33
I suspect that most people who study statistics are more interested in the practical aspects of statistics than in mathematical rigor. — David K, Aug 30 '17 at 23:43
In pure mathematics, there isn't a formal distinction between "calculus" and "real analysis". This question is therefore unclear. — MathematicsStudent1122, Aug 30 '17 at 23:45
@LarryB. I don't see the relevance of calculus of variations here, since I am talking about this in regards to introductory texts in statistics. — user119615, Aug 31 '17 at 04:59
@DavidK I agree, however I want to know if the books could be read rigoursly if one wanted. — user119615, Aug 31 '17 at 05:00
@MathematicsStudent1122 There is not a formal definition but I suspect that in most universities what is taught in the earlier courses in calculus, and what are taught later in real analysis are somewhat similar. I have made it a little clearer now, if Riemann-integration is enough, or does one need the theory of Lebesgue-integration also. — user119615, Aug 31 '17 at 05:03
Similar to https://stats.stackexchange.com/questions/199280/why-do-we-need-sigma-algebras-to-define-probability-spaces — Dap, Aug 31 '17 at 06:16
@Justin Benfield I agree with Jack D'Aurizio. The tool for finding probability and probability magnitudes should be deterministically measurable... somewhat like a machine that produces grease need not itself be all greasy. Gaussian Bell curve comes out of a deterministic differential relation $ \dfrac{dy}{dx}= - xy /\sigma^2$. — Narasimham, Aug 31 '17 at 07:11
Pushing a little further in the direction of "do we really need Lebesgue integration," do we even need Riemann integration? There's a question that asks whether the Cauchy integral is enough. — David K, Aug 31 '17 at 11:38
One place where many introductory texts in probability and statistics tend to go "trust me" that requires more sophistication than a typical calculus student would know to justify is differentiating under the integral sign (e.g. in showing that if $M_X(t)$ is the moment generating function of $X$, then $M'(0)=E(X)$ ). — Kevin P. Costello, Aug 31 '17 at 20:47

goblin GONE · Answer 1 · 2017-08-31T11:58:09.280

Measure theory is nice because it gives us a one-size fits all approach to probability theory. For example, if $D$ is a probability distribution on the real line, then the expectation of $D$ is, by definition, $$\mathbf{E}(D) = \int_D (x \in \mathbb{R} \mapsto x),$$ where the right hand side means the Lebesgue integral of the function $\mathbb{R} \rightarrow \mathbb{R}$ given by $x \mapsto x$ with respect to measure $D$. This is normally denoted $\int_{x \in \mathbb{R}}xD(dx),$ which looks a bit strange to my eyes. Anyway...

We recover the various formulae that are actually used to perform computations of the expectation as special cases of the above formula. For example, if $D$ is the discrete distribution corresponding to a probability mass function $p$, then the above formula simplifies to $$\mathbf{E}(D) = \sum_{x \in \mathbb{R}} xp(x).$$ On the other hand, if $D$ is the continuous distribution corresponding to a probability density function $f$, then it simplifies to: $$\mathbf{E}(D) = \int_{x \in \mathbb{R}} xf(x).$$

Of course, $D$ might have a discrete and a continuous part, in which case we have: $$\mathbf{E}(D) = \sum_{x \in \mathbb{R}} xp(x)+\int_{x \in \mathbb{R}} xf(x).$$

If that's all you care about, then in principle you can just write the above formula as a definition and be done with it. However, in my opinion, it's much more satisfying to derive the above formulae. In particular, writing $H_n$ for the $n$-dimensional Hausdorff measure on the real ine, it turns out that the discrete case is $D = p \cdot H_0$ and the continuous case is $D = f \cdot H_1$. The mixed case is $D = p\cdot H_0 + f \cdot H_1$. Therefore, we can derive the above formula as follows:

$$\mathbf{E}(p \cdot H_0+f\cdot H_1) = \int_{p \cdot H_0+f\cdot H_1} x = \int_{p \cdot H_0} x + \int_{f \cdot H_1} x$$

$$= \int_{H_0} xp(x) + \int_{H_1} xf(x) = \sum_{x \in \mathbb{R}} xp(x)+\int_{x \in \mathbb{R}} xf(x)$$

Once you've seen this kind of reasoning, it becomes obvious how to generalize; just involve summands involving $H_n$ for $0 < n < 1$. So the measure theoretic approach is really much more general, and imo much more satisfying.

And that's just on the real line. Imagine you're working in real $3$-space. The measure $H_0$ lets you describe the probabilistic analogue of "point charges", and the measure $H_3$ lets you describe the probabilistic analogue of "charge densities." Okay, but what if you want your outcomes to be randomly distributed along a wire coiling through space? In the context, $H_1$ comes the to the rescue. So even if you're not interested in those weird distributions arising from $H_n$ via non-integral values of $n$, nonetheless the added generality is still pretty useful.

By the way, you can't get too far in calculus without the Dirac delta function, which can be elegantly thought of as a measure, or better yet, a distribution. Indeed, distributions were invented because at some point while doing calculus, you realize that functions aren't general enough to do what you need them to do. This eventually leads to the theory of $k$-currents, which, among other things, provides an elegant reinterpretation of the fundamental theorem of calculus. In short, you'll end up doing measure theory and distribution theory one way or another.

H. H. Rugh · Answer 2 · 2017-09-02T07:40:22.823

It may be convenient to distinguish between probability theory and (practical) statistical analysis. In probability theory I don't think it is reasonable to try to avoid measure theory. The whole subject just waited for the emergence of measure theory to get a real foundation. The central limit theorem (CLT) somehow pops out most easily when using Lebesgue integration and Fourier transforms. This being said, as far as I know, Gauss did "prove" the central limit theorem using "reasonable" approximations and standard Riemann integration.

Now, coming to practical statistics, (IMHO) this mostly has to do with discrete observations where some more or less justifiable application of the CLT gives rise to e.g. chi-square tests, maximum likelihood estimators, etc... and this usually only involve algebra and differential calculus. I am probably doing an over-reduction here, but even a course on statistics I once took was mostly about putting problems into a frame work where orthogonal projections in Hilbert spaces gave the relevant answers. Some statistical examples to illustrate the point:

Example 1: Throw a dice $N=18000$ times and observe the number $X_1$ of 1's,$X_2$ of 2's etc. The hypothesis to be tested is equi-distribution. The observations are random variables and appealing to the CLT each is roughly normal distributed with mean $Np=3000$ ($p=1/6$) and standard deviation $\sigma=\sqrt{Np(1-p)}=50$. So normalizing differently, $$Z_1=\frac{X_1-Np}{\sigma}, ... ,Z_6 = \frac{X_6-Np}{\sigma} $$ should each be normal distributed ${\cal N}(0,1)$. The chi-square value: $S= \sum_{i=1}^6 Z_i^2$ is then $\chi^2(5)$ distributed, i.e. has the same distribution as the square sum of 5 standard normal variables. There is a catch here, because although there are 6 variables they are linearly related by $\sum_i Z_1=0$ and the observation is in fact an orthogonal projection of 6 gaussian standard variables onto the 5 dimension plane $\sum_i z_i=0$. This needs some finite dimensional Hilbert space arguments but no Lebesgue integration theory which has been hidden under the carpet by the CLT. Calculating the distribution function for a $\chi^2(5)$ random variable is done by Riemann integration and tables of the partition function is then used to do tests at various confidence levels. Given the tables the student doesn't even have to worry about how to calculate the partition functions, only to understand how the number 5 came about.

Example 2: Given $N$ observations $(x_i,y_i)$, $i=1,...,N$ find out if they are on a line $y=ax+b$. Mostly, one assumes $x_i$ to be exact values but that $y_i$ is a normal random variable with some variation $\sigma_i^2>0$ (for non-normal distributions, the problem is usually an unsolvable mess). A square distance is constructed $$S=\sum_i \frac{(y_i-a x_i-b)^2}{\sigma_i^2}$$ and estimators for $a$ and $b$ are found by finding the least square, e.g. setting derivatives w.r.t $a$ and $b$ to zero. Alternatively (assuming for simplicity $\sigma_i$ independent of $i$) by modeling in a Hilbert space of dimension $N$: Write $X=(x_1,...,x_N)$ and $e=(1,...,1)$. You then find the orthogonal projection of $Y=(y_1,...,y_N)$ onto the 2 dimensional plane spanned by $X$ and $e$. Then again $S$ is $\chi^2(N-2)$ distributed (the -2 because of the 2 dimensional projection). You may again do various statistical test by projecting to e.g. the subspace spanned by $X$ (to test if $b=0$) etc...

There are many more examples of this type, e,g, looking for correlations/independence of two types of observations, again leading to chi-square tests under a CLT assumption. I am not a specialist in statistics but when teaching how to use it in practice I doubt that measure theory is primordial.

score 1 · Answer 3 · answered Sep 03 '17 at 02:05

This might be of interest:

The Riemann integral is the definite integral normally encountered in calculus texts and used by physicists and engineers. Other types of integrals exist (e.g., the Lebesgue integral), but are unlikely to be encountered outside the confines of advanced mathematics texts. In fact, according to Jeffreys and Jeffreys (1988, p. 29), "it appears that cases where these methods [i.e., generalizations of the Riemann integral] are applicable and Riemann's [definition of the integral] is not are too rare in physics to repay the extra difficulty."

See: Weisstein, Eric W. "Riemann Integral." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/RiemannIntegral.html

See also Reference request for complete probability text without measure theory

Looking at those references, you can keep busy absorbing a great deal of material, and I don't see how it wouldn't 'work out' if you like the subject.

Depending on the trajectory of your interests and career, you can decide when the time is right, if ever, to study measure theory. I would think that if you are not planning to get deeply involved with functional analysis, it might be enough to just remember that

Infinitely large sample spaces
In an elementary approach to probability, any subset of the sample space is usually called an event. However, this gives rise to problems when the sample space is continuous, so that a more precise definition of an event is necessary. Under this definition only measurable subsets of the sample space, constituting a σ-algebra over the sample space itself, are considered events.

Finally, this might be helpful

Any elementary-level text or notes on (i) measure theory? (ii) Lebesgue integration?

score 1 · Answer 4 · answered Sep 05 '17 at 16:44

Elementary probability theory can be formulated in terms of Kurzweil-Henstock-Stieltijes integral as treated by Muldowney

http://eu.wiley.com/WileyCDA/WileyTitle/productCd-111816640X.html

Then there is also Edward Nelson's approach of Radically Elementary Probability theory

https://web.math.princeton.edu/~nelson/books/rept.pdf

Could a version of probability theory be made rigorous with only calculus?

4 Answers4