Measure theory is nice because it gives us a one-size fits all approach to probability theory. For example, if $D$ is a probability distribution on the real line, then the expectation of $D$ is, by definition, $$\mathbf{E}(D) = \int_D (x \in \mathbb{R} \mapsto x),$$ where the right hand side means the Lebesgue integral of the function $\mathbb{R} \rightarrow \mathbb{R}$ given by $x \mapsto x$ with respect to measure $D$. This is normally denoted $\int_{x \in \mathbb{R}}xD(dx),$ which looks a bit strange to my eyes. Anyway...
We recover the various formulae that are actually used to perform computations of the expectation as special cases of the above formula. For example, if $D$ is the discrete distribution corresponding to a probability mass function $p$, then the above formula simplifies to $$\mathbf{E}(D) = \sum_{x \in \mathbb{R}} xp(x).$$ On the other hand, if $D$ is the continuous distribution corresponding to a probability density function $f$, then it simplifies to: $$\mathbf{E}(D) = \int_{x \in \mathbb{R}} xf(x).$$
Of course, $D$ might have a discrete and a continuous part, in which case we have: $$\mathbf{E}(D) = \sum_{x \in \mathbb{R}} xp(x)+\int_{x \in \mathbb{R}} xf(x).$$
If that's all you care about, then in principle you can just write the above formula as a definition and be done with it. However, in my opinion, it's much more satisfying to derive the above formulae. In particular, writing $H_n$ for the $n$-dimensional Hausdorff measure on the real ine, it turns out that the discrete case is $D = p \cdot H_0$ and the continuous case is $D = f \cdot H_1$. The mixed case is $D = p\cdot H_0 + f \cdot H_1$. Therefore, we can derive the above formula as follows:
$$\mathbf{E}(p \cdot H_0+f\cdot H_1) = \int_{p \cdot H_0+f\cdot H_1} x = \int_{p \cdot H_0} x + \int_{f \cdot H_1} x$$
$$= \int_{H_0} xp(x) + \int_{H_1} xf(x) = \sum_{x \in \mathbb{R}} xp(x)+\int_{x \in \mathbb{R}} xf(x)$$
Once you've seen this kind of reasoning, it becomes obvious how to generalize; just involve summands involving $H_n$ for $0 < n < 1$. So the measure theoretic approach is really much more general, and imo much more satisfying.
And that's just on the real line. Imagine you're working in real $3$-space. The measure $H_0$ lets you describe the probabilistic analogue of "point charges", and the measure $H_3$ lets you describe the probabilistic analogue of "charge densities." Okay, but what if you want your outcomes to be randomly distributed along a wire coiling through space? In the context, $H_1$ comes the to the rescue. So even if you're not interested in those weird distributions arising from $H_n$ via non-integral values of $n$, nonetheless the added generality is still pretty useful.
By the way, you can't get too far in calculus without the Dirac delta function, which can be elegantly thought of as a measure, or better yet, a distribution. Indeed, distributions were invented because at some point while doing calculus, you realize that functions aren't general enough to do what you need them to do. This eventually leads to the theory of $k$-currents, which, among other things, provides an elegant reinterpretation of the fundamental theorem of calculus. In short, you'll end up doing measure theory and distribution theory one way or another.