Some have stated that kurtosis is the "movement of probability mass from the “shoulders” of a distribution into its center and tails" where "center" is defined as the range between $\mu \pm \sigma$. I was trying to find a proof for this but was unsuccessful. Can anyone point me to a proof of whether (i) larger kurtosis implies greater probability in the $\mu \pm \sigma$ range, or (ii) greater probability in the $\mu \pm \sigma$ range implies larger kurtosis? Does anyone have a proof of either of these conjectures?
1 Answers
Can anyone point me to a proof of whether (i) larger kurtosis implies greater probability in the μ±σ range, or (ii) greater probability in the μ±σ range implies larger kurtosis?
These are easily disproved by simple counterexamples. Let $P^*$ denote the "probability in the μ±σ range", i.e. $P^* =P[\mu-\sigma\le X\le\mu+\sigma]$, and let $\kappa$ denote the kurtosis.
Consider the following three symmetric distributions, each having $\mu=0$ and $P^* = P\left[X\in \{-1,1\}\right]$ (computations done in Sage): $$\begin{align}A:\quad P&=[.25,.25,.25,.25] \text{ on } X=[−2,−1,1,2]\implies\ P^*= 0.5;\ \kappa=1.36\\ B:\quad P&=[.30, .20, .20, .30] \text{ on } X=[−9,−1,1,9]\implies\ P^*= 0.4;\ \kappa=1.64\\ C:\quad P&=[.20,.30,.30,.20] \text{ on } X=[−2,−1,1,2]\implies\ P^*= 0.6;\ \kappa=1.45. \end{align}$$
Comparing A and B shows that it's neither the case that "(i) larger kurtosis implies greater probability in the μ±σ range", nor that "(ii) greater probability in the μ±σ range implies larger kurtosis". On the other hand, comparing A and C shows that it's sometimes the case that one increases or decreases when the other does likewise.
Your question seems to be referring to an interpretation that the Wikipedia article attributes to J.J.A. Moors [1986], The Meaning of Kurtosis: Darlington Reexamined. I haven't read that paper, but virtually the same interpretation appeared almost forty years earlier in L. Guttman [1948], An Inequality for Kurtosis:
In this paper is developed a general inequality which describes the piling up of frequency around these two points for the case where the fourth moment exceeds the square of the variance. In a sense, it is shown how "U-shaped" a distribution must be according to its second and fourth moments.
[...]
In general, the smaller $\kappa-1$ is, the greater the probability that $\frac{X-\mu}{\sigma}$ is in a small interval around $+1$ or $-1$.
where I've substituted now-common symbols for Guttman's.
That might seem to suggest a general inverse relation between kurtosis and concentration near the two "shoulder points"; however, the statement is actually proved to hold only when the kurtosis ($\kappa$) is already sufficiently near (but not equal) to its minimum value of $1$. The proof goes as follows ...
If $X$ is a random variable with finite fourth moment and nonzero variance, say with mean $\mu$ and variance $\sigma^2$, then the kurtosis ($\kappa$) of the distribution can be expressed in terms of the standardized random variable $Z=\frac{X-\mu}{\sigma}$ as follows: $$\kappa=E[Z^4] = V[Z^2] + (E[Z^2])^2 = V[Z^2] + 1,\tag{1} $$ where we've used the fact that $Z$ must have mean $0$ and variance $1$ by virtue of the standardization; i.e., $$E[Z^2]=V[Z]+(E[Z])^2=V\left[\frac{X-\mu}{\sigma}\right]+\left(E\left[\frac{X-\mu}{\sigma}\right]\right)^2 = 1 + 0^2.$$
Note that $1\le \kappa\lt \infty,$ since $V[Z^2]\ge 0.$
By Chebyshev's inequality, for any $\epsilon>0$, $$P\left(\mid Z^2 - 1\mid \le \epsilon\right)>1- \frac{V[Z^2]}{\epsilon^2}=1-\frac{\kappa-1}{\epsilon^2}\tag{2} $$ so if $\kappa\to 1$, then for all $\epsilon>0$, $P\left(\mid Z^2 - 1\mid \le \epsilon\right)\to 1$.
Now, for any $0<\epsilon<1$, we have $$P\big{(}\mid Z^2 - 1\mid \le \epsilon\big{)}=P\left(\sqrt{1-\epsilon}\le |Z|\le \sqrt{1+\epsilon}\,\right)\\ =P\left(\{-\sqrt{1+\epsilon}\le Z\le -\sqrt{1-\epsilon}\,\} \cup \{\sqrt{1-\epsilon}\le Z\le \sqrt{1+\epsilon}\,\}\right)\\ =P\left(\underbrace{\{\mu-\sigma\sqrt{1+\epsilon} \le X\le \mu-\sigma\sqrt{1-\epsilon}\,\}}_{\text{an }\epsilon\text{-neighborhood of }\mu-\sigma} \cup \underbrace{\{\mu+\sigma\sqrt{1-\epsilon} \le X\le \mu+\sigma\sqrt{1+\epsilon}\,\}}_{\text{an }\epsilon\text{-neighborhood of }\mu+\sigma} \right).$$
Therefore, (2) implies that when $\kappa-1<\epsilon^2<1$ (i.e. when the kurtosis is already sufficiently small), it is the case that as $\kappa$ decreases, the distribution of $Z$ increasingly concentrates in small intervals around $\pm1,$ and the distribution of $X$ increasingly concentrates in small intervals around $\mu\pm\sigma$ (implying also that when $\kappa=1$, $Z\stackrel{\text{a.s.}}{=}\pm1$ and $X\stackrel{\text{a.s.}}{=}\mu\pm\sigma.$)
NB: A "very large" kurtosis (in the sense of $\kappa\gg 1$) is driven by the occurrence of $|Z|>1$ (i.e., $|X-\mu|>\sigma$, which is to say, "$X$ outside the central $\mu\pm\sigma$ range"):
If $\kappa\gg 1 $, then (because $z^4\le 1$ when $|z|\le 1$ and $z^4\gt 1$ when $|z|\gt 1$) $$\begin{align}\kappa\ \ &=\ \ E[Z^4]\ \ =\ \ \underbrace{E[Z^4\,\big{|}\, |Z|>1]}_{\gg 1}\cdot \underbrace{P[|Z|>1]}_{<1}\ \ +\ \ \underbrace{E[Z^4\,\big{|}\, |Z|\le 1]}_{<1}\cdot \underbrace{P[|Z|\le 1]}_{<1}\\ \\ \ \ &\approx\ \ E[Z^4\,\big{|}\, |Z|>1]\cdot P[|Z|>1]. \end{align}$$ The only way to have $\kappa\gg 1$ is to have $E[Z^4\,\big{|}\, |Z|>1]\gg 1$, and vice versa, as all the other terms are less than $1$.
- 14,371
-
Thanks, I got the connection for small kurtosis. I was just wondering if there was an connection for large kurtosis. On the Wikipedia page, a definition is given that suggests larger kurtosis implies greater movement of mass into the center and tails. – BigBendRegion Nov 25 '17 at 13:35
-
@PeterWestfall - Your question as-posted merely concerns one kurtosis value being larger than another (i.e. in a comparative sense), rather than being "large" in an absolute sense (e.g. "much larger than $1$, say). How the kurtosis changes in the comparative sense may well depend on whether it's initially small or large in an absolute sense, as my answer suggests. – r.e.s. Nov 27 '17 at 03:24
-
Actually, I was just looking for a mathematical statement of fact. Larger kurtosis implies what about probability concentration in the center? Some sort of one-sided Chebychev inequality would help, but the inequalities all go in the wrong direction. The answer to my question is, apparently, "in general, large kurtosis implies nothing about probability concentration near the peak." Thanks for helping to confirm that. – BigBendRegion Dec 07 '17 at 00:47
-
Also, it is more interesting to know how differences in kurtosis inform differences in distributions, than how differences in distributions inform differences in kurtosis. For example, if the kurtosis in sample A is 2 more than the kurtosis in sample B, does that convey any useful information about the distribution A versus B? Sadly, most of the literature has it backwards - they show that if distributions A and B have such and such qualities, then one kurtosis is higher than the other. But people revert the logic and say higher kurtosis supports the qualities of A vs B. It's faulty logic. – BigBendRegion Dec 21 '17 at 23:43
-
Also, r.e.s., your addendum about expectations involving indicator functions seems to be a repeat of theorems I proved in my TAS article. I hope those theorems are indeed mine to claim. Do you have a cite that shows otherwise? – BigBendRegion Jan 10 '18 at 01:40
-
@PeterWestfall - My apologies -- I wrote that part after reading somewhere online the gist of such an argument, not knowing that it may have been your own! – r.e.s. Jan 10 '18 at 07:13