0

Most of the sources (like 1, 2 3 and in some books like arihant) provide the definition for Statistical dispersion as " The extent to which numerical data is spread about the average value is called the dispersion of the values"

But measures of dispersions includes range, Mean absolute deviation, standard deviation and variance.

Question: How can range be a measure of dispersion, if dispersion is defined as the extent to which numerical data is spread about the average value, since range does not tell about the spread of the data around the average value it only tells about the overall spread of the data?

Is the definition is wrong or my interpretation is wrong?

  • 2
    The notion of "spread" is informal. The range is one reasonable way to quantify the spread, in the sense that a larger range indicates that the data is more "spread out". That's all there is to it. – Karl Jan 09 '24 at 17:42
  • is the definition of dispersion of values is correct? – Altair25 Jan 09 '24 at 17:47
  • Sure, but that's not a formal mathematical definition, it's just a linguistic definition. Is there a precise difference between "spread about the average" and "overall spread"? These are just informal spatial ideas. – Karl Jan 09 '24 at 17:53

1 Answers1

1

The range does measure the spread of the data about an average. However, that average is not the mean, or even the median or mode. Instead, we look at the midrange, which is the arithmetic mean of the minimum and maximum values in the data or distribution. In other words, we define:

  • $x_{\min} = \min(x_i)$

  • $x_{\max} = \max(x_i)$

  • $\tilde{x}_{MR} = \frac{1}{2}(x_\min + x_\max)$

Then the range of $x$ is exactly twice the distance from the midrange to either the maximum or the minimum of the data.

This doesn't come out of nowhere, either. We can define measures of spread from an arbitrary point $c$ by use of various norms - for example, the standard deviation about $c$ is given by $||\mathbf{x} - \mathbf{c}||_2 = \sqrt{\sum_{i} (x_i - c)^2}$, and the value of $c$ that minimises this spread is the mean. Similarly, the average absolute deviation is given by $||\mathbf{x} - \mathbf{c}||_1 = \sum_{i} |x_i - c|$, and setting $c$ to the median of $x$ minimises this value. If we instead look at the maximum deviation, i.e. $||\mathbf{x} - \mathbf{c}||_\infty = \max(|x_i - c|)$, then which value of $c$ minimises it? Of course, it's the midrange.

ConMan
  • 24,300