0

I have a set of n numbers that I need to review and come up with the closet average. The set of numbers may or may not have a high standard deviation. Below is an example...

Set of numbers..

  • $0.6618

  • $0.6509

  • $0.6835

  • $0.9561

  • $15.4250 (should not be averaged, out of bounds)

  • $15.4400 (should not be averaged, out of bounds)

  • $4.7500 (should not be averaged, out of bounds)

  • $0.5948

  • $0.6485

  • $0.6856

A simple average of these numbers is \$4.0496
however my needs require me to remove the values that are way out of bounds.
Ideally my average would be around $0.6973

mez
  • 10,497

1 Answers1

2

One solution is to use the median instead, which is resistant to outliers. For your data, the median is $0.6846.

  • 1
    And the theory behind it is assuming the error distribution to be normal, which is symmetric. – mez Jan 25 '13 at 12:50
  • @mezhang: I did not know that. Do you have a reference? It's usually the arithmetic mean that turns up when you assume normally distributed errors. –  Jan 25 '13 at 12:53
  • Normal distribution is a common model for the error of physical measurements. http://en.wikipedia.org/wiki/Normal_distribution – mez Jan 25 '13 at 12:56
  • @mezhang: That's not an explanation for "the theory behind [the median] is assuming the error distribution to be normal". If you assume the error distribution to be normal, you get the mean, not the median, as the maximum-likelihood estimate. I don't know of a theory that starts from normally distributed errors and arrives at the median, which is what your original comment seems to be implying. –  Jan 25 '13 at 13:05
  • try to search central limit theorem for median. I have not found good reference yet, but this book only mentioned that it is more complicated than central limit theorem on the mean. here – mez Jan 25 '13 at 13:19
  • @mezhang, median is (with a technical caveat) just an order statistic that looks at the rank-order of samples, completely ignoring the "metric" aspect of distribution. In particular, the difference between mean and median in highly skewed distributions (ie, not Gaussian) tends to be high, which is why often medians are used in this context. – alancalvitti Jan 25 '13 at 13:37
  • @alancalvitti Thank you. I don't understand most of what you wrote but one day I will. – mez Jan 25 '13 at 13:39
  • @mezhang, check this: http://davidmlane.com/hyperstat/A92403.html – alancalvitti Jan 25 '13 at 13:42
  • i like hamburgers. – Michael J. Lee Jan 26 '13 at 12:04