0

When calculating outliers using the IQR method, we find a range and define outliers outside of that range (below). Is it 'mathematically' accepted if I change the 1.5 to a 2 to get less outliers for a particular dataset? Or does this break a conventional theory?

Additionally, does the data need to follow a normal distribution to use this method?

  • IQR: Q3 – Q1
  • Upper bound: Q3 + (1.5 * IQR).
  • Lower bound: Q3 – (1.5 * IQR)
  • Outlier = outside of range [Lower, Upper]
maximus
  • 103
  • See https://math.stackexchange.com/questions/966331/why-john-tukey-set-1-5-iqr-to-detect-outliers-instead-of-1-or-2 – Jukka Kohonen Sep 29 '21 at 17:39
  • Thanks for sharing @JukkaKohonen! If the data is right skewed, can I adjust the 1.5 to 2? (as in, it's not a perfectly centered bell curve). – maximus Sep 29 '21 at 17:42
  • The top answer in the linked question does say that 'goldilocks' would choose 1.5 but would it 'break' any mathematical theory if I make it 2? – maximus Sep 29 '21 at 17:53
  • There is no "outlier/not outlier" threshold. It is all a question of extremity. Some points are more extreme than others, in terms of distance from the main body of the data; such points have a greater degree of "outlierness," if you will. A z-score, or similar statistic using medians and IQRs, will suffice as a degree of "outlierness." In multiple dimensions, Mahalanobis distance and its variants serve the same purpose. There really is not much point to trying to create a "Yes/No" condition for outliers. – BigBendRegion Sep 30 '21 at 12:51
  • Thanks @BigBendRegion! I don't seem to have an 'upvote' button, but this is good! – maximus Oct 01 '21 at 19:27
  • Ok, I made it a bona fide answer! – BigBendRegion Oct 01 '21 at 20:52

1 Answers1

0

There is no "outlier/not outlier" threshold. It is all a question of extremity. Some points are more extreme than others, in terms of distance from the main body of the data; such points have a greater degree of "outlierness," if you will. A z-score, or similar statistic using medians and IQRs, will suffice as a degree of "outlierness." In multiple dimensions, Mahalanobis distance and its variants serve the same purpose. There really is not much point to trying to create a "Yes/No" condition for outliers.