0

I have a set of numbers.

I want to run standard deviation on those numbers.

Before running this process ,I first want to clean/eliminate numbers that are not close to most values.

For example, assume that I have the numbers: 1, 2, 1, 3, 1, 4, 4, 2, 7, 3002, 3, 3, 35000, 1, 2, 2, 9, 9 ,8, 8, 9, 698511.

Now, I want to remove the numbers 3002, 35000 and 698511 and run STD on the rest.

How can it be done?

Thanks.

Nir
  • 109
  • What's the problem? Just take them off the list. – lulu Sep 13 '17 at 12:38
  • Do some research on outliers : http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm Not the Malcolm Gladwell book with that title). – Ethan Bolker Sep 13 '17 at 12:43
  • If you are looking for a reliable procedure that works in all contexts, you are out of luck. Sometimes the situation you are studying let's you reject data (maybe those three points are physically impossible). If so, great! You can just reject all the physically impossible data. Maybe you have a model with some assumptions. Otherwise, if you have no information at all about the data, well whose to say what's an outlier and what isn't? – lulu Sep 13 '17 at 12:45

0 Answers0