13

I have a doubt: How to get the standard deviation of a given histogram? I'm looking for it on the internet. But I got Nothing. For example the case of this image below

enter image description here

Thanks in advance!

user157308
  • 656
  • 2
  • 6
  • 15
  • 1
    I don't understand, what are the categories for the x-axis, it appears that there is no one label: the 23, for example is on the left edge of one box with the 24 on the right, which one is the category? Or is it something else entirely? – Adam Hughes Jul 06 '14 at 01:47
  • Millimeter (SI unit symbol mm). – user157308 Jul 06 '14 at 01:50
  • 1
    I'm not asking the units, I'm asking the categories. A histogram is a graphical representation of data, the horizontal axis contains categories while the vertical axis measures the quantity of observations in those categories. I'm asking what's the category for say the first box? Is it 23? 24? Something else? – Adam Hughes Jul 06 '14 at 01:51
  • Ok. I'm sorry. The numbers in the horizontal axis are lengths of metal rods. The first box is 23.0 to 23.9. – user157308 Jul 06 '14 at 01:57
  • 1
    my answer below uses only the left-values, I'll explain the difficulties below that since the explanation changed while I was typing the answer. – Adam Hughes Jul 06 '14 at 02:29

2 Answers2

12

So first we convert the histogram to data to get a better feel for things:

\begin{pmatrix} 23 & 24 & 25 & 26 & 27 & 28 & 29 & 30 & 31 \\ 3 & 7 & 13 & 18 & 23 & 17 & 8 & 6 & 5 \end{pmatrix}

The definition of standard deviation is the square root of the variance, defined as

$${1\over N}\sum_{i=0}^N (x-\bar{x})^2$$

with $\bar{x}$ the mean of the data and $N$ the number of data point which is

$$3+7+13+18+23+17+8+6+5=100$$

Now

$$\bar{x}={1\over 100}(23\cdot 3+24\cdot 7 +\ldots + 31\cdot 5)=26.94$$

which you can compute for yourself. The terms are the number of rods times the number of times they appear in the data, we could have written it out the long way as

$$\underbrace{23+23+23}_{\text{3 times}}+\underbrace{24+24+}_{\text{7 times}}\ldots+\underbrace{31+31}_{\text{5 times}}$$

but we save some time using multiplication.

From there you can make your calculation of the variance easier by using multiplication in the sum

$$\sigma^2={1\over 100}\bigg(3(23-26.94)^2+7(24-26.94)^2+\ldots + 5(31-26.94)^2\bigg)=3.6364$$

Taking square roots, we get $\sigma=1.9069$ to four decimal places.

Edit: Since the categories are now a range and not just the left values, this is not entirely accurate. As a matter of course: it's not possible to figure out if the categories are ranges. If more of the rods are length 23 than 23.999 et cetera, then the value changes. Ranges aren't enough to determine statistics like standard deviation.

Adam Hughes
  • 36,777
4

If you'd like to get Adam Hughes' answer code in python please find it below. The testcase gives:

{'n': 9, 'sum': 100, 'prod': 2694, 'sqsum': 363.64, 'mean': 26.94, 'variance': 3.6364, 'stdv': 1.9069347130932406}

testcase:

def test_Stats():
    # https://math.stackexchange.com/questions/857566/how-to-get-the-standard-deviation-of-a-given-histogram-image
    values=[(23,3),(24,7),(25,13),(26,18),(27,23),(28,17),(29,8),(30,6),(31,5)]
    stats=Stats(values)
    print (vars(stats))

Code

class Stats:
    """ Calculate Histogram statistics see https://math.stackexchange.com/questions/857566/how-to-get-the-standard-deviation-of-a-given-histogram-image """
    def __init__(self,histindexed):
        self.n=len(histindexed)
        self.sum=0
        self.prod=0
        self.sqsum=0
        for x,y in histindexed:
            self.sum+=y
            self.prod+=x*y
        self.mean=self.prod/self.sum
        for x,y in histindexed:
            dx=x-self.mean
            self.sqsum+=y*dx*dx
        # σ²
        self.variance=self.sqsum/self.sum
        self.stdv=math.sqrt(self.variance)