14

QUESTION Solution for finding mean :

enter image description here

The problem faced when i saw a video to evaluate the mean https://www.youtube.com/watch?v=vMrc6dP8pCo

According to the video, the lecturer said that, we can take the average of the measurement intervals. so according to him:

we will get $$2.5 \times 15 +8.5\times 35+ ...$$ instead of $$1 \times 15 +6\times 35+ ...$$

Can we evaluate the mean and median precisely from the Histogram?

  • Yes, sort of. How precisely you can determine the mean and median from the histogram usually depends on how precise the histogram is (i.e. the "width" of the bins). An exact solution? Not always, no. – Newb Jul 26 '14 at 22:10
  • If there is an question to ask me to compare the mean and median from the histogram than how to do it? –  Jul 26 '14 at 22:18
  • @ComplexGuy According to the video the average would be$ \frac{3\cdot 15+8 \cdot 35+13\cdot15+18\cdot 12+23\cdot 10+28\cdot 5+33 \cdot 3}{95}$ – callculus42 Jul 26 '14 at 22:39
  • @calculas, right. But isn't it a contradiction that we get different mean? –  Jul 26 '14 at 22:45
  • @ComplexGuy If we both calculate them different, then there is no contradiction. – callculus42 Jul 26 '14 at 22:53
  • "According to the video, the lecturer said that, we can take the average of the measurement intervals. so according to him: we will get 2.5×15+" -- If the interval starts from 1 and goes up to 5, the center of the interval is not 2.5, but 3. You cannot calculate mean and median 'precisely' in general, only approximately. You can estimate both under reasonable assumptions (the usual assumptions are not always suitable, so in practice use your head, rather than blindly apply a rule) and you can get upper and lower bounds for the original sample mean and sample median. – Glen_b Jul 27 '14 at 02:21
  • But can we say that which one is bigger? mean or median? –  Jul 27 '14 at 07:24
  • @ComplexGuy In general it depends on the distribution of the data. Example 1: $1,1,1,1,1,1,5;$ $\overline x (\textrm{mean})=11/7, \color{blue}{\tilde x}(\textrm{median})=1;\overline x > \color{blue}{\tilde x} \ \ \ \ \ \ \ \ \ \ \ \ $Example 2: $1,5,5,5,5,5,5; \overline x (\textrm{mean})=31/7, \color{blue}{\tilde x}(\textrm{median})=5;\overline x < \color{blue}{\tilde x}$ – callculus42 Jul 27 '14 at 14:38
  • What will be for the given data in the question? –  Jul 27 '14 at 20:14
  • @ComplexGuy The median is the the value of $x_{\frac{n+1}{2}}=x_{\frac{95+1}{2}}=x_{48}=$of the (sorted) data. So it is in the inverval 6 to 10. Here you take the mean: 8 – callculus42 Jul 28 '14 at 13:29
  • @ComplexGuy If you want to get more precisely, than you can take the formula which is given here: http://math.stackexchange.com/questions/876362/statistics-finding-the-median/876396#876396 – callculus42 Jul 28 '14 at 13:44

2 Answers2

12

You can get both the mean and the median from the histogram. The way to calculate the mean is that illustrated in the video and already shown in one of the comments. For each histogram bar, we start by multiplying the central x-value to the corresponding bar height. Each of these products corresponds to the sum of all values falling within each bar. Summing all products gives us the total sum of all values, and dividing it by the number of observations yields the mean.

On the other hand, to calculate the median from a histogram you have to apply the following classical formula:

$$\displaystyle L_m + \left [ \frac { \frac{N}{2} - F_{m-1} }{f_m} \right ] \cdot c$$

where $L_m$ is the lower limit of the median bar, $N$ is the total number of observations, $F_{m-1}$ is the cumulative frequency of the bar preceding the median bar (i.e. the total number of observations in all bars below the median bar), $f$ is the frequency of the median bar, and $c$ is the median bar width. This formula substantially arises from a linear interpolation, which assumes that data are uniformly distributed within the median class. To understand this formula, it can be noted that the fraction $\displaystyle\frac {N/2 - F_{m-1}}{f_m}$ is the proportion of observations in the median bar that are below the median. Under the assumption that observations are uniformly distributed within the median bar, multiplying this proportion by the median bar width $c$ yields the fraction of median bar width corresponding to the position of the median. Adding this result to $L_m$ finally provides the median.

Anatoly
  • 17,079
  • 1
    You mean $f_m$ is the freq of the median bar, right? – TCSGrad Jun 09 '15 at 21:50
  • 1
    Yes, you're right – Anatoly Jun 11 '15 at 19:45
  • The "median bar" is the middle histogram bar, right? What if there are an even number of histogram bars? You could average the two nearest, but what would f, Lm, etc. be? – speedplane Aug 18 '15 at 18:55
  • The median bar is not the middle one, but that containing the median observation. If $n$ is the total number of observations, you have to calculate $k=n/2$ (if $n$ is even) or $k=(n+1)/2$ (if $n$ is odd). The median observation is the $k^{th}$ observation starting from the left or the right of the histogram. – Anatoly Aug 18 '15 at 23:25
  • 2
    This answer is not correct. It is not possible to determine the mean or the median given the histogram in the OP. The best we can do is bound the mean and the median; the best lower bound uses the left bound of each bin, and the best upper bound uses the right bound of each bin. – symplectomorphic Aug 01 '17 at 00:14
2

You cannot compute the sample mean of all of the data without knowledge of all of the data. The histogram, as given above, does not give all of the data. The histogram is just a crude picture. Any calculation from a histogram that allows more than one single value in each column will be at best an ESTIMATE of the sample mean.

However, that might be OK for a lot of purposes...