Zero-width histogram bin?

Question

I want to plot a histogram of some timing data. The timing data, represented by a continuous variable t, is binned as follows:

t=0
0<t<=1
1<t<=2
2<t<=3
3<t<=4

I have frequency data for each bin. To plot this as a histogram, I understand that I ought to use frequency density; that is, the frequency divided by the bin width. But my first bin has zero width! How can one cope with this?

@K.Miller $t$ represents some sort of delay. I'm particularly interested in showing explicitly the situations where there is no delay whatsoever. — John Wickerson, Oct 29 '16 at 14:08
You could add a bin $[-1,0]$ with the understanding that since $t \geq 0$, it corrsponds to observations at $t = 0$. — K. Miller, Oct 29 '16 at 14:25
@K.Miller Mm I thought of that too and it's quite tempting. Still feels like a bit of a hack though! — John Wickerson, Oct 29 '16 at 15:34
I find it curious that you can detect a delay of exactly zero. How do you know when it occurs? What does it even mean? — David K, Oct 29 '16 at 16:52
@DavidK It's certainly a good question! It's because the data is obtained analytically rather than experimentally. — John Wickerson, Oct 30 '16 at 21:02
If the data are all integers then set the bins to $(n-\frac12,n+\frac12]$ for integers $n$. If the other data really are spread randomly within each of the intervals $(0,1], (1,2], \ldots$, it sounds like a model of a mixed probability distribution (or something that works like such a distribution), in which case maybe a cumulative distribution function would be a better representation. Or hack the histogram as already suggested; histograms aren't really designed to do mixed distributions. — David K, Oct 30 '16 at 21:16
@DavidK Thanks. The data is indeed real-valued within those intervals. I will look into a cumulative version. Feel free to upgrade your comment to an answer that I can accept. — John Wickerson, Oct 30 '16 at 21:32

score 1 · Accepted Answer · answered Oct 30 '16 at 23:55

For data that are analytically derived, where some positive percentage of the data occur at a single exact value and others may be found throughout some interval(s) on the real line, a cumulative distribution function (CDF) is one way to clearly graph the data.

If this actually is a probability distribution of a random variable $X$, the CDF is given by $F(t) = P(X \leq t)$. For the situation described in the question, where only values $t \geq 0$ can occur, you would have $F(t) = 0$ for all $t < 0$, then $F(t) = P_0$ for $t = 0$, where $P_0$ is the fraction of data that fall at $t = 0$ exactly, and $F(t)$ is increasing for all $t > 0$ where the probability density at $t$ is positive, $F(t)$ constant anywhere else.

This also works for data that are not random but that act like a probability distribution, in this example a certain percentage at one exact value, a certain percentage distributed in the interval $(0,1]$, a certain percentage in the interval $(1,2]$, and so forth. If all you had available (or all you wanted to determine) was the frequencies for each of these bins and for the value $t=0$, you could interpolate a straight line segment from $(0,P_0)$ to $(1,P_0 + P_1)$ where $P_1$ was the fraction of data falling in the interval $(0,1]$.

Zero-width histogram bin?

1 Answers1