12

This idea popped in my head when I was reading this post on the normal distribution and the y-axis.

My question is (and taking advantage of a nearby computer), a PDF inputs one value and returns another, and this returned value is a probability. So, if we were using R, we'd do something like dnorm(0) and get 0.3989423. Fair enough.

However, the above post mentioned (all credit due to @Arkamis):
"By the fundamental theorem of calculus, the PDF is then the derivative of the CDF; that is, the PDF is the derivative of a function that returns a probability. So what is that intuitively? Honestly... it's not really anything. The "units" of the vertical axis in the PDF plot don't lead to anything intuitive; they are meaningful, but only in a derived, mathematical sense."

So, is the y-axis of a PDF returning a probability, or instead is it a mostly unintuitive construct?

  • 3
    It's exactly what it says. Probability density. I find it very intuitive. You could start with the discrete version, PDF with respect to a discrete measure for a dice has values 1/6 on 1, 2, 3, 4, 5, 6. – user2345215 Oct 31 '14 at 18:37

6 Answers6

7

As the author of that snippet, perhaps I should expand this comment.

Suppose you have a function $F(x)$ and its derivative $f(x) = F'(x)$. What does $y_k = f(x_k)$ tell you? It tells you the slope of $F(x)$ at the point $x_k$, and nothing else about $F(x)$. With some assumptions on $F$ such as continuity, we can extend this meaning to interpret some local behavior of $F$, and we can extract some additional approximate details about $F(x)$ from the quantity $y_k$.

Likewise, the probability density function of a continuous distribution, evaluated at a point in its support, gives you nothing but the density of the distribution at that point. With some additional knowledge of the underlying distribution function, we can expand this point value to extract some additional approximations and/or qualitative data about the distribution.

The PDF encodes the shape of the distribution, which is absolutely meaningful when you can compute $f(x)$ over some subset of the support of the distribution. But a single arbitrary value of the PDF usually gives you nothing important. It's only when we leverage the properties of a PDF in some way, typically through the Fundamental Theorem of Calculus, that we really get interesting data. Of course, plotting the PDF over the domain can be highly useful indeed!

Emily
  • 35,688
  • 6
  • 93
  • 141
  • 1
    The impetus of this comment, which I have echoed many times throughout this site, is to defeat the notion that the PDF of a continuous distribution returns probabilities... or that a single value from one PDF can be meaningfully compared in a vacuum to some other PDF evaluated at the same point. There are useful exceptions: assuming two distributions are normal, finding the difference in the locations of their respective maxima yields a statistically-relevant metric. But saying something like "$f_1(x_k) > f_2(x_k)$ therefore...." is a quick train to nonsenseville. – Emily Oct 31 '14 at 19:03
  • 1
    I would add that changing (countably) infinitely many values of PDF has absolutely no effect on the distribution. That's part of the reason why it doesn't make sense to focus on a single value. – user2345215 Oct 31 '14 at 19:23
  • @user2345215 That's a very good point. – Emily Oct 31 '14 at 19:23
4

You know if you think about it carefully there is a meaning. I think the meaning is the Y-axis measurement units are obviously not probabilities alone but probabilities per 1 unit x.

Example: I have a uniform distribution from 0 to 10 then its pdf value is 1/10 for all the support. Then this states that my probability for any unit of length 1 (say from 0 to 1) is 1/10.

If my uniform distribution is from 0 to .5 then obviously the pdf will have a value of 2 because the support is not .5 units long.

This is the same anology using time, speed, and distance. Here distance is analogous to probability. Time is on the x-axis, speed (analogous to the numbers of the pdf) is on the y axis and given as distance per unit time. The question is what is the total distance traveled?

Zues
  • 41
2

The units are those of $1/\sigma$, where $\sigma$ is the standard deviation. This is seen in the fact that besides $1/\sigma$ the other factors in the density are the unitless $1/\sqrt{2\pi\ {}}$ and the unitless value of the exponential function.

If men's heights and temperatures at noon on the fourth of July are normally distributed, the units would be different in those two cases.

The values of probability density functions are not probabilities. If they were, then none of them could be more than $1$, but we commonly see values more than $1$. E.g. the normal density with standard deviation $1/100$.

If a normally distributed random variable $X$ is in miles, then the values of the density are in "per mile", i.e. $1/\text{mile}$. You add a certain amount of probability per mile added.

1

The y units are the inverse of the x variable. For example if you have a distribution function showing the distribution of people vs weight you have weight on the x axis. When you take the area under the curve for a section of the populations, lets say 140-160 lbs, you get the probability of the population that is between 140 and 160 lbs. Since area is height x width we have probability is height x pounds and since probability is unites the height must be 1/lb. You could also say the y axis is probability per pound. Understanding that we have to scan a range of pounds and there is zero probability at any single weight. You could also use a y variable of %/lb as some prefer to think in terms of percentages rather than a unitless probability.

1

Y axis represents probability density whereas probability is represented as the area. Y axis can be thought as probability/dx instead of probability. If you assume y axis as probability as you refine your range and get smaller and smaller the curve goes away. To avoid this paradox y axis is probability density.

0

I was taught that because PDF's deal with continuous data, the relative frequencies (which can be thought of as the probabilities) are based on the chosen bin width. As the bin width gets smaller but the number of bins increases, the bins start to fit together and take on the shape of a smooth curve. This in turn leads, in a sense, to the y-axis values being the amount of relative frequency (probability)/small bandwidth or small change in x or (small) unit of x. Thus the y-axis unit is probability/small Δx. When we want to know the probability of our desired range of x (the AUC) it becomes in a sense the (probability/Δx)*desired range Δx’s = cumulative probability over our desired range of x....sorry if this is not scientific enough!! Just how it was presented to me before!

RobWP
  • 1