I'm trying to approximate a standard normal distribution with a triangular distribution. What parameters of the triangular distribution (min, max and mode) are more suitable? Thank you
-
3Jack D'Aurizio's answer uses $L^2$ distance with respect to Lebesgue measure. Perhaps for some purposes that is what is needed, but your question doesn't give enough information to be tell whether that is true of your purposes. Alexander Dunlap's answer gives a triangular distribution that has the same expected value and the same variance as your normal distribution. Statisticians often do that. Another possibility would be $L^2$ with respect to the normal distribution itself, or with respect to the triangular distribution. Yet another criterion is minimum total variation distance$,{}\ldots$ – Michael Hardy Jul 26 '15 at 20:40
-
$\ldots,{}$which minimizes the largest difference in probabilities assigned by the two distributions to the same subset of the line. ${}\qquad{}$ – Michael Hardy Jul 26 '15 at 20:41
3 Answers
The best approximation in the $L^2$ sense is given by the value of $\alpha\in\mathbb{R}^+$ for which:
$$ \frac{d}{d\alpha}\left(\frac{1}{2\pi}\int_{|x|\geq \alpha}e^{-x^2}\,dx + \int_{-\alpha}^{\alpha}\left(\frac{1-|x/\alpha|}{\alpha}-\frac{e^{-x^2/2}}{\sqrt{2\pi}}\right)^2\,dx\right)=0,$$ i.e. by minimizing the $L^2$ norm of the difference between the pdf of a standard normal distribution, $\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$, and a distribution supported on $[-\alpha,\alpha]$ having pdf $\frac{1}{\alpha}\left(1-\left|\frac{x}{\alpha}\right|\right).$
Numerically, it is $\color{red}{\alpha\approx 2.297}$. Here there are the two distributions:
- 353,855
-
-
@Ian: what I wrote is correct. I minimized the $L^2$ norm of the difference between the pdf of a standard normal distribution, $\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$, and a distribution supported on $[-\alpha,\alpha]$ having pdf $\frac{1}{\alpha}\left(1-\left|\frac{x}{\alpha}\right|\right).$ – Jack D'Aurizio Jul 26 '15 at 20:41
-
1Oh, I see, you did the appropriate squaring in the first term. It might have been a little bit clearer to write it that way instead. – Ian Jul 26 '15 at 20:43
It depends in what sense you want your triangular distribution to "approximate" the normal distribution. The normal distribution is symmetric about $0$ and unimodal, so you probably want your triangular distribution to be symmetric about $0$ and unimodal as well. In order for your triangular distribution to be a probability distribution, the area under the triangle should be $1$. If your triangle has height $h$ and base $b$, this means that $bh/2=1$, so $b=2/h$. Thus you have one more parameter to fix to best approximate a normal distribution.
A reasonable next thing to request would be that your triangular distribution and the standard normal distribution have the same variance; i.e. that your triangular distribution has variance $1$. The pdf of your distribution is given by
$$f(x)=\left(1-\frac{2}{b}|x|\right)h=(1-h|x|)h.$$
Thus the variance of your triangular distribution with height $h$ and base $2/h$ will be
$$\text{Var} = \int_{-1/h}^{1/h}x^2\cdot (1-h|x|)h \, dx =2\int_0^{1/h}x^2\cdot \left(1-hx\right)h \, dx;$$
then all you have to do is solve for $h$.
- 1,732
-
3It would be clearer if you labeled your integration variable (I assume the integrals are $dx$?) Also, you changed the $h$ back to $\frac{2}{b}$ in the last step, which I don't think is productive. All that said, the equation at the end is apparently quite simple: $1=\frac{1}{6h^2}$ means $h=\sqrt{1/6} \approx 0.408$. This is fairly close to Jack D'Aurizio's solution. – Ian Jul 26 '15 at 20:19
-
2(It is close in that Jack D'Aurizio's $\alpha$ is $1/h$ in the notation above, and $1/2.297 \approx 0.435$.) – Ian Jul 26 '15 at 20:24
from your Triangle Distribution you have a (min), (max), and (mode).
in a Normal Distribution you have a (mean) and (standard deviation) abbreviated as (sd) . in the Normal Distribution the (mean) is equal to the (mode) due to the symmetry of the model.
to move from a Triangle model to a Normal model we need to estimate the (mean) and the (standard deviation) from the Triangle models parameters
for the (mean):
it is reasonable to use your (mode) of the Triangle Distribution as your (mean) for your Normal Distribution because they are equal under the Normal model. NOTE at this point one should be aware they maybe breaking a symmetry assumption in their normal model.
a second reasonable estimate for the (mean) can come from the (midpoint) of the (min) and the (max) where (midpoint)=((max)-(min))/2.
the (mean) of the Normal distribution represents the central tendency. a (midpoint) and a (mode) are also measures of central tendency thus they are candidates that we have readily available from the Triangle Distribution.
a third method will to be use the (mean) of the Triangle Distribution and use it to estimate the mean a Normal Distribution. the (mean) of a triangle distribution is (mean_tri)=((min)+(max)+(mode))/3
method 1 (central tendency):
(mean)=(mode)
or
method 2 (central tendency):
(midpoint)=((max)-(min))/2
(mean)=(midpoint)
or
method 3 (match moments):
(mean_tri)=((min)+(max)+(mode))/3
(mean)=(mean_tri)
for the (standard deviation):
a Triangle distribution gives us a (min) and a (max). We can take the difference to find a range value (range)=(max)-(min). a function of the range can provide an estimate of the standard deviation of a Normal distribution f(range)=(sd). It is up to the user to determine a reasonable function at this point. example: four standard deviation covers 95% of a Normal Distribution's data and would result in an estimated standard devastation of (sd)=(range)/4, while a six standard deviation range covers 99% and would result in an estimate of (sd)=(range)/6.
Another method will be to take the (standard deviation) of the Triangle Distribution and use that to estimate the (standard distribution) of the Normal Distribution. the (sd) of a triangle distribution is (sd_tri)=sqrt(((min)^2+(max)^2+(mode)^2-(min)(max)-(min)(mode)-(max)(mode))/18)
method 1 (range coverage):
try setting
(range)=(max)-(min)
and using:
(sd)=(range)/2 (65% normal coverage) (fattest of the three models)
or
(sd)=(range)/4 (95% normal coverage) (middle width of the three models)
or
(sd)=(range)/6 (99% normal coverage) (skinniest of the three models)
method 2 (match central moments):
(sd_tri)=sqrt(((min)^2+(max)^2+(mode)^2-(min)(max)-(min)(mode)-(max)(mode))/18)
(sd)=(sd_tri)
when estimating your (mean) and (standard deviation) you may want to consider how much confidence you had in your Triangle's coverage of the population and if the original distribution was skewed. Did the (min) and (max) result from a very small sample or a very large sample? If i am skewed, do i want to focus in on the mode or do i want to go for more coverage?
- 1
- 1
