7

I'm working on a problem where I am trying to generate a random number from a Pareto distribution.

  1. Using some measured data, I have been able to fit a Pareto distribution to this data set with shape/scale values of $4/6820$ using the R library fitdistrplus.

  2. Now I want to, using the above scale and shape values to generate random numbers from this distribution.

  3. Doing some searching here (http://www.ntrand.com/pareto-distribution/) and here (http://www.randomservices.org/random/special/Pareto.html), it mentions that the inverse function or quantile function can be used to generate a random number i.e.

$$x=\frac{b}{(1−U)^{1/a}} \in U(0,1)$$

Using two methods (some C code and Excel formula) I've generated around $1000$ numbers using my shape & scale parameters. However when I import the data back into R and formally validate whether the randomly generated numbers are from a Pareto distribution an Anderson Darling GoF tests tells me that the Pareto distribution is not a good fit, additionally the fitdist library tells me my shape/scale parameters are totally different than the original parameters. (e.g. shape $= 8260157$, scale $= 10834967$)

Any ideas what I am doing wrong here?

Regards Jonathan

M47145
  • 4,106

3 Answers3

2

Either something is wrong with your numerical generation or with your maximum likelihood analysis

If the shape is $a$ and the scale is $b$, I would agree with you that given a uniform random variable $U$ on $(0,1)$ you would have $$X=\dfrac{b}{(1-U)^{1/a}}$$ as having a Pareto distribution.

Given a set of values drawn from a Pareto distribution, I would have thought that the maximum likelihood estimate of the scale would be $$\hat{b} = \min(x_i)$$ and of the shape would be $$\hat{a} = \dfrac{1}{\overline{\log(x_i)} - \log\left(\hat{b}\right)}$$

The following R code suggests this seems to work, since

shape <- 4
scale <- 6820 
cases <- 1000
set.seed(2016)
u <- runif(cases)
x <- scale / (1-u)^(1/shape) 
mlescale <- min(x)
mleshape <- 1 /( mean(log(x)) - log(mlescale) )

gives

> mleshape 
[1] 3.968604
> mlescale 
[1] 6824.48

which are close to to original values

Henry
  • 157,058
  • Thans for your commends. I took a look at this in R just now: – Jonathan Dunne May 09 '16 at 10:17
  • If i generate a pareto distribution from the r library – Jonathan Dunne May 09 '16 at 10:18
  • code x4 <- rpareto(1000, 4.14104, 6820.53374) I get an the following output in fitdistrplus as follows:
    shape 3.688942
    scale 5915.610140 which is not totally unreasonable. If i plug those values into an AD GoF test i get : data: x4 and ppareto AD = 0.32997, p-value = 0.914 which is an excellent fit
    – Jonathan Dunne May 09 '16 at 10:22
  • Next if i use the above shape and scale calculations you mentioned previously as follows: mlescale <- min(x4) mlescale mleshape <- 1 /( mean(log(x4)) - log(mlescale)) mleshape I get the following values returned: mlescale : 0.0599367 and mleshape: 0.1024889 if i plug that back into the AD GoF test i get the following: data: x4 and ppareto AD = 341.46, p-value = 6e-07 – Jonathan Dunne May 09 '16 at 10:25
  • If you got mlescale: $0.0599367$ then that suggests a minimum value of $0.0599367$ in your simulated data. But that is impossible if your original scale was $6820$ or $6820.53374$ or $5915.610140$ as all generated values must be at least as big as the scale parameter. – Henry May 09 '16 at 10:44
  • Hi Henry so just to confirm when i use the code above in R it does indeed generate mleshape and scales in the right ball park shape = 3.968604 and scale = 6824.48 however when i check the recently generated data with fitdistrplus i get some really strange values for the shape and scale i.e. estimate Std. Error shape 10739506 NA scale 97767067374 NA and of course AD GoF tells me this distribution is not suitable as a pareto distribution. So i'm scratching my head as to why this would be the case? – Jonathan Dunne May 09 '16 at 11:26
  • So what is the minimum of your generated data that produces such strange scale estimates? You might do better using the known maximum likelihood formulae than a fragile black-box package – Henry May 09 '16 at 13:14
2

If $$U\in [0,1)$$ is uniformly distributed random variable, then $$E(\lambda)=-\ln(1-U)/\lambda$$ is exponentially distributed random variable, and then $$P(x_m,\alpha)=x_m\cdot\exp(E(\alpha))$$ is Pareto distributed random variable with scale $x_m$ and shape $\alpha$. $$L(x_m,\alpha)=P(x_m,\alpha)-x_m$$ generates Lomax distributed random variable.

Srba
  • 21
1

Ok i believe i figured out the answer, it looks like the formulae that i mentioned in my original post was not correct. I figured out that the numbers being generated where out by a factor the the scale i used i.e. 6820.53374

Apologies I am using my c code to show the correct computations (See below):

double pareto_arrival(void) { /* R computations estimate that Pareto is a reasonable fit with / / shape = 4.14104, scale = 6820.53374 paramaters / / https://cran.r-project.org/web/packages/actuar/actuar.pdf */

float a;
float b;
float inv_fun_denom;
float par_arr;

a =  4.14104;
b = 6820.53374;

inv_fun_denom = pow(1-drand48(), 1/a);
par_arr = (b/inv_fun_denom)-b; //adding the -b did the trick

return par_arr;

}

Page 64 of this pdf helped put me on the right track https://cran.r-project.org/web/packages/actuar/actuar.pdf

  • I think your confusion stems from the fact that the PDF of the pareto distribution in the actuar package is different from the PDF of the pareto distribution in the page that you looked at. The PDF in actuar is a * b^a / (x+b)^(a+1), while the PDF in the webpage is a * b^a / x^(a+1). So your quantile function you used to generate random number would have been different. – C.J. Jackson Mar 19 '18 at 21:11