2

There is nothing original about this question. It was asked here.

I am just curious about an answer that is beyond my mathematical level.

In one of the simulations appearing in the comments to the OP, the number $e$ is approximated through a formula like this:

$$\Large2 + \mathrm E\left(\frac{1}{\left(\lceil \frac{1}{X\, \sim \,U(0,1)}\rceil-2\right)!} \right)$$

Hopefully this is acceptable mathematical notation for the code (2 + mean(1/factorial(ceiling(1/runif(1e5))-2))).

The question is, what mathematical concept, geometrical approximation, or distribution approximation underpins this formula?

  • @Rahul The formula and the code are consistent on my screen (interpreting runif as a function that generates uniform(0,1) random numbers). – Ian Feb 05 '16 at 16:23
  • @Ian Oops, I was looking at an old version of the question. –  Feb 05 '16 at 16:25
  • Can the person who voted to close the question please explain why it warrants closing $iff$ the reason for such recommendation is unrelated to my initial mishaps getting the latex legible? Just trying to get a sense for future reference. – Antoni Parellada Feb 05 '16 at 16:39
  • I didn't vote to close, but given that your question is "where does this formula come from?", I can see why the formula not being clear would be a problem. – Ian Feb 05 '16 at 16:44
  • @Ian No question. I am teaching myself mathematics, and wanted to show the respect of at least attempting to translate the line of code into latex. In the process of deciding how to resolve the ceiling part and the random uniform denominator I forgot part of it, and self-corrected when I realized it. Thank you for your answer, I'm re-re-reading it, and plan on accepting. – Antoni Parellada Feb 05 '16 at 16:51

1 Answers1

3

Note that the ceiling changes when its argument passes through an integer. This happens when the uniform variable passes through $1/n$ for some $n$. The ceiling will be $n$ when the uniform variable is between $1/n$ and $1/(n-1)$. The length of this interval is $\frac{1}{n-1}-\frac{1}{n}=\frac{1}{n(n-1)}$, and the probability of a uniform random variable on (0,1) lying in some subinterval of (0,1) is the length of that interval.

So the overall random variable inside the expectation there is equal to $\frac{1}{(n-2)!}$ with probability $\frac{1}{n(n-1)}$, for $n=2,3,\dots$. So the formula for the expectation is

$$\sum_{n=2}^\infty \frac{1}{n(n-1)(n-2)!}=\sum_{n=2}^\infty \frac{1}{n!}.$$

This is the famous Maclaurin series for $e$, except missing the first two terms, which add up to $2$.

Incidentally, this is a pretty terrible way to approximate $e$; a few runs on my machine with 100,000 simulations each gave an accuracy of about $10^{-3}$. Given how Monte Carlo's accuracy scales (the error is essentially proportional to $n^{-1/2}$), this means you should expect to need on the order of $10^{16}$ simulations to get an accuracy of $10^{-9}$. Yet only 13 terms of the sum itself are required for this accuracy.

Ian
  • 101,645
  • Can I ask you a couple of clarifications? I get the overall idea, but I don't get the statement in the fifth sentence re: overal r.v. inside the expectation being $1/(n-2)!$. – Antoni Parellada Feb 05 '16 at 18:40
  • 1
    @AntoniParellada The ceiling of the reciprocal of the uniform variable is equal to $n$ when the reciprocal itself is between $n-1$ and $n$. This occurs when the uniform variable is between $\frac{1}{n}$ and $\frac{1}{n-1}$. The length of the interval $(1/n,1/(n-1))$ is $\frac{1}{n(n-1)}$, so this occurs with probability $\frac{1}{n(n-1)}$. When the ceiling is equal to $n$, the overall quantity in the expectation is $\frac{1}{(n-2)!}$. So the overall quantity is equal to $\frac{1}{(n-2)!}$ with probability $\frac{1}{n(n-1)}$. – Ian Feb 05 '16 at 18:42
  • +1 Re the final paragraph: You almost got the point, but it's the opposite of what you state. We can generalize this method and simulate, say, $\sum_{i=n+1}^\infty 1/i!$ and add $1+1+1/2+\cdots+1/n!$ to it to estimate $e$. As $n$ increases this becomes extremely efficient, to the point where no iterations of the simulation are needed. Thus, there is a sequence of simulations that range from this "pretty terrible way" to methods requiring no iterations at all. At what point, exactly, do we stop calling this "simulation" and start calling it pure "calculation"? :-) – whuber Feb 07 '16 at 14:12
  • @whuber The questions are: 1. how much of the work is spent on calculating the "head" vs. the "tail"? Once most of the work is in the "head", it's more deterministic calculation than Monte Carlo. 2. How simply can you write down an expression for the tail part? I don't see an immediately obvious way to generalize this to a larger "head" without an actual CDF lookup table. – Ian Feb 07 '16 at 15:16
  • Re (1) This seems like an ambiguous criterion. If we make sure to generate random variates sufficiently slowly, then (according to it) the calculation will be "Monte Carlo". How about if I time the effort to compute the head and then take twice as much time to generate a Monte-Carlo estimate of the tail--that would seem to qualify as a "Monte Carlo" calculation by this criterion. (2) The tail CDF has a simple closed form expression. Regardless, there are many ways to generate random variables from it. – whuber Feb 07 '16 at 16:35
  • @whuber 1. Yes, you have to be a bit more precise about the difficulty of "a unit of computation" in each of the phases. This is harder because things like the CDF lookup for a random variable is not necessarily independent of the value of the randomization parameter. Still, once you agree on those constants, you can make the comparison. 2. What is this simple closed form? – Ian Feb 08 '16 at 00:02