6

I tried testing random integers for compliance with Benford's law, which they are apparently supposed to do. However, when I try doing this with Python,

map(lambda x:str(x)[0], [random.randint(0, 10000) for a in range(100000)]).count('1')

I get approximately equal frequencies for all leading digits. Why is this the case? Might it have something to do with how the pseudorandom number generator, the Mersenne twister, works?

Hypercube
  • 576
  • 1
  • 5
  • 14
  • 3
    Benford's law certainly does not assume that one chooses randomly the integers in $[1,10^n]$. – Did Oct 21 '12 at 08:00
  • Therefore, this is the distribution expected if the mantissae of the logarithms of the numbers (but not the numbers themselves) are uniformly and randomly distributed. (Wikipedia) – martini Oct 21 '12 at 08:01

5 Answers5

8

The linked paper is titled unfortunately, at least as regards the current conception of the word "random." The whole point of Benford's law is precisely that it doesn't hold when integers are drawn uniformly from a range that, like yours, ends at a power of $10$: a well-designed pseudorandom number generator should give numbers with asymptotically exactly a $\frac{1}{10}$ chance of each leading digit $0,1,...,9$ in decimal notation.

Benford's law applies not to properly random sources of data, but to data produced by certain real-life random processes. One simple characterization of data that obey Benford's law is as those produced by processes that exhibit more-or-less exponential growth, such as populations. Such processes produce uniformly distributed logarithms of data points, but this uniform distribution gets splayed out into the characteristic Benford's law upon exponentiation.

Kevin Carlson
  • 52,457
  • 4
  • 59
  • 113
  • I have a cryptographic (?) random number generator that i thought was obeying Benford's law because I did see a leading 1 more often than anything else. But checking with multiple sets of 100k, I see 1 @ ~52% and all other digits ~6%. Any ideas what kind of distribution would do that or if that's expected? – mafafu Aug 30 '16 at 18:54
  • 2
    I think I have answered my own question. I was generating unsigned 64bit random integers, so my upper bound base10 is ~1.8x10^19. So ~44% of the numbers in this space are going to be greater than 1x10^19 and thus start with 1. That with the other numbers with fewer digits that start with 1 skewed my results. The quickest way to verify this was to switch to uint32 to see the distribution pile around 1,2,3 and drop off from there. If I cap the random numbers at some value of repeating 9's, I see the equivalent distribution you mentioned. This makes me feel more comfortable with my CRNG. – mafafu Aug 30 '16 at 19:31
  • Yep, that sounds right. – Kevin Carlson Aug 30 '16 at 19:55
4

Benford's law applies only to distributions that are scale-invariant and thus applies approximately to many real-life data sources, especially when we measure with arbitrary units: If the leading-digit distribution of a sample is essentially the same whether we measure in inches or centimeters, this is only possible if the logarithm is equidistributed (or approximately so over a range wide enough to cover several orders of magnitudes).

2

Thank you for the answers, but I think I found a more explicit source of the error by reading the paper more carefully.

Benford's law does apply to random integers, but only as the upper and lower bounds go to infinity. The limit $\lim_{N\to \infty} P_N^1(1)$ (the proportion of integers from 1 to $N$ which have a leading digit of 1, as $N$ goes to infinity) diverges, and equals 1/9 at every $10^n, n\geq 2$, which explains my result. If I set the upper bound on the random integers to 12000, for example, I get different results.

Hypercube
  • 576
  • 1
  • 5
  • 14
  • 1
    Sure, and my answer mentioned that property of the range you were picking from, but you don't get Benford's law just by picking your finite bound carefully and continuing to draw uniformly. – Kevin Carlson Oct 21 '12 at 09:11
  • What do you mean? Wouldn't it approach Bentford's law? – Hypercube Oct 21 '12 at 09:15
  • As you said yourself, the sequence $P^1_N$ diverges. In particular it oscillates forever, with the $\lim\sup$ too high for Benford and the $\lim\inf$ too low. That's why the author had to spend the next couple of pages constructing a sequence of limits of partial averages $P^m_N$ to finally derive the law. – Kevin Carlson Oct 21 '12 at 10:05
  • Yes, I meant as the lower bound goes to infinity. Thank you! – Hypercube Oct 21 '12 at 16:48
2

The simplest solution to approach the logarithmical distribution using mersenne twister is by using two dimensions of randomness. First the bandwidth for the amount of digits, say 15, secondly the bandwidth, say 10-100000. With more than 10,000 iterations the results shows an almost exact benford distribution.

Hope this helps.

Regards,

Dennes Spek

0

Sorry for the necro add, but I needed an answer to the title of the post and found all the answers unsatisfactory for what I wanted (generating random Benford numbers) and I figured I should post an answer if anyone has the same issue as me.

First up, (this is mostly to address the post of the question, not the title) Benford's law doesn't apply to randomly selected numbers between two values. If you select a random number between 1 and X (where X is really large) then the odds of the number starting with 1 is the same as it starting with 8. What Benford's law is saying, you're more likely to naturally run into the number 13 than the number 83, that it's more likely for '13' to naturally describe something than '83'.

So, anyway, if you're interested in generating Benford-applicable numbers, what you want is:

  1. Generate a random number between 0.000 and 1.000
  2. Calculate 10 to the power of that number
  3. This value is the coefficient for a value in scientific notation

(you can verify that these steps will produce numbers Benford numbers, because random values between 0.000 and 0.301 will produce values between 1.0 and 2.0 - aka, the ~30.1% Benford predicts; this calculation is basically a reverse of the process Benford uses to estimate the start-digit-odds with Log10(X).)

From here, you can decide what exponentiation rules you want, depending on the scale of numbers you want (realistically, a pure Benford number generator is very rarely going to output anything larger than 3 or 4 digits.) If you want the most realistic simulation, you can simply make each exponential value 1/10th as likely as the one before it (this would be a pure Benford prediction method.) Or if you want a wide range of scope with more common larger numbers, simply choose an exponent at random (the starting digit would be Benford compliant, but not the scale of integer.)

For the pure option:

  1. Generate a second random number between 0.000 and 1.000
  2. Calculate 0-(Log10(1-RandomValue))
  3. Round this value down to the nearest integer

... for generated values between 0.0 and 0.899999..., this value will be 0. For values between 0.9 and 0.899999999...., this value will be 1. Etc - basically, it's a mathematical way of figuring out how many 9's are in the number before a non-9, aka, making each exponent value 1/10th as likely as the one before it.

The final answer is FirstValue x 10^SecondValue, with the whole thing rounded to an integer.

Kevin
  • 131