4

I know there are many methods to scale a number from range $[A,B]$ to a range $[C,D]$, and I've searched over and over the web. I've seen this math.SE thread.

I need to scale a big number (signed 32-bit integer) between $-2147483648$ and $2147483647$, to a smaller range (from 0 to millions).

However, the original number is generated by a random number generator that I cannot control in any way, so I must to be sure that the output formula does not alter its randomness (in a way that I can demonstrate - academic papers are well accepted).

I need the correct way to scale this number taking in consideration that:

  • the output range must have as maximum value a power of two (e.g., $4194304$)
  • the formula can be demonstrated
  • the formula does not alter the randomness of the source number

Anyone may be of help?

  • Hi there, I hope you don't mind if I've improved your post slightly. Assuming you're the same person who asked me this question via email, I'm sure you'll get excellent answers here, better than anything I'd come up with. – Zev Chonoles Apr 19 '15 at 17:52
  • @ZevChonoles Hi. Thanks for the editing. Only after my e-mail I thinked about the fact that privately the community did not benefit of your answer. So i preferred to ask here to share the answer with others. Thanks for your help. – majornelson Apr 19 '15 at 18:09

3 Answers3

2

As other answers have noted, this is impossible; in particular, you start with a discrete distribution, where each outcome has a probability of $2^{-32}$. Now, if you take some function $f$ taking in such random outcomes and outputting an integer in the desired range, then it's clear that the probability of receiving some $n$ as the output equals $2^{-32}$ times the number of outcomes $x$ such that $f(x)=n$ - in particular, the probability is always a multiple of $2^{-32}$. Unfortunately, numbers like $\frac{1}3$ are not multiples of $2^{-32}$, so we can't get a uniform distribution on three numbers (be we can get a uniform distribution on $2^n$ outcomes for $n\leq 32$ just by truncating the number).

However, here's a standard and simple procedure to generate a random integer in $[1,N]$ given the capacity to generate a random one in $[1,M]$ for $M\geq N$. Your problem is easily under this class.

Suppose that $M=aN + b$ where $a$ is an integer and $b$ is positive and less than $N$ - that is, $b$ is the remainder of $M$ divided by $N$ and $a$ is the integer part of the quotient. Generate an integer $x$ in $[1,M]$. If it is less than or equal to $aN$, return the remainder of $x_0$ divided by $N$. Otherwise, generate another integer in $[1,M]$ and test if it is lower than $aN$. Repeat indefinitely.

That is, to generate a random number, you throw out certain results and try again. For instance, if we wanted to generate a random integer in $[1,2]$ given the ability to do so in $[1,3]$, we simply keep finding random numbers in $[1,3]$, until one of them is either $1$ or $2$. This is clearly a uniform distribution. In the very worst case, the above method throws out about $\frac{1}2$ of the results generated. However, if you are generating numbers in $[1,2^{32}]$ and seeking a random number between $1$ and a million, you will accept almost every (e.g. about 99.97%) random number you get, so there's no great loss in efficiency.

Milo Brandt
  • 60,888
0

It sounds like the input number is a sequence of 32 bits (interpreted as a signed integer). If the desired output is also a random sequence of $n$ bits, for some $n < 32$, then an easy way is to truncate the input sequence to the first $n$ bits. It's pretty clear that if the input is uniformly distributed, then the output will also be uniformly distributed.

But if the output is in a range of $N$ possibilities, where $N$ is not a power of 2, then it cannot be uniformly distributed among those possibilities. So you won't get your output to be completely random in that case. We can show this with a simple counting argument.

ADDED: A counting argument also proves the uniform distribution mentioned above. For example, suppose the input is uniformly distributed from $-2^{31}$ to $2^{31}-1$ and the desired output is from $0$ to $2^{22}-1$. We can truncate by throwing away the most significant bits with the function $y = (x + 2^{31}) \mod 2^{22}$. The function maps exactly ${2^{32}}/{2^{22}} = 2^{10}$ input values to each output value. We assumed that the probability of getting any particular input value is $1 / 2^{32}$, so the probability of getting any particular output value is $2^{10} / 2^{32} = 1 / 2^{22}$, which means that any output value is equally likely.

Hew Wolff
  • 4,074
  • Hi Hew, thanks for your answer. I can not simply truncate the number, because the input is signed and the output must be unsigned. – majornelson Apr 19 '15 at 18:18
  • But if it's a random sequence of bits, the interpretation of it as a signed or unsigned integer is up to you. For example, if you have a signed integer between -127 and 128, you can easily convert it to an unsigned integer between 0 and 255; they're both simply 8 bits. – Hew Wolff Apr 19 '15 at 18:22
  • Can you give an example of an output range that you're interested in? – Hew Wolff Apr 19 '15 at 19:29
  • From 0 to 4194304 (it's preferable to stay under 10 millions) – majornelson Apr 19 '15 at 19:31
  • So, $0$ to $2^{22}$. The size of that output range is not a power of $2$, so this is not possible (I added a note in my answer about this). On the other hand, if you want $0$ to $2^{22} - 1$, that's easy. – Hew Wolff Apr 19 '15 at 22:49
  • Yes, I managed it starting from 0 so it's good 0 to (2^22-1), since I need to mantain the number randomness. I think that the maximum will be 2^22 if we start from counting from 1 excluding 0. Right? – majornelson Apr 19 '15 at 23:07
  • Yes, if you set the output range to be from $0$ to $2^{22} - 1$ or from $1$ to $2^{22}$, the range size will be $2^{22}$ and it's easy to solve. For example, if the input is from $-2^{31}$ to $2^{31} - 1$ and the output is from $0$ to $2^{22} - 1$, $y = (x + 2^{31}) \mod 2^{22}$ works. – Hew Wolff Apr 20 '15 at 17:56
  • thanks a lot for your answer! Last thing: how that formula can be demonstrated? – majornelson Apr 20 '15 at 20:33
  • I added some details, see if that helps. – Hew Wolff Apr 21 '15 at 03:10
0

This works in special cases:

Let $[a,b]$ be an interval and consider a uniform distribution on the intervals in this interval. Let $k$ be the number of integers in $[a,b]$.

Let $[c,d]$ be another interval and suppose that the number of integers in this interval is $l$.

BIG ASSUMPTION: Suppose that $l\mid k$.

Then, any $\frac{k}{l}=m$ to $1$ map will give you a uniform distribution on $[c,d]$.

For example, if you have the number $n$, you could take $n\pmod l$ and then add $c$.

If you don't have the BIG ASSUMPTION, then you may be able to weight the map described here, (and not have it exactly $m$ to $1$, but that is more challenging and situation specific).

Michael Burr
  • 32,867
  • Thanks Michael, let's find out if I understood correctly. I my number is 1472398015, and my new range is 0-4194304, my new number will be: ((OutputRangeMax/(InputRangeMax2))(n + InputRangeMin) and so =(4194304/4294967294) * (1472398015 + 2147483648) = 3535040,69. How this can be demonstrated? – majornelson Apr 19 '15 at 18:49
  • This doesn't satisfy the big assumption: the original interval has $4294967296$ integers while the new range has $4194305$ integers and $4194305\nmid 4294967296$. Therefore, the construction won't work as stated. – Michael Burr Apr 19 '15 at 18:53
  • It's mandatory (for me) to scale it to a lower range, so the number of integers in the new range will be always less than the old range. – majornelson Apr 19 '15 at 19:05
  • It's ok to be less than, the symbol means divides. – Michael Burr Apr 19 '15 at 19:05
  • It's sure by my fault if I can fully understand your answer. I apologize. The example does not respect the big assumption, so you suggested to "weight the map described here". Perhaps "here" had to be a link? – majornelson Apr 19 '15 at 19:20