0

My goal is to generate around 1500 distinct random numbers, from range 8000.

I receive (blockchain ChainLink VRF - connected to random oracle, but that's not important):

  • variable array of uint256[] randomValues: unsigned values <0, 2^256-1>
  • i convert this field right away to numbers from my range, using ( randomValues[i] MOD 8000 ) + 1.
  • Since I want to create distinct numbers, I guess I should be calling generating function until I have enough of them.

My question is if you know about more efficient way how to generate array of distinct random numbers from array of numbers like [ 106182753482634914995248705155707514089551099076291415270692202660364749508656, 45628802998066180854454684078101263717187335597784096973096012883499998820379, .. ]?

wtdmn
  • 43
  • Btw this is how that uint256 array is generated in first place: expandedValues = new uint256[](n); for (uint256 i = 0; i < n; i++) { expandedValues[i] = (uint256(keccak256(abi.encode((randomNumber) + 1, i))) % 8000) + 1; } – wtdmn Jan 26 '22 at 14:33

1 Answers1

0

Ok, this is more practical than theoretical, but maybe someone will find it useful. I decided to go another way on this one (using python3 random module).

  1. I shuffle generated list of numbers 1..10000 using Mersenne twister (MT)
  2. python3 random.shuffle() is using Fisher-Yates shuffle; as a randomness initiator for random I am using large integer, I use it to seed MT
  3. for 1500 list items MT needs seed of size = log2(1500!) bits, note that for python3 seed longer than 20.000 bits is cut
  4. after shuffle, first 1500 list items are kept
  5. I have 1500 items from range

As @r.e.s. mentioned in comments, MT has 219937−1 possible states, that's why single run has to be held under 2080 items - to be able to reach every random permutation.

Also random.sample() seems to be an option.

wtdmn
  • 43
  • What is your source for saying there's a 20000-bit limit on the random seed length? – r.e.s. Feb 04 '22 at 06:20
  • @r.e.s. there is nice explanation for it here: link, also discussion on 20000b (2500B) here: link. Didn't go deep but basically it seems to be same for 3.4 - 3.8. – wtdmn Feb 04 '22 at 10:01
  • I think the answer at that first link is wrong, and I've posted a comment there explaining why. The discussion at the second link is interesting, but seems to concern only the default seeding, not the user-input seed. As described in the Python docs, if the input is an int -- which is limited in size only by available memory -- then all of its bits get used directly. – r.e.s. Feb 04 '22 at 18:48
  • 1
    (Of course, that arbitrarily large seeds will have all of their bits used to seed the MT doesn't change the fact that the MT has "only" $2^{19937}-1$ possible states -- not enough to simulate a uniform distribution on a set of more than about $2080!$ possibilities.) – r.e.s. Feb 04 '22 at 19:33
  • @r.e.s. about 2080 limit I know, that thing you said about no-limit I will go through. Guess I will have to dig deeper. Updating answer. Good job. – wtdmn Feb 04 '22 at 19:42
  • Since you're using Python, why not just use random.sample(range(8000), 1500)? The docs for that say the sample is returned in the order selected, so the number of possible such ordered samples is $8000!/6500!=2^{19232^-},$ which is well under the number of possible MT states ($2^{19937}-1$). – r.e.s. Feb 04 '22 at 20:52
  • When I was writing this, I wasn't looking into source code much and I wanted F-Y which I knew was the random.shuffle. I guess this is also possibility. – wtdmn Feb 04 '22 at 20:57