I have asked questions about Number of unique integer in randomly generated arrays. Suppose we have $10^6$ random generated numbers, we should have about $~6*10^5$ unique numbers. I wrote a custom sort algorithm based on this assumption.
Pseudocode :
P := 0
RecursiveSort(Array A):
If length of A <= 1:
return A
Find maximum and minimum of A.
Create Array B of length A.
P := P + length A // For evaluating purpose
For each number I in A:
I := (I - min) / (max - min) // Rescaling I based on max and min of A
K := I multiply by the length of A // 0<=K<=Length of A
K := round(K) // Make K an index in Array B
Put I in B[K]. // Each B[K] may hold several I
Create Array Result of length 0.
For each array L in B:
For each number I in RecursiveSort(Array L):
Put I in Result
Return Result
After several runs, I noticed that P / length Awas approximately 1.73 for uniform distribution and 2.19 for normal distribution. Why is that so?
For uniform distribution, after each iterations the size of Array A reduce by $e$ based on the assumption, shouldn't P / length A be approximately 1.58 ($ 1 + e^{-1} + e^{-2} + e^{-3} + .. + e^{-n}$) ?
Here is actual python code:
import numpy as np
import random
A = []
lim = 1000000
#Generating normal distribution array.
# a_normal = np.random.normal(lim * 1000 / 2, lim * 1000 / 6, lim)
# a_normal = np.random.normal(lim * 1000 / 6, lim * 1000 / 4, lim)
# a_normal = np.random.normal(0, lim , lim)
# A = a_normal.tolist()
for _ in range(lim):
A.append(random.randrange(lim * 100, lim * 1000 - 1))
P = 0
def custom_sort(A):
if (len(A)) <= 1:
return A
mi = min(A)
ma = max(A)
global P
P += len(A)
if ma == mi: # if all numbers are the same
return A
B = [[] for _ in range(len(a))]
for I in A:
K = int((i - mi) / (ma - mi) * (len(a) - 1))
B[K].append(I)
result = []
for L in B:
for I in custom_sort(L):
result.append(I)
return result
sort(A)
print(P / len(A))