Non-comparison based sorting algorithm based on "Number of unique integer in random generated arrays"

Question

I have asked questions about Number of unique integer in randomly generated arrays. Suppose we have $10^6$ random generated numbers, we should have about $~6*10^5$ unique numbers. I wrote a custom sort algorithm based on this assumption.

Pseudocode :

P := 0
RecursiveSort(Array A):
  If length of A <= 1:
    return A
Find maximum and minimum of A.
  Create Array B of length A.
P := P + length A                       // For evaluating purpose
For each number I in A:
    I := (I - min) / (max - min)          // Rescaling I based on max and min of A
    K := I multiply by the length of A        // 0<=K<=Length of A
    K := round(K)                         // Make K an index in Array B
    Put I in B[K].                        // Each B[K] may hold several I
Create Array Result of length 0.
  For each array L in B:
    For each number I in RecursiveSort(Array L):
       Put I in Result
Return Result

After several runs, I noticed that P / length Awas approximately 1.73 for uniform distribution and 2.19 for normal distribution. Why is that so?

For uniform distribution, after each iterations the size of Array A reduce by $e$ based on the assumption, shouldn't P / length A be approximately 1.58 ($ 1 + e^{-1} + e^{-2} + e^{-3} + .. + e^{-n}$) ?

Here is actual python code:

import numpy as np
import random
A = []
lim = 1000000
#Generating normal distribution array.
# a_normal = np.random.normal(lim * 1000 / 2, lim * 1000 / 6, lim) 
# a_normal = np.random.normal(lim * 1000 / 6, lim * 1000 / 4, lim)
# a_normal = np.random.normal(0, lim , lim)
# A = a_normal.tolist()
for _ in range(lim):
  A.append(random.randrange(lim * 100, lim * 1000 - 1))
P = 0
def custom_sort(A):
  if (len(A)) <= 1:
    return A
mi = min(A)
  ma = max(A)
global P
  P += len(A)
if ma == mi: # if all numbers are the same
    return A
B = [[] for _ in range(len(a))]
  for I in A:
    K = int((i - mi) / (ma - mi) * (len(a) - 1))
    B[K].append(I)
result = []
  for L in B:
    for I in custom_sort(L):
      result.append(I)
  return result
sort(A)
print(P / len(A))

Your pseudo-code assigns $P=0$. Then you add the length of the input array to it once. Surely $P/length(A) = 1$ always. (Edit: Ok, seems like you have a unknown loop condition here. Checking if A is sorted or something i assume.) — Mikael Öhman, Aug 06 '21 at 17:10
I've edited it to make it more clear. P is not inside the Sort function. — Era, Aug 06 '21 at 17:13
@MikaelÖhman The pseudocode does not match the Python code. But I cannot see what either are actually simulating — Henry, Aug 06 '21 at 17:18
It's like a weird radix sort that does weird stuff for reasons I can't understand. — Asinomás, Aug 06 '21 at 17:24

Karl · Accepted Answer · 2021-08-06T17:46:45.087

1

This is a bucket sort with $k$ (the number of buckets) equal to the input length. It runs in linear time if the input data is uniformly distributed, since all the buckets are small after the first pass.

You can optimize a bucket sort for any input distribution by using that distribution's CDF as the bucketing function; if the input matches the expected distribution, then the CDF values will be uniformly distributed between 0 and 1.

So e.g. if you expect input data to be normally distributed, you could start by computing the mean and variance of the data and then bucketing using a normal CDF with those parameters.

edited Aug 06 '21 at 17:46

answered Aug 06 '21 at 17:40

Karl

11,446

1

I guess you're also asking about your theoretical estimate of the runtime constant factor for your recursive algorithm. Maybe someone else can answer that part. – Karl Aug 06 '21 at 17:51
I am curious about that. After setting array size to be $10^6$, and generated range from $(0, 10^6)$ to $(0, 10^{16})$, the sort functions took the same time (2 seconds) and return the same P / length A ~ (1.73). – Era Aug 06 '21 at 17:58
2

Hmm, if the input is perfectly uniform then the runtime should be $T(n)=n+kT(n/k)$, so $T(n)=n\log_kn$. It's like a Quicksort with $k-1$ pivots (instead of 1) at each stage. – Karl Aug 06 '21 at 18:06

Non-comparison based sorting algorithm based on "Number of unique integer in random generated arrays"

1 Answers1