1

Please correct me if I got it wrong.

I am trying to learn MinHash algorithm by implementing one. The hash function needs better to be min-wise independent instead of pairwise independent. I'm testing both to understand the different effects on the algorithm.

I read that pairwise independence or strong universality means that no pairs of different numbers in input set to be hashed into the same output value.

If I do output = (i + 1) % n then no pair of different numbers will be hashed into same value, yet I see a strong correlation of input and output. The output is not random.

Can I say this algorithm has strong universality? If yes, what is the property called, when a hash algorithm that can greatly randomize the input?

(Reviewer: please check comments!)

dz902
  • 113
  • Sorry for the typo, I've edited the expression. i for input. – dz902 Dec 30 '21 at 10:09
  • Why is this question kept closed for lacking of context? I've edited the question to include background (learn MinHash by implementation), motivation (compare hash efficiency), my understanding of the definition (pairwise independent = no same pair hashed into same bucket), my strategy (output = (i + 1) % n). What else context on the earth do I need to provide? – dz902 Jan 05 '22 at 06:55
  • I voted to reopen because I was awake and saw the changes. Please be patient. – Sean Roberson Jan 05 '22 at 07:13

0 Answers0