Colliding pairs - hash functions

Question

Let $h:D\to [m]$ be a hash function that maps some object from the set $D$ to a natural number $k$ ($1 \leq k \leq m$) with probability $1/m$ for every $k\in [m].$ A falsely colliding pair are two elements $d_1, d_2 \in D$ with $d_1 \neq d_2$ but $h(d_1) = h(d_2).$ The probability for a collision between $d_1$ and $d_2$ is $1/m$ because we have $m$ possible positions where they could collide, for each of which there is a $1/m^2$ chance that both elements get mapped to this position. Is my argumentation correct?

$1/m$. $d_1$ has 1 probability to be any value and $d_2$ has $1/m$ probability to hit. — kelalaka, Apr 25 '21 at 15:18
No. A collision is a collision, the position is not important. — kelalaka, Apr 25 '21 at 15:40
@kelalaka I don't really see that yet. We have $m$ possible values for the hash values, for each of the $m$ integers we have a probability of $1/m^2$ that $d_1$ and $d_2$ "hit" this number, so where is the mistake? — Hilberto1, Apr 25 '21 at 15:47
The first element can be anywhere, then the second element has 1/m change to hit the same position. There is a great article on wikipedia about this. This is classical birthday problem(paradox). — kelalaka, Apr 25 '21 at 16:25
@kelalaka the source is great. I think a simpler explanation would just be that the probability of $d_2$ being at another position is $\frac{m-1}{m}$ and then it becomes clear. Thanks! — Hilberto1, Apr 25 '21 at 16:31
Yes, that's it. Note that Birthday attack is everywhere in Cryptography. — kelalaka, Apr 25 '21 at 17:53
@aschepler the premise is wrong. A collision doesn't require to occur at a specific position. Two input values $a$ and $b$ have a collision if $h(a) = h(b)$. This is the definition of collision. What described is the probability of hitting position $x$ both by $a$ and $b$. It has a probability $1/m^2$ and from this, if we free the position, we have $m$ than back to the collision case! — kelalaka, Apr 26 '21 at 00:55
@kelalaka The argument as I read it is: $P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$ — aschepler, Apr 26 '21 at 13:21
@aschepler that is one of the way to read, and that is unnecessarily complex. — kelalaka, Apr 26 '21 at 13:45
@kelalaka There are other ways to read the key sentence in this question? Complex or not, it already has a correct explanation. — aschepler, Apr 26 '21 at 13:51
@aschepler Ok, I've updated the answer. If the function is not uniform and we have the details, this is the way to calculate it. Otherwise it is more complex than the naive way. — kelalaka, Apr 26 '21 at 14:04

kelalaka · Accepted Answer · 2021-04-26T14:02:06.843

0

This is the initial step of the classical birthday paradox and in cryptography it is called the birthday attack.

Consider a uniform hash function with output space size $m$. Then the probability of collision of two distinct input values to have the same hash value, $a,b\in D, a\neq b$ and $h(a) = h(b)$ is $1/m$.

Consider that $h(a)$ can be any value equal likely and $h(b)$ has $1/m$ probability of hitting the same value, i.e. $h(a) = h(b)$, and $(m-1)/m$ probability of having different value, i.e. $h(a) \neq h(b)$.

In the case that you are calculating the probability of the collision by combining for each position as aschepler stated.

$$P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$$

This is another way to achieve this. For uniform random functions, this is more complex than the standard way, however, if the function is not uniform this is the way to go.

Note: Usually, in cryptography, the hash functions are considered $$H:\{0,1\}^*\to \{0,1\}^n$$ where ${ }^*$ is the Kleene star indication the strings of 0 and 1 in arbitrary sizes, including the empty string.

Then the output space contains $2^n$ elements.

edited Apr 26 '21 at 14:02

answered Apr 25 '21 at 18:04

kelalaka

1,637

So my explanation was indeed correct. – Hilberto1 Apr 26 '21 at 15:29
Yes, if we read this way, maybe my first reading was incorrect. What is false collision? – kelalaka Apr 26 '21 at 15:32
A false collision is a pair of distinct objects of the domain of $h$ who get mapped to the same hash value – Hilberto1 Apr 26 '21 at 15:47
Then what is true collision? What you describe is the collision. – kelalaka Apr 26 '21 at 15:49
Ok that's true, I don't know why I used the word "false"... – Hilberto1 Apr 26 '21 at 15:52
so my calculation just refers to a "collision" – Hilberto1 Apr 26 '21 at 15:52
Yes, That is why I linked in the comments. – kelalaka Apr 26 '21 at 16:27

Colliding pairs - hash functions

1 Answers1