2

Let $h:D\to [m]$ be a hash function that maps some object from the set $D$ to a natural number $k$ ($1 \leq k \leq m$) with probability $1/m$ for every $k\in [m].$ A falsely colliding pair are two elements $d_1, d_2 \in D$ with $d_1 \neq d_2$ but $h(d_1) = h(d_2).$ The probability for a collision between $d_1$ and $d_2$ is $1/m$ because we have $m$ possible positions where they could collide, for each of which there is a $1/m^2$ chance that both elements get mapped to this position. Is my argumentation correct?

kelalaka
  • 1,637
Hilberto1
  • 1,046
  • $1/m$. $d_1$ has 1 probability to be any value and $d_2$ has $1/m$ probability to hit. – kelalaka Apr 25 '21 at 15:18
  • @kelalaka isn't that the same argumentation as mine? – Hilberto1 Apr 25 '21 at 15:35
  • No. A collision is a collision, the position is not important. – kelalaka Apr 25 '21 at 15:40
  • @kelalaka I don't really see that yet. We have $m$ possible values for the hash values, for each of the $m$ integers we have a probability of $1/m^2$ that $d_1$ and $d_2$ "hit" this number, so where is the mistake? – Hilberto1 Apr 25 '21 at 15:47
  • The first element can be anywhere, then the second element has 1/m change to hit the same position. There is a great article on wikipedia about this. This is classical birthday problem(paradox). – kelalaka Apr 25 '21 at 16:25
  • @kelalaka the source is great. I think a simpler explanation would just be that the probability of $d_2$ being at another position is $\frac{m-1}{m}$ and then it becomes clear. Thanks! – Hilberto1 Apr 25 '21 at 16:31
  • Yes, that's it. Note that Birthday attack is everywhere in Cryptography. – kelalaka Apr 25 '21 at 17:53
  • This argument is also correct, though. – aschepler Apr 25 '21 at 18:11
  • @aschepler the premise is wrong. A collision doesn't require to occur at a specific position. Two input values $a$ and $b$ have a collision if $h(a) = h(b)$. This is the definition of collision. What described is the probability of hitting position $x$ both by $a$ and $b$. It has a probability $1/m^2$ and from this, if we free the position, we have $m$ than back to the collision case! – kelalaka Apr 26 '21 at 00:55
  • @kelalaka The argument as I read it is: $P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$ – aschepler Apr 26 '21 at 13:21
  • @aschepler that is one of the way to read, and that is unnecessarily complex. – kelalaka Apr 26 '21 at 13:45
  • @kelalaka There are other ways to read the key sentence in this question? Complex or not, it already has a correct explanation. – aschepler Apr 26 '21 at 13:51
  • @aschepler Ok, I've updated the answer. If the function is not uniform and we have the details, this is the way to calculate it. Otherwise it is more complex than the naive way. – kelalaka Apr 26 '21 at 14:04

1 Answers1

0

This is the initial step of the classical birthday paradox and in cryptography it is called the birthday attack.

Consider a uniform hash function with output space size $m$. Then the probability of collision of two distinct input values to have the same hash value, $a,b\in D, a\neq b$ and $h(a) = h(b)$ is $1/m$.

Consider that $h(a)$ can be any value equal likely and $h(b)$ has $1/m$ probability of hitting the same value, i.e. $h(a) = h(b)$, and $(m-1)/m$ probability of having different value, i.e. $h(a) \neq h(b)$.


In the case that you are calculating the probability of the collision by combining for each position as aschepler stated.

$$P(\mathrm{collision}) = \sum_{x \in \mathbb{Z}_m} P(\mathrm{collision~at~}x) = m(1/m^2) = 1/m$$

This is another way to achieve this. For uniform random functions, this is more complex than the standard way, however, if the function is not uniform this is the way to go.


Note: Usually, in cryptography, the hash functions are considered $$H:\{0,1\}^*\to \{0,1\}^n$$ where ${ }^*$ is the Kleene star indication the strings of 0 and 1 in arbitrary sizes, including the empty string.

Then the output space contains $2^n$ elements.

kelalaka
  • 1,637