0

This is the question.

Suppose a column has $m$ $1$’s and therefore $n − m$ $0$’s, and we randomly choose $k$ rows to consider when computing the minhash. Prove that the probability of getting “don’t know” as the minhash value for this column is at most $\left(\frac{n-k}n\right)^m$.

This is the solution.

The number of columns with $m$ $1$’s out of $n$ is $nCm$. The number of these columns that have no $1$ in one of the $k$ selected rows is $(n-k)Cm$. The probability of no $1$ in the chosen $k$ rows is therefore the latter divided by the former. If we expand the binomial coefficients in terms of factorials, we get $\frac{(n-k)!m!(n-m)!}{m!(n-k-m)!n!}$. The $m!$’s cancel, and when we reorganize we can write this expression as $\left(\frac{n-k}n\right)\left(\frac{n-k-1}{n-1}\right)\ldots\left(\frac{n-k-m+1}{n-m+1}\right)$. Each of the $m$ factors is at most $n−kn$. Thus, their product is at most $(n−kn)m$.

In the solution, I have a question. Why is $(n-k)Cm$ not $\frac{(n-k)!}{m!(n-k-m)!}$ but $\frac{(n-k)!m!(n-m)!}{m!(n-k-m)!n!}$?

Sorry for poor readability. I don't know how to put fractions. Please tell me how to put it, and I will fix it.

Brian M. Scott
  • 616,228
SeHoon
  • 1

1 Answers1

1

The fraction

$$\frac{(n-k)!m!(n-m)!}{m!(n-k-m)!n!}$$

isn’t $\binom{n-k}m$: it’s

$$\frac{\binom{n-k}m}{\binom{n}m}\,,$$

the probability of no $1$ in the chosen $k$ rows.

Brian M. Scott
  • 616,228