0

The entropy of a password of a fixed length $n$ and $c$ possible characters is calculated by $n*log_2(c)= log_2(c^n)$, see also here: https://ritcyberselfdefense.wordpress.com/2011/09/24/how-to-calculate-password-entropy/

Assuming that I have a kind of "blackbox" that has unknown requirements on a password, especially like: Min occurences of a certain group of characters, different possible characters for different character positions, etc. I can give this blackbox a password that I generated and ask if it is an accepted one or not.

Is it possible to estimate the entropy of the "room" of available passwords by generating an amount of random passwords (lets say 1 million) and see how many have been accepted? What I do know of the room of available passwords is which characters may be used, so I only have to try passwords with these characters, not with the whole UTF-8 table. From this ratio $r$ of tried and accepted password, I can calculate the entropy with a formular like this:

$e_{estimated} = log_2(r*c^n)$

From my tests this formular works quote well for an estimation of the entropy. I am looking for another formular that can tell me how many passwords I have to try to make an assumtion like this:

With a probability of $p$, $|e_{actual} - e_{estimated}| < epsilon$

Is something like this even possible?

1 Answers1

1

Yes, this is entirely possible. As long as your test passwords are randomly distributed in the space, you will get the measure you are seeking. The hard part comes if the requirements on passwords allow most or very few of them to pass. You will detect that, but will not get a good measure.

As an example, let us assume a character set of $26$ upper case letters, $26$ lower case letters, $10$ numbers and $10$ symbols, for a total of $62$ characters, and passwords $12$ characters long. If the requirement is just one symbol or number, there are $62^{12}\approx 3.23 \cdot 10^{21}$ character strings, of which $62^{12}-52^{12}\approx 2.84 \cdot 10^{21}$ are valid passwords. That is about $90\%$ valid, and your sampling will work well. If we allowed all passwords, your sampling would show that but you might worry that since you only sampled a tiny fraction of the space there are some invalid ones out there.

To estimate the error, you can use the binomial distribution. If you do $N$ samples and the fraction of valid passwords is $p$, the standard deviation of the number of valid ones found is $\sqrt{Np(1-p)}$ This is valid when $N$ is large and $p, 1-p$ are not too close to zero. Since $\sqrt {p(1-p}$ has a maximum of $1/2$, you can say the standard deviation of the fraction of valid passwords is less than $2/\sqrt N$

Ross Millikan
  • 374,822
  • Thank you very much! Could you elaborate not too close to zero? Thanks – mario.schlipf Oct 08 '15 at 14:05
  • 1
    If every password you try is valid, you will estimate $p=1$ and get $0$ for the standard deviation, so will declare with certainty that all passwords are valid. Clearly you can't be sure of this-there might be just one that you didn't try that is valid. There are approximations made in converting the binomial to normal, which fail for small numbers, but what is acceptable depends on the error you can tolerate. If you have less than $10$ passes or failures, I would review the derivation to check the approximations. Several tens should be OK for most purposes. – Ross Millikan Oct 08 '15 at 15:20
  • 1
    Intuitively (but we are using the normal distribution to assess the approximation that got us there) if you have $n$ failures and mostly successes, the standard deviation in the number of failures is $\sqrt n$. We have taken $1-p \approx 1$ to get this. If $n=10, you are only $3\sigma$ from $0$, which is where I got the rough number. – Ross Millikan Oct 08 '15 at 15:23