3

What is the computational complexity of this task? The goal is to compute the number $x=\min(\Bbb N\setminus A)$, where $A$ is the input list and the complexity parameter is $n=|A|$ (which is finite).

This is a common problem when generating a unique identifier: you want a number that is not equal to any other in the "pool" of other identifiers. A simpler algorithm just takes $x=\max(A)+1$, and assuming that $A$ is represented as a list of length $n$, calculating this is $O(n)$. But this leaves "gaps" in the set (assuming we are adding these new numbers to the pool $A:=A\cup\{x\}$ repeatedly), and this can be undesirable. A "gap-filling" algorithm uses $x=\min(\Bbb N\setminus A)$ instead, but this is a more complex operation.

A lower bound on the complexity is of course $O(n)$, since one must at least observe the entire input. An upper bound is $O(n\log n)$, since we can sort the list $A$ into order, then just walk down the list until we find an $x$ such that $A(x)\ne x$.

  • Forgive me for conflating lists and sets in the description of $A$. For the purposes of computing the complexity, $A$ is a list, or function $A:n\to\Bbb N$, and the set $A$ is really ${\rm ran}(A)$. – Mario Carneiro Aug 10 '13 at 14:40

2 Answers2

2

If $n=0$, trivially $f(A)=1$. Otherwise, let $m=\frac n2$. In one pass, split $A$ into $A_1=\{\,a\in A\mid a\le m\,\}$ and $A_2=\{\,a-m\mid a\in A, a>m\,\}$. If $|A_1|<m$ return $f(A_1)$, otherwise return $m+f(A_2)$. For the time needed, we obtain $T(n)\le n+T(\frac n2)$, hence $T(n)=O(n)$. If $A$ is given as a linked list and we don't mind if the order of the list is affected we can undo the splitting and subtracting before returning and need only $O(\log n)$ addtional memory for the recursion (as opposed to $O(n)$ memory with dkuper's solution).

  • You can avoid the subtraction as well if you generalize the problem to $f(k,A)=\min((\Bbb N+k)\setminus A)$, where $A\subseteq\Bbb N+k$. Then let $m=k+\frac n2$, and $A_1={a\in A\mid a\le m}$ and $A_2={a\in A\mid a>m}$, and then $f(k,A)=f(k,A_1)$ if $|A_1|<m-k$ and $f(k,A)=f(m,A_2)$ otherwise. – Mario Carneiro Aug 10 '13 at 15:58
1

The problem is linear, here is a proposed algorithm.

Create a table $t$ of booleans of size $n$ initialized to $0$. Then go through $A$. Every time you see $x\in [0,n-1]$, you put a $1$ in $t(x)$. If $x$ is bigger than $n$ you just ignore it.

In the end you just need to go through $t$ and stop as soon as there is a $0$. So you do at most $2n$ steps.

Denis
  • 6,945
  • 1
  • 21
  • 22