Time complexity of taking duplicates out of a set

Question

Imagine an algorithm, that takes a set as input, and returns that set, but without duplicates, as output.

The algorithm is bruteforce, it goes through each element in the input set, and one-by-one compares it to all elements in the output set. If the current element doesn't have a dupe in the output, it will be placed there.

Say, we have $\{5, 7, 2, 6, 2, 8, 4, 5\}$ as input (the elements could be anything), and the algorithm already scanned through $5$, $7$ and $2$.

It moves on to 6:
$6$ is compared to the first element of the output set, $5$. They aren't the same, so the algorithm moves on.
$6$ is then compared to $7$, and $2$.
There wasn't another $6$ found in the output set, so $6$ is added there ($\{5, 7, 2, 6\}$).

Next, the algorithm moves on to $2$, comparing it to $5$ , then $7$. But when $2$ is compared to the next element, $2$, they're the same, so the $2$ being compared to all elements in the output array is dupe, and is dropped. The algorithm then moves on to the next element...

The main question is: What is the algorithm's time complexity?

I have a JavaScript function that does this, if anyone wants it. — danik0011, Jul 14 '23 at 17:46
A question about removing duplicates? Now that's definitely on topic for MSE. ;-) — Franklin Pezzuti Dyer, Jul 14 '23 at 17:50
In the worst case (when all elements are different) there are $\frac{n(n-1)}{2}$ comparisons for a list of size $n$, which is $O(n^2)$. — Daniel P, Jul 14 '23 at 17:52
@FranklinPezzutiDyer About computational complexity. Yes, relevant to discrete mathematics. — NDB, Jul 14 '23 at 17:55
The algorithm you presented has a complexity of $O(n^2)$. We can actually achieve a complexity of $O(n)$ time using a hash table. We traverse the input set and while doing so store the elements of the set in a hash table. If an element is present in the table we do not insert into the table. Afterwards we traverse the hash table to make a set without duplicates. — prcssngnr, Jul 14 '23 at 18:08

score 1 · Answer 1 · answered Jul 14 '23 at 17:59

1

That is of order $n^2$.

I would sort the set. The time would then be order $n \log n$.

If the elements were bounded relative to the number of them, it could be done in order $n$ by using an array with size of the maximum possible value.

answered Jul 14 '23 at 17:59

marty cohen

107,799

prcssngnr · Answer 2 · 2023-07-14T18:33:38.083

1

The algorithm you described checks all elements in the output set and if the element is not present in the output set you put it in the set. If the input set has $n$ elements this algorithm has a time complexity of $O(n^2)$.

In fact close to linear time complexity can be achieved using a hash table. We traverse the input set and while doing so store the elements of the set in a hash table. If an element is present in the table we do not insert into the table. Afterwards we traverse the hash table to construct a set without duplicates. This gives us an average time complexity of $O(n)$, assuming a reasonable implementation of a hash table.

edited Jul 14 '23 at 18:33

answered Jul 14 '23 at 18:16

prcssngnr

1,191

I don't know how to talk about nontrivial hash tables without talking about hash collisions. Once you consider that, I think you get a worst-case time complexity of $O(n^2)$ even though in practice, with an intelligently-designed hash function, it's close to if not actually $O(n)$. – Brian Moehring Jul 14 '23 at 18:23
A good point. If I were to be more precise the average time complexity is $O(n)$ if a reasonable implementation of a hash table is used. – prcssngnr Jul 14 '23 at 18:32

Time complexity of taking duplicates out of a set

2 Answers2