# How expensive are the union and intersection of two unordered_set in C++?

If you are using a modern C++ (C++11 or better), you have access to set data structures (unordered_set) which have the characteristics of a hash set. The standard does not provide us with built-in functions to compute the union and the intersection of such sets, but we can make our own. For example, the union of two sets A and B can be computed as follow:

```out.insert(A.begin(), A.end());
out.insert(B.begin(), B.end());
```

where out is an initially empty set. Because insertion in a set has expected constant-time performance, the computational complexity of this operation is O(size(A) + size(B)) which is optimal.

If you are a computer scientist who does not care about real-world performance, your work is done and you are happy.

But what if you want to build fast software for the real world? How fast are these C++ sets?

I decided to populate two sets with one million integers each, and compute how how many cycles it takes to compute the intersection and the union, and then I divide by 2,000,000 to get the time spent per input element.

 intersection (unordered_set) 100 cycles/element union (unordered_set) 250 cycles/element

How good or bad is this? Well, we can also take these integers and put them in sorted arrays. Then we can invoke the set_intersection and set_union methods that STL offers.

 set_intersection (std::vector) 3 cycles/element set_union (std::vector) 5 cycles/element

That’s an order of magnitude better!

So while convenient, C++’s unordered_set can also suffer from a significant performance overhead.

What about std::set which has the performance characteristics of a tree? Let us use code as follows where out is an initially empty set.

```std::set_union(A.begin(), A.end(), B.begin(), B.end(),
std::inserter(out,out.begin()));
```
 set_intersection (std::set) 150 cycles/element set_union (std::set) 750 cycles/element

As we can see, results are considerably worse.

The lesson is that a simple data structure like that of std::vector can serve us well.

### Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

## 14 thoughts on “How expensive are the union and intersection of two unordered_set in C++?”

1. bsf says:

It would have been cool to compare std::map and std::set, as these are ordered by design but have the pointer chasing issues.

1. bsf says:

Or are you not including the time spent to insert() in your measurements? That’s a little confusing.

If I have to add the elements to a set in order to sort them (by your proposed algorithm) then we should be including that time.

Anyways, if you are including insert time then it doesn’t compare a plain prepopulated set, and if you aren’t including insert time then the numbers are not totally correct.

I think

1. Or are you not including the time spent to insert() in your measurements? That’s a little confusing.

I think I describe the benchmark accurately, and I provide the full source code that you can review. There should be no confusion.

The test is as follow: take two objects of some class (std::vector, std::set, std::unordered_set) and produce a new one of the same class that represents either the union or the intersection.

So, we do not include the time necessary to create the original two sets, but we measure the time required to create either the intersection or the union.

Obviously, you can question whether it is “fair” but that becomes subjective.

2. I have updated my code and the blog post to include results with std::set.

2. Dan says:

Have you tried a flat_set implementation? I’d imagine it would avoid some of the data locality cost.

3. alanwj says:

You need to tell us which implementation of C++ you used for any benchmarks to be meaningful.

It would also be interesting to run your benchmarks on several implementations and compare results.

1. You need to tell us which implementation of C++ you used for any benchmarks to be meaningful.
It would also be interesting to run your benchmarks on several implementations and compare results.

I do a lot better than that… I provide my source code so that you can get your own numbers quickly.

4. Mathias Gaunard says:

That comparison is pretty meaningless.
Of course set_intersection is faster. The costly operation here is sorting.

But still, an unordered_set is a different thing than a sorted array, it is optimized for lookup, not merging.

1. The costly operation here is sorting.

There is no sorting done in the case of unordered_set.

But still, an unordered_set is a different thing than a sorted array, it is optimized for lookup, not merging.

I agree but I am not sure it is a priori obvious to the programmer.

2. Sorting integers is actually pretty cheap. I got something like 4-10 cycles/element, depending on the contents of the array.

The lesson is that random memory accesses are expensive, and it’s hard to avoid them in a data structure that supports updates.

Mutable data structures are optimized for latency. If you’re more interested in throughput, batch processing with sorted arrays can easily be an order of magnitude faster.

5. Mutable data structures are optimized for latency. If you’re more interested in throughput, batch processing with sorted arrays can easily be an order of magnitude faster.

Yes.

6. Sami says:

Comment avez vous fait pour calculer le nombre de cycles? Il y’a une option dans le compilateur qui permet ça?
Merci