If you are using a modern C++ (C++11 or better), you have access to set data structures (`unordered_set`) which have the characteristics of a hash set. The standard does not provide us with built-in functions to compute the union and the intersection of such sets, but we can make our own. For example, the union of two sets A and B can be computed as follow:

out.insert(A.begin(), A.end()); out.insert(B.begin(), B.end());

where `out` is an initially empty set. Because insertion in a set has expected constant-time performance, the computational complexity of this operation is *O*(size(A) + size(B)) which is optimal.

If you are a computer scientist who does not care about real-world performance, your work is done and you are happy.

But what if you want to build fast software for the real world? How fast are these C++ sets?

I decided to populate two sets with one million integers each, and compute how how many cycles it takes to compute the intersection and the union, and then I divide by 2,000,000 to get the time spent per input element.

intersection (unordered_set) |
100 cycles/element |

union (unordered_set) |
250 cycles/element |

How good or bad is this? Well, we can also take these integers and put them in sorted arrays. Then we can invoke the `set_intersection` and `set_union` methods that STL offers.

set_intersection (std::vector) |
3 cycles/element |

set_union (std::vector) |
5 cycles/element |

That’s an order of magnitude better!

So while convenient, C++’s `unordered_set` can also suffer from a significant performance overhead.

What about `std::set` which has the performance characteristics of a tree? Let us use code as follows where `out` is an initially empty set.

std::set_union(A.begin(), A.end(), B.begin(), B.end(), std::inserter(out,out.begin()));

set_intersection (std::set) |
150 cycles/element |

set_union (std::set) |
750 cycles/element |

As we can see, results are considerably worse.

The lesson is that a simple data structure like that of `std::vector` can serve us well.

It would have been cool to compare std::map and std::set, as these are ordered by design but have the pointer chasing issues.

Or are you not including the time spent to insert() in your measurements? That’s a little confusing.

If I have to add the elements to a set in order to sort them (by your proposed algorithm) then we should be including that time.

Anyways, if you are including insert time then it doesn’t compare a plain prepopulated set, and if you aren’t including insert time then the numbers are not totally correct.

I think

Or are you not including the time spent to insert() in your measurements? That’s a little confusing.I think I describe the benchmark accurately, and I provide the full source code that you can review. There should be no confusion.

The test is as follow: take two objects of some class (

std::vector,std::set,std::unordered_set) and produce a new one of the same class that represents either the union or the intersection.So, we do not include the time necessary to create the original two sets, but we measure the time required to create either the intersection or the union.

Obviously, you can question whether it is “fair” but that becomes subjective.

I have updated my code and the blog post to include results with

std::set.Have you tried a flat_set implementation? I’d imagine it would avoid some of the data locality cost.

I have not tested Boost’s

flat_set.You need to tell us which implementation of C++ you used for any benchmarks to be meaningful.

It would also be interesting to run your benchmarks on several implementations and compare results.

You need to tell us which implementation of C++ you used for any benchmarks to be meaningful.It would also be interesting to run your benchmarks on several implementations and compare results.

I do a lot better than that… I provide my source code so that you can get your own numbers quickly.

That comparison is pretty meaningless.

Of course set_intersection is faster. The costly operation here is sorting.

But still, an unordered_set is a different thing than a sorted array, it is optimized for lookup, not merging.

The costly operation here is sorting.There is no sorting done in the case of

unordered_set.But still, an unordered_set is a different thing than a sorted array, it is optimized for lookup, not merging.I agree but I am not sure it is a priori obvious to the programmer.

Sorting integers is actually pretty cheap. I got something like 4-10 cycles/element, depending on the contents of the array.

The lesson is that random memory accesses are expensive, and it’s hard to avoid them in a data structure that supports updates.

Mutable data structures are optimized for latency. If you’re more interested in throughput, batch processing with sorted arrays can easily be an order of magnitude faster.

Mutable data structures are optimized for latency. If you’re more interested in throughput, batch processing with sorted arrays can easily be an order of magnitude faster.Yes.

Comment avez vous fait pour calculer le nombre de cycles? Il y’a une option dans le compilateur qui permet ça?

Merci

Je vous invite à consulter mon code source.