Suppose that I give you a stream of values and you want to maintain the top-*k* smallest or largest values. You could accumulate all these values, sort them all and keep the smallest *k* values.

You can do somewhat better by using a binary heap to construct a priority queue. And you can also use the QuickSelect algorithm. I compared naive implementations of both the approach based on a binary heap and on QuickSelect in a previous blog post.

Let us review the code of these techniques. I use JavaScript because it is supported everywhere.

We can sort and slice…

var a = new Array(); for (var i = 0; i < streamsize; i++) { a.push(rand(i)); } a.sort(function(a, b) { return a - b }); var answer = a.slice(0, k);

We can dump everything naively into a priority queue (backed by a binary heap)…

var reverseddefaultcomparator = function(a, b) { return a > b; };

var b = new FastPriorityQueue(reverseddefaultcomparator); for (var i = 0; i < k; i++) { b.add(rand(i)); } for (i = k; i < streamsize; i++) { b.add(rand(i)); b.poll(); } var answer = b.array.slice(0, k).sort(function(a, b) { return a - b });

Each time we call `poll` in the priority queue, we remove the largest element. Thus we maintain a list of *k* smallest elements.

I now want to consider less naive implementations of these ideas.

At all time, we have access to the largest element in the priority queue through the `peek` function. Long-time reader Kendall Willets pointed out to me that we can make good use of the `peek` function as follows.

var b = new FastPriorityQueue(reverseddefaultcomparator); for (var i = 0; i < k; i++) { b.add(rand(i)); } for (i = k; i < streamsize; i++) { var x = rand(i); if (x < b.peek()) { b.add(x); b.poll(); } } var answer = b.array.slice(0, k).sort(function(a, b) { return a - b });

This has the potential of reducing substantially the amount of work that the priority queue has to do.

What about the QuickSelect algorithm?

Michel Lemay pointed out to me what the Google Guava library does, which is to use a small buffer (of size *k*) and to run QuickSelect on this small buffer instead of the whole stream of data.

var a = new Array(); for (var i = 0; i < 2 * k; i++) { a.push(rand(i)); } QuickSelect(a, k, defaultcomparator); var eaten = 2 * k; while (eaten < streamsize) { for (var i = 0; i < k; i++) { a[k + i] = rand(i + eaten); } QuickSelect(a, k, defaultcomparator); eaten += k; } var answer = a.slice(0, k).sort(function(a, b) { return a - b });

David Eppstein described to me something slightly more complex where we use our knowledge of the pivot in the QuickSelect algorithm to only merge potential candidates.

var a = new Array(); var i = 0; for (; i < 2 * k; i++) { a.push(rand(i)); } QuickSelect(a, k, defaultcomparator); var pivotvalue = a[k]; var locationinbuffer = k; for (; i < streamsize; i += 1) { var newvalue = rand(i); if (newvalue < pivotvalue) { a[locationinbuffer++] = newvalue; } if (locationinbuffer == 2 * k) { QuickSelect(a, k, defaultcomparator); locationinbuffer = k; } } if (locationinbuffer != k) { QuickSelect(a, k, defaultcomparator); } var answer = a.slice(0, k).sort(function(a, b) { return a - b });

It is not exactly what David described, but I believe that it captures the essence of his idea.

So how do these techniques fare?

I have implemented a benchmark that you can run yourself, here are my raw results:

$ node test.js Platform: linux 4.4.0-38-generic x64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Node version 4.5.0, v8 version 4.5.103.37 starting dynamic queue/enqueue benchmark FastPriorityQueue x 3,095 ops/sec ±0.15% (97 runs sampled) FastPriorityQueue-KWillets x 18,875 ops/sec ±0.59% (93 runs sampled) sort x 490 ops/sec ±0.37% (94 runs sampled) QuickSelect x 8,576 ops/sec ±0.31% (97 runs sampled) QuickSelect-naiveGoogleGuava x 6,003 ops/sec ±0.20% (98 runs sampled) QuickSelect-naiveDavidEppstein x 7,317 ops/sec ±0.22% (98 runs sampled)

So in my particular test, the approach proposed by Kendall Willets, using a priority queue with pruning based on the peek value, is a net winner. Your results might vary, but I think that the Willets approach is attractive. It is simple, it uses little memory and it should provide consistent performance.

**Exercise for the reader**: I am using a random input in my benchmark. As pointed out by David Eppstein, this means that we can bound the probability that the *i*th value is in the top-*k* by min(1, *k*/*i*), so that the Willets check only fails O(*k* log *n*) times. How well would the different techniques do for nearly sorted data?

**Update**: As pointed out in the comments, we can further improve the Willets approach by merging the add and poll functions into a single function. The code is similar…

var b = new FastPriorityQueue(reverseddefaultcomparator); for (var i = 0; i < k; i++) { b.add(rand(i)); } for (i = k; i < streamsize; i++) { var x = rand(i); if (x < b.peek()) { b.replaceTop(x); } } var answer = b.array.slice(0, k).sort(function(a, b) { return a - b });

So what is the performance this time?

$ node test.js Platform: linux 4.4.0-38-generic x64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Node version 4.5.0, v8 version 4.5.103.37 starting dynamic queue/enqueue benchmark FastPriorityQueue x 3,137 ops/sec ±0.13% (97 runs sampled) FastPriorityQueue-KWillets x 19,027 ops/sec ±0.37% (94 runs sampled) FastPriorityQueue-KWillets-replaceTop x 22,200 ops/sec ±0.46% (95 runs sampled) sort x 479 ops/sec ±0.42% (92 runs sampled) QuickSelect x 8,589 ops/sec ±0.31% (98 runs sampled) QuickSelect-naiveGoogleGuava x 5,661 ops/sec ±0.22% (97 runs sampled) QuickSelect-naiveDavidEppstein x 7,320 ops/sec ±0.25% (98 runs sampled)

So we improve the performance once more!

**Update 2**: You can combine a binary heap and QuickSelect using introselect.

One thing I just noticed is that add/poll can be replaced with swapping the root and percolating down, so that two separate heap adjustments aren’t needed.

//b.add(x);

//b.poll();

b.array[0]=x;

b._percolateDown(0);

This gives another 50% increase in performance.

Unfortunately I’m not seeing that increase on the most recent version of the test, but it still seems like the right way to do it.

And, this heap operation is known as a “replace top” in Postgres; they quote Knuth and Heapsort as the source. It’s required for heapify, but I agree with Yonik that it should be exposed as a method.

Looking at their code now I see that they also do the peek() check. It’s fairly standard.

The Quickselect method seems like it should be faster, since it uses the previous k-th value as the pivot on each new run (I checked), and that should yield the same pruning as the peek() check during the first pass through the data buffer. The overhead is probably in the array storage before the Quickselect call, and the fact that half the array is already pivoted.

Yes, it improves the performance, I have updated the blog post.

https://queue.acm.org/detail.cfm?id=1814327

What about b-heap?

Years before I had performed a benchmark between b-heap and binary heap according to this repository

https://github.com/valyala/gheap, however it had not been proven to be faster than standard std::priority_queue, perhaps it’s because of the implementation details which I did not dive into.

I ran some rather comprehensive tests. Turns out for moderate-size queues, it’s quite hard to beat the binary heap. Perhaps, some years ago caching was a problem, but now it seems to be less of a problem. My tests are available online:

https://github.com/searchivarius/BlogCode/tree/master/2016/bench_prio

Lucene also uses peek to see if a new entry is competitive, but a further important optimization that we make is that we only do a single rebalance of the binary heap.

Rather than add() followed by poll() (each of which will reestablish heap variants), we update the root (the smallest element that peek saw) and fix up the heap once. This cuts the time for a new insert by a factor of 2.

Unfortunately, we don’t see this capability in standard heap or priority queue implementations and thus maintain our own.

I have updated the blog post with a merged add-poll call and the performance is improved.

It cuts by a factor of 2 only for small queues. Somewhat unsurprisingly, as you use larger queues, the processing time seems to be dominated by cache misses.

Have you tried to use a bigger random number margin? It’s going to change priority queue performance because right now, after the first 5000 elements, priority queue values are all between 0 and 5 and this makes it harder to find a smaller value than 5.

Nice series of posts on priority queues. Why does sort has the largest throughput, do I miss something?

Sort has the worst results…

Hello Daniel,

I believe that to be fair, you should always go through the comparator function and should write !reverseddefaultcomparator(x, b.peek()) instead of (x < b.peek()).

Also, I implemented a priority queue that remembers a max size and to which you always add(). It's on par with FastPriorityQueue-KWillets-replaceTop, maybe a bit faster.

After having read the Wikipedia article (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap) that states Floyd's faster than Williams', I tried to implement add2() that keeps the heap in a "degraded" state:

– array[0] is always the min element

– I heapify the array once this.size reaches this.maxsize

To my surprise, FastPriorityQueueGP-add2 is much slower than FastPriorityQueueGP-add.

$ node ./test.js

Platform: darwin 16.6.0 x64

Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz

Node version 8.1.3, v8 version 5.8.283.41

starting dynamic queue/enqueue benchmark

FastPriorityQueue-KWillets-replaceTop x 6,522 ops/sec ±2.40% (87 runs sampled)

FastPriorityQueueGP-add x 7,234 ops/sec ±0.77% (91 runs sampled)

FastPriorityQueueGP-add2 x 4,435 ops/sec ±1.64% (86 runs sampled)

sort x 240 ops/sec ±1.11% (83 runs sampled)

https://gist.github.com/gpakosz/655b087fc4135580c5025870316c4c6e