The number of comparisons needed to sort a shuffled array: qsort versus std::sort

Given an array of N numbers of type double, the standard way to sort it in C is to invoke the qsort function

qsort(array, N, sizeof(double), compare);

where compare is a function which returns an integer less than, equal to, or greater than zero if the first argument is less than, equal to, or greater than the second. Because it is a C function, it takes in void pointers which we must convert back to actual values. The following is a reasonable implementation of a comparison function:

int compare(const void *a, const void *b) {
  double x = *(double *)a;
  double y = *(double *)b;
  counter++;
  if (x < y) {
    return -1;
  }
  if (x == y) {
    return 0;
  }
  return 1;
}

Though the function appears to have branches, optimizing compilers can generate binary code without any jumps in this case.

Though the name suggests that qsort might be implemented using the textbook algorithm Quicksort, the actual implementation depends on the standard library.

The standard approach in C++ is similar. The code might look as follows:

    std::sort(array, array + N, compare);

Again, we have a compare function. In C++, the compare function can refer directly to the type, since the std::sort is actually a template function, and not a mere function. It makes that the C++ compiler effectively generates a function for each comparison function you provide. It trades an increase in binary size for a potential increase in performance. A reasonable implementation of the comparison function is as follows:

bool compare(const double x, const double y) {
  return x < y;
}

The signature of the C++ comparison function is different: we return a Boolean value as opposed to a three-class integer value.

An interesting question is how many comparisons each function makes. Typically, the comparisons are inexpensive and a bad predictor of the performance, but you can imagine cases where comparing your values could be expensive.

The exact number of comparisons depends on the underlying implementation provided by your system. As inputs, I use random arrays.

I choose to count the average number of times that the comparison function is called. Experimentally, I find that the C++ function makes many more calls to the comparison function than the C function (qsort). The C library that comes with GCC (glibc) uses k – 1.2645 comparisons per element on average to sort arrays of size 2k, matching the theoretical average case performance of merge sort.

LLVM 13 (Apple):

N calls to the comparison function (qsort) per input value calls to the comparison function (std::sort) per input value
210 10.04 12.26
211 11.13 13.41
212 12.21 14.54

GCC 9:

N calls to the comparison function (qsort) per input value calls to the comparison function (std::sort) per input value
210 8.74 11.98
211 9.74 13.22
212 10.74 14.40

My code is available.

Further reading.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

2 thoughts on “The number of comparisons needed to sort a shuffled array: qsort versus std::sort”

  1. For whatever it’s worth, I get the same numbers you get from GCC 9 when using LLVM 15 and 14 on x86-64 Linux, with either GNU libstdc++ or LLVM libc++.

  2. That’s very interesting. The example code uses pseudorandom doubles for input, so the input is in random order and duplicates are vanishingly unlikely.

    The two obvious suspects are:

    1) The std::sort template code is inline expanded with specialized compare and swap operations. In return for this optimization opportunity, the code might have to use a more concise algorithm than the out-of-line qsort(). In particular, median-of-medians pivot choosing code can be voluminous and might be omitted from a template implementation.

    2) There is a semantic difference: std::sort takes a two-way “<" predicate, while qsort() takes a three-way “” comparison operator. There might be some extra comparisons in std::sort to detect duplicate values, which can cause asymmetric pivot problems in quicksort.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.