How fast can you construct a small list of strings in C for Python?

Python is probably the most popular programming language in the world right now. Python is easy to extend using C code.

You may want to return from Python a small data structure. When crossing from C to Python, there is an overhead. Thus, if performance is a concern, you do not want to return lots of small values such as hundreds of individual small strings.

However, you may want to return a list of strings such as

['zero elephant', 'one elephant is having fun', 'two elephants are having fun',
'three elephants are meeting with the president', 'four elephants',
'five elephants in an hexagon', 'six elephants are playing the saxophone',
'seven elephants are visiting the school', 'eight elephants are at Church',
'nine elephants are having a party']

I am going to assume that these strings are in an array called ‘numbers’ in my C code. Of course, if the strings are known at compile time, I could just precompute the array and return it. However, I want to generate the content dynamically.

A reasonable C function which will return an list is as follows:

PyObject* all_strings(PyObject* self) {
  size_t len = sizeof(numbers) / sizeof(numbers[0]);
  PyObject* pyList = PyList_New(len);
  for (size_t i = 0; i < len; ++i) {
    PyObject* pyString = PyUnicode_FromString(numbers[i]);
    PyList_SET_ITEM(pyList, i, pyString);
  }
  return pyList;
}

It looks a bit complicated, but it is not. It is just an application of the Python C functions. The key element is the PyUnicode_FromString function which automatically takes a pointer to an UTF-8 string, and converts it into a Python string. In our case, all our strings are ASCII, so the function does more work than needed. In this instance, I know ahead of time the size of the list (hence PyList_New(len)), but I could also have constructed an empty list (PyList_New(0)) and appended strings to it as needed.

Can we do better? We might. Python introduced lower-level string construction functions. You start by constructing a string that cannot be resized, and you must tell it whether the string content fits in one byte (ASCII or Latin1), or whether it requires more storage per character (UTF-16 without surrogate pairs) or whether you need the full Unicode range. In my case, it is easy: my strings are pure ASCII. In the general case, you would need to ascertain the string type (a functionality we plan to add to the simdutf library).

The code becomes slightly more complicated, but barely so…

PyObject* all_strings_new(PyObject* self) {
  size_t len = sizeof(numbers) / sizeof(numbers[0]);
  PyObject* pyList = PyList_New(len);
  for (size_t i = 0; i < len; ++i) {
    const char * str = numbers[i];
    size_t len = strlen(str);
    PyObject* py_string = PyUnicode_New(len, 127);
    Py_UCS1* data = PyUnicode_1BYTE_DATA(py_string);
    memcpy(data, str, len);
    PyList_SET_ITEM(pyList, i, py_string);
  }
  return pyList;
}

I wrote a small benchmark. You need Python and a development environment to build it. I ran the benchmark on my Apple M2 laptop, using LLVM 15 and Python 3.12. Your results will vary.

In my case, I get a very significant boost (2x) from the lower-level approach.

basic 270 ns/structure
lower-level 140 ns/structure

My data structure contains 295 bytes of strings, so the lower-level approach generates strings at over 2 GB/s, whereas the conventional approach has half the speed.

Daniel Lemire, "How fast can you construct a small list of strings in C for Python?," in Daniel Lemire's blog, May 9, 2024.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

9 thoughts on “How fast can you construct a small list of strings in C for Python?”

  1. Couple of typos:

    1. The title “How fast can construct a small list of strings in C for Python?” should be corrected as “How fast can you construct a small list of strings in C for Python?”

    2. The conclusion “I my case,” should be corrected as “In my case,”

      1. Couple more:

        1. “I could just precomputed” may be changed to either “I could have just precomputed” or “I could just precompute”

        2. “It is just a application” should be changed to “It is just an application”

        3. “it easy:” may be changed to either “it’s easy:” or “it is easy:”

  2. Interestingly, Python already optimizes the case where the input string is ASCII-only, using essentially the same logic you wrote:

    https://github.com/python/cpython/blob/fb0cf7d1408c904e40142a74cd7a53eb52a8e568/Objects/unicodeobject.c#L4761-L4772

    They call ascii_decode() instead of memcpy() directly, but the only extra work ascii_decode() does, is calculating the length of the ASCII-only prefix:

    https://github.com/python/cpython/blob/fb0cf7d1408c904e40142a74cd7a53eb52a8e568/Objects/unicodeobject.c#L4715-L4734

    Their implementation processes sizeof(size_t) bytes at a time, so it’s not grossly inefficient, but it seems plausible you could improve it somewhat.

    (The above assumes relatively long strings. You mention you tested with “295 bytes of strings” at which point the performance may well be dominated by constant factors or by differences in compiler optimizations.)

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.