Writing out large arrays in Go: binary.Write is inefficient for large arrays

Programmers often need to write data structures to disk or to networks. The data structure then needs to be interpreted as a sequence of bytes. Regarding integer values, most computer systems adopt “little endian” encoding whereas an 8-byte integer is written out using the least significant bytes first. In the Go programming language, you can write an array of integers to a buffer as follows:

var data []uint64
var buf *bytes.Buffer = new(bytes.Buffer)

...

err := binary.Write(buf, binary.LittleEndian, data)

Until recently, I assumed that the binary.Write function did not allocate memory. Unfortunately, it does. The function converts the input array to a new, temporary byte arrays.

Instead, you can create a small buffer just big enough to hold you 8-byte integer and write that small buffer repeatedly:

var item = make([]byte, 8)
for _, x := range data {
    binary.LittleEndian.PutUint64(item, x)
    buf.Write(item)
}

Sadly, this might have poor performance on disks or networks where each write/read has a high overhead. To avoid this problem, you can use Go’s buffered writer and write the integers one by one. Internally, Go will allocate a small buffer.

writer := bufio.NewWriter(buf)
var item = make([]byte, 8)
for _, x := range data {
	binary.LittleEndian.PutUint64(item, x)
	writer.Write(item)
}
writer.Flush()

I wrote a small benchmark that writes an array of 100M integers to memory.

function memory usage time
binary.Write 1.5 GB 1.2 s
one-by-one 0 0.87 s
buffered one-by-one 4 kB 1.2 s

(Timings will vary depending on your hardware and testing procedure. I used Go 1.16.)

The buffered one-by-one approach is not beneficial with respect to speed in this instance, but it would be more helpful in other cases. In my benchmark, the simple one-by-one approach is fastest and uses least memory. For small inputs, binary.Write would be faster. The ideal function might have a fast path for small arrays, and a more careful handling of the larger inputs.

Enforcement by software

At my university, one of our internal software systems allows a professor to submit a revision to a course. The professor might change the content or the objectives of the course.

In a university, professors have extensive freedom regarding course content. As long as you reasonably meet the course objectives, you can do whatever you like. You can pick the textbook you prefer, write your own and so forth.

However, the staff that built our course revision system decided to make it so that every single change should go through all layers of approvals. So if I want to change the title of an assignment, according to this tool, I need the department to approve.

KaneJamison.com

When I first encountered this new tool, I immediately started to work around it. And because I am department chair, I brought my colleagues along for the ride. So we ‘pretend’ to get approval, submitting fake documents. The good thing in a bureaucracy is that most people are too bored to check up on the fine prints.

Surprisingly, it appears that no other department has been routing around the damage that is this new tool. I should point out that I am not doing anything illegal or against the rules. I am a good soldier. I just route around the software system. But it makes people uneasy.

And there lies the scary point. People are easily manipulated by computing.

People seem to think that if the software requires some document, then surely the rules require the document in question. That is, human beings believe that the software must be an accurate embodiment of the law.

In some sense, software does the policing. It enforces the rules. But like the actual police, software can go far beyond the law… and most people won’t notice.

An actual policeman can be intimidating. However, it is a human being. If they ask something that does not make sense, you are likely to question them. You are also maybe more likely to think that a policeman could be mistaken. Software is like a deaf policeman. And people want software to be correct.

Suppose you ran a university and you wanted all professors to include a section on religion in all their courses. You could not easily achieve such a result by the means of law. Changing the university regulations to add such a requirement would be difficult at a secular institution. However, if you simply make it that all professors must fill out a section on religion when registering a course, then professors would probably do it without question.

Of course, you can achieve the same result with bureaucracy. You just change the forms and the rules. But it takes much effort. Changing software is comparatively easier. There is no need to document the change very much. There is no need to train the staff.

I think that there is great danger in some of the recent ‘digit ID’ initiatives that various governments are pushing. Suppose, for example, that your driver’s license is on your mobile phone. It seems reasonable, at first, for the government to be able to activate it and deactivate it remotely. You no longer need to go to a government office to get your new driver’s license. However, it now makes it possible for a civil servant to decide that you cannot drive your car on Tuesdays. They do not need a new law, they do not need your consent, they can just switch a flag inside a server.

You may assume then that people would complain loudly, and they may. However, they are much less likely to complain than if it is a policeman that comes to their door on Tuesdays to take away their driver’s license. We have a bias as human being to accept without question software enforcement.

It can be used for good. For example, the right software can probably help you lose weight. However, software can enable arbitrary enforcement. For crazy people like myself, it will fail. Sadly, not everyone is as crazy as I am.

The Canadian Common CV and the captured academy

Most Canadian academics have to write their resumes using a government online tool called the Common CV. When it was first introduced, it was described as a time-saving tool: instead of writing your resume multiple times for different grant agencies, you would write it just once and be done with it. In practice, it turned into something of a nightmare for many of us. You have to adapt it each time for each grant application, sometimes in convoluted ways, using a clunky interface.

What the Common CV does do is provide much power to middle-managers who can insert various bureaucratic requirements. You have to use their tool, and they can tailor it administratively without your consent. It is part of an ongoing technocratic invasion.

How did Canadian academics react? Did they revolt? Not at all. In fact, they are embracing it. I recently had to formally submit my resume as part of a routine internal review process, they asked for my Common CV. That is, instead of fighting against the techno-bureaucratic process, they extend its application to every aspect of their lives including internal functions. And it is not that everyone enjoys it: in private, many people despise the Common CV.

So why won’t they dissent?

One reason might be that they are demoralized. Why resist the Common CV when every government agency providing funding to professors requires it?

If so, they are confused. We dissent as an appeal to the intelligence of a future day. A dissent today is a message to the future people who will have the power to correct our current mistakes. These messages from today are potential tools in the future. “The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.” (Shaw)

The lack of dissent is hardly new of course. Only a minority of academics questioned the Vietnam war (Schreiber, 1973), and much of the resistance came when it became safe to speak out. The scientists described by Freeman Dyson in The Scientist as Rebel have always been a fringe.

Chomski lamented on this point:

IT IS THE RESPONSIBILITY of intellectuals to speak the truth and to expose lies. This, at least, may seem enough of a truism to pass over without comment. Not so, however. For the modern intellectual, it is not at all obvious. Thus we have Martin Heidegger writing, in a pro-Hitler declaration of 1933, that “truth is the revelation of that which makes a people certain, clear, and strong in its action and knowledge”; it is only this kind of “truth” that one has a responsibility to speak. Americans tend to be more forthright. When Arthur Schlesinger was asked by The New York Times in November, 1965, to explain the contradiction between his published account of the Bay of Pigs incident and the story he had given the press at the time of the attack, he simply remarked that he had lied; and a few days later, he went on to compliment the Times for also having suppressed information on the planned invasion, in “the national interest,” as this term was defined by the group of arrogant and deluded men of whom Schlesinger gives such a flattering portrait in his recent account of the Kennedy Administration. It is of no particular interest that one man is quite happy to lie in behalf of a cause which he knows to be unjust; but it is significant that such events provoke so little response in the intellectual community—for example, no one has said that there is something strange in the offer of a major chair in the humanities to a historian who feels it to be his duty to persuade the world that an American-sponsored invasion of a nearby country is nothing of the sort. And what of the incredible sequence of lies on the part of our government and its spokesmen concerning such matters as negotiations in Vietnam? The facts are known to all who care to know. The press, foreign and domestic, has presented documentation to refute each falsehood as it appears. But the power of the government’s propaganda apparatus is such that the citizen who does not undertake a research project on the subject can hardly hope to confront government pronouncements with fact.

How many digits in a product?

We often represent integers with digits. E.g., the integer 1234 has 4 digits. By extension, we use ‘binary’ digits, called bits, within computers. Thus the integer  7 requires three bits: 0b111.

If I have two integers that use 3 digits, say, how many digits will their product have?

Mathematically, we might count the number of digits of an integer using the formula ceil(log(x+1)) where the log is the in the base you are interested in. In base 10, the integers with three digits go from 100 to 999, or from 102 to 103-1, inclusively. For example, to compute the number of digits in base 10, you might use the following Python expression ceil(log10(x+1)). More generally, an integer has d digits in base b if it is between bd-1 and bd-1, inclusively. By convention, the integer 0 has no digit in this model.

The product between an integer having d1 digits and integer having d2 digits is between bd1+d2-2 and bd1+d2bd1bd2+1 (inclusively). Thus the product has either d1+d2-1 digits or d1+d2 digits.

To illustrate, let us consider the product between two integers having three digits. In base 10, the smallest product is 100 times 100 or 10,000, so it requires 5 digits. The largest product is 999 times 999 or 998,001 so 6 digits.

Thus if you multiply a 32-bit number with another 32-bit number, you get a number than has at most 64 binary digits. The maximum value will be 264 – 233 + 1.

It seems slightly counter-intuitive that the product of two 32-bit numbers does not span the full range of 64-bit numbers because it cannot exceed 264 – 233 + 1. A related observation is that any given product may have several possible pairs of 32-bit numbers. For example, the product 4 can be achieved by the multiplication of 1 with 4 or the multiplication of 2 times 2. Furthermore, many other 64-bit values may not be produced from two 32-bit values: e.g., any prime number larger or equal than 232 and smaller than 264 .

Further reading: Computing the number of digits of an integer even faster

The end of the monopolistic web?

Except maybe in totalitarian states, you cannot have a single publisher. Most large cities had multiple independent newspapers.

In recent years, we saw a surge of concentration in newspaper and television ownership. However, this was accompanied by a surge of online journalism. The total number of publishers increased, if nothing else.

You can more easily have a single carrier/distributor than a monopolistic publisher. For example, the same delivery service provides me my newspaper as well as a range of competing newspapers. The delivery man does not much care for the content of my newspaper. A few concentrated Internet providers support diverse competing services.

The current giants (Facebook, Twitter and Google) were built initially as neutral distributors. Google was meant to give you access to all of the web’s information. If the search engine is neutral, there is no reason to have more than one. If twitter welcomes everyone, then there is no reason to have competing services. Newspapers have fact-checking services, but newspaper delivery services do not.

Of course, countries like Russia and China often had competing services, but most of the rest of the world fell back on American-based large corporations for their web infrastructure. Even the Talibans use Twitter.

It has now become clear that Google search results are geared toward favouring some of their own services. Today, we find much demand for services like Facebook and Twitter to more closely vet their content. Effectively, they are becoming publishers. They are no longer neutral. It is undeniable that they now see their roles as arbitrer of content. They have fact-checking services and they censor individuals.

If my mental model is correct, then we will see the emergence of strong competitors. I do not predict the immediate downfall of Facebook and Twitter. However, much of their high valuation was due to them being considered neutral carriers. The difference in value between a monopoly and a normal player can be significant. People who know more about online marketing than I do also tell me that online advertisement might be overrated. And advertisement on a platform that is no longer universal is less valuable: the pie is shared. Furthermore, I would predict that startups that were dead on arrival ten years ago might be appealing businesses today. Thus, at the margin, it makes it more appealing for a young person to go work for a small web startup.

I should stress that this is merely a model. I do not claim to be right. I am also not providing investment or job advice.

Further reading: Stop spending so much time being trolled by billionaire corporations!

SWAR explained: parsing eight digits

It is common to want to parse long strings of digits into integer values. Because it is a common task, we want to optimize it as much as possible.

In the blog post, Quickly parsing eight digits, I presented a very quick way to parse eight ASCII characters representing an integers (e.g., 12345678) into the corresponding binary value. I want to come back to it and explain it a bit more, to show that it is not magic. This works in most programming languages, but I will stick with C for this blog post.

To recap, the long way is a simple loop:

uint32_t parse_eight_digits(const unsigned char *chars) {
  uint32_t x = chars[0] - '0';
  for (size_t j = 1; j < 8; j++)
    x = x * 10 + (chars[j] - '0');
  return x;
}

We use the fact that in ASCII, the numbers 0, 1, … are in consecutive order in terms of byte values. The character ‘0’ is 0x30 (or 48 in decimal), the character ‘1’ is 0x31 (49 in decimal) and so forth. At each step in the loop, we multiple the running value by 10 and add the value of the next digit.

It assumes that all characters are in the valid range (from ‘0’ to ‘9’): other code should check that it is the case.

An optimizing compiler will probably unroll the loop and produce code that might look like this in assembly:

        movzx   eax, byte ptr [rdi]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 1]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 2]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 3]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 4]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 5]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 6]
        lea     eax, [rcx + 2*rax]
        lea     eax, [rax + 4*rax]
        movzx   ecx, byte ptr [rdi + 7]
        lea     eax, [rcx + 2*rax]
        add     eax, -533333328

Notice how there are many loads, and a whole lot of operations.

We can substantially shorten the resulting code, down to something that looks like the following:

        imul    rax, qword ptr [rdi], 2561
        movabs  rcx, -1302123111085379632
        add     rcx, rax
        shr     rcx, 8
        movabs  rax, 71777214294589695
        and     rax, rcx
        imul    rax, rax, 6553601
        shr     rax, 16
        movabs  rcx, 281470681808895
        and     rcx, rax
        movabs  rax, 42949672960001
        imul    rax, rcx
        shr     rax, 32

How do we do it? We use a technique called SWAR which stands for SIMD within a register. The intuition behind is that modern computers have 64-bit registers. Processing eight consecutive bytes as eight distinct words, as in the native code above, is inefficient given how wide our registers are.

The first step is to load all eight characters into a 64-bit register. In C, you might do it in this manner:

int64_t val; 
memcpy(&val, chars, 8);

It looks maybe expensive, but most compilers will translate the memcpy instruction into a single load, when compiling with optimizations turned on.

Computers store values in little-endian order. This means that the first byte you encounter is going to be used as the least significant byte, and so forth.

Then we want to subtract the character ‘0‘ (or 0x30 in hexadecimal). We can do it with a single operation:

val = val - 0x3030303030303030;

So if you had the string ‘12345678‘, you will now have the value 0x0807060504030201.

Next we are going to do a kind of pyramidal computation. We add pairs of successive bytes, then pairs of successive 16-bit values and then pairs of successive 32-bit bytes.

It goes something like this, suppose that you have the sequence of digit values b1, b2, b3, b4, b5, b6, b7, b8. You want to do…

  • add pairs of bytes: 10*b1+b2, 10*b3+b4, 10*b5+b6, 10*b7+b8
  • combine first and third sum: 1000000*(10*b1+b2) + 100*(10*b5+b6)
  • combine second and fourth sum: 10*b7+b8 + 10000*(10*b3+b4)

I will only explain the first step (pairs of bytes) as the other two steps are similar. Consider the least significant two bytes, which have value 256*b2 + b1. We multiply it by 10, and we add the value shifted by 8 bits, and we get b1+10*b2 in the least significant byte. We can compute 4 such operations in one operation…

val = (val * 10) + (val >> 8);

The next two steps are similar:

val1 = (((val & 0x000000FF000000FF) * (100 + (1000000ULL << 32)));

val2 = (((val >> 16) & 0x000000FF000000FF) 
          * (1 + (10000ULL << 32))) >> 32;

And the overall code looks as follows…

uint32_t  parse_eight_digits_unrolled(uint64_t val) {
  const uint64_t mask = 0x000000FF000000FF;
  const uint64_t mul1 = 0x000F424000000064; // 100 + (1000000ULL << 32)
  const uint64_t mul2 = 0x0000271000000001; // 1 + (10000ULL << 32)
  val -= 0x3030303030303030;
  val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8;
  val = (((val & mask) * mul1) + (((val >> 16) & mask) * mul2)) >> 32;
  return val;
}

Appendix: You can do much the same in C# starting with a byte pointer (byte* chars):

ulong val = Unsafe.ReadUnaligned<ulong>(chars);
const ulong mask = 0x000000FF000000FF;
const ulong mul1 = 0x000F424000000064; 
// 100 + (1000000ULL << 32)
const ulong mul2 = 0x0000271000000001; 
// 1 + (10000ULL << 32)
val -= 0x3030303030303030;
val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8;
val = (((val & mask) * mul1) + (((val >> 16) & mask) * mul2)) >> 32;

What is the ‘range’ of a number type?

In programming, we often represent numbers using types that have specific ranges. For example, 64-bit signed integer types can represent all integers between -9223372036854775808 and 9223372036854775807, inclusively. All integers inside this range are valid, all integers outside are “out of range”. It is simple.

What about floating-point numbers? The nuance with floating-point numbers is that they cannot represent all numbers within a continuous range. For example, the real number 1/3 cannot be represented using binary floating-point numbers. So the convention is that given a textual representation, say “1.1e100”, we seek the closest approximation.

Still, are there ranges of numbers that you should not represent using floating-point numbers? That is, are there numbers that you should reject?

It seems that there are two different interpretation:

  • My own interpretation is that floating-point types can represent all numbers from -infinity to infinity, inclusively. It means that ‘infinity’ or 1e9999 are indeed “in range”. For 64-bit IEEE floating-point numbers, this means that numbers smaller than 4.94e-324 but greater than 0 can be represented as 0, and that numbers greater than 1.8e308 should be infinity. To recap, all numbers are always in range.
  • For 64-bit numbers, another interpretation is that only numbers in the ranges 4.94e-324 to 1.8e308 and -1.8e308 to -4.94e-324, together with exactly 0, are valid. Numbers that are too small (less than 4.94e-324 but greater than 0) or numbers that are larger than 1.8e308 are “out of range”. Common implementations of the strtod function or of the C++ equivalent follow this convention.

This matters because the C++ specification for the from_chars functions state that

If the parsed value is not in the range representable by the type of value, value is unmodified and the member ec of the return value is equal to errc​::​result_­out_­of_­range.

I am not sure programmers have a common understanding of this specification.

How programmers make sure that their software is correct

Our most important goal in writing software is that it be correct. The software must do what the programmer wants it to do. It must meet the needs of the user.

In the business world, double-entry bookkeeping is the idea that transactions are recorded in at least two accounts (debit and credit). One of the advantages of double-entry bookkeeping, compared to a more naive approach, is that it allows for some degree of auditing and error finding. If we compare accounting and software programming, we could say that double-entry accounting and its subsequent auditing is equivalent to software testing.

For an accountant, converting a naive accounting system into a double-entry system is a difficult task in general. In many cases, one would have to reconstruct it from scratch. In the same manner, it can be difficult to add tests to a large application that has been developed entirely without testing. And that is why testing should be first on your mind when building serious software.

A hurried or novice programmer can quickly write a routine, compile and run it and be satisfied with the result. A cautious or experienced programmer will know not to assume that the routine is correct.

Common software errors can cause problems ranging from a program that abruptly terminates to database corruption. The consequences can be costly: a software bug caused the explosion of an Ariane 5 rocket in 1996 (Dowson, 1997). The error was caused by the conversion of a floating point number to a signed integer represented with 16 bits. Only small integer values could be represented. Since the value could not be represented, an error was detected and the program stopped because such an error was unexpected. The irony is that the function that triggered the error was not required: it had simply been integrated as a subsystem from an earlier model of the Ariane rocket. In 1996 U.S. dollars, the estimated cost of this error is almost $400 million.

The importance of producing correct software has long been understood. The best scientists and engineers have been trying to do this for decades.

There are several common strategies. For example, if we need to do a complex scientific calculation, then we can ask several independent teams to produce an answer. If all the teams arrive at the same answer, we can then conclude that it is correct. Such redundancy is often used to prevent hardware-related faults (Yeh, 1996). Unfortunately, it is not practical to write multiple versions of your software in general.

Many of the early programmers had advanced mathematical training. They hoped that we could prove that a program is correct. By putting aside the hardware failures, we could then be certain that we would not encounter any errors. And indeed, today we have sophisticated software that allows us to sometimes prove that a program is correct.

Let us consider an example of formal verification to illustrate our point. We can use the z3 library from Python (De Moura and Bjørner, 2008). If you are not a Python user, don’t worry: you don’t have to be to follow the example. We can install the necessary library with the command pip install z3-solver or the equivalent. Suppose we want to be sure that the inequality ( 1 + y ) / 2 < y holds for all 32-bit integers. We can use the following script:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
if(s.check() == z3.sat):
    model = s.model()
    print(model)

In this example we construct a 32-bit word (BitVec) to represent our example. By default, the z3 library interprets the values that can be represented by such a variable as ranging from -2147483648 to 2147483647 (from \(-2^{31}\) to \(2^{31}-1\) inclusive). We enter the inequality opposite to the one we wish to show (( 1 + y ) / 2 >= y). If z3 does not find a counterexample, then we will know that the inequality ( 1 + y ) / 2 < y holds.

When running the script, Python displays the integer value 2863038463 which indicates that z3 has found a counterexample. The z3 library always gives a positive integer and it is up to us to interpret it correctly. The number 2147483648 becomes -2147483648, the number 2147483649 becomes -2147483647 and so on. This representation is often called the two’s complement. Thus, the number 2863038463 is in fact interpreted as a negative number. Its exact value is not important: what matters is that our inequality (( 1 + y ) / 2 < y) is incorrect when the variable is negative. We can check this by giving the variable the value -1, we then get 0 < -1. When the variable takes the value 0, the inequality is also false (0<0). We can also check that the inequality is false when the variable takes the value 1. So let us add as a condition that the variable is greater than 1 (s.add( y > 1 )):

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

Since the latter script does not display anything on the screen when it is executed, we can conclude that the inequality is satisfied as long as the variable of variable is greater than 1.

Since we have shown that the inequality ( 1 + y ) / 2 < y is true, perhaps the inequality ( 1 + y ) < 2 * y is true too? Let’s try it:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) >= 2 * y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This script will display 1412098654, half of 2824197308 which is interpreted by z3 as a negative value. To avoid this problem, let’s add a new condition so that the double of the variable can still be interpreted as a positive value:


import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )
s.add( y < 2147483647/2)

if(s.check() == z3.sat):
model = s.model()
print(model)

This time the result is verified. As you can see, such a formal approach requires a lot of work, even in relatively simple cases. It may have been possible to be more optimistic in the early days of computer science, but by the 1970s, computer scientists like Dijkstra were expressing doubts:

we see automatic program verifiers verifying toy programs and one observes the honest expectation that with faster machines with lots of concurrent processing, the life-size problems wiIl come within reach as well. But, honest as these expectations may be, are they justified? I sometimes wonder… (Dijkstra, 1975)

It is impractical to apply such a mathematical method on a large scale. Errors can take many forms, and not all of these errors can be concisely presented in mathematical form. Even when it is possible, even when we can accurately represent the problem in a mathematical form, there is no reason to believe that a tool like z3 will always be able to find a solution: when problems become difficult, computational times can become very long. An empirical approach is more appropriate in general.

Over time, programmers have come to understand the need to test their software. It is not always necessary to test everything: a prototype or an example can often be provided without further validation. However, any software designed in a professional context and having to fulfill an important function should be at least partially tested. Testing allows us to reduce the probability that we will have to face a disastrous situation.

There are generally two main categories of tests.

  • There are unit tests. These are designed to test a particular component of a software program. For example, a unit test can be performed on a single function. Most often, unit tests are automated: the programmer can execute them by pressing a button or typing a command. Unit tests often avoid the acquisition of valuable resources, such as creating large files on a disk or making network connections. Unit testing does not usually involve reconfiguring the operating system.
  • Integration tests aim to validate a complete application. They often require access to networks and access to sometimes large amounts of data. Integration tests sometimes require manual intervention and specific knowledge of the application. Integration testing may involve reconfiguring the operating system and installing software. They can also be automated, at least in part. Most often, integration tests are based on unit tests that serve as a foundation.

Unit tests are often part of a continuous integration process (Kaiser et al., 1989). Continuous integration often automatically performs specific tasks including unit testing, backups, applying cryptographic signatures, and so on. Continuous integration can be done at regular intervals, or whenever a change is made to the code.

Unit tests are used to structure and guide software development. Tests can be written before the code itself, in which case we speak of test-driven development. Often, tests are written after developing the functions. Tests can be written by programmers other than those who developed the functions. It is sometimes easier for independent developers to provide tests that are capable of uncovering errors because they do not share the same assumptions.

It is possible to integrate tests into functions or an application. For example, an application may run a few tests when it starts. In such a case, the tests will be part of the distributed code. However, it is more common not to publish unit tests. They are a component reserved for programmers and they do not affect the functioning of the application. In particular, they do not pose a security risk and they do not affect the performance of the application.

Experienced programmers often consider tests to be as important as the original code. It is therefore not uncommon to spend half of one’s time on writing tests. The net effect is to substantially reduce the initial speed of writing computer code. Yet this apparent loss of time often saves time in the long run: setting up tests is an investment. Software that is not well tested is often more difficult to update. The presence of tests allows us to make changes or extensions with less uncertainty.

Tests should be readable, simple and they should run quickly. They often use little memory.

Unfortunately, it is difficult to define exactly how good tests are. There are several statistical measures. For example, we can count the lines of code that execute during tests. We then talk about test coverage. A coverage of 100% implies that all lines of code are concerned by the tests. In practice, this coverage measure can be a poor indication of test quality.

Consider this example:

package main

import (
    "testing"
)


func Average(x, y uint16) uint16 {
   return (x + y)/2
}

func TestAverage(t *testing.T) {
    if Average(2,4) != 3 {
        t.Error(Average(2,4))
    }
}

In the Go language, we can run tests with the command go test. We have an Average function with a corresponding test. In our example, the test will run successfully. The coverage is 100%.

Unfortunately, the Average function may not be as correct as we would expect. If we pass the integers 40000 and 40000 as parameters, we would expect the average value of 40000 to be returned. But the integer 40000 added to the integer 40000 cannot be represented with a 16-bit integer (uint16): the result will be instead (40000+4000)%65536=14464. So the function will return 7232 which may be surprising. The following test will fail:


func TestAverage(t *testing.T) {
if Average(40000,40000) != 40000 {
t.Error(Average(40000,40000))
}
}

When possible and fast, we can try to test the code more exhaustively, like in this example where we include several values:

package main

import (
    "testing"
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
  for x := 0; x < 65536; x++ {
    for y := 0; y < 65536; y++ {
      m := int(Average(uint16(x),uint16(y)))
      if x < y {
        if m < x || m > y {
          t.Error("error ", x, " ", y)
        }           
      } else {
        if m < y || m > x {
          t.Error("error ", x, " ", y)
        }  
      }
    }
  } 
}

In practice, it is rare that we can do exhaustive tests. We can instead use pseudo-random tests. For example, we can generate pseudo-random numbers and use them as parameters. In the case of random tests, it is important to keep them deterministic: each time the test runs, the same values are tested. This can be achieved by providing a fixed seed to the random number generator as in this example:

package main

import (
    "testing"   
        "math/rand"
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
  rand.Seed(1234)
  for test := 0; test < 1000; test++ {
    x := rand.Intn(65536)
    y := rand.Intn(65536)
    m := int(Average(uint16(x),uint16(y)))
    if x < y {
      if m < x || m > y {
        t.Error("error ", x, " ", y)
      }           
    } else {
      if m < y || m > x {
        t.Error("error ", x, " ", y)
      }  
    }
  } 
}

Tests based on random exploration are part of a strategy often called fuzzing (Miller at al., 1990).

We generally distinguish two types of tests. Positive tests aim at verifying that a function or component behaves in an agreed way. Thus, the first test of our Average function was a positive test. Negative tests verify that the software behaves correctly even in unexpected situations. We can produce negative tests by providing our functions with random data (fuzzing). Our second example can be considered a negative test if the programmer expected small integer values.

The tests should fail when the code is modified (Budd et al., 1978). On this basis, we can also develop more sophisticated measures by testing for random changes in the code and ensuring that such changes often cause tests to fail.

Some programmers choose to generate tests automatically from the code. In such a case, a component is tested and the result is captured. For example, in our example of calculating the average, we could have captured the fact that Average(40000,40000) has the value 7232. If a subsequent change occurs that changes the result of the operation, the test will fail. Such an approach saves time since the tests are generated automatically. We can quickly and effortlessly achieve 100% code coverage. On the other hand, such tests can be misleading. In particular, it is possible to capture incorrect behaviour. Furthermore, the objective when writing tests is not so much their number as their quality. The presence of several tests that do not contribute to validate the essential functions of our software can even become harmful. Irrelevant tests can waste programmers’ time in subsequent revisions.

Finally, we review the benefits of testing: tests help us organize our work, they are a measure of quality, they help us document the code, they avoid regression, they help debugging and they can produce more efficient code.

Organization

Designing sophisticated software can take weeks or months of work. Most often, the work will be broken down into separate units. It can be difficult, until you have the final product, to judge the outcome. Writing tests as we develop the software helps to organize the work. For example, a given component can be considered complete when it is written and tested. Without the test writing process, it is more difficult to estimate the progress of a project since an untested component may still be far from being completed.

Quality

Tests are also used to show the care that the programmer has put into his work. They also make it possible to quickly evaluate the care taken with the various functions and components of a software program: the presence of carefully composed tests can be an indication that the corresponding code is reliable. The absence of tests for certain functions can serve as a warning.

Some programming languages are quite strict and have a compilation phase that validates the code. Other programming languages (Python, JavaScript) leave more freedom to the programmer. Some programmers consider that tests can help to overcome the limitations of less strict programming languages by imposing on the programmer a rigour that the language does not require.

Documentation

Software programming should generally be accompanied by clear and complete documentation. In practice, the documentation is often partial, imprecise, erroneous or even non-existent. Tests are therefore often the only technical specification available. Reading tests allows programmers to adjust their expectations of software components and functions. Unlike documentation, tests are usually up-to-date, if they are run regularly, and they are accurate to the extent that they are written in a programming language. Tests can therefore provide good examples of how the code is used.
Even if we want to write high-quality documentation, tests can also play an important role. To illustrate computer code, examples are often used. Each example can be turned into a test. So we can make sure that the examples included in the documentation are reliable. When the code changes, and the examples need to be modified, a procedure to test our examples will remind us to update our documentation. In this way, we avoid the frustrating experience of readers of our documentation finding examples that are no longer functional.

Regression

Programmers regularly fix flaws in their software. It often happens that the same problem occurs again. The same problem may come back for various reasons: sometimes the original problem has not been completely fixed. Sometimes another change elsewhere in the code causes the error to return. Sometimes the addition of a new feature or software optimization causes a bug to return, or a new bug to be added. When software acquires a new flaw, it is called a regression. To prevent such regressions, it is important to accompany every bug fix or new feature with a corresponding test. In this way, we can quickly become aware of regressions by running the tests. Ideally, the regression can be identified while the code is being modified, so we avoid regression. In order to convert a bug into a simple and effective test, it is useful to reduce it to its simplest form. For example, in our previous example with Average(40000,40000), we can add the detected error in additional test:

package main

import (
    "testing
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
   if Average(2,4) != 3 {
     t.Error("error 1")
   } 
   if Average(40000,40000) != 40000 {
     t.Error("error 2")
   }           
}

Bug fixing

In practice, the presence of an extensive test suite makes it possible to identify and correct bugs more quickly. This is because testing reduces the extent of errors and provides the programmer with several guarantees. To some extent, the time spent writing tests saves time when errors are found while reducing the number of errors.
Furthermore, an effective strategy to identify and correct a bug involves writing new tests. It can be more efficient on the long run than other debugging strategies such as stepping through the code. Indeed, after your debugging session is completed, you are left with new unit tests in addition to a corrected bug.

Performance

The primary function of tests is to verify that functions and components produce the expected results. However, programmers are increasingly using tests to measure the performance of components. For example, the execution speed of a function, the size of the executable or the memory usage can be measured. It is then possible to detect a loss of performance following a modification of the code. You can compare the performance of your code against a reference code and check for differences using statistical tests.

Conclusion

All computer systems have flaws. Hardware can fail at any time. And even when the hardware is reliable, it is almost impossible for a programmer to predict all the conditions under which the software will be used. No matter who you are, and no matter how hard you work, your software will not be perfect. Nevertheless, you should at least try to write code that is generally correct: it most often meets the expectations of users.
It is possible to write correct code without writing tests. Nevertheless, the benefits of a test suite are tangible in difficult or large-scale projects. Many experienced programmers will refuse to use a software component that has been built without tests.
The habit of writing tests probably makes you a better programmer. Psychologically, you are more aware of your human limitations if you write tests. When you interact with other programmers and with users, you may be better able to take their feedback into account if you have a test suite.

Suggested reading

  • James Whittaker, Jason Arbon, Jeff Carollo, How Google Tests Software, Addison-Wesley Professional; 1st edition (March 23 2012)
  • Lisa Crispin, Janet Gregory, Agile Testing: A Practical Guide for Testers and Agile Teams, Addison-Wesley Professional; 1st edition (Dec 30 2008)

Credit

The following Twitter users contributed ideas: @AntoineGrodin, @dfaranha, @Chuckula1, @EddyEkofo, @interstar, @Danlark1, @blattnerma, @ThuggyPinch, @ecopatz, @rsms, @pdimov2, @edefazio, @punkeel, @metheoryt, @LoCtrl, @richardstartin, @metala, @franck_guillaud, @__Achille__, @a_n__o_n, @atorstling, @tapoueh, @JFSmigielski, @DinisCruz, @jsonvmiller, @nickblack, @ChrisNahr, @ennveearr1, @_vkaku, @kasparthommen, @mathjock, @feO2x, @pshufb, @KishoreBytes, @kspinka, @klinovp, @jukujala, @JaumeTeixi

Science and Technology links (December 19th 2021)

  1. Becoming a physician increases the use of antidepressants, opioids, anxiolytics, and sedatives, especially for female physicians.
  2. When trying to reproduce results in cancer researchers, independent researchers find that the benefits are typically grossly exaggerated.
  3. A large planet has been found orbiting two suns. It is orbiting from very far away.
  4. NASA sent a probe in the atmosphere of the sun.
  5. As you age, you accumulate old (senescent) cells. These cells are dysfunctional and in sufficient quantity might cause harm to your body such as chronic inflammation. Researchers found a plant-derived compound (procyanidin C1 or PCC1) which can not only limit the harm due to senescent cells but also eliminate some of them. It is so beneficial that treated mice live longer.

Science and Technology links (December 4th 2021)

  1. It used to be that all the exciting new processors came from Intel and AMD, and they were meant for your PC. The mobile revolution changed that: it lead to the production of fantastic processors that used little energy. We are now moving back into laptops and servers. The leading supplier of servers is probably Amazon: if you want to run servers for your business, your first choice is often Amazon and its cloud computing infrastructure. And Amazon now makes their own processors. Initially, they made low-cost chips that were secondary to Intel and AMD powerful processors. Amazon released its latest processor, Graviton 3. It is a much wider and deeper chip:

    The N1 core used in the Graviton2 chip had an instruction fetch unit that was 4 to 8 instructions wide and 4 wide instruction decoder that fed into an 8 wide issue unit that included two SIMD units, two load/store units, three arithmetic units, and a branch unit. With the Perseus N2 core used in the Graviton3, there is an 8 wide fetch unit that feeds into a 5 wide to 8 wide decode unit, which in turn feeds into a 15 wide issue unit, which is basically twice as wide as that on the N1 core used in the Graviton2. The vector engines have twice as much width (and support for BFloat16 mixed precision operations) and the load/store. arithmetic, and branch units are all doubled up, too.

    It feels like we are witnessing a revolution in processors. Not only are corporations like Apple and Amazon making their own chips… they appear to match Intel in raw power.

  2. Despite generous Danish social policies, family influence on important child outcomes in Denmark is about as strong as it is in the United States.
  3. Clearing senescent (old) cells appears to also clear out diabetes (in mice).
  4. South Korea is building a self-sustaining floating city.
  5. A supplement based on alpha-ketoglutarate appears to have strong rejuvenation effects (in human beings).
  6. Metformin is a commonly used drug which is believed to have some anti-aging properties. However some research suggested that metformin could blunt the beneficial effects of exercise. A recent clinical trial indicates that these fears are unwarranted.
  7. Biological aging (senescence) could have evolved when animals first colonized the land, from the sea.
  8. Your favorite colour reveals nothing about your personality.

Can you safely parse a double when you need a float?

In C as well as many other programming languages, we have 32-bit and 64-bit floating-point numbers. They are often referred to as float and double. Most of systems today follow the IEEE 754 standard which means that you can get consistent results across programming languages and operating systems. Hence, it does not matter very much if you implement your software in C++ under Linux whereas someone else implements it in C# under Windows: if you both have recent systems, you can expect identical numerical outcomes when doing basic arithmetic and square-root operations.

When you are reading these numbers from a string, there are distinct functions. In C, you have strtof and strtod. One parses a string to a float and the other function parses it to a double.

At a glance, it seems redundant. Why not just parse your string to a double value and cast it back to a float, if needed?

Of course, that would be slightly more expensive. But, importantly, it is also gives incorrect results in the sense that it is not equivalent to parsing directly to a float. In other words, these functions are not equivalent:

float parse1(const char * c) {
    char * end;
    return strtod(c, &end);
}

float parse2(const char * c) {
    char * end;
    return strtof(c, &end);
}

It is intuitive that if I first parse the number as a float and then cast it back to a double, I will have lost information in the process. Indeed, if I start with the string “3.14159265358979323846264338327950”, parsed as a float (32-bit), I get 3.1415927410125732421875. If I parse it as a double (32-bit), I get the more accurate result 3.141592653589793115997963468544185161590576171875. The difference is not so small, about 9e-08.

In the other direction, first parsing to a double and then casting back to a float, I can also lose information, although only a little bit due to the double rounding effect. To illustrate, suppose that I have the number 1.48 and that I round it in one go to the nearest integer: I get 1. If I round it first to a single decimal (1.5) and then to the nearest integer, I might get 2 using the usually rounding conventions (either round up, or round to even). Rounding twice is lossy and not equivalent to a single rounding operation. Importantly, you lose a bit of precision in the sense that you may not get back the closest value.

With floating-point numbers, I get this effect with the string “0.004221370676532388” (for example). You probably cannot tell unless you are a machine, but parsing directly to a float is 2e-7 % more accurate.

In most applications, such a small loss of accuracy is not relevant. However, if you ever find yourself having to compare results with another program, you may get inconsistent results. It can make debugging more difficult.

Further reading: Floating-Point Determinism (part of a series)

Science and Technology links (Novembre 28th 2021)

  1. Government-funded research is getting more political and less diverse:

    The frequency of documents containing highly politicized terms has been increasing consistently over the last three decades. The most politicized field is Education & Human Resources. The least are Mathematical & Physical Sciences and Computer & Information Science & Engineering, although even they are significantly more politicized than any field was in 1990. At the same time, abstracts have been becoming more similar to each other over time. Taken together, the results imply that there has been a politicization of scientific funding in the US in recent years and a decrease in the diversity of ideas supported.

  2. Parabiosis is the process of tying the blood vessels of animals. Zhang et al. proceeded with parabiosis between young and old mice, followed by a detachment period. The old mice lived longer than control mice and they appear to have been rejuvenated by the parabiosis.
  3. A single injection to enable paralyzed mice to walk again.
  4. The Artic has been warming for much longer than we thought.
  5. Dog fed only once daily are healthier.
  6. India fertility rate is now below replacement rate.

Are tenured professors more likely to speak freely?

University professors often have robust job security after a time: they receive tenure. It means that they usually do not have to worry about applying for a new job after a few years.

Tenure is not available in all countries. Countries like Australia reassess positions every few years.

So why does it exist where it does?

One of the justifications for tenure is that professors who have tenure can speak more freely. Thus, in theory, they can be critical of government or corporate policies.

Do they? What would “speaking freely” entails?

What about denouncing a colleague who commits blatant fraud? On this front, the evidence is not great. Diederik Stapel published well over 100 research papers in prestigious journals. He was fired when it was determined that he was making up all of his research data. It took outsiders (students) to report him. Harvard professor Marc Hauser published over 200 papers in the best journals, making up data as he went. It took naive students to report the fraud. We find too many examples of over fraud in science, and rarely do we find that the close colleagues, the ones who should first spot the problems, report them. Brian Wansink was another famous professor who published countless papers based on fraudulent practices. It took media pressure as well as an investigation lead by a non-academic book publisher to take him down. I could go on. Professors rarely denounce other professors.

What about teaching controversial courses or engaging in disliked research? Ceci et al. found little evidence:

The findings from the present survey suggest that tenure itself does not result in faculty members routinely teaching courses that their senior colleagues disfavor, nor in their conducting research that their senior colleagues dislike.

Whenever professors tell me that they feel free to hold controversial ideas thanks to their tenure… I ask about the positions that they took recently that would endanger their job security if not for tenure. I might also ask about the positions that they might take that would endanger their chances of getting a research grant?

I should make it clear that being an advocate for transgenders’ rights or climate change is not controversial in 2021. I am looking for examples where a lone professor goes against the majority and would otherwise loose their job.

If not themselves, then I might ask about other professors that did so. And we will find some of them. In Canada, we have Jordan Peterson at the University of Toronto, for example. Yet I do not consider Peterson particularly controversial. In fact, at the core, Peterson is merely a politically conservative professor. We used to  have a healthy share of politically conservative professors. It is only in the academy that it is highly troublesome to be politically conservative. A truly controversial professor was Denis Rancourt who thought important to question the foundation of the academy. Sadly Rancourt was fired (tenure did not protect him). I might throw in with Rancourt people like Bret Weinstein, Kathleen Stock and so forth.

So while tenure might protect professors  if they want to hold controversial ideas… professors are trained to seek prestige: they avoid serious controversy when they can. Holding controversial views puts one’s reputation in danger. The overwhelming majority will never publicly hold controversial views. They are happy to go along with whatever is the policy of the day, whatever is fashionable.

It does not follow the tenure is useless. It appears that tenured professor are often more productive. Indeed, it is possible that if you do not have to worry about where your next job will be, you can concentrate more on your teaching and your research.

More content: Gad Saad and Pat Kambhampati are controversial tenured professors in Montreal. They are not the average professor.

Converting integers to fix-digit representations quickly

It is tricky to convert integers into strings because the number of characters can vary according to the amplitude of the integer. The integer ‘1’ requires a single character whereas the integer ‘100’ requires three characters. So a solution might possible need a hard-to-predict branch.

Let us simplify the problem.

Imagine that you want to serialize integers to fixed-digit strings. Thus you may want to convert 16-digit integers (up to 10000000000000000) to exactly 16 digits, including leading zeros if needed. In this manner, it is easy to write code that contains only trivial branches.

The simplest approach could be a character-by-character routine where I use the fact that the character ‘0’ in ASCII is just 0x30 (in hexadecimal):

void to_string_backlinear(uint64_t x, char *out) {
    for(int z = 0; z < 16; z++) {
        out[15-z] = (x % 10) + 0x30;
        x /= 10;
    }
}

It is somewhat strange to write the characters backward, starting from the less significant digit. You can try to go forward, but it is a bit trickier. Here is one ugly approach that is probably not efficient:

void to_string_linear(uint64_t x, char *out) {
  out[0] = x / 1000000000000000 + 0x30;
  x %= 1000000000000000;
  out[1] = x / 100000000000000 + 0x30;
  x %= 100000000000000;
  out[2] = x / 10000000000000 + 0x30;
  x %= 10000000000000;
  out[3] = x / 1000000000000 + 0x30;
  x %= 1000000000000;
  out[4] = x / 100000000000 + 0x30;
  x %= 100000000000;
  out[5] = x / 10000000000 + 0x30;
  x %= 10000000000;
  out[6] = x / 1000000000 + 0x30;
  x %= 1000000000;
  out[7] = x / 100000000 + 0x30;
  x %= 100000000;
  out[8] = x / 10000000 + 0x30;
  x %= 10000000;
  out[9] = x / 1000000 + 0x30;
  x %= 1000000;
  out[10] = x / 100000 + 0x30;
  x %= 100000;
  out[11] = x / 10000 + 0x30;
  x %= 10000;
  out[12] = x / 1000 + 0x30;
  x %= 1000;
  out[13] = x / 100 + 0x30;
  x %= 100;
  out[14] = x / 10 + 0x30;
  x %= 10;
  out[15] = x + 0x30;
}

Instead we could try to do it in a tree-like manner to reduce data dependency during the computation and hope for more core-level parallelism. We first divide the integer 100000000 to compute the first and last 8 digits separately, and so forth. It should drastically decrease data dependencies:

void to_string_tree(uint64_t x, char *out) {
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;      
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;
  uint64_t toptoptop = toptop / 100;
  uint64_t toptopbottom = toptop % 100;
  uint64_t topbottomtop = topbottom / 100;
  uint64_t topbottombottom = topbottom % 100;
  uint64_t bottomtoptop = bottomtop / 100;
  uint64_t bottomtopbottom = bottomtop % 100;
  uint64_t bottombottomtop = bottombottom / 100;
  uint64_t bottombottombottom = bottombottom % 100;
  //
  out[0] = toptoptop / 10 + 0x30;
  out[1] = toptoptop % 10 + 0x30;
  out[2] = toptopbottom / 10 + 0x30;
  out[3] = toptopbottom % 10 + 0x30;
  out[4] = topbottomtop / 10 + 0x30;
  out[5] = topbottomtop % 10 + 0x30;
  out[6] = topbottombottom / 10 + 0x30;
  out[7] = topbottombottom % 10 + 0x30;
  out[8] = bottomtoptop / 10 + 0x30;
  out[9] = bottomtoptop % 10 + 0x30;
  out[10] = bottomtopbottom / 10 + 0x30;
  out[11] = bottomtopbottom % 10 + 0x30;
  out[12] = bottombottomtop / 10 + 0x30;
  out[13] = bottombottomtop % 10 + 0x30;
  out[14] = bottombottombottom / 10 + 0x30;
  out[15] = bottombottombottom % 10 + 0x30;
}

We could also try to accelerate the computation with table lookups. We want to keep the tables small. We can effectively process the tail end of the tree-based technique by looking up small integers smaller than 100 by looking up their conversion: the integer 12 becomes the 2-character string ’12’ and so forth (my code could be nicer):

void to_string_tree_table(uint64_t x, char *out) {
  static const char table[200] = {
      0x30, 0x30, 0x30, 0x31, 0x30, 0x32, 0x30, 0x33, 0x30, 0x34, 0x30, 0x35,
      0x30, 0x36, 0x30, 0x37, 0x30, 0x38, 0x30, 0x39, 0x31, 0x30, 0x31, 0x31,
      0x31, 0x32, 0x31, 0x33, 0x31, 0x34, 0x31, 0x35, 0x31, 0x36, 0x31, 0x37,
      0x31, 0x38, 0x31, 0x39, 0x32, 0x30, 0x32, 0x31, 0x32, 0x32, 0x32, 0x33,
      0x32, 0x34, 0x32, 0x35, 0x32, 0x36, 0x32, 0x37, 0x32, 0x38, 0x32, 0x39,
      0x33, 0x30, 0x33, 0x31, 0x33, 0x32, 0x33, 0x33, 0x33, 0x34, 0x33, 0x35,
      0x33, 0x36, 0x33, 0x37, 0x33, 0x38, 0x33, 0x39, 0x34, 0x30, 0x34, 0x31,
      0x34, 0x32, 0x34, 0x33, 0x34, 0x34, 0x34, 0x35, 0x34, 0x36, 0x34, 0x37,
      0x34, 0x38, 0x34, 0x39, 0x35, 0x30, 0x35, 0x31, 0x35, 0x32, 0x35, 0x33,
      0x35, 0x34, 0x35, 0x35, 0x35, 0x36, 0x35, 0x37, 0x35, 0x38, 0x35, 0x39,
      0x36, 0x30, 0x36, 0x31, 0x36, 0x32, 0x36, 0x33, 0x36, 0x34, 0x36, 0x35,
      0x36, 0x36, 0x36, 0x37, 0x36, 0x38, 0x36, 0x39, 0x37, 0x30, 0x37, 0x31,
      0x37, 0x32, 0x37, 0x33, 0x37, 0x34, 0x37, 0x35, 0x37, 0x36, 0x37, 0x37,
      0x37, 0x38, 0x37, 0x39, 0x38, 0x30, 0x38, 0x31, 0x38, 0x32, 0x38, 0x33,
      0x38, 0x34, 0x38, 0x35, 0x38, 0x36, 0x38, 0x37, 0x38, 0x38, 0x38, 0x39,
      0x39, 0x30, 0x39, 0x31, 0x39, 0x32, 0x39, 0x33, 0x39, 0x34, 0x39, 0x35,
      0x39, 0x36, 0x39, 0x37, 0x39, 0x38, 0x39, 0x39,
  };
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;
  uint64_t toptoptop = toptop / 100;
  uint64_t toptopbottom = toptop % 100;
  uint64_t topbottomtop = topbottom / 100;
  uint64_t topbottombottom = topbottom % 100;
  uint64_t bottomtoptop = bottomtop / 100;
  uint64_t bottomtopbottom = bottomtop % 100;
  uint64_t bottombottomtop = bottombottom / 100;
  uint64_t bottombottombottom = bottombottom % 100;
  //
  memcpy(out, &table[2 * toptoptop], 2);
  memcpy(out + 2, &table[2 * toptopbottom], 2);
  memcpy(out + 4, &table[2 * topbottomtop], 2);
  memcpy(out + 6, &table[2 * topbottombottom], 2);
  memcpy(out + 8, &table[2 * bottomtoptop], 2);
  memcpy(out + 10, &table[2 * bottomtopbottom], 2);
  memcpy(out + 12, &table[2 * bottombottomtop], 2);
  memcpy(out + 14, &table[2 * bottombottombottom], 2);
}

You can extend this trick if you are willing to include a 40kB table in your code:

void to_string_tree_bigtable(uint64_t x, char *out) {
  #include "bigtable.h"

  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  //
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;

  memcpy(out, &bigtable[4 * toptop], 4);
  memcpy(out + 4, &bigtable[4 * topbottom], 4);
  memcpy(out + 8, &bigtable[4 * bottomtop], 4);
  memcpy(out + 12, &bigtable[4 * bottombottom], 4);
}

An intermediate solution with a 3-character table would only require a 3kB table. I also consider Muła’s SIMD-based approach though I refer you to his article for details. Effectively Muła use fancy Intel-specific instructions to get the job done.

If you cannot use SIMD instructions, you can use something similar called SWAR. Effectively, you pack several integer values inside a wide integer (64 bits) and you try to somehow save instructions. Luckily, Khuong has a solution for us:

// credit: Paul Khuong
uint64_t encode_ten_thousands(uint64_t hi, uint64_t lo) {
  uint64_t merged = hi | (lo << 32);
  uint64_t top = ((merged * 10486ULL) >> 20) & ((0x7FULL << 32) | 0x7FULL);
  uint64_t bot = merged - 100ULL * top;
  uint64_t hundreds;
  uint64_t tens;
  hundreds = (bot << 16) + top;
  tens = (hundreds * 103ULL) >> 10;
  tens &= (0xFULL << 48) | (0xFULL << 32) | (0xFULL << 16) | 0xFULL;
  tens += (hundreds - 10ULL * tens) << 8;

  return tens;
}

void to_string_khuong(uint64_t x, char *out) {
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  uint64_t first =
      0x3030303030303030 + encode_ten_thousands(top / 10000, top % 10000);
  memcpy(out, &first, sizeof(first));
  uint64_t second =
      0x3030303030303030 + encode_ten_thousands(bottom / 10000, bottom % 10000);
  memcpy(out + 8, &second, sizeof(second));
}

I refer you to Khuong’s blog post for a description.

I wrote a small benchmark in C++ which measures the time per integer. Remember that every call to my functions produces 16 digits, exactly.

function Apple M1, LLVM 12 AMD Zen 2, GCC 10
linear 14 ns 25 ns
backward linear 7.7 ns 18 ns
tree 6.9 ns 15 ns
Khuong 3.3 ns 8.0 ns
small table 3.7 ns 7.1 ns
SIMD non-available 4.8 ns
big table 1.5 ns 2.9 ns

On both processors, the crazy big-table (40kB) approach is about 2 times faster than the version with a small table. Though a big-table can be justified in some instances, my feeling is that only in niche applications would such a big table be acceptable for such a narrow task. Even a smaller 3kB seems like an overkill given the good results we get with a small table.

The SIMD approach has a rather minimal gain compared to the version with a small table (merely 25%).

At a glance, the small table wins on practical ground. It is small, simple and portable.

 

Science and Technology links (Novembre 13rd 2021)

  1. Pacific rougheye rockfish can live hundreds of years while other rockfish barely live past ten years.
  2. Female condors can reproduce without males. The phenomenon is known as parthenogenesis and it occurs in birds such as chickens. It does not happen in mammals naturally as far as we know.
  3. Chimpanzees can thrive in virtual reality.
  4. People have a good opinion of their own environmental impact and they are biased against the behaviour of others. In other words, we are environmental hypocrites.
  5. A majority of minimum wage workers are age 15 to 24 in Canada.
  6. Though we tend to view American businesses as uniquely innovative, Nordic countries in Europe may be just as innovative, albeit with a small population. This could be explained in part by the fact that Nordic countries have advantageous tax regimes for businesses.
  7. Smartphone vendors like to scan our content for illegal pictures. From databases of pictures, they seek to detect pictures that you own that might be visually similar. Unfortunately, it appears that the current breed of algorithms are easily fooled.
  8. Eating several eggs a day might improve your cardiovascular health.
  9. Some mice can regenerate their kidneys without scars.
  10. About half of all patents have to do with software. They are filed from a few major tech centres, unlike other patents which have more varied origins.

Checking simple equations or inequalities with z3

When programming, you sometimes need to make sure that a given formula is correct. Of course, you can rely on your mastery of high-school mathematics, but human beings, in general, are terrible at formal mathematics.

Thankfully, you can outsource simple problems to a software library. If you are a Python user, you can install z3 relatively with pip: pip install z3-solver. You are then good to go!

Suppose that you want to be sure that ( 1 + y ) / 2 < y for all 32-bit integers y. You can check it with the following Python script:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
if(s.check() == z3.sat):
    model = s.model()
    print(model)

We construct a “bit vector” spanning 32 bits to represent our integer. The z3 library considers such a number by default as a signed integer going from -2147483648 to 2147483647.

We check the reverse inequality, because we are looking for a counterexample. When no counterexample can be found, we know that our inequality holds.

Running the above script, Python prints back the integer 2863038463. How can this be given that we said that z3 assumed that we were using numbers from -2147483648 to 2147483647?  The z3 library always prints out a positive integer, and it is our job to reinterpret it: the number 2147483647 is fine, the number 2147483648 becomes -2147483648, the number 2147483649 becomes -2147483647 and so forth. This representation which “wraps around” is called two’s complement. So 2863038463 is actually interpreted as a negative value. The exact value is not too important: what matters is that our inequality (( 1 + y ) / 2 < y) is false when y is negative. It is clear enough: setting the variable to -1, I get 0 < -1.  For zero, the inequality is also false. Let me add as a condition that the variable is positive:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This time, I get 1 as a possible solution. Damn it! Let us exclude it as well:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

Given this last script nothing is printed out which proves that my inequality is valid, as long as the variable is greater than 1.

Given that we have showed that ( 1 + y ) / 2 < y, then maybe ( 1 + y )  < 2 * y is true as well?

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) >= 2 * y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This script returns 1412098654. Twice that value is 2824197308 which is interpreted as a negative number using two’s complement. To avoid the overflow, we can put another condition on the variable:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )
s.add( y <  2147483647/2)

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This time the result holds!

As you can see, it is quite a lot of work and it unlikely that anyone would be able to fully test out a non-trivial program in this manner.

Stop spending so much time being trolled by billionaire corporations!

As a kid, my parents would open the television set, and we would get to watch whatever the state television decided we would watch. It was a push model. Some experts pick the content you need and they deliver it to you. You have little say in the matter. There was one newspaper in my town. We had two or three TV channels. Traditional schools also operate on a push model: the teacher decides whatever you get to learn.

So I got to watch hours and hours of incredibly boring TV shows because there was nothing good on. I was very interested in computers and science, but there was almost nothing relevant in the major news sources. When personal computers became popular, I quickly learned more about them than any journalist had.

In the early days of the Internet, people wrote on posting boards. Some started blogs. To this day, I get much of my online news by an RSS aggregator which collects information from various blogs and news sites. An RSS aggregator simply picks up all of the news items from various sites, and it lays it out sequentially. You do not like a news source? You unsubscribe. Mailing lists work similarly: you get emails whenever someone has new content.

This model has been described as “pull” oriented. You pick your sources. You sidestep the experts. For someone like myself, it was incredibly liberating. As the pull model grew, many people feared that old-school journalism would die. It also challenged conventional education.

Within this emerging framework, Silicon Valley invented Twitter, Facebook and other networks. At first they worked much like posting boards and blogs. You launched Twitter and you got the tweets of the people you followed. You did not like someone’s tweets? You just unfollowed them. Google even co-opted the RSS reader by creating a fantastic tool called Google Reader, which put you in control.

However, mostly, the industry moved in a different direction. They took control of what you see and read. Brilliant engineers are hard at work making sure that you remain glued to your screen. So they find content that you may like and push it to you. Google closed Google Reader, decimating the RSS reader community. Whereas you could count on the Google search engine delivering the documents containing the keywords you are search, you are increasingly facing a curated list of links.

We are back at a push model which is not unlike how things were when I was a kid. The likes of Twitter, Facebook and Google feel like they get to decide what I see.

Richard Startin describes my feeling in a couple of tweets:

As the user, you are no longer in control. The corporation is wrestling back full control of what you get to watch and read.

TikTok is one such tool where you just open the application and watch whatever they want you to watch. You become some kind of automaton.

Of course, deciding what people watch and read is valuable. It is a great way to make money. But it also becomes a politically valuable power. And so, it now seems unavoidable that multiple countries in the world will regulate these sites to make sure that you watch the right “things”.

Maybe Richard Startin wants to read about what programmers have to say, but what if some corporation or some government feels that he needs to be made aware of some important bit of information, what then ?

Thankfully some tools are still leaving you in control:

    1. Blogs are still out there. I had 50,000 visitors last month. You can still use RSS readers. You can subscribe to blogs like mine by email.
    2. I have recently discovered the fantastic substack community.
    3. Telegram is pretty decent as a secured news aggregator. My blog has a telegram channel. Nobody needs to know what you are reading.
    4. Twitter has a hidden feature (twitter list) which lets you subscribe to specific individuals and only see content from these individuals.
    5. DuckDuckGo is a fantastic search engine which mostly gives me what I am looking for instead of what it thinks I should find.
    6. Do not underestimate books. Contrary to what you may have heard, you can still order paper books. The great thing about a paper book is that nobody needs to know what you are reading and when. If you like books and programming, you can grab Performance Analysis and Tuning on Modern CPUs for example. I have many recommendations on the sidebar of my blog.
    7. There are fantastic podcasts out there. Spotify has some great stuff, but you can find many others on other platforms. If you like programming, you might want to check corecursive. Joe Rogan is also fantastic. There are many others.

Being in control takes work. It also requires you to sometimes pay real money. But do you really want to have your attention is sold and manipulated?

I am not advocating that anyone leaves Twitter, Facebook or tiktok, but we should all diversify our information sources. Be conscious that Twitter, Facebook, Google and others are in the business of manipulating your attention for their interest and the interests of their clients.

Be smarter!

Related: The Social Dilemma.

Science and Technology (October 31st 2021)

  1. Though exoskeletons are exciting and they allow some of us to carry one with physical activities despite handicaps, they appear to require quite a bit of brain power. In effect, though they may help you move, they require a lot of mental effort which can be distracting.
  2. It is difficult to make people smarter by changing their environment. Socioeconomic background effects on children’s cognitive development and student achievement are likely to be spurious according to new research.
  3. There is a U-shaped association between cardiovascular health and physical activity. A moderate amount of physical activity is really good for you, but you can do too much.
  4. In richer countries where women are treated as equals to men, there are greater differences between boys’ and girls’ aspirations. For example, boys are more likely to be attracted to science compared to girls in a rich country.
  5. Scientists spend ever more time in school for relatively few desirable positions. These scientists are then ever more likely to pursue post-doctoral positions. These longer studies are a significant net loss of their lifetime income.

In C, how do you know if the dynamic allocation succeeded?

In the C programming language, we allocate memory dynamically (on the heap) using the malloc function. You pass malloc a size parameter corresponding to the number of bytes you need. The function returns either a pointer to the allocated memory or the NULL pointer if the memory could not be allocated.

Or so you may think. Let us write a program that allocates 1 terabytes of memory and then tries to write to this newly allocated memory:

#include <stdio.h>
#include <stdlib.h>

int main() {
  size_t large = 1099511627776;
  char *buffer = (char *)malloc(large);
  if (buffer == NULL) {
    printf("error!\n");
    return EXIT_FAILURE;
  }
  printf("Memory allocated\n");
  for (size_t i = 0; i < large; i += 4096) {
    buffer[i] = 0;
  }
  free(buffer);
  return EXIT_SUCCESS;
}

After running and compiling this program, you would expect to either get the message “error!”, in which case the program terminates immediately… or else you might expect to see “Memory allocated” (if 1 terabyte of memory is available) in which case the program should terminate successfully.

Under both macOS/clang and Linux/GCC, I find that the program prints “Memory allocated” and then crashes.

What is happening?

The malloc call does allocate memory, but it will almost surely allocate “virtual memory”. At the margin, it could be that no physical memory is allocated at all. The system merely sets aside the address space for the memory allocation. It is when you try to use the memory that the physical allocation happens.

It may then fail.

It means that checking the memory allocation was a success is probably much less useful than you might have imagined when calling malloc.

It also leads to confusion when people ask how much memory a program uses. It is wrong to add up the calls to malloc because you get the virtual memory usage. If you use a system-level tool that reports the memory usage of your processes, you should look at the real memory usage. Here is the current memory usage of a couple of processes on my laptop currently:

process memory (virtual) memory (real)
qemu 3.94 GB 32 MB
safari 3.7 GB 180 MB

Dynamic memory is allocated in pages (e.g., 4 kB) and it is often much smaller than the virtual memory. Doing “malloc(x)” is not the same as taking x bytes of physical memory.

In general, it is therefore a difficult question to know whether the allocation succeeded when relying on malloc. You may only find out when you write and read to the newly allocated memory.

In C++, is empty() faster than comparing the size with zero?

Most C++ programmers rely on “STL” for their data structures. The most popular data structure is probably vector, which is just a dynamic array. The set and the map are other useful ones.

The STL data structures are a minimalist design. You have relatively few methods. All of them allow you to compute the size of the data structure, that is, how many elements it contains, via the size() method. In recent C++ (C++11), the size() method must have constant-time complexity for all containers. To put it in clearer terms, the people implementing the containers can never scan the content to find out the number of elements.

These containers also have another method called empty() which simply returns true of the container is… well… empty. Obviously, an equivalent strategy would be to compare the size with zero:  mystruct.size() == 0.

Determining whether a data structure is empty is conceptually easier than determining its size. Thus, at least in theory, calling empty() could be faster.

Inspecting the assembly output, I find that recent versions of GCC produce nearly identical code for the comparison of the size and the empty call. The exception being the list data structure where the assembly is slightly different, but not in a manner that should affect performance.

Of course, there are different implementations of C++ and it is possible that other implementations could provide more efficient code when calling empty(). An interesting question is whether effort is needed from the compiler.

Travis Downs wrote a list data structure by hand, but with a size() function that is linear time. He then implemented the empty function naively:

struct node {
    struct node *next;
    int payload;
};

int count_nodes(const node* p) {
    int size = 0;
    while (p) {
        p = p->next;
        size++;
    }
    return size;
}

bool is_not_empty(const node* p) {
    return count_nodes(p) > 0;
}

Amazingly, we find that the GCC compiler is able to compile Travis’ is_not_empty C++ function to constant-time code. The compiler inlines the count_nodes function into is_empty. Then the compiler figures out that as soon as you enter the loop once with count_nodes, then size is going to be greater than zero, so there is no need to keep looping.

However, the optimisation is not robust. Suppose that I wish instead to return an unsigned type instead of Travis’ int value. The problem with an unsigned type is that I might overflow if the list is very long. With a signed integer, the compiler is allowed to assume that overflows do not happen. It could be difficult for the compiler to tell whether count_nodes() return 0 or not, if the compiler must handle overflows. To handle this potential issue, I can forcefully bound the return value of count_nodes() to be no more than 1000. If I change the code to return a standard size_t type, like so…

#include <cstddef>

struct node {
    struct node *next;
    int payload;
};

size_t count_nodes(const node* p) {
    size_t size = 0;
    while (p) {
        p = p->next;
        size++;
        if(size == 1000) { 
            return 1000; 
        }
    }
    return size;
}

bool is_not_empty(const node* p) {
    return count_nodes(p) > 0;
}

Sadly, GCC is now unable to optimize away the call. Maybe compilers are not yet all-powerful beings?

The lesson is that it is probably wise to get in the habit of calling directly empty() if you care about performance. Though it may not help much with modern STL data structures, in other code it could be different.

Of course, another argument is that the call to empty()  is shorter and cleaner.

Credit: This blog post was motivated by a tweet by Richard Startin.