What is the ‘range’ of a number type?

In programming, we often represent numbers using types that have specific ranges. For example, 64-bit signed integer types can represent all integers between -9223372036854775808 and 9223372036854775807, inclusively. All integers inside this range are valid, all integers outside are “out of range”. It is simple.

What about floating-point numbers? The nuance with floating-point numbers is that they cannot represent all numbers within a continuous range. For example, the real number 1/3 cannot be represented using binary floating-point numbers. So the convention is that given a textual representation, say “1.1e100”, we seek the closest approximation.

Still, are there ranges of numbers that you should not represent using floating-point numbers? That is, are there numbers that you should reject?

It seems that there are two different interpretation:

  • My own interpretation is that floating-point types can represent all numbers from -infinity to infinity, inclusively. It means that ‘infinity’ or 1e9999 are indeed “in range”. For 64-bit IEEE floating-point numbers, this means that numbers smaller than 4.94e-324 but greater than 0 can be represented as 0, and that numbers greater than 1.8e308 should be infinity. To recap, all numbers are always in range.
  • For 64-bit numbers, another interpretation is that only numbers in the ranges 4.94e-324 to 1.8e308 and -1.8e308 to -4.94e-324, together with exactly 0, are valid. Numbers that are too small (less than 4.94e-324 but greater than 0) or numbers that are larger than 1.8e308 are “out of range”. Common implementations of the strtod function or of the C++ equivalent follow this convention.

This matters because the C++ specification for the from_chars functions state that

If the parsed value is not in the range representable by the type of value, value is unmodified and the member ec of the return value is equal to errc​::​result_­out_­of_­range.

I am not sure programmers have a common understanding of this specification.

How programmers make sure that their software is correct

Our most important goal in writing software is that it be correct. The software must do what the programmer wants it to do. It must meet the needs of the user.

In the business world, double-entry bookkeeping is the idea that transactions are recorded in at least two accounts (debit and credit). One of the advantages of double-entry bookkeeping, compared to a more naive approach, is that it allows for some degree of auditing and error finding. If we compare accounting and software programming, we could say that double-entry accounting and its subsequent auditing is equivalent to software testing.

For an accountant, converting a naive accounting system into a double-entry system is a difficult task in general. In many cases, one would have to reconstruct it from scratch. In the same manner, it can be difficult to add tests to a large application that has been developed entirely without testing. And that is why testing should be first on your mind when building serious software.

A hurried or novice programmer can quickly write a routine, compile and run it and be satisfied with the result. A cautious or experienced programmer will know not to assume that the routine is correct.

Common software errors can cause problems ranging from a program that abruptly terminates to database corruption. The consequences can be costly: a software bug caused the explosion of an Ariane 5 rocket in 1996 (Dowson, 1997). The error was caused by the conversion of a floating point number to a signed integer represented with 16 bits. Only small integer values could be represented. Since the value could not be represented, an error was detected and the program stopped because such an error was unexpected. The irony is that the function that triggered the error was not required: it had simply been integrated as a subsystem from an earlier model of the Ariane rocket. In 1996 U.S. dollars, the estimated cost of this error is almost $400 million.

The importance of producing correct software has long been understood. The best scientists and engineers have been trying to do this for decades.

There are several common strategies. For example, if we need to do a complex scientific calculation, then we can ask several independent teams to produce an answer. If all the teams arrive at the same answer, we can then conclude that it is correct. Such redundancy is often used to prevent hardware-related faults (Yeh, 1996). Unfortunately, it is not practical to write multiple versions of your software in general.

Many of the early programmers had advanced mathematical training. They hoped that we could prove that a program is correct. By putting aside the hardware failures, we could then be certain that we would not encounter any errors. And indeed, today we have sophisticated software that allows us to sometimes prove that a program is correct.

Let us consider an example of formal verification to illustrate our point. We can use the z3 library from Python (De Moura and Bjørner, 2008). If you are not a Python user, don’t worry: you don’t have to be to follow the example. We can install the necessary library with the command pip install z3-solver or the equivalent. Suppose we want to be sure that the inequality ( 1 + y ) / 2 < y holds for all 32-bit integers. We can use the following script:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
if(s.check() == z3.sat):
    model = s.model()
    print(model)

In this example we construct a 32-bit word (BitVec) to represent our example. By default, the z3 library interprets the values that can be represented by such a variable as ranging from -2147483648 to 2147483647 (from \(-2^{31}\) to \(2^{31}-1\) inclusive). We enter the inequality opposite to the one we wish to show (( 1 + y ) / 2 >= y). If z3 does not find a counterexample, then we will know that the inequality ( 1 + y ) / 2 < y holds.

When running the script, Python displays the integer value 2863038463 which indicates that z3 has found a counterexample. The z3 library always gives a positive integer and it is up to us to interpret it correctly. The number 2147483648 becomes -2147483648, the number 2147483649 becomes -2147483647 and so on. This representation is often called the two’s complement. Thus, the number 2863038463 is in fact interpreted as a negative number. Its exact value is not important: what matters is that our inequality (( 1 + y ) / 2 < y) is incorrect when the variable is negative. We can check this by giving the variable the value -1, we then get 0 < -1. When the variable takes the value 0, the inequality is also false (0<0). We can also check that the inequality is false when the variable takes the value 1. So let us add as a condition that the variable is greater than 1 (s.add( y > 1 )):

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

Since the latter script does not display anything on the screen when it is executed, we can conclude that the inequality is satisfied as long as the variable of variable is greater than 1.

Since we have shown that the inequality ( 1 + y ) / 2 < y is true, perhaps the inequality ( 1 + y ) < 2 * y is true too? Let’s try it:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) >= 2 * y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This script will display 1412098654, half of 2824197308 which is interpreted by z3 as a negative value. To avoid this problem, let’s add a new condition so that the double of the variable can still be interpreted as a positive value:


import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )
s.add( y < 2147483647/2)

if(s.check() == z3.sat):
model = s.model()
print(model)

This time the result is verified. As you can see, such a formal approach requires a lot of work, even in relatively simple cases. It may have been possible to be more optimistic in the early days of computer science, but by the 1970s, computer scientists like Dijkstra were expressing doubts:

we see automatic program verifiers verifying toy programs and one observes the honest expectation that with faster machines with lots of concurrent processing, the life-size problems wiIl come within reach as well. But, honest as these expectations may be, are they justified? I sometimes wonder… (Dijkstra, 1975)

It is impractical to apply such a mathematical method on a large scale. Errors can take many forms, and not all of these errors can be concisely presented in mathematical form. Even when it is possible, even when we can accurately represent the problem in a mathematical form, there is no reason to believe that a tool like z3 will always be able to find a solution: when problems become difficult, computational times can become very long. An empirical approach is more appropriate in general.

Over time, programmers have come to understand the need to test their software. It is not always necessary to test everything: a prototype or an example can often be provided without further validation. However, any software designed in a professional context and having to fulfill an important function should be at least partially tested. Testing allows us to reduce the probability that we will have to face a disastrous situation.

There are generally two main categories of tests.

  • There are unit tests. These are designed to test a particular component of a software program. For example, a unit test can be performed on a single function. Most often, unit tests are automated: the programmer can execute them by pressing a button or typing a command. Unit tests often avoid the acquisition of valuable resources, such as creating large files on a disk or making network connections. Unit testing does not usually involve reconfiguring the operating system.
  • Integration tests aim to validate a complete application. They often require access to networks and access to sometimes large amounts of data. Integration tests sometimes require manual intervention and specific knowledge of the application. Integration testing may involve reconfiguring the operating system and installing software. They can also be automated, at least in part. Most often, integration tests are based on unit tests that serve as a foundation.

Unit tests are often part of a continuous integration process (Kaiser et al., 1989). Continuous integration often automatically performs specific tasks including unit testing, backups, applying cryptographic signatures, and so on. Continuous integration can be done at regular intervals, or whenever a change is made to the code.

Unit tests are used to structure and guide software development. Tests can be written before the code itself, in which case we speak of test-driven development. Often, tests are written after developing the functions. Tests can be written by programmers other than those who developed the functions. It is sometimes easier for independent developers to provide tests that are capable of uncovering errors because they do not share the same assumptions.

It is possible to integrate tests into functions or an application. For example, an application may run a few tests when it starts. In such a case, the tests will be part of the distributed code. However, it is more common not to publish unit tests. They are a component reserved for programmers and they do not affect the functioning of the application. In particular, they do not pose a security risk and they do not affect the performance of the application.

Experienced programmers often consider tests to be as important as the original code. It is therefore not uncommon to spend half of one’s time on writing tests. The net effect is to substantially reduce the initial speed of writing computer code. Yet this apparent loss of time often saves time in the long run: setting up tests is an investment. Software that is not well tested is often more difficult to update. The presence of tests allows us to make changes or extensions with less uncertainty.

Tests should be readable, simple and they should run quickly. They often use little memory.

Unfortunately, it is difficult to define exactly how good tests are. There are several statistical measures. For example, we can count the lines of code that execute during tests. We then talk about test coverage. A coverage of 100% implies that all lines of code are concerned by the tests. In practice, this coverage measure can be a poor indication of test quality.

Consider this example:

package main

import (
    "testing"
)


func Average(x, y uint16) uint16 {
   return (x + y)/2
}

func TestAverage(t *testing.T) {
    if Average(2,4) != 3 {
        t.Error(Average(2,4))
    }
}

In the Go language, we can run tests with the command go test. We have an Average function with a corresponding test. In our example, the test will run successfully. The coverage is 100%.

Unfortunately, the Average function may not be as correct as we would expect. If we pass the integers 40000 and 40000 as parameters, we would expect the average value of 40000 to be returned. But the integer 40000 added to the integer 40000 cannot be represented with a 16-bit integer (uint16): the result will be instead (40000+4000)%65536=14464. So the function will return 7232 which may be surprising. The following test will fail:


func TestAverage(t *testing.T) {
if Average(40000,40000) != 40000 {
t.Error(Average(40000,40000))
}
}

When possible and fast, we can try to test the code more exhaustively, like in this example where we include several values:

package main

import (
    "testing"
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
  for x := 0; x < 65536; x++ {
    for y := 0; y < 65536; y++ {
      m := int(Average(uint16(x),uint16(y)))
      if x < y {
        if m < x || m > y {
          t.Error("error ", x, " ", y)
        }           
      } else {
        if m < y || m > x {
          t.Error("error ", x, " ", y)
        }  
      }
    }
  } 
}

In practice, it is rare that we can do exhaustive tests. We can instead use pseudo-random tests. For example, we can generate pseudo-random numbers and use them as parameters. In the case of random tests, it is important to keep them deterministic: each time the test runs, the same values are tested. This can be achieved by providing a fixed seed to the random number generator as in this example:

package main

import (
    "testing"   
        "math/rand"
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
  rand.Seed(1234)
  for test := 0; test < 1000; test++ {
    x := rand.Intn(65536)
    y := rand.Intn(65536)
    m := int(Average(uint16(x),uint16(y)))
    if x < y {
      if m < x || m > y {
        t.Error("error ", x, " ", y)
      }           
    } else {
      if m < y || m > x {
        t.Error("error ", x, " ", y)
      }  
    }
  } 
}

Tests based on random exploration are part of a strategy often called fuzzing (Miller at al., 1990).

We generally distinguish two types of tests. Positive tests aim at verifying that a function or component behaves in an agreed way. Thus, the first test of our Average function was a positive test. Negative tests verify that the software behaves correctly even in unexpected situations. We can produce negative tests by providing our functions with random data (fuzzing). Our second example can be considered a negative test if the programmer expected small integer values.

The tests should fail when the code is modified (Budd et al., 1978). On this basis, we can also develop more sophisticated measures by testing for random changes in the code and ensuring that such changes often cause tests to fail.

Some programmers choose to generate tests automatically from the code. In such a case, a component is tested and the result is captured. For example, in our example of calculating the average, we could have captured the fact that Average(40000,40000) has the value 7232. If a subsequent change occurs that changes the result of the operation, the test will fail. Such an approach saves time since the tests are generated automatically. We can quickly and effortlessly achieve 100% code coverage. On the other hand, such tests can be misleading. In particular, it is possible to capture incorrect behaviour. Furthermore, the objective when writing tests is not so much their number as their quality. The presence of several tests that do not contribute to validate the essential functions of our software can even become harmful. Irrelevant tests can waste programmers’ time in subsequent revisions.

Finally, we review the benefits of testing: tests help us organize our work, they are a measure of quality, they help us document the code, they avoid regression, they help debugging and they can produce more efficient code.

Organization

Designing sophisticated software can take weeks or months of work. Most often, the work will be broken down into separate units. It can be difficult, until you have the final product, to judge the outcome. Writing tests as we develop the software helps to organize the work. For example, a given component can be considered complete when it is written and tested. Without the test writing process, it is more difficult to estimate the progress of a project since an untested component may still be far from being completed.

Quality

Tests are also used to show the care that the programmer has put into his work. They also make it possible to quickly evaluate the care taken with the various functions and components of a software program: the presence of carefully composed tests can be an indication that the corresponding code is reliable. The absence of tests for certain functions can serve as a warning.

Some programming languages are quite strict and have a compilation phase that validates the code. Other programming languages (Python, JavaScript) leave more freedom to the programmer. Some programmers consider that tests can help to overcome the limitations of less strict programming languages by imposing on the programmer a rigour that the language does not require.

Documentation

Software programming should generally be accompanied by clear and complete documentation. In practice, the documentation is often partial, imprecise, erroneous or even non-existent. Tests are therefore often the only technical specification available. Reading tests allows programmers to adjust their expectations of software components and functions. Unlike documentation, tests are usually up-to-date, if they are run regularly, and they are accurate to the extent that they are written in a programming language. Tests can therefore provide good examples of how the code is used.
Even if we want to write high-quality documentation, tests can also play an important role. To illustrate computer code, examples are often used. Each example can be turned into a test. So we can make sure that the examples included in the documentation are reliable. When the code changes, and the examples need to be modified, a procedure to test our examples will remind us to update our documentation. In this way, we avoid the frustrating experience of readers of our documentation finding examples that are no longer functional.

Regression

Programmers regularly fix flaws in their software. It often happens that the same problem occurs again. The same problem may come back for various reasons: sometimes the original problem has not been completely fixed. Sometimes another change elsewhere in the code causes the error to return. Sometimes the addition of a new feature or software optimization causes a bug to return, or a new bug to be added. When software acquires a new flaw, it is called a regression. To prevent such regressions, it is important to accompany every bug fix or new feature with a corresponding test. In this way, we can quickly become aware of regressions by running the tests. Ideally, the regression can be identified while the code is being modified, so we avoid regression. In order to convert a bug into a simple and effective test, it is useful to reduce it to its simplest form. For example, in our previous example with Average(40000,40000), we can add the detected error in additional test:

package main

import (
    "testing
)


func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}

func TestAverage(t *testing.T) {
   if Average(2,4) != 3 {
     t.Error("error 1")
   } 
   if Average(40000,40000) != 40000 {
     t.Error("error 2")
   }           
}

Bug fixing

In practice, the presence of an extensive test suite makes it possible to identify and correct bugs more quickly. This is because testing reduces the extent of errors and provides the programmer with several guarantees. To some extent, the time spent writing tests saves time when errors are found while reducing the number of errors.
Furthermore, an effective strategy to identify and correct a bug involves writing new tests. It can be more efficient on the long run than other debugging strategies such as stepping through the code. Indeed, after your debugging session is completed, you are left with new unit tests in addition to a corrected bug.

Performance

The primary function of tests is to verify that functions and components produce the expected results. However, programmers are increasingly using tests to measure the performance of components. For example, the execution speed of a function, the size of the executable or the memory usage can be measured. It is then possible to detect a loss of performance following a modification of the code. You can compare the performance of your code against a reference code and check for differences using statistical tests.

Conclusion

All computer systems have flaws. Hardware can fail at any time. And even when the hardware is reliable, it is almost impossible for a programmer to predict all the conditions under which the software will be used. No matter who you are, and no matter how hard you work, your software will not be perfect. Nevertheless, you should at least try to write code that is generally correct: it most often meets the expectations of users.
It is possible to write correct code without writing tests. Nevertheless, the benefits of a test suite are tangible in difficult or large-scale projects. Many experienced programmers will refuse to use a software component that has been built without tests.
The habit of writing tests probably makes you a better programmer. Psychologically, you are more aware of your human limitations if you write tests. When you interact with other programmers and with users, you may be better able to take their feedback into account if you have a test suite.

Suggested reading

  • James Whittaker, Jason Arbon, Jeff Carollo, How Google Tests Software, Addison-Wesley Professional; 1st edition (March 23 2012)
  • Lisa Crispin, Janet Gregory, Agile Testing: A Practical Guide for Testers and Agile Teams, Addison-Wesley Professional; 1st edition (Dec 30 2008)

Credit

The following Twitter users contributed ideas: @AntoineGrodin, @dfaranha, @Chuckula1, @EddyEkofo, @interstar, @Danlark1, @blattnerma, @ThuggyPinch, @ecopatz, @rsms, @pdimov2, @edefazio, @punkeel, @metheoryt, @LoCtrl, @richardstartin, @metala, @franck_guillaud, @__Achille__, @a_n__o_n, @atorstling, @tapoueh, @JFSmigielski, @DinisCruz, @jsonvmiller, @nickblack, @ChrisNahr, @ennveearr1, @_vkaku, @kasparthommen, @mathjock, @feO2x, @pshufb, @KishoreBytes, @kspinka, @klinovp, @jukujala, @JaumeTeixi

Science and Technology links (December 19th 2021)

  1. Becoming a physician increases the use of antidepressants, opioids, anxiolytics, and sedatives, especially for female physicians.
  2. When trying to reproduce results in cancer researchers, independent researchers find that the benefits are typically grossly exaggerated.
  3. A large planet has been found orbiting two suns. It is orbiting from very far away.
  4. NASA sent a probe in the atmosphere of the sun.
  5. As you age, you accumulate old (senescent) cells. These cells are dysfunctional and in sufficient quantity might cause harm to your body such as chronic inflammation. Researchers found a plant-derived compound (procyanidin C1 or PCC1) which can not only limit the harm due to senescent cells but also eliminate some of them. It is so beneficial that treated mice live longer.

Science and Technology links (December 4th 2021)

  1. It used to be that all the exciting new processors came from Intel and AMD, and they were meant for your PC. The mobile revolution changed that: it lead to the production of fantastic processors that used little energy. We are now moving back into laptops and servers. The leading supplier of servers is probably Amazon: if you want to run servers for your business, your first choice is often Amazon and its cloud computing infrastructure. And Amazon now makes their own processors. Initially, they made low-cost chips that were secondary to Intel and AMD powerful processors. Amazon released its latest processor, Graviton 3. It is a much wider and deeper chip:

    The N1 core used in the Graviton2 chip had an instruction fetch unit that was 4 to 8 instructions wide and 4 wide instruction decoder that fed into an 8 wide issue unit that included two SIMD units, two load/store units, three arithmetic units, and a branch unit. With the Perseus N2 core used in the Graviton3, there is an 8 wide fetch unit that feeds into a 5 wide to 8 wide decode unit, which in turn feeds into a 15 wide issue unit, which is basically twice as wide as that on the N1 core used in the Graviton2. The vector engines have twice as much width (and support for BFloat16 mixed precision operations) and the load/store. arithmetic, and branch units are all doubled up, too.

    It feels like we are witnessing a revolution in processors. Not only are corporations like Apple and Amazon making their own chips… they appear to match Intel in raw power.

  2. Despite generous Danish social policies, family influence on important child outcomes in Denmark is about as strong as it is in the United States.
  3. Clearing senescent (old) cells appears to also clear out diabetes (in mice).
  4. South Korea is building a self-sustaining floating city.
  5. A supplement based on alpha-ketoglutarate appears to have strong rejuvenation effects (in human beings).
  6. Metformin is a commonly used drug which is believed to have some anti-aging properties. However some research suggested that metformin could blunt the beneficial effects of exercise. A recent clinical trial indicates that these fears are unwarranted.
  7. Biological aging (senescence) could have evolved when animals first colonized the land, from the sea.
  8. Your favorite colour reveals nothing about your personality.

Can you safely parse a double when you need a float?

In C as well as many other programming languages, we have 32-bit and 64-bit floating-point numbers. They are often referred to as float and double. Most of systems today follow the IEEE 754 standard which means that you can get consistent results across programming languages and operating systems. Hence, it does not matter very much if you implement your software in C++ under Linux whereas someone else implements it in C# under Windows: if you both have recent systems, you can expect identical numerical outcomes when doing basic arithmetic and square-root operations.

When you are reading these numbers from a string, there are distinct functions. In C, you have strtof and strtod. One parses a string to a float and the other function parses it to a double.

At a glance, it seems redundant. Why not just parse your string to a double value and cast it back to a float, if needed?

Of course, that would be slightly more expensive. But, importantly, it is also gives incorrect results in the sense that it is not equivalent to parsing directly to a float. In other words, these functions are not equivalent:

float parse1(const char * c) {
    char * end;
    return strtod(c, &end);
}

float parse2(const char * c) {
    char * end;
    return strtof(c, &end);
}

It is intuitive that if I first parse the number as a float and then cast it back to a double, I will have lost information in the process. Indeed, if I start with the string “3.14159265358979323846264338327950”, parsed as a float (32-bit), I get 3.1415927410125732421875. If I parse it as a double (32-bit), I get the more accurate result 3.141592653589793115997963468544185161590576171875. The difference is not so small, about 9e-08.

In the other direction, first parsing to a double and then casting back to a float, I can also lose information, although only a little bit due to the double rounding effect. To illustrate, suppose that I have the number 1.48 and that I round it in one go to the nearest integer: I get 1. If I round it first to a single decimal (1.5) and then to the nearest integer, I might get 2 using the usually rounding conventions (either round up, or round to even). Rounding twice is lossy and not equivalent to a single rounding operation. Importantly, you lose a bit of precision in the sense that you may not get back the closest value.

With floating-point numbers, I get this effect with the string “0.004221370676532388” (for example). You probably cannot tell unless you are a machine, but parsing directly to a float is 2e-7 % more accurate.

In most applications, such a small loss of accuracy is not relevant. However, if you ever find yourself having to compare results with another program, you may get inconsistent results. It can make debugging more difficult.

Further reading: Floating-Point Determinism (part of a series)

Science and Technology links (Novembre 28th 2021)

  1. Government-funded research is getting more political and less diverse:

    The frequency of documents containing highly politicized terms has been increasing consistently over the last three decades. The most politicized field is Education & Human Resources. The least are Mathematical & Physical Sciences and Computer & Information Science & Engineering, although even they are significantly more politicized than any field was in 1990. At the same time, abstracts have been becoming more similar to each other over time. Taken together, the results imply that there has been a politicization of scientific funding in the US in recent years and a decrease in the diversity of ideas supported.

  2. Parabiosis is the process of tying the blood vessels of animals. Zhang et al. proceeded with parabiosis between young and old mice, followed by a detachment period. The old mice lived longer than control mice and they appear to have been rejuvenated by the parabiosis.
  3. A single injection to enable paralyzed mice to walk again.
  4. The Artic has been warming for much longer than we thought.
  5. Dog fed only once daily are healthier.
  6. India fertility rate is now below replacement rate.

Are tenured professors more likely to speak freely?

University professors often have robust job security after a time: they receive tenure. It means that they usually do not have to worry about applying for a new job after a few years.

Tenure is not available in all countries. Countries like Australia reassess positions every few years.

So why does it exist where it does?

One of the justifications for tenure is that professors who have tenure can speak more freely. Thus, in theory, they can be critical of government or corporate policies.

Do they? What would “speaking freely” entails?

What about denouncing a colleague who commits blatant fraud? On this front, the evidence is not great. Diederik Stapel published well over 100 research papers in prestigious journals. He was fired when it was determined that he was making up all of his research data. It took outsiders (students) to report him. Harvard professor Marc Hauser published over 200 papers in the best journals, making up data as he went. It took naive students to report the fraud. We find too many examples of over fraud in science, and rarely do we find that the close colleagues, the ones who should first spot the problems, report them. Brian Wansink was another famous professor who published countless papers based on fraudulent practices. It took media pressure as well as an investigation lead by a non-academic book publisher to take him down. I could go on. Professors rarely denounce other professors.

What about teaching controversial courses or engaging in disliked research? Ceci et al. found little evidence:

The findings from the present survey suggest that tenure itself does not result in faculty members routinely teaching courses that their senior colleagues disfavor, nor in their conducting research that their senior colleagues dislike.

Whenever professors tell me that they feel free to hold controversial ideas thanks to their tenure… I ask about the positions that they took recently that would endanger their job security if not for tenure. I might also ask about the positions that they might take that would endanger their chances of getting a research grant?

I should make it clear that being an advocate for transgenders’ rights or climate change is not controversial in 2021. I am looking for examples where a lone professor goes against the majority and would otherwise loose their job.

If not themselves, then I might ask about other professors that did so. And we will find some of them. In Canada, we have Jordan Peterson at the University of Toronto, for example. Yet I do not consider Peterson particularly controversial. In fact, at the core, Peterson is merely a politically conservative professor. We used to  have a healthy share of politically conservative professors. It is only in the academy that it is highly troublesome to be politically conservative. A truly controversial professor was Denis Rancourt who thought important to question the foundation of the academy. Sadly Rancourt was fired (tenure did not protect him). I might throw in with Rancourt people like Bret Weinstein, Kathleen Stock and so forth.

So while tenure might protect professors  if they want to hold controversial ideas… professors are trained to seek prestige: they avoid serious controversy when they can. Holding controversial views puts one’s reputation in danger. The overwhelming majority will never publicly hold controversial views. They are happy to go along with whatever is the policy of the day, whatever is fashionable.

It does not follow the tenure is useless. It appears that tenured professor are often more productive. Indeed, it is possible that if you do not have to worry about where your next job will be, you can concentrate more on your teaching and your research.

More content: Gad Saad and Pat Kambhampati are controversial tenured professors in Montreal. They are not the average professor.

Converting integers to fix-digit representations quickly

It is tricky to convert integers into strings because the number of characters can vary according to the amplitude of the integer. The integer ‘1’ requires a single character whereas the integer ‘100’ requires three characters. So a solution might possible need a hard-to-predict branch.

Let us simplify the problem.

Imagine that you want to serialize integers to fixed-digit strings. Thus you may want to convert 16-digit integers (up to 10000000000000000) to exactly 16 digits, including leading zeros if needed. In this manner, it is easy to write code that contains only trivial branches.

The simplest approach could be a character-by-character routine where I use the fact that the character ‘0’ in ASCII is just 0x30 (in hexadecimal):

void to_string_backlinear(uint64_t x, char *out) {
    for(int z = 0; z < 16; z++) {
        out[15-z] = (x % 10) + 0x30;
        x /= 10;
    }
}

It is somewhat strange to write the characters backward, starting from the less significant digit. You can try to go forward, but it is a bit trickier. Here is one ugly approach that is probably not efficient:

void to_string_linear(uint64_t x, char *out) {
  out[0] = x / 1000000000000000 + 0x30;
  x %= 1000000000000000;
  out[1] = x / 100000000000000 + 0x30;
  x %= 100000000000000;
  out[2] = x / 10000000000000 + 0x30;
  x %= 10000000000000;
  out[3] = x / 1000000000000 + 0x30;
  x %= 1000000000000;
  out[4] = x / 100000000000 + 0x30;
  x %= 100000000000;
  out[5] = x / 10000000000 + 0x30;
  x %= 10000000000;
  out[6] = x / 1000000000 + 0x30;
  x %= 1000000000;
  out[7] = x / 100000000 + 0x30;
  x %= 100000000;
  out[8] = x / 10000000 + 0x30;
  x %= 10000000;
  out[9] = x / 1000000 + 0x30;
  x %= 1000000;
  out[10] = x / 100000 + 0x30;
  x %= 100000;
  out[11] = x / 10000 + 0x30;
  x %= 10000;
  out[12] = x / 1000 + 0x30;
  x %= 1000;
  out[13] = x / 100 + 0x30;
  x %= 100;
  out[14] = x / 10 + 0x30;
  x %= 10;
  out[15] = x + 0x30;
}

Instead we could try to do it in a tree-like manner to reduce data dependency during the computation and hope for more core-level parallelism. We first divide the integer 100000000 to compute the first and last 8 digits separately, and so forth. It should drastically decrease data dependencies:

void to_string_tree(uint64_t x, char *out) {
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;      
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;
  uint64_t toptoptop = toptop / 100;
  uint64_t toptopbottom = toptop % 100;
  uint64_t topbottomtop = topbottom / 100;
  uint64_t topbottombottom = topbottom % 100;
  uint64_t bottomtoptop = bottomtop / 100;
  uint64_t bottomtopbottom = bottomtop % 100;
  uint64_t bottombottomtop = bottombottom / 100;
  uint64_t bottombottombottom = bottombottom % 100;
  //
  out[0] = toptoptop / 10 + 0x30;
  out[1] = toptoptop % 10 + 0x30;
  out[2] = toptopbottom / 10 + 0x30;
  out[3] = toptopbottom % 10 + 0x30;
  out[4] = topbottomtop / 10 + 0x30;
  out[5] = topbottomtop % 10 + 0x30;
  out[6] = topbottombottom / 10 + 0x30;
  out[7] = topbottombottom % 10 + 0x30;
  out[8] = bottomtoptop / 10 + 0x30;
  out[9] = bottomtoptop % 10 + 0x30;
  out[10] = bottomtopbottom / 10 + 0x30;
  out[11] = bottomtopbottom % 10 + 0x30;
  out[12] = bottombottomtop / 10 + 0x30;
  out[13] = bottombottomtop % 10 + 0x30;
  out[14] = bottombottombottom / 10 + 0x30;
  out[15] = bottombottombottom % 10 + 0x30;
}

We could also try to accelerate the computation with table lookups. We want to keep the tables small. We can effectively process the tail end of the tree-based technique by looking up small integers smaller than 100 by looking up their conversion: the integer 12 becomes the 2-character string ’12’ and so forth (my code could be nicer):

void to_string_tree_table(uint64_t x, char *out) {
  static const char table[200] = {
      0x30, 0x30, 0x30, 0x31, 0x30, 0x32, 0x30, 0x33, 0x30, 0x34, 0x30, 0x35,
      0x30, 0x36, 0x30, 0x37, 0x30, 0x38, 0x30, 0x39, 0x31, 0x30, 0x31, 0x31,
      0x31, 0x32, 0x31, 0x33, 0x31, 0x34, 0x31, 0x35, 0x31, 0x36, 0x31, 0x37,
      0x31, 0x38, 0x31, 0x39, 0x32, 0x30, 0x32, 0x31, 0x32, 0x32, 0x32, 0x33,
      0x32, 0x34, 0x32, 0x35, 0x32, 0x36, 0x32, 0x37, 0x32, 0x38, 0x32, 0x39,
      0x33, 0x30, 0x33, 0x31, 0x33, 0x32, 0x33, 0x33, 0x33, 0x34, 0x33, 0x35,
      0x33, 0x36, 0x33, 0x37, 0x33, 0x38, 0x33, 0x39, 0x34, 0x30, 0x34, 0x31,
      0x34, 0x32, 0x34, 0x33, 0x34, 0x34, 0x34, 0x35, 0x34, 0x36, 0x34, 0x37,
      0x34, 0x38, 0x34, 0x39, 0x35, 0x30, 0x35, 0x31, 0x35, 0x32, 0x35, 0x33,
      0x35, 0x34, 0x35, 0x35, 0x35, 0x36, 0x35, 0x37, 0x35, 0x38, 0x35, 0x39,
      0x36, 0x30, 0x36, 0x31, 0x36, 0x32, 0x36, 0x33, 0x36, 0x34, 0x36, 0x35,
      0x36, 0x36, 0x36, 0x37, 0x36, 0x38, 0x36, 0x39, 0x37, 0x30, 0x37, 0x31,
      0x37, 0x32, 0x37, 0x33, 0x37, 0x34, 0x37, 0x35, 0x37, 0x36, 0x37, 0x37,
      0x37, 0x38, 0x37, 0x39, 0x38, 0x30, 0x38, 0x31, 0x38, 0x32, 0x38, 0x33,
      0x38, 0x34, 0x38, 0x35, 0x38, 0x36, 0x38, 0x37, 0x38, 0x38, 0x38, 0x39,
      0x39, 0x30, 0x39, 0x31, 0x39, 0x32, 0x39, 0x33, 0x39, 0x34, 0x39, 0x35,
      0x39, 0x36, 0x39, 0x37, 0x39, 0x38, 0x39, 0x39,
  };
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;
  uint64_t toptoptop = toptop / 100;
  uint64_t toptopbottom = toptop % 100;
  uint64_t topbottomtop = topbottom / 100;
  uint64_t topbottombottom = topbottom % 100;
  uint64_t bottomtoptop = bottomtop / 100;
  uint64_t bottomtopbottom = bottomtop % 100;
  uint64_t bottombottomtop = bottombottom / 100;
  uint64_t bottombottombottom = bottombottom % 100;
  //
  memcpy(out, &table[2 * toptoptop], 2);
  memcpy(out + 2, &table[2 * toptopbottom], 2);
  memcpy(out + 4, &table[2 * topbottomtop], 2);
  memcpy(out + 6, &table[2 * topbottombottom], 2);
  memcpy(out + 8, &table[2 * bottomtoptop], 2);
  memcpy(out + 10, &table[2 * bottomtopbottom], 2);
  memcpy(out + 12, &table[2 * bottombottomtop], 2);
  memcpy(out + 14, &table[2 * bottombottombottom], 2);
}

You can extend this trick if you are willing to include a 40kB table in your code:

void to_string_tree_bigtable(uint64_t x, char *out) {
  #include "bigtable.h"

  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  //
  uint64_t toptop = top / 10000;
  uint64_t topbottom = top % 10000;
  uint64_t bottomtop = bottom / 10000;
  uint64_t bottombottom = bottom % 10000;

  memcpy(out, &bigtable[4 * toptop], 4);
  memcpy(out + 4, &bigtable[4 * topbottom], 4);
  memcpy(out + 8, &bigtable[4 * bottomtop], 4);
  memcpy(out + 12, &bigtable[4 * bottombottom], 4);
}

An intermediate solution with a 3-character table would only require a 3kB table. I also consider Muła’s SIMD-based approach though I refer you to his article for details. Effectively Muła use fancy Intel-specific instructions to get the job done.

If you cannot use SIMD instructions, you can use something similar called SWAR. Effectively, you pack several integer values inside a wide integer (64 bits) and you try to somehow save instructions. Luckily, Khuong has a solution for us:

// credit: Paul Khuong
uint64_t encode_ten_thousands(uint64_t hi, uint64_t lo) {
  uint64_t merged = hi | (lo << 32);
  uint64_t top = ((merged * 10486ULL) >> 20) & ((0x7FULL << 32) | 0x7FULL);
  uint64_t bot = merged - 100ULL * top;
  uint64_t hundreds;
  uint64_t tens;
  hundreds = (bot << 16) + top;
  tens = (hundreds * 103ULL) >> 10;
  tens &= (0xFULL << 48) | (0xFULL << 32) | (0xFULL << 16) | 0xFULL;
  tens += (hundreds - 10ULL * tens) << 8;

  return tens;
}

void to_string_khuong(uint64_t x, char *out) {
  uint64_t top = x / 100000000;
  uint64_t bottom = x % 100000000;
  uint64_t first =
      0x3030303030303030 + encode_ten_thousands(top / 10000, top % 10000);
  memcpy(out, &first, sizeof(first));
  uint64_t second =
      0x3030303030303030 + encode_ten_thousands(bottom / 10000, bottom % 10000);
  memcpy(out + 8, &second, sizeof(second));
}

I refer you to Khuong’s blog post for a description.

I wrote a small benchmark in C++ which measures the time per integer. Remember that every call to my functions produces 16 digits, exactly.

function Apple M1, LLVM 12 AMD Zen 2, GCC 10
linear 14 ns 25 ns
backward linear 7.7 ns 18 ns
tree 6.9 ns 15 ns
Khuong 3.3 ns 8.0 ns
small table 3.7 ns 7.1 ns
SIMD non-available 4.8 ns
big table 1.5 ns 2.9 ns

On both processors, the crazy big-table (40kB) approach is about 2 times faster than the version with a small table. Though a big-table can be justified in some instances, my feeling is that only in niche applications would such a big table be acceptable for such a narrow task. Even a smaller 3kB seems like an overkill given the good results we get with a small table.

The SIMD approach has a rather minimal gain compared to the version with a small table (merely 25%).

At a glance, the small table wins on practical ground. It is small, simple and portable.

 

Science and Technology links (Novembre 13rd 2021)

  1. Pacific rougheye rockfish can live hundreds of years while other rockfish barely live past ten years.
  2. Female condors can reproduce without males. The phenomenon is known as parthenogenesis and it occurs in birds such as chickens. It does not happen in mammals naturally as far as we know.
  3. Chimpanzees can thrive in virtual reality.
  4. People have a good opinion of their own environmental impact and they are biased against the behaviour of others. In other words, we are environmental hypocrites.
  5. A majority of minimum wage workers are age 15 to 24 in Canada.
  6. Though we tend to view American businesses as uniquely innovative, Nordic countries in Europe may be just as innovative, albeit with a small population. This could be explained in part by the fact that Nordic countries have advantageous tax regimes for businesses.
  7. Smartphone vendors like to scan our content for illegal pictures. From databases of pictures, they seek to detect pictures that you own that might be visually similar. Unfortunately, it appears that the current breed of algorithms are easily fooled.
  8. Eating several eggs a day might improve your cardiovascular health.
  9. Some mice can regenerate their kidneys without scars.
  10. About half of all patents have to do with software. They are filed from a few major tech centres, unlike other patents which have more varied origins.

Checking simple equations or inequalities with z3

When programming, you sometimes need to make sure that a given formula is correct. Of course, you can rely on your mastery of high-school mathematics, but human beings, in general, are terrible at formal mathematics.

Thankfully, you can outsource simple problems to a software library. If you are a Python user, you can install z3 relatively with pip: pip install z3-solver. You are then good to go!

Suppose that you want to be sure that ( 1 + y ) / 2 < y for all 32-bit integers y. You can check it with the following Python script:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
if(s.check() == z3.sat):
    model = s.model()
    print(model)

We construct a “bit vector” spanning 32 bits to represent our integer. The z3 library considers such a number by default as a signed integer going from -2147483648 to 2147483647.

We check the reverse inequality, because we are looking for a counterexample. When no counterexample can be found, we know that our inequality holds.

Running the above script, Python prints back the integer 2863038463. How can this be given that we said that z3 assumed that we were using numbers from -2147483648 to 2147483647?  The z3 library always prints out a positive integer, and it is our job to reinterpret it: the number 2147483647 is fine, the number 2147483648 becomes -2147483648, the number 2147483649 becomes -2147483647 and so forth. This representation which “wraps around” is called two’s complement. So 2863038463 is actually interpreted as a negative value. The exact value is not too important: what matters is that our inequality (( 1 + y ) / 2 < y) is false when y is negative. It is clear enough: setting the variable to -1, I get 0 < -1.  For zero, the inequality is also false. Let me add as a condition that the variable is positive:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This time, I get 1 as a possible solution. Damn it! Let us exclude it as well:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

Given this last script nothing is printed out which proves that my inequality is valid, as long as the variable is greater than 1.

Given that we have showed that ( 1 + y ) / 2 < y, then maybe ( 1 + y )  < 2 * y is true as well?

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) >= 2 * y )
s.add( y > 1 )

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This script returns 1412098654. Twice that value is 2824197308 which is interpreted as a negative number using two’s complement. To avoid the overflow, we can put another condition on the variable:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )
s.add( y <  2147483647/2)

if(s.check() == z3.sat):
    model = s.model()
    print(model)

This time the result holds!

As you can see, it is quite a lot of work and it unlikely that anyone would be able to fully test out a non-trivial program in this manner.

Stop spending so much time being trolled by billionaire corporations!

As a kid, my parents would open the television set, and we would get to watch whatever the state television decided we would watch. It was a push model. Some experts pick the content you need and they deliver it to you. You have little say in the matter. There was one newspaper in my town. We had two or three TV channels. Traditional schools also operate on a push model: the teacher decides whatever you get to learn.

So I got to watch hours and hours of incredibly boring TV shows because there was nothing good on. I was very interested in computers and science, but there was almost nothing relevant in the major news sources. When personal computers became popular, I quickly learned more about them than any journalist had.

In the early days of the Internet, people wrote on posting boards. Some started blogs. To this day, I get much of my online news by an RSS aggregator which collects information from various blogs and news sites. An RSS aggregator simply picks up all of the news items from various sites, and it lays it out sequentially. You do not like a news source? You unsubscribe. Mailing lists work similarly: you get emails whenever someone has new content.

This model has been described as “pull” oriented. You pick your sources. You sidestep the experts. For someone like myself, it was incredibly liberating. As the pull model grew, many people feared that old-school journalism would die. It also challenged conventional education.

Within this emerging framework, Silicon Valley invented Twitter, Facebook and other networks. At first they worked much like posting boards and blogs. You launched Twitter and you got the tweets of the people you followed. You did not like someone’s tweets? You just unfollowed them. Google even co-opted the RSS reader by creating a fantastic tool called Google Reader, which put you in control.

However, mostly, the industry moved in a different direction. They took control of what you see and read. Brilliant engineers are hard at work making sure that you remain glued to your screen. So they find content that you may like and push it to you. Google closed Google Reader, decimating the RSS reader community. Whereas you could count on the Google search engine delivering the documents containing the keywords you are search, you are increasingly facing a curated list of links.

We are back at a push model which is not unlike how things were when I was a kid. The likes of Twitter, Facebook and Google feel like they get to decide what I see.

Richard Startin describes my feeling in a couple of tweets:

As the user, you are no longer in control. The corporation is wrestling back full control of what you get to watch and read.

TikTok is one such tool where you just open the application and watch whatever they want you to watch. You become some kind of automaton.

Of course, deciding what people watch and read is valuable. It is a great way to make money. But it also becomes a politically valuable power. And so, it now seems unavoidable that multiple countries in the world will regulate these sites to make sure that you watch the right “things”.

Maybe Richard Startin wants to read about what programmers have to say, but what if some corporation or some government feels that he needs to be made aware of some important bit of information, what then ?

Thankfully some tools are still leaving you in control:

    1. Blogs are still out there. I had 50,000 visitors last month. You can still use RSS readers. You can subscribe to blogs like mine by email.
    2. I have recently discovered the fantastic substack community.
    3. Telegram is pretty decent as a secured news aggregator. My blog has a telegram channel. Nobody needs to know what you are reading.
    4. Twitter has a hidden feature (twitter list) which lets you subscribe to specific individuals and only see content from these individuals.
    5. DuckDuckGo is a fantastic search engine which mostly gives me what I am looking for instead of what it thinks I should find.
    6. Do not underestimate books. Contrary to what you may have heard, you can still order paper books. The great thing about a paper book is that nobody needs to know what you are reading and when. If you like books and programming, you can grab Performance Analysis and Tuning on Modern CPUs for example. I have many recommendations on the sidebar of my blog.
    7. There are fantastic podcasts out there. Spotify has some great stuff, but you can find many others on other platforms. If you like programming, you might want to check corecursive. Joe Rogan is also fantastic. There are many others.

Being in control takes work. It also requires you to sometimes pay real money. But do you really want to have your attention is sold and manipulated?

I am not advocating that anyone leaves Twitter, Facebook or tiktok, but we should all diversify our information sources. Be conscious that Twitter, Facebook, Google and others are in the business of manipulating your attention for their interest and the interests of their clients.

Be smarter!

Related: The Social Dilemma.

Science and Technology (October 31st 2021)

  1. Though exoskeletons are exciting and they allow some of us to carry one with physical activities despite handicaps, they appear to require quite a bit of brain power. In effect, though they may help you move, they require a lot of mental effort which can be distracting.
  2. It is difficult to make people smarter by changing their environment. Socioeconomic background effects on children’s cognitive development and student achievement are likely to be spurious according to new research.
  3. There is a U-shaped association between cardiovascular health and physical activity. A moderate amount of physical activity is really good for you, but you can do too much.
  4. In richer countries where women are treated as equals to men, there are greater differences between boys’ and girls’ aspirations. For example, boys are more likely to be attracted to science compared to girls in a rich country.
  5. Scientists spend ever more time in school for relatively few desirable positions. These scientists are then ever more likely to pursue post-doctoral positions. These longer studies are a significant net loss of their lifetime income.

In C, how do you know if the dynamic allocation succeeded?

In the C programming language, we allocate memory dynamically (on the heap) using the malloc function. You pass malloc a size parameter corresponding to the number of bytes you need. The function returns either a pointer to the allocated memory or the NULL pointer if the memory could not be allocated.

Or so you may think. Let us write a program that allocates 1 terabytes of memory and then tries to write to this newly allocated memory:

#include <stdio.h>
#include <stdlib.h>

int main() {
  size_t large = 1099511627776;
  char *buffer = (char *)malloc(large);
  if (buffer == NULL) {
    printf("error!\n");
    return EXIT_FAILURE;
  }
  printf("Memory allocated\n");
  for (size_t i = 0; i < large; i += 4096) {
    buffer[i] = 0;
  }
  free(buffer);
  return EXIT_SUCCESS;
}

After running and compiling this program, you would expect to either get the message “error!”, in which case the program terminates immediately… or else you might expect to see “Memory allocated” (if 1 terabyte of memory is available) in which case the program should terminate successfully.

Under both macOS/clang and Linux/GCC, I find that the program prints “Memory allocated” and then crashes.

What is happening?

The malloc call does allocate memory, but it will almost surely allocate “virtual memory”. At the margin, it could be that no physical memory is allocated at all. The system merely sets aside the address space for the memory allocation. It is when you try to use the memory that the physical allocation happens.

It may then fail.

It means that checking the memory allocation was a success is probably much less useful than you might have imagined when calling malloc.

It also leads to confusion when people ask how much memory a program uses. It is wrong to add up the calls to malloc because you get the virtual memory usage. If you use a system-level tool that reports the memory usage of your processes, you should look at the real memory usage. Here is the current memory usage of a couple of processes on my laptop currently:

process memory (virtual) memory (real)
qemu 3.94 GB 32 MB
safari 3.7 GB 180 MB

Dynamic memory is allocated in pages (e.g., 4 kB) and it is often much smaller than the virtual memory. Doing “malloc(x)” is not the same as taking x bytes of physical memory.

In general, it is therefore a difficult question to know whether the allocation succeeded when relying on malloc. You may only find out when you write and read to the newly allocated memory.

In C++, is empty() faster than comparing the size with zero?

Most C++ programmers rely on “STL” for their data structures. The most popular data structure is probably vector, which is just a dynamic array. The set and the map are other useful ones.

The STL data structures are a minimalist design. You have relatively few methods. All of them allow you to compute the size of the data structure, that is, how many elements it contains, via the size() method. In recent C++ (C++11), the size() method must have constant-time complexity for all containers. To put it in clearer terms, the people implementing the containers can never scan the content to find out the number of elements.

These containers also have another method called empty() which simply returns true of the container is… well… empty. Obviously, an equivalent strategy would be to compare the size with zero:  mystruct.size() == 0.

Obviously, determining whether a data structure is empty is conceptually easier than determining its size. Thus, at least in theory, calling empty() could be faster.

Inspecting the assembly output, I find that recent versions of GCC produce nearly identical code for the comparison of the size and the empty call. The exception being the list data structure where the assembly is slightly different, but not in a manner that should affect performance.

Of course, there are different implementations of C++ and it is possible that other implementations could provide more efficient code when calling empty(). An interesting question is whether effort is needed from the compiler.

Travis Downs wrote a list data structure by hand, but with a size() function that is linear time. He then implemented the empty function naively:

struct node {
    struct node *next;
    int payload;
};

int count_nodes(const node* p) {
    int size = 0;
    while (p) {
        p = p->next;
        size++;
    }
    return size;
}

bool is_not_empty(const node* p) {
    return count_nodes(p) > 0;
}

Amazingly, we find that the GCC compilers is able to compile Travis’ is_not_empty C++ function to constant-time code. The compiler inlines the count_nodes function into is_empty. Then the compiler figures out that as soon as you enter the loop once with count_nodes, then size is going to be greater than zero, so there is no need to keep looping.

However, the optimisation is not robust. Suppose that I wish instead to return an unsigned type instead of Travis’ int value. The problem with an unsigned type is that I might overflow if the list is very long. With a signed integer, the compiler is allowed to assume that overflows do not happen. It could be difficult for the compiler to tell whether count_nodes() return 0 or not, if the compiler must handle overflows. To handle this potential issue, I can forcefully bound the return value of count_nodes() to be no more than 1000. If I change the code to return a standard size_t type, like so…

#include <cstddef>

struct node {
    struct node *next;
    int payload;
};

size_t count_nodes(const node* p) {
    size_t size = 0;
    while (p) {
        p = p->next;
        size++;
        if(size == 1000) { 
            return 1000; 
        }
    }
    return size;
}

bool is_not_empty(const node* p) {
    return count_nodes(p) > 0;
}

Sadly, GCC is now unable to optimize away the call. Maybe compilers are not yet all powerful beings?

The lesson is that it is probably wise to get in the habit of calling directly empty() if you care about performance. Though it may not help much with modern STL data structures, in other code it could be different.

Of course, another argument is that the call to empty()  is shorter and cleaner.

Credit: This blog post was motivated by a tweet by Richard Startin.

Science and Technology links (October 23rd 2021)

    1. Apple announced new processors for its computers. Here is a table with the transistor count of some recent Apple processors:
      processor release year transistors
      Apple A7 2013 1 billions
      Apple A8 2014 2 billions
      Apple A9 2015 2 billions
      Apple A10 2016 3.2 billions
      Apple A11 2017 4.3 billions
      Apple A12 2018 6.9 billions
      Apple A13 2019 8.5 billions
      Apple A14 2020 11.8 billions
      Apple M1 2020 16 billions
      Apple M1 Max 2021 57 billions

      We find that the number of transistors on commodity processors doubles every two years, approximatively. Furthermore, the energy usage of chips is trending downward: the Apple M1 Max reportedly uses less than 100 Watts, which is comparable with processors of the last two decades despite a much greater performance.
      The storage capacity and bandwidth is also trending upward: the Apple M1 Max has 64GB of RAM with a bandwidth of 400GB/s.

      However, our processors are increasingly parallel in nature: the M1 Max has ten generic processing cores and 32 graphical cores. You are unlikely to be able to make use of the 64 GB of RAM and of all of its bandwidth with a single core.

    2. It seems that older people might be more social, more likely to give to charity. It seems to confirm earlier work.
    3. A popular supplement (nicotinamide riboside) seems to partially rejuvenate the immune system of old mice. (I do not take nicotinamide riboside.)
    4. Winter is coming. Maybe you are planning to hide from the cold? It appears that cold exposure reduces neuroinflammation. Since too much inflammation is commonly a problem when you are not otherwise sick, these results suggest that going outside in the cold might do some good.
    5. Weaker students in the US have poorer and poorer scores in reading and mathematics. The best students are still doing well. The net result is that the gap between the weak students and the best students is getting larger. (These results are unrelated to the recent pandemic.)

Converting binary floating-point numbers to integers

You are given a floating-point number, e.g. a double type in Java or C++. You would like to convert it to an integer type… but only if the conversion is exact. In other words, you want to convert the floating-point number to an integer and check if the result is exact.

In C++, you could implement such a function using few lines of code:

bool to_int64_simple(double x, int64_t *out) {
  int64_t tmp = int64_t(x);
  *out = tmp;
  return tmp == x;
}

The code is short in C++, and it will compile to few lines of assembly. In x64, you might get the following:

        cvttsd2si rax, xmm0
        pxor    xmm1, xmm1
        mov     edx, 0
        cvtsi2sd        xmm1, rax
        mov     QWORD PTR [rdi], rax
        ucomisd xmm1, xmm0
        setnp   al
        cmovne  eax, edx

Technically, the code might rely on undefined behaviour.

You could do it in a different manner. Instead of working with high-level instructions, you could copy your binary floating-point number to a 64-bit word and use your knowledge of the IEEE binary64 standard to extract the mantissa and the exponent.

It is much more code. It also involves pesky branches. I came up with the following routine:

typedef union {
  struct {
    uint64_t fraction : 52;
    uint32_t exp_bias : 11;
    uint32_t sign : 1;
  } fields;
  double value;
} binary64_float;

bool to_int64(double x, int64_t *out) {
  binary64_float number = {.value = x};
  const int shift = number.fields.exp_bias - 1023 - 52;
  if (shift > 0) {
    return false;
  }
  if (shift <= -64) {
    if (x == 0) {
      *out = 0;
      return true;
    }
    return false;
  }
  uint64_t integer = number.fields.fraction | 0x10000000000000;
  if (integer << (64 + shift)) {
    return false;
  }
  integer >>= -shift;
  *out = (number.fields.sign) ? -integer : integer;
  return true;
}

How does it compare with the simple and naive routine?

I just wrote a simple benchmark where I iterate over many floating-point numbers in sequence, and I try to do the conversion. I effectively measure the throughput and report the number of nanosecond per function call. My benchmark does not stress the branch predictor. Unsurprisingly, the naive and simple approach can be faster.

system long version simple version
ARM M1 + LLVM 12 0.95 ns/call 1.1 ns/call
Intel 7700K + LLVM 13 1.6 ns/call 1.2 ns/call

I find reassuring that the simple approach is so fast. I was concerned at first that it would be burdensome.

It is possible that my long-form routine could be improved. However, it seems unlikely that it could ever be much more efficient than the simple version.

My source code is available.

Science and Technology links (October 16th 2021)

  1. The thymus is an important component of our immune system. As we age, the thymus degenerates and our immune system becomes less fit: emotional and physical distress, malnutrition, and opportunistic bacterial and viral infections damage the thymus. New research suggests that practical thymus regeneration could be closer than ever. The thymus is able to repair itself and the right mix of signals could convince it to stay fit over time.
  2. It appears that women use sexual vocalization during penetration to increase their sexual attractiveness to their current partner by means of boosting their partner’s self-esteem.
  3. People associate creativity with effortless insight and undervalue persistence. If you just sit down and work hard, you can generate many new ideas. If you just wait for ideas to come, you will generate far fewer ideas.
  4. 63% of faculty perceive that their university does not care about the content of their publications, only where and how much is published. Such a culture can easily lead to people producing work that passes peer review but fails to contribute meaningfully to science or engineering.
  5. Redheaded women are more sexually active than other women because men are more likely to approach them with sexual intentions.

Calling a dynamically compiled function from Go

Compiled programming languages are typically much faster than interpreted programming language. Indeed, the compilation step produces “machine code” that is ideally suited for the processor. However, most programming languages today do not allow you to change the code you compiled. It means that if you find out which function you need after the code has been compiled, you have a problem.

It happens all of the time. For example, you might have a database engine that has to do some expensive processing. The expensive work might be based on a query that is only provided long after the code has been compiled and started. It means that your performance could be limited because the code that runs the query was written without sufficient knowledge of the query.

Let us illustrate the issue and a possible solution in Go (the programming language).

Suppose that your program needs to take the power of some integer. You might write a generic function with a loop, such as the following:

func Power(x int, n int) int {
	product := 1
	for i := 0; i < n; i++ {
		product *= x
	}
	return product
}

However, if the power (n) is a constant, such a function is inefficient. If the constant is known at compile time, you can write an optimized function:

func Square(x int) int {
	return x * x
}

But what if the power is only given to you at runtime?

In Go, we are going to write out the following file (on disk):

package main

func FastFunction(x int) int {
	return x * x
}

The file can be created, at runtime, from with a runtime variable ‘n’ indicating the power:

	file, _ := os.Create("fast.go")
	file.WriteString("package main\nfunc FastFunction(x int) int {\n  return x")
	for i := 1; i < n; i++ {
		file.WriteString(" * x")
	}
	file.WriteString("\n}\n")
        file.Close()

Importantly, the variable ‘n’ can be any integer, including an integer provided by the user, at runtime.
We then compile the new file and load the new function as a “plugin”:

	exec.Command("go", "build", "-buildmode=plugin", "-o", "fast.so", "fast.go").Output() 
        plug, _ := plugin.Open("fast.so")
	fastSquare, _ := plug.Lookup("FastFunction")
	loaded, _ := fastSquare.(func(int) int)

And that is all! The ‘loaded’ variable contains a fast and newly compiled function that I can use in the rest of my code.

Of course, writing the file on disk, compiling the result and loading the compiled function is not free. On my laptop, it takes about half a second. Yet once the function is loaded, it is appears to be as fast as the native function (see the Square function above).

I expect that Go cannot “inline” the resulting function inside the rest of your code. But that should not be a concern if your function is non-trivial.

Another downside of the approach I described is that it is fragile. It can fail for many reasons. It is a security risk. Thus you should only do it if it is absolutely necessary. If you are motivated by performance, it should offer a massive boost (e.g., 2x the performance).

Furthermore, once a plugin is loaded, it cannot be unloaded. If you were to use this method again and again, you might eventually run out of memory.

My source code is available.

Science and Technology links (October 10th 2021)

  1. Evans and Chu suggest, using data and a theoretical model, that as the number of scientists grow, progress may stagnate. Simply put, in a large field, with many researchers, a few papers and a few people are able to acquire a decisive advantage over newcomers. Large fields allow more inequality. One can review their model critically. Would you not expect large fields to fragment into smaller fields? And haven’t we seen much progress in fields that have exploded in size?
  2. Keeping your iron stores low might be important to slow your aging. Sadly, much of what you eat has been supplemented with iron because a small fraction of the population needs iron supplementation. It is widely belief that you cannot have too much iron supplementation, but, to my knowledge, long-term effects of iron supplementation have not been carefully assessed.
  3. If you lose weight while having high levels of insulin, you are more likely to lose lean tissues (e.g., muscle) than fat.
  4. A drug similar to viagra helps mice fight obesity.
  5. Age-related hair loss might be due to stem cells escaping the hair follicle bulge. This new work contradicts the commonly held belief that the stem cells die over time. This work may not relate to male-pattern baldness.
  6. People tend to stress the ability of “formal peer review” to set apart the good work from the less significant work. People greatly overestimate the value of formal peer review. They forget that much of the greatest works of science occured before formal peer review had even been considered. Cortes and Lawrence have assessed the quality of peer review at a (or “the”) major conference these days (NeurIPS). They found several years ago that when two teams of referees independently assessed the same work, they only agreed on about 50% of the assessment. They have extended their work with a new finding:

    Further, with seven years passing since the experiment we find that for accepted papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count.

    The lesson is pretty clear. Formal peer review can identify and reject the “obviously bad work”. But it is not a difficult task. In fact, I am quite certain that a fully automated system could quickly identify work that nobody should ever read. However, if you have work that has been visibly well executed, following the state-of-the-art, citing the required prior work, using the right data for the problem, and so forth… then it becomes really difficult to tell whether it is great work, or simply work that meets minimal standards. It is obvious if you know how formal peer review works. You have between two and five researchers, with various levels of familiarity with the domain of the paper, who read it over. They do not redo the experiments. They do not redo the demonstrations. Sometimes you get lucky and find a referee that has deep knowledge of the domain (often because they have worked on a similar problem) and they can be really critical. Even so, in the context of a conference peer review process, there is only so much a highly knowledgeable referee can do, since they cannot freely interact with the authors.

For software performance, can you always trust inlining?

It is easier for an optimizing compiler to spot and eliminate redundant operations if it can operate over a large block of code. Nevertheless, it is still recommended to stick with small functions. Small functions are easier to read and debug. Furthermore, you can often rely on the compiler smartly inline your small functions inside larger functions. Furthermore, if you have a just-in-time compiler (e.g., in C# and Java), the compiler may often focus its energy on functions that are called more often. Thus small functions are more likely to get optimized.

Nevertheless, you sometimes want to manually inline your code. Even the smartest compilers get it wrong. Furthermore, some optimizing compilers are simply less aggressive than others.

We have been working on a fast float-parsing library in C# called csFastFloat. Though it is already several times faster than the standard library, the primary author (Verret) wanted to boost the performance further by using SIMD instructions. SIMD instructions are fancy instructions that allow you to process multiple words at once, unlike regular instructions. The C# language allows you to use SIMD instructions to speed up your code.

He encountered a case where using SIMD instructions failed to bring the exciting new performance he was expecting. The easy way out in the end was to “manually inline” the functions. I am going to review a simplified instance because I believe it is worth discussing.

I am not going to review it in detail, but the following C# function allows you to quickly determine if 16 bytes are made of ASCII digits. In ASCII, the character ‘0’ has value 48 whereas the character ‘9’ has value 57. The function builds two “vectors” made of the values 47 and 58 (repeated 16 times) and then it does 16 comparisons at once (with CompareGreaterThan and then CompareLessThan). It concludes with a couple of instructions to check whether all conditions are met to have 16 digits.

unsafe static bool is_made_of_sixteen_digits(byte* chars) {
  Vector128<sbyte> ascii0 = Vector128.Create((sbyte)47);
  Vector128<sbyte> after_ascii9 = Vector128.Create((sbyte)58);
  Vector128<sbyte> raw = Sse41.LoadDquVector128((sbyte*)chars);
  var a = Sse2.CompareGreaterThan(raw, ascii0);
  var b = Sse2.CompareLessThan(raw, after_ascii9);
  var c = Sse2.Subtract(a, b); // this is not optimal   
  return (Sse41.TestZ(c,c));
}

Let us write a function that tries to determine if I have a string that begins with 16 digits, or 32 digits. It is a made up function that I just use for illustration purposes.

// return 2 if 32 digits are found
// return 1 if 16 digits are found
// otherwise return 1
unsafe static int ParseNumberString(byte* p, byte* pend) {
  if ((p + 16 <= pend) && is_made_of_sixteen_digits(p)) {
    if((p + 32 <= pend) && is_made_of_sixteen_digits(p + 16)) {
      return 2;
    }
    return 1;
  }
  return 0;
}

If you just write that out, C# might fail to inline the is_made_of_sixteen_digits function. Thankfully, you can tell C# that you do want it to inline the child function by labelling it with an “AggressiveInlining” attribute.

So far so good, right?

Let us look at the assembly output to see how it gets compiled. Do not worry, I do not expect you to read this gibberish. Just look at the lines that start with an arrow (“->”).

<Program>$.<<Main>$>g__ParseNumberString|0_1(Byte*, Byte*)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: vzeroupper
    L0006: lea eax, [ecx+0x10]
    L0009: cmp eax, edx
    L000b: ja short L0071
    -> L000d: vmovupd xmm0, [<Program>$.<<Main>$>g__ParseNumberString|0_1(Byte*, Byte*)]
    -> L0015: vmovupd xmm1, [<Program>$.<<Main>$>g__ParseNumberString|0_1(Byte*, Byte*)]
    L001d: vlddqu xmm2, [ecx]
    L0021: vpcmpgtb xmm0, xmm2, xmm0
    L0025: vpcmpgtb xmm1, xmm1, xmm2
    L0029: vpsubb xmm0, xmm0, xmm1
    L002d: vptest xmm0, xmm0
    L0032: jne short L0071
    L0034: lea eax, [ecx+0x20]
    L0037: cmp eax, edx
    L0039: ja short L006a
    -> L003b: vmovupd xmm0, [<Program>$.<<Main>$>g__ParseNumberString|0_1(Byte*, Byte*)]
    -> L0043: vmovupd xmm1, [<Program>$.<<Main>$>g__ParseNumberString|0_1(Byte*, Byte*)]
    L004b: vlddqu xmm2, [ecx+0x10]
    L0050: vpcmpgtb xmm0, xmm2, xmm0
    L0054: vpcmpgtb xmm1, xmm1, xmm2
    L0058: vpsubb xmm0, xmm0, xmm1
    L005c: vptest xmm0, xmm0
    L0061: jne short L006a
    L0063: mov eax, 2
    L0068: pop ebp
    L0069: ret
    L006a: mov eax, 1
    L006f: pop ebp
    L0070: ret
    L0071: xor eax, eax
    L0073: pop ebp
    L0074: ret

What C# does is to create the “vectors” made of the values 47 and 58 twice each (for a total of four times).

There might be a clever way to get C# to stop being inefficient, but you can also just manually inline. That is you create one function that includes the other function. The result might look at follows:

// return 2 if 32 digits are found
// return 1 if 16 digits are found
// otherwise return 1
unsafe static int ParseNumberStringInline(byte* p, byte* pend) {
  if (p + 16 <= pend) {
    Vector128<sbyte> ascii0 = Vector128.Create((sbyte)47);
    Vector128<sbyte> after_ascii9 = Vector128.Create((sbyte)58);    
    Vector128<sbyte> raw = Sse41.LoadDquVector128((sbyte*)p);
    var a = Sse2.CompareGreaterThan(raw, ascii0);
    var b = Sse2.CompareLessThan(raw, after_ascii9);
    var c = Sse2.Subtract(a, b);
    if((p + 32 <= pend) && Sse41.TestZ(c,c)){
      raw = Sse41.LoadDquVector128((sbyte*)p + 16);
      a = Sse2.CompareGreaterThan(raw, ascii0);
      b = Sse2.CompareLessThan(raw, after_ascii9);
      c = Sse2.Subtract(a, b);
      if(Sse41.TestZ(c,c)) { return 2; }
    }
    return 1;
  }
  return 0;
}

This new code is harder to read and maybe harder to maintain. However, let us look at the compiled output:

<Program>$.<<Main>$>g__ParseNumberStringInline|0_2(Byte*, Byte*)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: vzeroupper
    L0006: lea eax, [ecx+0x10]
    L0009: cmp eax, edx
    L000b: ja short L0061
    L000d: vmovupd xmm0, [<Program>$.<<Main>$>g__ParseNumberStringInline|0_2(Byte*, Byte*)]
    L0015: vmovupd xmm1, [<Program>$.<<Main>$>g__ParseNumberStringInline|0_2(Byte*, Byte*)]
    L001d: vlddqu xmm2, [ecx]
    L0021: vpcmpgtb xmm3, xmm2, xmm0
    L0025: vpcmpgtb xmm2, xmm1, xmm2
    L0029: vpsubb xmm4, xmm3, xmm2
    L002d: lea eax, [ecx+0x20]
    L0030: cmp eax, edx
    L0032: ja short L005a
    L0034: vptest xmm4, xmm4
    L0039: jne short L005a
    L003b: vlddqu xmm2, [ecx+0x10]
    L0040: vpcmpgtb xmm3, xmm2, xmm0
    L0044: vpcmpgtb xmm2, xmm1, xmm2
    L0048: vpsubb xmm4, xmm3, xmm2
    L004c: vptest xmm4, xmm4
    L0051: jne short L005a
    L0053: mov eax, 2
    L0058: pop ebp
    L0059: ret
    L005a: mov eax, 1
    L005f: pop ebp
    L0060: ret
    L0061: xor eax, eax
    L0063: pop ebp
    L0064: ret

I do not expect you to read this gibberish, but notice how the result is now tighter. It is also going to be slightly faster.

As a rule of thumb, if you look at the assembly output of your code, and it is shorter, it is usually the code that you will have better performance. It is simply the case that executing fewer instructions is often faster.

Though my example is a toy example, you should expect the csFastFloat library to benefit from SIMD instructions in the near future. The preliminary numbers I have seen suggest a nice performance bump. There is a pull request.

My source code is available.