On the state of virtual-reality gaming

For nearly two years, I have been trying a wide range of video games in a virtual reality setting. Our lab. in Montreal has some permanent space dedicated to the HTC Vive, so I was also able to test out games with a wide range of people. I must have tried several dozen different games so far.

Gaming in virtual-reality is a disappointment. I am surprised that Sony sold millions of virtual-reality headsets. To my knowledge, there are no big studio betting on virtual reality. It is mostly owned by independent developers making small bets.

To be clear, I am not disappointed at virtual reality per se. However, it seems clear that two years ago, I greatly underestimated how much work we collectively need to do to get “virtual reality right”.

What works? A few games are quite good. I have two favorite games.

One of them is Superhot VR. In Superhot, you are an assassin moving from one minimalist sandbox to another, killing people best you can (with a knife, your fist, a bottle, a gun, …). It would be quite bland if not for the trick that time flows only as fast as you move. As long as you remain immobile, time remains still. The game is a “port” to virtual reality of a conventional game, but virtual reality makes it shine.

My other favorite game is Beat Saber. As the name suggests, you use (light) sabers to cut coloured boxes coming at you (not unlike a Star Wars Jedi) at the rhythm of some music. It is probably my favorite virtual-reality game so far.

Both of them are so good that they provide an unforgettable experience. However, they are both modest games.

What might we say about virtual-reality gaming?

  1. Both of these games are highly immersive. Once you are in the game, you feel as if you were teleported elsewhere and you forget where your body is. Yet they are not, in any way, realistic. That is, you are teleported in an artificial world that looks nothing like our everyday world.

    A few years ago, many people assumed that photorealism was required for immersion. That is entirely false.

  2. As a related, but distinct point, neither of these games was particularly expensive to build, or technically challenging. I could probably write cheap clones of these games in a few months, and I am not a video-game programmer. That is, of course, a consequence of the fact that there are seemingly no major investments.
  3. These games require “six degrees of freedom” and handheld commands. That is, they work because you can really move in the environment (forward, backward) while looking in all directions, and using your hands freely.

    However, they only require you to move within a small space. This last step is important since your actual body is still limited to a relatively small space.

    Many games allow you to travel vast distances through various tricks such as teleports, or by moving from within a vehicle. For example, you can point to a far location and click a button to appear there. Even though teleports “work” technically, they are disappointing. I almost invariably get frustrated at such games.

    Other games offer only restricted degrees of freedom. Some games only require you to look around, without having to move. I find these games disappointing as well.

  4. My impression is that simply carrying over existing video games is almost always going to be a futile exercise.

What might come around the corner?

  1. Multiplayer virtual-reality gaming might be great. There are games like Rec Room that offer decent experiences already, with a lot of frustration thrown in. However, we will need better hardware with features like eye tracking. It is almost here.
  2. I still haven’t seen any “long-form” game. That is, playing for hours in a deep and involved game is not possible, right now. What is worse: I cannot even imagine what such a game might look like.

Science and Technology links (September 15th, 2018)

  1. I was told repeatedly throughout my life that the normal body temperature was 37.5°C. This estimate is over a hundred years old and flawed. It is off by one degree: a better “normal” is 36.5°C.
  2. According to Malhotra et al., heart disease is a chronic inflammatory condition (meaning that it is related to a dysfunction of your immune system), not something caused by saturated fat clogging the arteries.
  3. Apple has released a watch with government-approved ECG (heart monitoring) capabilities.
  4. Could Alzheimer’s be an infectious disease?
  5. Drinking beer does not lead to weight gains in obese people.
  6. Boys tend to be both the lowest and the highest performers in terms of their reasoning abilities.
  7. Technological progress does not require better understanding, but is maybe more likely the result of the accumulation of many small improvments. That is, technological progress is more about evolution than about science and knowledge gathering.
  8. Higher personal and corporate income taxes negatively affect the quantity, quality, and location of inventive activity.
  9. The latest iPhone processor (the A12) has 6.9 billion transistors.
  10. Many researchers publish at least one paper every five days. They are described as being hyperprolific. Several of them have published hundreds of articles in the same two journals. Some of them work under a system where the more you publish, the more you get paid.

    Einstein published about 300 papers in his (relatively long) life. These people publish as much as Einstein did every five years.

    To be fair, if Einstein were alive today and had access to the Internet and to computers, he might publish 300 papers a year.

Science and Technology links (September 8th, 2018)

  1. Most research articles are not available for free to the public, even when the research was fully funded by the public. To legally access research articles, one typically needs to go through a college library which pays for access (often with public dollars). Major European agencies have thus decided that by 2020, research that they fund should be immediately accessible to the public, after it is published.

    It sounds very strict, doesn’t it?

    Here is the dirty little secret behind these mandates: they are not enforced. Funding agencies do not check that the work is actually made available. Past compliance with the mandate is simply not a criterion when applying for a new research grant. I have never heard of anyone losing a research grant for failing to abide by an open-access mandate.

    It does not mean, of course, that there is no impact from these mandates. But they need to be viewed more as encouragements than as requirements.

  2. Lysosomes are the components of our cells that a responsible for recycling the trash. In older cells, we believe that they do not work as well. Of particular importance is the health of our stem cells, as our bodies rely on stem cells to repair our tissues. Thankfully, enhancing lysosomal function is sufficient to restore healthy stem cell activity in the aged brain.
  3. To assess the value of medical therapies, we use clinical trials. It is important for researchers to report fully on the results of the trials, to not leave important data points out. Yet it seems that selective reporting is prevalent. In at least 30% of the clinical trials, researchers failed to report what they promised to report before the clinical trial started.
  4. Men with reduced testosterone levels (a common occurence for older men) would benefit from testosterone therapy. Yet this is uncommon. An article in Nature explains:

    (…) recent study has shown a decline in testosterone prescriptions since media reports of potential increased cardiovascular risk in 2014. The phenomenon of medical hysteria accounts for this reduced prescribing, as numerous subsequent studies provide substantial evidence of reduced cardiovascular risk and other important benefits with testosterone therapy for men with testosterone deficiency.

  5. We tend to think of evolution as a process that is limited to our genes. Yet if you have trouble losing weight, it might have to do with how active your mother and grand-mothers were while pregnant… Pregnant mice without access to exercise wheels produce offspring that have themselves have larger, fatter offspring:

    Without having to struggle for energy and nutrients, the fat cells in the fetus increase in both size and number, increasing the birth weight of the infant – a factor strongly related to adult obesity and type II diabetes. This is passed on down the line, with future generations becoming fatter and increasingly inactive and unhealthy.

    (This is interesting but speculative.)

  6. Exercise-induced birth of new neurons in the brain (neurogenesis) might improve memory, and we could mimick this effect with drugs. It works in mice, according to an article in Science.
  7. The United States has the lowest life expectancy of developed countries, with reduction in life expectancy the past 2 years and by far the highest medical cost per person (source: Eric Topol).
  8. Though heart disease remains the leading cause of death in the United States, cancer has surpassed it in several states.
  9. Consuming a hypercaloric high protein diet does not result in an increase in body fat when accompanied with resistance training (weight lifting).
  10. Adjusting for other factors, low cholesterol is associated with increased criminal violence.

AVX-512: when and how to use these new instructions

Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit registers. They have the potential of speeding up some applications because they can “crunch” more data per instruction.

However, some of these instructions use a lot of power and generate a lot of heat. To keep power usage within bounds, Intel reduces the frequency of the cores dynamically. This frequency reduction (throttling) happens in any case when the processor uses too much power or becomes too hot. However, there are also deterministic frequency reductions based specifically on which instructions you use and on how many cores are active (downclocking). Indeed, when any 512-bit instruction is used, there is a moderate reduction in speed, and if a core uses the heaviest of these instructions in a sustained way, the core may run much slower. Furthermore, the slowdown is usually worse when more cores use these new instructions. In the worst case, you might be running at half the advertised frequency and thus your whole application could run slower. On this basis, some engineers have recommended that we disable AVX-512 instructions default on our servers.

So what do we know about the matter?

  1. The term “AVX-512” can describe instructions operating on various register lengths (128-bit, 256-bit and 512-bit). When discussing AVX-512 downclocking, we mean to refer only to the instructions acting on 512-bit registers. Thus you can “safely” benefit from many new AVX-512 instructions and features such as mask registers and new memory addressing modes without ever worrying about AVX-512 downclocking, as long as you operate on shorter 128-bit or 256-bit registers. You should never get any downclocking when working on 128-bit registers.
  2. Downclocking, when it happens, is per core and for a short time after you have used particular instructions (e.g., ~2ms).
  3. There are heavy and light instructions. Heavy instructions are those involving floating point operations or integer multiplications (since these execute on the floating point unit). Light instructions include integer operations other than multiplication, logical operations, data shuffling (such as vpermw and vpermd) and so forth. Heavy instructions are common in deep learning, numerical analysis, high performance computing, and some cryptography (i.e., multiplication-based hashing). Light instructions tend to dominate in text processing, fast compression routines, vectorized implementations of library routines such as memcpy in C or System.arrayCopy in Java, and so forth.
  4. Intel cores can run in one of three modes: license 0 (L0) is the fastest (and is associated with the turbo frequencies “written on the box”), license 1 (L1) is slower and license 2 (L2) is the slowest. To get into license 2, you need sustained use of heavy 512-bit instructions, where “sustained” means approximately one such instruction every cycle. Otherwise, any other 512-bit instructions will move the core to L1.

    The downclocking is determined on a per-core basis based on the license and the total number of active cores, on the same CPU socket, irrespective of the license of the other cores. That is, to determine the frequency of core under downclocking, you need only to know its license (determined by the type of instructions it runs) and count the number of cores where code is running. Thus you cannot downclock other cores on the same socket, other than the sibling logical core when hyperthreading is used, merely by running heavy and sustained AVX-512 instructions on one core. If you can isolate your heavy numerical work on a few cores (or just one), then the downclocking is limited to these cores. On linux, you can control which cores are running your processes using tools such as taskset or numactl.

    You will find tables online like this one, for the Intel Xeon Gold 5120 processor…

    mode 1 active core 9 active cores
    Normal 3.2 GHz 2.7 GHz
    AVX2 3.1 GHz 2.3 GHz
    AVX-512 2.9 GHz 1.6 GHz

    We have chosen to only include two columns. The frequency behavior is the same for 9 to 12 cores and is the worst case for the L2 license. When you have more than 9 active cores, there is no further downclocking documented for the L2 license.

    These tables are somewhat misleading. The row “AVX-512” (the L2 license) really means “sustained use of heavy AVX-512 instructions”. The row “AVX2” (L1 license) includes all other use of AVX-512 instructions and heavy AVX2 instructions. That is, it is wrong to assume that the use of any AVX-512 instruction puts the cores into the frequency indicated by the AVX-512 row.

    These tables do give us some useful information, however:

    • a. These tables indicate that frequency downclocking is not specific to AVX-512. If you have many active cores, you will get downclocking in any case, even if you are not using any AVX-512 instructions.
    • b. If you just use light AVX-512, even if it is across all cores, then the downclocking is modest (15%).
    • c. If you are doing sustained heavy numerical work while many cores are active, then the downclocking becomes significant on these cores (~40%).

Should you be using AVX-512 512-bit instructions? The goal is never to maximize the CPU frequency; if that were the case people would use 14-core Xeon Gold processor with a single active core. These AVX-512 instructions do useful work. They are powerful: having registers 8 times larger can allow you to do much more work and far reduce the total number of instructions being issued. We typically want to maximize the amount of work done per unit of time. So we need to make engineering decisions. It is not the case that a downclocking of 10% means that you are going 10% slower, evidently.

Here are some pointers:

  1. Engineers should probably use tools to monitor the frequency of their cores to ensure they are running in the expected license. Massive downclocking is then easily identified. For example, the perf stat command on Linux can be used to determine the average frequency of any process, and finer grained details are available using the CORE_POWER.LVL0_TURBO_LICENSE event (and the identical events for LVL1 and LVL2).
  2. On machines with few cores (e.g., standard PC), you may never get the kind of massive downclocking that we can see on a huge chip like the Xeon Gold processor. For example, on on Intel Xeon W-2104 processor, the worse downclocking for a single core is 2.4 GHz compared to 3.2GHz. A 25% reduction is frequency is maybe not an important risk.
  3. If your code at least partly involves sustained use of heavy numerical instructions you might consider isolating this work to specific threads (and hence cores), to limit the downclocking to cores that are taking full advantage of AVX-512. If this is not practical or possible, then you should mix this code with other (non-AVX-512 code) with care. You need to ensure that the benefits of AVX-512 are substantial (e.g., more than 2x faster on a per cycle basis). If you have AVX-512 code with heavy instructions that runs 30% faster than non-AVX-512 on a per-cycle basis, it seems possible that once it is made to run on all cores, you will not be doing well.

    For example, the openssl project used heavy AVX-512 instructions to bring down the cost of a particular hashing algorithm (poly1305) from 0.51 cycles per byte (when using 256-bit AVX instructions) to 0.35 cycles per byte, a 30% gain on a per-cycle basis. They have since disabled this optimization.

  4. The bar for light AVX-512 is lower. Even if the work is spread on all cores, you may only get a 15% frequency on some chips like a Xeon Gold. So you only have to check that AVX-512 gives you a greater than 15% gain for your overall application on a per-cycle basis.
  5. Library providers should probably leave it up to the library user to determine whether AVX-512 is worth it. For example, one may provide compile-time options to enable or disable AVX-512 features, or even offer a runtime choice. Performance sensitive libraries should document the approach they have taken along with the likely speedups from the wider instructions.
  6. A significant problem is compiler inserted AVX-512 instructions. Even if you are not using any explicit AVX-512 instructions or intrinsics, compilers may decide to use them as a result of loop vectorization, within library functions and other optimization. Even something as simple as copying a structure may cause AVX-512 instructions to appear in your program. Current compiler behavior here varies greatly, and we can expect it to change in the future. In fact, it has already changed: Intel made more aggressive use of AVX-512 instructions in earlier versions of the icc compiler, but has since removed most use unless the user asks for it with a special command line option.

    Based on some not very comprehensive tests of LLVM’s clang (the default compiler on macOS), GNU gcc, Intel’s compiler (icc) and MSVC (part of Microsoft Visual Studio), only clang makes agressive use of 512-bit instructions for simple constructs today: it used such instructions while copying structures, inlining memcpy, and vectorizing loops. The Intel icc compiler and gcc only seem to generate AVX-512 instructions for this test with non-default arguments: -qopt-zmm-usage=high for icc, and -mprefer-vector-width=512 for gcc. In fact, for most code, such as the generated copies, gcc seems to prefer to use 128-bit registers over 256-bit ones. MSVC currently (up to version 2017) doesn’t support compiler generated use of AVX-512 at all, although it does support use of AVX-512 through the standard intrinsic functions.

    From the compiler’s perspective, deciding to use AVX-512 instructions is difficult: they often provide a reasonable local speedup, but at the possible cost of slowing down the entire core. If such instructions are frequent enough to keep the core running in the L1 license, but not frequent enough to produce enough of a speedup to counteract the slowdown, the program may run slower overall after you recompile to support AVX-512. It is hard to give general recommendations here beyond compiling your program both with and without AVX-512 and benchmarking in a realistic environment to determine which is faster. Because of the large variation in AVX-512 behavior across active core counts and Intel hardware, one should ensure they match these factors as closely as possible when testing performance.

Future work:

  1. It seems that there is a market for a tool that would monitor the workload of a server and identify when and why downclocking occurs.
  2. Operating systems or application frameworks could assign threads to specific cores according to the type of instructions they are using and the anticipated license.

Final words: AVX-512 instructions are powerful. With great power comes great responsibility. It seems unwarranted to disable AVX-512 by default at this time. Instead, the usual engineering evaluations should proceed.

Credit: This post was co-authored with Travis Downs.

Per-core frequency scaling and AVX-512: an experiment

Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds.

I wanted to test it out so I wrote a little threaded program. It runs on four threads which, probabilistically, might run each on its own core. It helps that my Xeon W-2104 does not have hyperthreading (as far as I know) so that only one (physical) thread should run per core at any time.

I use X of these cores to do heavy AVX-512 work while the test do normal floating-point operations. My variable X varies from 0 to 4. I measure the average system frequency.

number of heavy coresaverage measured frequency
03.178 GHz
13.073 GHz
22.911 GHz
32.751 GHz
42.491 GHz

I could figure out the per-core frequency, but I do not have good tool handy to do the work right now, and I don’t want to add more code to my short experiment.

Let us do some math instead.

My benchmark is not perfect. For example, the heavy threads might finish earlier than the regular threads. Still, let us assume that the heavy cores run at a frequency of 2.491 GHz while the other cores run at a frequency of 3.178 GHz, and let us compute the expected average frequency.

number of heavy coresexpected average frequency
03.178 GHz
13.006 GHz
22.835 GHz
32.663 GHz
42.491 GHz

My model is correct within 3%. So it is not an unreasonable mental model, on this machine, to assume that cores on which you run heavy AVX-512 run slower (at 2.491 GHz) while the rest of cores run at full speed (3.178 GHz).

My code is available, run it under Linux.

Further reading: AVX-512: when and how to use these new instructions

Science and Technology links (September 1st, 2018)

  1. Our PCs and servers run x64 processors, most of them made by Intel and AMD. In my home, all my x64 processors are made by Intel… except the processor of my PlayStation 4…. so I was surprised to read that AMD and Intel each hold a market share of 50% in terms of units sold. My estimation is that Intel is at least ten times larger as a company. Intel’s processors are typically more expensive (and faster).

    Update: It turns out that this data was for one retailer. In fact, AMD has under 5% of the server market. However, all signs point to the fact that AMD’s market share is growing fast.

  2. What causes the obesity epidemic? Archer et al. think the role of diet is overblown. Rather they believe that obesity is caused by reductions in physical activty below the Metabolic Tipping Point. Their argument is based on the fact that various human population have had various diets (including diets rich in sugar) without triggering widespread obesity. Of course, one would need to demonstrate that physical activity has declined recently. My impression is that many people, being overweight, try to exercise more… without necessarily losing weight.

    Update: Archer clarified his position by email:

    To be precise, the major determinant of the obesity and diabetes epidemics was the loss of matrilineal and maternal metabolic control due to low levels of physical activity (PA) during the pubertal, pre-conception, and prenatal periods. Yet PA only needed to be lower in previous, not current generations.

    The non-genetic evolutionary processes of maternal effects, phenotypic evolution and accommodation (i.e., a form of canalization) allow the recapitulation (inheritance) and/or evolution of obese and metabolically compromised phenotypes without the original environmental context (i.e., low physical activity). In other words, after a few generations of offspring being born less metabolically robust, each successive generation would need to eat less and move more than the previous generation to remain at the same level of adiposity.

  3. A disproportionate number of Thai Buddhist monks are overweight.
  4. Rapamycin is a common drug given to transplantees. It seems that Rapamycin is capable of rejuvenating overaries, and thus prolong fertility in females. It works in mice.
  5. Paul Krugman, a celebrated economist and Nobel-prize recipient, predicts the fall of Bitcoin:

    there might be a potential equilibrium in which Bitcoin (although probably not other cryptocurrencies) remain in use mainly for black market transactions and tax evasion, but that equilibrium, if it exists, would be hard to get to from here: once the dream of a blockchained future dies, the disappointment will probably collapse the whole thing.

    My wife might make peace with the fact that some nice people once granted me a bitcoin (circa 2012), that I quickly discarded without any thought.

  6. Taleb on innovation (in his book Antifragile):

    both governments and universities have done very, very little for innovation and discovery, precisely because, in addition to their blinding rationalism, they look for the complicated (…) rarely for the wheel on the suitcase. Simplicity does not lead to laurels. (…) Even the tax-funded National Institutes of Health found that out of forty-six drugs on the market with significant sales, about three had anything to do with federal funding.

  7. Of the social science studies published in Nature and Science, about a third cannot be reproduced. More critically, the effect being reported is half as large as it should be in the other studies.
  8. Daily aspirin may not help reduce cardiovascular risks. The study is strong with many participants, but there are possible problems like low adherence and insufficient dosage. My understanding is that aspirin may only help if the dosage is just right (not too high, not too low).
  9. Daily aspirin reduced deaths due to several common cancers. Benefit increased with duration of treatment and was consistent across the different study populations.
  10. Statins are regularly prescribed to people at risk of heart attacks or strokes. It is a billion-dollar industry. Okuyama et al. report that statins stimulate atherosclerosis and heart failure.
  11. Few animals have menopause. Besides human beings, it seems that a few whales also have menopause. That is pretty much all.
  12. Common viruses might put you at risk for Alzheimer’s.
  13. The co-inventor of deep learning, Hinton, writes:

    The data efficiency of deep learning will be greatly augmented in the years ahead, and its potential applications in health care and other fields will increase rapidly.

  14. The CEO of a company backed by Amazon’s Jeff Bezos is quoted by CNBC as saying:

    The time has finally arrived that our knowledge of biology and our sophistication level is sufficient that we can attack some of these fundamental, underlying causes of aging

  15. Dairy products protect you from death.
  16. C++ is a popular programming language, especially among videogame developers. I like C++ well enough when the programmers try to avoid fancy features and unnecessary abstraction. However, I have increasingly felt uneasy about the language as it seems to attract people who believe that complexity is a feature. Scott Meyers, a reputed author of books on C++ writes:

    C++ is a large, intricate language with features that interact in complex and subtle ways, and I no longer trust myself to keep all the relevant facts in mind. As a result, (…) I no longer plan to update my books to incorporate technical corrections

    I am not sure we should take Scott litterally. I think he may very well be able to figure out what the C++ standard says, but he might have concluded that the interaction with people who enjoy the complexity a bit too much is too annoying.

    Here is Linus Torvalds on C++:

    (…) the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don’t screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don’t screw things up with any idiotic “object model” crap.

    If you read Linus carefully, his objection regarding C++ stems from the kind of people who are attracted to the language.

    As I repeatedly write: programming is social.

AVX-512 throttling: heavy instructions are maybe not so dangerous

Recent Intel processors have fancy instructions operating over 512-bit registers. They are reported to cause a frequency throttling of the core where they are run, and possibly of other cores in some cases. Thus, it has been recommended to avoid AVX-512 instructions. I have written a series of blog posts on the topic trying to reproduce the effect. Though I can measure some level of performance degradation, if I work hard, I simply cannot find the “obvious” performance degradations (50%) that are often advertised. I tested on two distinct processors. I tried single-threaded and multi-threaded code.

There is more to the story than appears at first.

Travis Downs wrote a fancy tool to investigate the issue. Let me reproduce some of his findings in my own words. According to Intel’s documentation, there are two types of AVX-512, light instructions (e.g., integer additions) and heavy instructions (e.g., multiplications). Heavy instructions reportedly cause a much greater frequency throttle. None of my tests showed that. Travis found that it is quite hard to trigger:

Even a stream of 1 FMAD [fused multiply–add] every 4 or even 2 cycles doesn’t set the frequency down lower. The lowest speed is only reached if FMAs [fused multiply–add] come at a rate of more than 1 every 2 cycles.

As far as I can tell, this is absent from Intel’s documentation. If Travis is right, and I have no reason to doubt him, this means that the reported massive frequency throttling (slowest license) that we find everywhere online (including on Intel’s site) requires substantial qualification. Few people will ever achieve the rate of sustained heavy instructions that Travis documents.

For example, if you use AVX-512 to for pattern matching (Intel Hyperscan), to code and decode base64, or to compress and uncompress integers, you are probably never going to trigger massive throttling. If you do a lot of cryptography, machine learning or number crunching, the story might be different.

It is important to take into account how much you gain in the first place by going to AVX-512. For example, openssl found that a particular cryptographic routine involving many multiplications ran 30% faster on a per-cycle basis with AVX-512. Once you factor in some throttling, it is easy to see how it could be wasteful. So maybe a sensible approach is to ensure that you make substantial gains when using AVX-512 if it involves many heavy instructions.

Update: The same holds true for AVX (256-bit) instructions. For AVX instructions to lead to any throttle at all, you have sustain expensive instructions repeatedly every 1 or 2 cycles.

Further reading: AVX-512: when and how to use these new instructions

Science and Technology links (August 24th, 2018)

  1. There is water on the surface of the Moon. This is important because if you want to build a long-term base on the Moon, having access to water is a great asset. Water can sustain life, but it can also be use to create fuel (e.g., hydrogen).
  2. Despite paying tens of thousands of dollars in tuition fees, despite being the very best students college have… a quarter of second-year medical students reported that they “almost never” attend class:

    Leaders in medical education have begun to scramble. Some medical schools, like Harvard, have done away with lectures for the most part.

    If you think that it is only medical students, it is not. A 2014 study found that Harvard students (some of the best and brightest) only have 60% attendance rate. That is, when you measure how often students attend classes in one of the best universities in the world, you find that 40% of the student skip any given class. That is despite the fact that some professors factor in class attendance in the grade.

  3. Some people associate crowded cities with poor living conditions. This is an incorrect intuition. Osaka (Japan) has a higher rating that Calgary in Canada. Calgary is a fantastic Canadian town, but it is not exactly crowded. Similarly, Toronto and Tokyo are tied as sone of the best cities in the world, despite having vastly different densities.
  4. John Ioannidis, a celebrated medical researcher, calls for us to stop using observational studies:

    Simply by observing what people eat and trying to link this to disease outcomes is moreover a waste of effort. These studies need to be largely abandoned. We’ve wasted enough resources and caused enough confusion, and now we need to refocus. Funds, resources and effort should be dispensed into fewer, better-designed, randomized trials.

    More formally, he writes:

    Some nutrition scientists and much of the public often consider epidemiologic associations of nutritional factors to represent causal effects that can inform public health policy and guidelines. However, the emerging picture of nutritional epidemiology is difficult to reconcile with good scientific principles.

    Archer et al. are equally critical:

    investigators engendered a fictional discourse on the health effects of dietary sugar, salt, fat and cholesterol when they failed to cite contrary evidence or address decades of research demonstrating the fatal measurement, analytic, and inferential flaws

    We are facing a serious problem: nutritional “science” is failing us. It is non-falsifiable, non-reproducible; it lacks rigour. It is “fake news”.

  5. With much fanfare, an article in the prestigious Lancet claims that “the level of consumption that minimises health loss is zero” using observational studies of alcohol use.

    Should you stop consuming alcohol if you do?

    First, we need to point out that these conclusions are based on observational studies. Some reputed scientists believe these studies are a waste of effort.

    Second, let us take the study at its words and assume that its conclusions are entirely correct. By their own reports, to experience one extra health problem, 25,000 people need to drink 10g of alcohol a day for a year. Or 1,600 people need to drink 20g of alcohol a day.

    Unless you drink far more than 20g of alcohol a day, you can be certain that alcohol is not going to kill you.

    David Spiegelhalter (a famous statistician) has a more in-depth analysis on his blog. He has other interesting blog posts including one where he demonstrates that drinking up to current guidelines linked to improved cognitive performance.

    Improved cognitive performance sounds good to me!

  6. Google’s Pixel 2 phone monitors music being played and automatically recognizes it. By itself, this is not impressive in 2018. What is more impressive is that this is done entirely using an onboard database residing on the phone.
  7. We can tell how “old” a cell is by how long its telomeres are. Telomere are the protective ends of chromosomes that grow shorter with each cell division, until the cell cannot divide anymore. Thankfully we can “rejuvenate” a cell using telomerase. Our cells normally do not use telomerase. Some cancers regularly use telomerase as well as stem cells. If your cells were to make more active use of telomerase, it can be argued that your cells would remain young. But some people fear that it might also make it more likely that you will die of cancer. In fact, some people believe that cell aging evolved to protect us from cancer. Yet there is not much evidence that it is the case, and rather more evidence that having short telomeres might increase your cancer rates. A recent study concludes once more than telomerase is safe:

    Given the potential cancer risk associated to telomerase expression in the organism, we set to analyze the effects of telomerase gene therapy in a lung cancer mouse model. Our work demonstrates that telomerase gene therapy does not aggravate the incidence, onset and progression of lung cancer in mice. These findings expand on the safety of AAV-mediated telomerase activation as a novel therapeutic strategy for the treatment of diseases associated to short telomeres.

    That is excellent news. We know how to administer telomerase. Telomerase extends the lifespan of mice.

  8. Clinical trials are supposed to be pre-registered. This means that ahead of time you must file in a report where you explain what you expect to find. Yet only about two-third of all clinical trials are pre-registered. Government and industrial trials are more likely the registered; academics are more likely to skip pre-registration. In effect, academics show less rigour.
  9. Rejuvenation of the brain via cell replacement to reverse age-related damage and functional decline appears to be as valid an approach as it is for most other organs and tissues.

Trying harder to make AVX-512 look bad: my quantified and reproducible results

Intel’s latest processors have fancy instructions part of the AVX-512 family. The AVX-512 instructions are useful for numerical work and sophisticated computing (e.g., cryptography, multimedia), but not necessarily useful for mundane tasks.

Intel documents that the use of AVX-512 instructions can lower the frequency of the processor. How big the effect is depends on the processor.

Arguably, some of the best Intel processors are the Xeon Gold processors. They are readily available at cloud vendors like Packet.

I have written a new benchmark that computes, in parallel, some complicated mathematical result. Selectively, I can insert a few AVX-512 instructions in the code. These instructions do not help in any way the computation, I only insert them as an attempt to slow down the processor, as much as possible. That is, I am trying to make Intel look bad. It is a follow-up on a similar single-threaded experience where I reported a barely noticeable effect due to AVX-512.

The more threads I use, the more work the program does. Each thread does the same work. So a 10-thread version does 10 times the work of a 1-thread version. The machine I am using has two 14-core processors. I am told that AVX-512 hurts these processors most when using 9 threads or more, per processor… so I go up to 20 threads. Intel distinguishes between simple AVX-512 instructions and heavy ones (such as multiplications).

I give all my raw results and scripts, but here are the rounded up numbers:

number of threadsno-AVX512simple AVX-512heavy AVX-512
1 thread0.945 s0.985 s0.985 s
10 threads10.8 s11 s11 s
20 threads21 s25 s25 s

So at the extreme, doing everything I can do make Intel AVX-512 look bad using a few AVX-512 instructions in a larger program, I get a 15-20% increase in the running time.

Is that bad? If you are continuously using sporadic AVX-512 instructions over many cores and they don’t accelerate your software by at least 20% on a fixed clock frequency budget, then you are losing out on this processor.

As it turns out, you can get more severely hit by heavy AVX-512, but you have to do it in a sustained way.

Further reading: AVX-512: when and how to use these new instructions

Avoid lexicographical comparisons when testing for string equality?

By default, programmers like to compare their bytes and strings using a lexicographical order. “Lexicographical” is a fancy word for “dictionary order”. That is, you compare the first two elements, check if they differ, if they do you report which string is largest, if not you repeat with the next two elements and so forth.

In C and C++, there is a super fast function for this purpose: memcmp. Derrick Stolee reported to me a performance regression in Git (a well-known tool among programmers). The problem has to do with memcmp.

Let us examine the problematic function in Git:

static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
{
	return memcmp(sha1, sha2, the_hash_algo->rawsz);
}

This returns a lexicographical comparison between two hash values, return a negative value when the first is smallest, zero if they are equal, and a positive value otherwise. As it is written, I do not know how to make this faster in general. It seems that we can often assume that the_hash_algo->rawsz will be 20 or 32, but that is not terribly useful.

However, let us look at an instance of how the function is used:

if (!hashcmp(sha1, pdata->objects[pos].idx.oid.hash)) {
	*found = 1;
	return i;
}

Do you see what is happening?

In this particular usage (and in others), we only check whether the two strings of bytes are identical. We do not need a lexicographical comparison.

It can be easier to decide whether two strings of bytes are identical than to compare them lexicographically. Lexicographical sort critically depends on the order of the bytes whereas byte comparisons is order oblivious. Even if you just have 8 bytes to compare lexicographically on an x64 processor, the compiler will need three instructions because it needs to reorder the bytes:

bswap   rcx
bswap   rdx
cmp     rcx, rdx

In contrast, a single instruction (cmp) is needed to determine whether the two 8-byte words are identical. It is even worse than that because x64 processors can compare a register against a memory value, thus potentially saving a load operation.

There used to be a standard C function for this purpose (bcmp) but it has been deprecated, and it is probably not highly optimized.

It is possible that your compiler is smart enough to figure out that checking that the returned value of memcmp is zero is equivalent to checking for equality. And your particular compiler might, indeed, be that smart. It is also possible the the overhead of the lexicographical order is irrelevant. But should you risk it?

So let me write something silly, assuming that we have exactly 20 bytes to compare:

bool memeq20(const char * s1, const char * s2) {
    uint64_t w1, w2;
    memcpy(&w1, s1, sizeof(w1));
    memcpy(&w2, s2, sizeof(w2));
    if(w1 != w2) return false;
    memcpy(&w1, s1 + sizeof(w1), sizeof(w1));
    memcpy(&w2, s2 + sizeof(w2), sizeof(w2));
    if(w1 != w2) return false;
    uint32_t ww1, ww2;
    memcpy(&ww1, s1 + 2 * sizeof(w1), sizeof(ww1));
    memcpy(&ww2, s2 + 2 * sizeof(w1), sizeof(ww2));
    return (ww1 == ww2);
}

That should be safe and portable. I am sure that good hackers can make it faster.

How fast is it already? Quite fast:

memcmp10.5 cycles
hand-rolled memcmp12.8 cycles
bcmp10.5 cycles
check equal only5.2 cycles

My version is twice as fast as memcmp. So while I probably couldn’t roll my own super fast memcmp function in 5 minutes, I certainly can beat memcmp with some basic code if I ask a different question instead: are the two strings of bytes identical?

I am using GCC 5.5. Your results will vary quite a bit depending on the compiler. In some settings, it will not be possible to beat memcmp at all, if the compiler is sufficiently smart. Also, there might be branching involved, so the results will depend on the statistics of your data.

Nate Lawson points out another reason to shy away from unnecessary lexicographical comparison: security. He writes:

The most important concern is if this will encourage unsafe designs. I can’t come up with a crypto design that requires ordering of secret data that isn’t also a terrible idea. Sorting your AES keys? Why? Don’t do that. (…) In any scenario that involves the need for ordering of secret data, much larger architectural issues need to be addressed than a comparison function.

My code is available.