Science and Technology links (December 15th, 2017)

  1. Scientists found a human gene which, when inserted into mice, makes their brain grow larger. David Brin has a series of classical sci-fi books where we “uplift” animals so that they become as smart as we are. Coincidence? I think not.
  2. Should we be more willing to accept new medical therapies? Are we too cautious? Some people think so:

    Sexual reproduction, had it been invented rather than evolved would never have passed any regulatory body’s standards (John Harris)

  3. Apple has released the iMac Pro. It is massively expensive (from 5k$ to well over 10k$). It comes with a mysterious co-processor named T2 which is rumored to be a powerful ARM processor derived from an iPhone processor. It can encrypt your data without performance penalty.
  4. Playing video games can make older people smarter? You might think so after reading the article Playing Super Mario 64 increases hippocampal grey matter in older adults.
  5. At a high level, technologists like to point out that technology improves at an exponential rate. A possible mechanism is that the more sophisticated you are, the faster you can improve technology. The exponential curve is a robust result: just look at per capita GPD or the total number of pictures taken per year.

    Many people like to point out that technology does not, strictly speaking, improve at an exponential rate. In practice, we experience plateaus when looking at any given technology.

    Rob Miles tweeted a powerful counterpoint:

    This concern about an ‘explosion’ is absurd. Yes, the process looks exponential, but it’s bounded – every real world exponential is really just the start of a sigmoid, it will have to plateaux. There’s only a finite amount of plutonium in the device (…) Explosives already exist, so nukes aren’t very concerning

    His point, in case you missed it, is that it is easy to rationally dismiss potentially massive disruptions as necessarily “small” in some sense.

  6. Gene therapies are a bit scary. Who wants to get his genetic code played with? Some researchers suggest that we could accomplish a lot simply by activating or turning off genes using a variation on the technology currently used to modified genes (e.g., CRISPR/Cas9).
  7. Why do Middle Eastern girls crush boys in school?

    A boy doesn’t need to study hard to have a good job. But a girl needs to work hard to get a respectable job.

  8. Google has this cool feature whereas it automatically catalogs celebrities and displays their biographical information upon request. If you type my name in Google, right now, my picture should come up. However, the biographical information is about someone else (I am younger than 62). To make matters worse, my name comes up along with a comedian (Salvail) who was recently part of a sexual scandal. Maybe it is a warning that you should not take everything Google says for the truth? But we know this, didn’t we?

    In case you want to dig deeper into the problem… “Daniel Lemire” is also the name of a somewhat famous Canadian comedian. I think we look nothing alike and we have had entirely distinct careers. It should be trivial for machine learning to distinguish us.

If all your attributes are independent two-by-two… are all your attributes independent?

Suppose that you are working on a business problem where you have multiple attributes… maybe you have a table with multiple columns such as “age, gender, income, location, occupation”.

You might be interested in determining whether there are relations between some of these attributes. Maybe the income depends on the gender or the age?

That is fairly easy to do. You can take the gender column and the income column and do some statistics. You can compute Pearson’s correlation or some other measure.

If you have N attributes, you have N (N-1) / 2 distinct pairs of attributes, so there can be many pairs to check, but it is not so bad.

However, what if you have established that there is no significant relationship between any of your attributes when you take them two-by-two. Are you done?

Again, you could check for all possible sets (e.g., {age, gender, income}, {income, location, occupation}). The set of all possible sets is called the power set. It contains 2N sets. So it grows exponentially with N, which means that for any large value of N, it is not practical to check all such sets.

But maybe you think that because you checked all pairs, you are done.

Maybe not.

Suppose that x and y are two attributes taking random integer values. So there is no sensible dependency between x and y. Then introduce z which is given by z = x + y. Clearly, x and y determine z. But there is no pairwise dependency between any of x, y, z.

To be precise, in Java, if you do the following

Random r = new Random();
for(int k = 0; k < N; k++) {
   x[k] = r.nextInt();
   y[k] = r.nextInt();
   z[k] = x[k] + y[k];
}

then there is no correlation between (y, z) or (x, z) even though x + y = z.

So if you look only at (x, y), (y, z) and (x, z), this tells less than you might think about (x, y, z).

Thus, checking relationships pairwise is only the beginning…

No, a supercomputer won’t make your code run faster

I sometimes consult with bright colleagues from other departments who do advanced statistical models or simulations. They are from economics, psychology, and so forth. Quite often, their code is slow. As in “it takes weeks to run”. That’s not good.

Given the glacial pace of academic research, you might think that such long delays are nothing to worry about. However, my colleagues are often rightfully concerned. If it takes weeks for you to get the results back, you can only iterate over your ideas a few times a year. This limits drastically how deeply you can investigate issues.

These poor folks are often sent my way. In an ideal world, they would have a budget so that their code can be redesigned for speed… but most research is not well funded. They are often stuck with whatever they put together.

Too often they hope that I have a powerful machine that can run their code much faster. I do have a few fast machines, but it is often not as helpful as they expect.

  • Powerful computers tend to be really good at parallelism. Maybe counter-intuitively, these same computers can run non-parallel code slower than your ordinary PC. So dumping your code on a supercomputer can even make things slower!
  • In theory, you would think that software could be “automatically” parallelized so that it can run fast on supercomputers. Sadly, I cannot think of many examples where the software automatically tries to run using all available silicon on your CPU. Programmers still need to tell the code to run in parallel (though, often, it is quite simple). Some software libraries are clever and do this work for you… but if you wrote your code without care for performance, it is likely you did not select these clever libraries.
  • If you just grabbed code off the Internet, and you do not fully understand what is going on… or you don’t know anything about software performance… it is quite possible that a little bit of engineering can make the code run 10, 100 or 1000 times faster. So messing with a supercomputer could be entirely optional. It probably is.

    More than a few times, by changing just a single dependency, or just a single function, I have been able to switch someone’s code from “too slow” to “really fast”.

How should you proceed?

  • I recommend making back-of-the-envelope computations. A processor can do billions of operations a second. How many operations are you doing, roughly? If you are doing a billion simple operations (like a billion multiplications) and it takes minutes, days or weeks, something is wrong and you can do much better.

    If you genuinely require millions of billions of operations, then you might need a supercomputer.

    Estimates are important. A student of mine once complained about running out of memory. I stupidly paid for much more RAM. Yet all I had to do to establish that the machine was not at fault was to compare the student code with a standard example found online. The example was much, much faster than the student’s code running on the same machine, and yet the example did much more work with not much more code. That was enough to establish the problem: I encouraged the student to look at the example code.

  • You often do not need fancy tools to make code run faster. Once you have determined that you could run your algorithm faster, you can often inspect the code and determine at a glance where most of the work is being done. Then you can search for alternatives libraries, or just think about different ways to do the work.

    In one project, my colleague’s code was generating many random integers, and this was a bottleneck since random number generation is slow in Python by default, so I just proposed a faster random number generation written in C. (See my blog post Ranged random-number generation is slow in Python… for details.) Most times, I do not need to work so hard, I just need to propose trying a different software library.

    If you do need help finding out the source of the problem, there are nifty tools like line-by-line profilers in Python. There are also profilers in R.

My main insight is that most people do not need supercomputers. Some estimates and common sense are often enough to get code running much faster.

Science and Technology links (December 8th, 2017)

  1. Facebook’s leading artificial-intelligence researcher Yan Lecun wrote:

    In the history of science and technology, the engineering artifacts have almost always preceded the theoretical understanding: the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science.

  2. In total, Sony has sold 2 million PSVR units, its virtual-reality headsets. I find the number impressive. How many people will get one for Christmas? Sadly the game line-up is unimpressive, but maybe others will see it otherwise.
  3. There are faculty positions open only to female applicants in Germany.
  4. A team from Google (Alphabet/DeepMind) has created a computer system (AlphaZero) that can learn games like Go and Chess in a few hours, based only on the rules, and then beat not only human players, but the very best software systems. In effect, they have made existing Chess and Go software obsolete. What is more, AlphaZero plays Chess in a remarkable new way:

    Imagine this: you tell a computer system how the pieces move — nothing more. Then you tell it to learn to play the game. And a day later — yes, just 24 hours — it has figured it out to the level that beats the strongest programs in the world convincingly!
    (…) Modern chess engines are focused on activity, and have special safeguards to avoid blocked positions as they have no understanding of them and often find themselves in a dead end before they realize it. AlphaZero has no such prejudices or issues, and seems to thrive on snuffing out the opponent’s play. It is singularly impressive, and what is astonishing is how it is able to also find tactics that the engines seem blind to. (…) The completely disjointed array of Black’s pieces is striking, and AlphaZero came up with the fantastic 21.Bg5!! After analyzing it and the consequences, there is no question this is the killer move here, (…) I gave it to Houdini 6.02 with 9 million positions per second. It analyzed it for one full hour and was unable to find 21.Bg5!!

    Nahr points out another remarkable fact about AlphaZero:

    What’s even more remarkable than AlphaGo Zero’s playing strength is the puny hardware it runs on: one single PC with four specialized TPUs, no distributed network needed

    AlphaZero has made the work done by many artificial-intelligence engineers obsolete. It is likely that many other games will follow in the near future. This is a remarkable breakthrough. Their paper can be found on arXiv.

  5. Qualcomm announced its latest mobile processor, which will find its way in many smartphones next year:

    The Adreno 630 GPU will be able to push 4K video at 60 FPS, but on top of that, display a split screen for VR of 2K x 2K (each eye) at 120 FPS. The icing on the cake comes from the ability to handle 10-bit color depth as well, for HDR10 UHD content. (…) Another addition to the chip is an increased presence of machine learning hardware (…) with the increased presence of VR and AR (slowly becoming XR for miXed Reality), being able to see the outside world and understand what’s going on, requires a major shift in processing capabilities, which is where machine learning comes in.

    You will be excused if you don’t understand everything about this processor, but if you are buying a high-end phone next year, that is what you are getting inside. You are getting a computer capable of doing advanced in-silicon artificial intelligence, a system supporting high-quality virtual reality and a machine able to display video at a quality that exceeds that of most televisions. These same processors will probably be dirt cheap in a few years. Anyone who thinks that virtual reality is a fad should pay close attention to what is being built in silicon right now.

  6. Our cells are powered by their mitochondria which are like “tiny cells” that live inside our cells. It is possible to improve our mitochondria with small molecules and direct supplementation. In mice, it seems to lead to a healthier brain. It also seems to heal human heart cells.
  7. One PlayStation 4 has more memory than all of the Atari 2600 consoles ever manufactured.
  8. Miltner asks: Who Benefits When Schools Push Coding? Are computers in the classroom more helpful to students or to corporations? She is right, of course, that computers don’t, by themselves, make people smarter… and I am very critical of attempts to turn random young people into “coders”.
  9. Apple published an interesting paper that describes how it can learn from your data while preserving your privacy, all of it economically.
  10. Microsoft is relaunching Windows for ARM processors. Most PCs run on Intel (or AMD) processors. Most mobile devices run on ARM processors. Microsoft wants you to run Windows on a laptop with ARM processors, thus cutting off its dependency on Intel. Though Intel still makes the very best PC processors money can buy, there is a sense that ARM processors will soon catch up and beat Intel, maybe. It is already the case that my iPad Pro with its ARM processor could run circles around many Intel-based laptops. One argument that is made in favor of ARM processors is their relative low power usage. ARM processors are now trying to enter the server market (Qualcomm recently proposed a 24-core processor) by putting together many low-power processors. It seems that we are experiencing a shift in hardware design that is beneficial to ARM processors: we no longer care so much about having one very fast processor… we prefer to have many moderately fast ones. I am not sure what is driving this apparent shift.
  11. A bitcoin is now worth $15,000. An app to buy and sell bitcoins is the most popular iPhone app. I once owned a bitcoin, it was given to me. For a long time, they were very cheap. I did not even keep a record of it, and it is now gone. My wife is very angry at me. I don’t understand why people are willing to pay so much for bitcoins, but then I never tried to be a good capitalist.
  12. Though gene therapy has had few successes in the last decades, we have had four remarkable therapeutic breakthroughs in a little over a month: hemophilia, spinal muscular atrophy, retinal dystrophy…
  13. Though we can measure human intelligence, we don’t know what makes us more or less intelligent at the biological level. Some German scientists think that they have found the answer: it has to do with information flow in the brain.
  14. Metformin is a diabetes drug that is believed to have anti-aging properties. It is somewhat toxic for mice, but by administrating it every other week to old mice, scientists got healthier old mice according to an article in Nature.
  15. Consuming lots of sugar makes you at risk for heart disease.
  16. A woman with a transplanted uterus gave birth.
  17. In the US, there are more Netflix subscriptions than cable TV subscriptions.
  18. Amazon released Cloud9, a programming environment that allows you to code and build your code with just a browser. They have also released a service called SageMaker to make it easier to build and deploy machine learning on Amazon’s infrastructure.
  19. Arveson challenges the hypocrisy of American colleges regarding graduate students and tuition fees. Indeed, most colleges waive tuition fees for graduate students. So why pretend to charge it in the first place? She writes:

    The most self-serving reason university administrators continue to charge tuition, though, is to use the fact that they waive payment of it as propaganda.

    South Korea is a major power in technology… yet South Korea needs to “change to its conformist culture and rigid education system, which stymie creativity” according to the Financial Times.

Simplistic programming is underrated

I was a nerdy kid who liked to spend a lot of time reading. Back then, we did not have the Internet, we could not even imagine it… so I ended up reading the dictionary, reading my encyclopedia. I had a weird vocabulary. I could easily string sentences that nobody around me could understand. Adults would reward me with special attention when I would use words that they did not know.

That was not good.

I also learned to program relatively young. Information was scarce but I quickly absorbed everything I could. When I learned about structured programming, I rejected anything “more primitive”. Then I learned about object-oriented programming, and it was the end of structured programming. Then I learned about functional programming and thought that I had once more reached another level. I learned about metaprogramming and became a fan. And so forth.

That was not good.

I still write using big words sometimes, but never intentionally. I try to write short and simple sentences.

It took me a long time to figure out that the same is true with programming. If you can write a program using nothing but the simplest syntax, it is a net win.

I should explain. It is absolutely true that if you deploy a larger vocabulary if you use longer, more pompous sentences, many people will think you are smarter. The same is true with programming. If you can cram metaprogramming, pure functional programming, some assembly and a neural network into one program, many programmers will be impressed by your skills.

However, there are important downsides:

  • Your apparent mental prowesses will fail to impress those who find it easy to do the same. I have met my share of college students and professors who excel at dropping the name of a few philosophers, at using words only 1% of the population knows about… but, at a certain level, it does nothing for them. People simply roll their eyes and move on… If you have a job interview with Jeff Bezos or Peter Thiel, quoting famous philosophers or using big words might very well backfire.

    Exactly the same argument works for programming. You might impress your peers with your fancy use of closures… but this no longer works so well on people who have known for a few decades what closure are. You simply won’t convince Linus Torvalds that you are hot because you use all of the features of the latest programming languages.

    If you are used to achieving success by appearing smart… you might hit a ceiling, and you won’t even understand what is happening. There is a difference between appearing to be smart, and being smart.

    Really smart people have no need to show off. If you are showing off, you are broadcasting that you aren’t really smart.

  • Big words and fancy programming techniques are asocial. They turn you into a jerk. Even the people who think that you are smart won’t enjoy working with you. We might be impressed by people who use big words, but we don’t want to hang out with them. It is annoying to collaborate with programmers who throw the big guns every little chance they get.
  • Complexity scales poorly. It is much easier to build on your previous work if it is simple. There is a reason we still teach Newton’s three laws. They are powerful because they can be expressed so simply. A simple piece of code that uses few features is easier to reuse.

I like the concept of “simplistic programming” by which I mean “programming that is so simple that people will criticize you for it”. At first, that sounds strange… can we really get criticized for being “too simple”? Of course, we do.

Science and Technology links (December 1st, 2017)

  1. Chollet has a piece on the impossibility of intelligence explosion. He is responding to the theory that smart machines will build even smarter machines, and that soon, human beings will become obsolete. He convincingly tears apart this theory. Human brains might be obsolete one day, but it is not as simple as setting up self-improving artificial intelligence in motion. He writes: we are our tools. He means that the model of intelligence as “brain in a jar” is hopelessly naive. I made much of the same points in a blog post published a week before Chollet’s essay: You are your tools. So to acquire more intelligence, we need to build more tools, and this takes time. Einstein would not have done so well on a deserted island. I do part company with Chollet when he writes:

    Yet, modern scientific progress is measurably linear. I wrote about this phenomenon at length in a 2012 essay titled “The Singularity is not coming”. We didn’t make greater progress in physics over the 1950–2000 period than we did over 1900–1950 — we did, arguably, about as well. Mathematics is not advancing significantly faster today than it did in 1920

    What Chollet fails to appreciate, I fear, is the nature of progress itself. If we have chairs in 1750, it is not the case that chairs will get exponentially more confortable and cheaper over time so that a chair in 2017 should be expected to nourrish all your muscles for free. Bacteria that existed long before there were multicellular organisms are still around. If you go see a play today, it is not “exponentially better” than a play organized by Molière. Antibiotics are probably cheaper than they were in 1945, but they are not necessarily even better. Progress is about enabling new things. That’s how the exponential comes about: the number of new things you get grows and grows over time… at an exponential rate. Chollet is a great programmer who has produced software tools that are then used by lots of other engineers to build better tools, and so forth. What is being built is not just “better software”… it is software that solves problems we could not solve before. Just as one way of doing things becomes optimal, new alternatives open up. That’s called open-ended evolution and it is definitively happening.

  2. The sales of virtual-reality headsets have exceeded 1 million in the third quarter of this year. Half of these headsets are PlayStation VR headsets. If you achieve 1 million units per quarter, then you are at four million units per year. I have a bet with Greg Linden that we will soon achieve 10 million units a year. We are almost half-way there, so I think I still stand a chance of winning if there are interesting developments (like great games) in the near future.
  3. CRISPR/Cas9 is a technique to edit genes discovered in 2012. It is now relatively cheap and it is likely to be used in a few years to correct genetic defects. It works like a search and replace function. I wondered how fast it is:

    it takes as long as six hours for Cas9 to search a bacterium, that is, through four million base pairs. (…) The results show that the price Cas9 pays for its flexibility is time, (…) To find the target faster, more Cas9 molecules searching for the same DNA sequence are needed. (…) Most proteins that search DNA code can recognize one specific sequence merely by sensing the outside of the DNA double helix, Cas9 can search for an arbitrary code, but to determine whether it is in the right place, the molecule has to open the double DNA helix and compare the sequence with the programmed code. The incredible thing is that it can still search the entire genome without using any energy.

    I am no geneticist, but as a computer scientist, the thought that it takes an hour to process about a million base pair has me concerned. Human beings have 3 billion base-pairs per cell. Thankfully, the problem can be parallelized: you can use several Cas9 molecules.

  4. It looks like reducing the amount of iron in our brains could be useful in preventing cognitive decline. It is going to be re-tested in the context of Alzheimer’s. Much of our food has been supplemented from iron. Indeed, actual iron extracted from the ground is mixed with our food (e.g., bread, cereal). We tend to accumulate iron because the body has a hard time excreting iron. Though we need some iron to be healthy, too much iron in some tissues could be harmful.
  5. In Switzerland, a quarter of all clinical trials are never completed, according to a recent article. The article reviews all factors involved in such a high failure rate. Note that these are not mere failures to get scientific results… they are failures to even complete the scientific work. Some of the factors involved are intriguing. For example, if you want to run clinical trials in multiple cities, then you need ethics approvals in each city. Then there is the set of personal incentives which favor competition between researchers rather than broad collaboration.
  6. As men grow older, their testosterone levels fall and this leads, among other things, in muscle loss and sexual difficulties. Magnan observes that some men maintain high levels of testosterone well into old age. My instinct would be to go see these old men and find out how exactly they differ. Is there a genetic component involved?
  7. Kagan believes that we give pills too eagerly to school-age kids. He believes that labelling kids as having a deficit disorder, when their dopamine levels are perfectly fine, is bad for the kids. They are likely to internalize that there is something wrong with them. An interesting point he makes is that the increase number of prescriptions has a financial origin: insurance is likely to cover the cost, so it is free to the parents and to the school. I don’t know why we expect kids to sit at desks all day while listening to what an adult has to say. It does not sound like fun at all!
  8. According to Papadimitriou, many of us may not be getting enough vitamin D:

    Since 2006, type 1 diabetes in Finland has plateaued and then decreased after the authorities’ decision to fortify dietary milk products with cholecalciferol. The role of vitamin D in innate and adaptive immunity is critical. A statistical error in the estimation of the recommended dietary allowance (RDA) for vitamin D was recently discovered; in a correct analysis of the data used by the Institute of Medicine, it was found that 8895 IU/d was needed for 97.5% of individuals to achieve values ≥50 nmol/L. Another study confirmed that 6201 IU/d was needed to achieve 75 nmol/L and 9122 IU/d was needed to reach 100 nmol/L. The largest meta-analysis ever conducted of studies published between 1966 and 2013 showed that 25-hydroxyvitamin D levels <75 nmol/L may be too low for safety and associated with higher all-cause mortality (…)

  9. The ancient Greeks proved that the only regular polygons that tile are triangles, quadrilaterals and hexagons (as now seen on many a bathroom floor). My wife asked: “how did they prove it?” I don’t know how the Greeks proved it. I lazily looked up the answer online. Here is the gist of it. If you have a regular tiling, you can look at the vertices where different polygons meet, the sum of the angles must be 360 degrees. Then you can look at the inner angles formed by the regular polygons. Hexagons have inner angles of 120 degrees, so three hexagons can meet at a vertex, and their inner angles sum up to 360 degrees. Polygons with more sides have larger inner angles… approaching a flat 180 degrees. Suppose, for example, that two polygons with more than 6 sides meet up with a common vertex, then you sum up the two inner angles and what remains is smaller than the inner angle of the polygons and smaller than 180 degrees. With such arguments, you can construct a formal proof. But that’s not how the Ancient Greeks did it, surely?
  10. In a clinical trial, it was found that consuming twice the recommended amount of proteins improved muscle mass. This suggests that we should eat more proteins.
  11. Doctors trust male surgeons more.

Bit hacking versus memoization: a Stream VByte example

In compression techniques like Stream VByte or Google’s varint-GB, we use control bytes to indicate how blocks of data are compressed. Without getting into the details (see the paper), it is important to map these control bytes to the corresponding number of compressed bytes very quickly. The control bytes are made of four 2-bit numbers and we must add these four 2-bit numbers as quickly as possible.

There is a related Stack Overflow question from which I am going to steal an example: given the four 2-bits 11 10 01 00 we want to compute 3 + 2 + 1 + 0 = 6.

  • How do we solve this problem in our implementation? Using table look-ups. Basically, we precompute each of the 256 possible values and just look them in a table. This is often called memoization. It works fine and a lot of fast code relies on memoization but I don’t find it elegant. It makes me sad that so much of the very fastest code ends up relying on memoization.
  • What is the simplest piece of code that would do it without table lookup? I think it might be
     (x & 0b11) + ((x>>2) & 0b11) + ((x>>4) & 0b11) + (x>>6). 
  • Can we get slightly more clever? Yes, aqrit and Kendall Willets came up with a fancier involving two multiplications:
    ((0x11011000 * ((x * 0x0401) & 0x00033033)) >> 28).

    The compiler might implement a product like x * 0x0401 into a shift and an addition. Nevertheless, it is not obvious that two multiplications (even with optimizations) are faster than the naive approach but it is really a nice piece of programming. I expect that most readers will struggle to find out why this expression work, and that’s not necessarily a good thing. (John Regher points out that this code has undefined behavior as I have written it. One needs to ensure that all computations are done using unsigned values.)

  • In Stream VByte, the control bytes are organized sequentially which means that you can use another fancy approach that processes four bytes at once:
    v = ((v >> 2) & 0x33333333) + (v & 0x33333333);
    v = ((v >> 4) & 0x0F0F0F0F) + (v & 0x0F0F0F0F);
    

    where the variable v represents a 32-bit integer. You could generalize to 64-bit integers for even better speed. It might be slightly puzzling at first, but it is not very difficult to work out what the expression is doing.

    It has the benefit of being likely to be faster than memoization, but at the expense of some added code complexity since we need to process control bytes in batches. There is also some concern that I could suffer from uneven latency, with the first length in a batch of four being delayed if we are not careful.

    We could modify this approach slightly to compute directly the sums of the lengths which could be put to good use in the actual code… but it is fancy enough as it stands..

I could imagine quite a few more alternatives, including some that use SIMD instructions, but I have to stop somewhere.

So how fast are these techniques? I threw together a quick benchmark to measure the throughput. I am using a recent (Intel Skylake) processor.

memoization1.7 cycles/byte
naive2.6 cycles/byte
aqrit-Willets3.1 cycles/byte
batch (32-bit)1.4 cycles/byte

Sadly, the aqrit-Willets approach, despite its elegance, is not always faster than the naive approach. The batch approach is fastest.

Because the batch approach could be made even faster by using 64-bit words, it would be my best choice right now to replace memoization if speed were my concern. It illustrates how there are potential benefits in a data layout that allows batch processing.

This microbenchmark reinforces the view that memoization is fast, as it does well despite its simplicity. Unfortunately.

Update: On Twitter, Geoff Langdale described a fast vectorized approach using SIMD instructions. An approach similar to what he advocates is described in the paper Faster Population Counts Using AVX2 Instructions.

Science and Technology links (November 24th, 2017)

Women earned majority of doctoral degrees in 2016 for 8th straight year and outnumber men in grad school 135 to 100.

Materialists use Facebook more frequently, because they compare themselves to others, they objectify and instrumentalize others, and they accumulate friends.

The modern office chair, with wheels, was invented by Charles Darwin. Or so says a Wikipedia article.

The famous Harvard professor Clayton Christensen says that half of all colleges are bound for bankruptcy. (I am skeptical.)

We can rather easily multiply the lifespan of worms. We might ask though whether such actions actually slow down aging, or just prevent death. The former is true. Researchers found enhanced organ functionality in older, long-lived mutants.

I have always been fascinated by how poor synthesized speech sounds. For all the progress made so far, Siri still sounds like a machine. DeepMind had demoed better sounding synthesized speech a few months ago, but it was impractical computationally. They have now announced that they have a computationally practical version. Thus you can safely predict that within a few short years, synthesized speech coming out of most your devices will pass the Turing test: you won’t be able to differentiate voice coming out of your smartphone from actual human voice.

Echoing Tyler Cowen, Tim Hartford report on research research that found that

Companies still invest heavily in innovation, but the focus is on practical applications rather than basic science, and research is often outsourced to smaller outfits whose intellectual property can easily be bought and sold.

If you think that the solution is more government research grants… please consider that the surest way to be denied a government research grant is to propose a project that has a good chance of failing. Failure must not be an option if you want your research grant to be a success. It is that simple: governments are risk averse. And probably rightly so… when governments take chances, things often end poorly.

The data is in: coffee is healthy.

I keep telling people that scholars routinely cite papers that they have never read. I often get incredulous looks. Well. A made-up article got almost 400 citations.

About half of all men (me included) will suffer from male-pattern baldness. We still don’t know exactly what causes it and we do not have a handy cure for it. We have now explained 38% of the risk using genetic analysis.

Professor Kambhampati is a pacifist that does not support the campaign to stop “killer robots”. I share many of his thoughts.

Stress is good: Mitochondrial stress enhances resilience, protects aging cells and delays risk for disease.

Bees can be left or right handed.

The sex of the mice handlers seems to make a difference in drug experiments. (I am skeptical.)

How often do superior alternatives fail to catch on?

Many of us rely on a Qwerty keyboard, at least when we are typing at a laptop. It is often said that the Qwerty keyboard is inferior to clearly better alternatives like the Dvorak keyboard. However, this appears to be largely a myth backed by dubious science.

There is the similarly often repeated story of VHS versus Betamax tapes, when people would record video on tapes. The often told story was that Betamax lost to VHS despite being technically superior. But VHS tapes could record whole 2-hour movies whereas Betamax could not: so VHS was indeed superior.

It is often said that birds have far superior lungs than mammals. So mammals are failures compared to birds… However, bats (which are mammals) have superior lungs than either terrestrial mammals or birds. This suggests that mammals can acquire better lungs when they need it.

I fear that many of the stories about us being stuck with inferior products due to market failures, or about animals being stuck with inferior organs due to evolutionary dead-ends, might actually be weak or false stories.

Credit: This post was inspired by an email exchange with Peter Turney.

You are your tools

I believe that there are no miracle people. When others get the same work done as you do, only much faster, they are almost surely using better tools.

Tools are not always physical objects. In fact, most tools are not physical per se. For example, mathematics is a great tool. Word processors are another tool. Google is also a tool.

Intellectuals have tools to help them be productive. They have books. They have computers. They have software. They also have models, frameworks, and theories.

For example, I studied Physics, so I learned about how physicists think… and it is not how most people think. They have these tricks which turn difficult problems into far easier problems. The main lesson I took away from Physics is that you can often take an impossibly hard problem and simply represent it differently. By doing so, you turn something that would take forever to solve into something that is accessible to smart teenagers.

To illustrate what I have in mind… most people who have studied mathematics seriously, even teenagers, can quickly sum up all numbers in a sequence. For example, what is the sum of the numbers between 1 and 99. That sounds hard? So maybe you can look up a formula online. Maybe. But once you know the “trick”, you can do it in your head, quickly, without effort. There is no miracle involved. To sum up the numbers between 1 and 99, just pair up the numbers. You pair 1 with 99, 2 with 98… and so forth, up to 49 and 51. So you have 49 such pairs, and each pair, sums up to 100 (99+1, 98+2,…). So you have 49 times 100 which is 4,900. Then you have to add the remaining number (50), so that the sum is 4,950.

We don’t know yet what intelligence is. It is not something as simple as how many neurons you host in your neocortex… Dolphins have more such neurons than you do. It is probable that, in time, we will see that what defines intelligence is our ability to build upon new tools.

For some reason, the smartest among us have access to better tools. And that’s ultimately why they can run circles around you and I.

They can’t easily transmit their tools. It takes work, but it tends to happen. A few hundred years ago, most people could not read and write. It was widely believed that most people could never learn to read and write. Until fairly recently (i.e., a handful of centuries), the ability to read was regarded as a very rare trait, a sure sign of high intelligence. We now expect even the dumbest kids in high school to read.

Summing up the numbers between 1 and 100 in your head was, no doubt, a great feat once upon a day. Today it is something that all kids in Singapore know how to do.

You should be constantly trying to expand the number of tools at your disposal. It is a particular version of the growth mindset: the belief that you should always seek to better yourself, by acquiring new tools.

You might reasonably ask… “I have whatever tool that I learned to use, and it is good enough for what I do usually. Why would I invest in learning something new if I don’t feel any urgent need to do so?”

My answer is that acquiring new tools is the surest way to get smarter.

Further reading: Stop Using Excel, Finance Chiefs Tell Staffs at the Wall Street Journal.

Relevant quotes:

We have become the tool of our tools. (Henry David Thoreau)

We shape our tools and, thereafter, our tools shape us. (John Culkin, also attributed to Marshall McLuhan)

Our tools are better than we are, and grow better faster than we do. They suffice to crack the atom, to command the tides, but they do not suffice for the oldest task in human history, to live on a piece of land without spoiling it. (Aldo Leopold)