Why do students pay for the research professors do?

Universities require their professors to publish research papers. Yet publishing your research has little to do with most of the teaching that goes on in universities. And with online teaching, we can almost completely separate teaching from research. Yet we are typically happy to dismiss these concerns by pointing out that universities have also a research purpose. But this answer is not entirely satisfying: who gets to decide what universities should do beside provide degrees and teaching?

There was a long student boycott in Quebec in 2012 that attracted worldwide attention. Students asked for free higher education. One of their core arguments for cheap tuition was that much of the university budget goes to support research. Apparently, many students would rather not pay for research. That is, if universities have to do research, then it is up to the government (or to industry) to fund it.

I think that the Quebec students are misguided when it comes to research. So let us ask why universities do research.

How did we get started? The first universities were entirely funded by the students. In fact, they were often run by the students themselves. Yet, even back then, the professors published and communicated their research eagerly. Why?

There would be great savings if we could just get rid of research. It is easy to find people willing to lecture for a fourth of the salary of a professor.

But professors are not lecturers even though they sometimes lecture. Students seek to be close to people who are perceived as leaders in their respective area. They do so because recognition from such a leader is highly valued socially. And to recruit and retain leaders, you need to pay a decent salary.

The principle is general. If you are a well-known porn star, it is likely that there are people who will pay just to get some coaching from you, precisely because you are known as a porn star. So, a computer science professor should try to be known as a computer scientist. Then students who want to become computer scientists will want to have access to this professor.

Publishing is a very natural process if you want to build up your reputation. In fact, many people who write non-fiction books do so because it will attract indirect sources of income such as consulting and training. Professors are not different: they write books and research articles because it increases their social status. In turn, this social status can be monetized.

Thus, if you want to know whether a professor is doing a decent job, you should ask whether people would be willing to pay to interact with him if we did not have universities. A good professor should be able to fund much of his research with consulting and training contracts, should he ever lose his academic position. Hence, his employer gets a fair deal even if it has to allow him to spend a great deal of time on self-directed research.

Students who are merely interested in some cheap teaching can find affordable people willing to tutor them. But that is not what motivates students to attend universities. They seek prestige. This is is why professors have to act as role models. That is why we need to pay them to publish.

My argument is falsifiable. The Quebec government could create new universities that are focused entirely on teaching. They would do no research, but they would have lower tuition fees. If students really do not care for research, they should all flock to these new universities. Of course, teaching-only universities do exist and though they attract their fair share of students, they have never disrupted conventional universities.

A simple trick to get things done even when you are busy

I have been overwhelmed with busy work for the last couple of weeks. Unavoidably, this has meant shorter emails, fewer follow-ups with collaborators and not as much blogging as I would like. I probably missed a few commitments as well.

However, despite all the turmoil of responsibilities, committees and grading, I still get a little bit of core research done every week. I do have a little secret weapon. It will likely only work with one aspect of your life, but it will work well. It is called “pay yourself first”.

Imagine a typical workday. You look at your schedule and it is crazy. At some point, you have half an hour that you can spend how you want. What do you do? Probably you have 50 unanswered emails, a few forms to fill out, some advice to give out, a grant proposal to review… What these tasks have in common is that they don’t require a lot of energy but they consume time. They also tend to be important for someone else. They are tempting because with relatively little energy, you can reduce the length of your to-do list and immediately feel good about yourself. And someone might notice right away that you did them! These tasks are like candies. You feel good and energetic when doing them, but they drain you out eventually.

Instead, look at what you figured was important in your life. For example, working out, writing a novel or a research paper. Do spend time doing this important work first. I do not mean that you should sacrifice the other aspects of your life, only that in your schedule, you should complete some tasks before the others.

The key insight is to recognize that there are different types of work, just like there are different types of food, and that the order in which you do your work matters. If you eat dessert first, you will never be hungry the meat and the vegetables.

Rationally, it does not seem to make sense. Why would it matter when you work on your novel? But consider that busy work has a way to expand and take up all your time. It is because it is so tempting to work on short-term problems, and there is never a shortage of those.

I suspect that this issue has deep roots. Our ancestors were probably most successful when they worried most about day-to-day issues. We are tempted to do the same.

Why I like the new C++

I was reading Vivek Haldar’s post on the new C++ (C++11) and I was reminded that I need to write such a post myself.

C++ is a standardized language. And they came up with a new version of the standard called C++11. Usually, for complex languages like C++, new versions tend to make things worse. The reason is simple: every vendor is eager to see its features standardized. So the whole standardization process becomes highly political as mutually contradictory features must be glued on. But, for once, the design-by-committee process worked really well: C++11 is definitively superior to previous versions of C++.

In one of my recent research projects, we produced a C++ library leveraging several of the new features of C++11. The use of C++11 is still a bit problematic. For example, our library only compiles under GCC 4.7 and better or LLVM 3.2.

So, why did we bother with C++11?

1. The move semantics

In conventional C++, values are either copied or passed by reference. If you are adding large objects (e.g., the data of an image) to a container such as an STL vector, then they are necessarily copied unless you do tricks involving pointers and other magical incantations. Performance-wise, this is absolutely terrible!

The new C++ introduces the move semantics: you can move data to a container (such as an STL vector) without copying it! For example, the following code took over 4s to run using the regular C++, and only 0.6s using C++11: it is a performance boost by a factor of 5, without changing a line of code.

vector<vector<int> > V;
for(int k = 0; k < 100000; ++k) {
    vector<int> x(1000);

2. No hassle initialization

Initializing containers used to be a major pain in C++. It has now become ridiculously easy:

const map<string,int> m = {{"Joe",2},{"Jack",3}};

There are still a few things that I could not get to work, such as initializing static members within the class itself. But, at least, you no longer waste minutes initializing a trivial data structure.

3. Auto

STL uses iterators. And iterators are cool. But old C++ forces you to fully qualify the type each time (e.g., vector::iterator) even when the compiler could deduce the type. It gets annoying and it makes the code hard to parse. C++11 is much better:

vector<int> x = {1,2,3,4};
for(auto i : x)
  cout<< i <<endl;

These lines of code initialize a container and print it out. I never want to go back to the old ways!

4. Constexpr

Suppose that you have a function that can be called safely by the compiler because it has no side effect: most mathematical functions are of this form. Think about a function that computes the square root of a number, or the greatest common diviser of two numbers.

In old-style C++, you often have to hard-code the result of these functions. For example, you cannot write enum{x=sqrt(9)}. Now you can!

For example, let us define a simple constexpr function:

// returns the greater common divisor
constexpr int gcd(int x, int y) {
    return (x % y) == 0 ? y :  gcd(y,x % y);

If gcd was just any function, then it might be called multiple times in the following loop, but thanks to C++11, it will never get called when the program is running (just once by the compiler):

for(int k = 0; k < 100000; ++k) {
    vector<int> x(gcd(1000,100000)); 

(Naturally, a buggy C++ compiler might fail to optimize away the constexpr function call, but you can then file a bug report with your compiler vendor.)


C++11 is not yet supported by all compilers, but if you are serious about C++, you should switch to C++11 as fast as possible.

As usual, my code is available from github.

What I do with my time

I am not a very productive person. I also do not work long hours. However, I sometimes give the impression that I am productive. I have been asked to explain how I do it. Maybe the best answer is to explain what I do on a daily basis.

We are Wednesday. Here is what kept me busy this week:

  • Every night I spend an hour playing video games with my two sons. We finished Xenoblade a few days ago. It took 95 hours to finish the game, so about 95 days. We are playing Oblivion right now.
  • This week I spent an afternoon grading papers.
  • I spent probably an afternoon preparing a new homework question for my database course.
  • I spent almost an entire day reading a Ph.D. thesis and writing up a report.
  • I spent a morning running some benchmarks on an AMD processor. It involved some minor C programming.
  • I am writing this blog post and I wrote another one earlier this week. I often spend 3 hours blogging a week. Sometimes less, sometimes more. I also read comments, here and on social networks and I try to react.
  • I am part of some federal committee offering equipment grants. It took me a few hours to fill out a form this week.
  • I have been asked to be on the tenure review committee for the business school as an external. I have reviewed 4 or 5 files, looked up research papers, read the corresponding c.v.s and written up some notes.
  • I made bread twice this week. I make all the bread our family eats.
  • I spent a few hours working on furniture. I am building my own furniture for our living room as well as a few custom things for the house.
  • I spent a few hours on Google+ arguing with people like Greg Linden.
  • I spent a few hours exchanging emails with various people including graduate students.
  • Because I chair a certificate program, I had to answer a few questions from students. This afternoon, I wrote a long email to help the program coordinators. We are going to build some kind of FAQ for students.
  • I was asked whether my graduate data warehouse course would be offered next term. I explained to the chair of the IT M.Sc. program that it would be offered but that students can’t register right now.
  • Because there is much demand for this upcoming graduate data warehousing course, I prepared a long email with supporting documents to help move this along. I will be offering three graduate courses next term. And yes, I do all the grading myself. I am currently offering two though most of my teaching time is used up by the undergraduate database course that I am offering for the first time this term.
  • I spent a few hours this week arguing with a database student that, well, it is not ok if he is weak in mathematics. He needs to build up his expertise.
  • Someone submitted a problem to me about transcoding UTF-8 to UTF-16. We exchanged a few emails and I proposed a data structure along with a reference paper. This may eventually become a blog post.
  • I spent some time worrying that I am still without news about a paper I submitted 9 months ago to a journal.
  • I sometimes supervise Ph.D. students in the cognitive computer science program. The program is being reviewed right now. I spent my morning on Monday at a meeting with external experts.
  • Tomorrow, I have two administrative meetings occupying the full day.
  • I reviewed a report and some code from a Ph.D. student I co-supervise.
  • I picked up my kids from school yesterday and today. My wife did it Monday and she will do it again tomorrow.
  • I watched a dozen videos on YouTube. I have this amazing (but cheap) TV that can display YouTube videos. I really liked Using Social Media To Cover For Lack Of Original Thought.

I would qualify the current week as busy, but not extraordinarily so. It would be a much more relaxed week if I did not have a full day of meetings tomorrow.

Some things that I have not done:

  • I have not checked that this blog post has good spelling and grammar.
  • Before going to sleep, I watch a TV show or read books on my iPad. These days, I am watching Once upon a time. However, I do not watch broadcast TV.
  • I owe a couple M.Sc. students a meeting. I promised to email them last week, but I have not yet done so.
  • I have delayed a few meetings that were supposed to happen this week.
  • I was planning to do some follow-up work this week on a couple of research projects, but it looks doubtful that I will have any time at all. I constantly worry that I am just keeping busy instead of doing the important work.
  • I am trying to read The Art of Procrastination: A Guide to Effective Dawdling, Lollygagging and Postponing, but I am not making much progress. (This is not a joke.)
  • I used to spent a lot of time following blogs. I do not have much time for this anymore.
  • I am also not attending strip clubs or doing threesomes. I do not have a mistress.
  • Unlike one of my friends, I do not run a farm.
  • I do not travel.
  • I am not active in politics.
  • Other than swearing, I have no religious activity.
  • I do not have a side business. I do not consult.
  • I do not workout. (I compensate by drinking coffee.)
  • I do not clean up my office.
  • I do not shop for clothes.
  • Though I shower every day, I do not spend any time at all trying to look nice. I pick my clothes randomly in the morning. I am sometimes astonished how business-like some of my colleagues look.

Conclusion: I do not know what to conclude, but I am interested in how what I do differs from what other people do.

The learning pill

Shirky predicts that the bulk of higher education is being disrupted the same way the music industry was disrupted by MP3 files.

Should we believe him?

Let us run a thought experiment. Instead of online lectures disrupting higher education, imagine that someone has invented a learning pill. That is, someone can extract the knowledge of experts and embed it in a pill: you take the pill, you are an expert. Your knowledge is perfect in every way right away. Essentially, you could acquire the knowledge of several advanced degrees in seconds by taking a single pill. Let us assume, to make the case even more compelling, that these pills cost nothing to produce and that they are without side-effect (except for the learning, of course).

What would happen?

  • Almost certainly, the pill would be made illegal. The excuse: the pill is surely dangerous and we need to protect the public. A black market would blossom and, eventually, governments would cave in. However, the pill would be highly regulated. There would be a lobby trying to make it scarce.
  • Educators would point out that the pill is just not the same as a quality education. Indeed, they would explain that knowledge is just one component of a good education. People would argue: do you really want to go see a doctor who learned from a pill?

The net result, I believe, is that even though a learning pill would disrupt society, it would do little to disrupt colleges. Degree requirements would not go away: it is unlikely that anyone who took a medecine pill would be allowed to practice medicine. A college degree would remain as a test of character: maybe even more so in the context of the learning pill. Kids would want to signal that they are dedicated enough to learn the good old way.

Politically, the public would take to the streets if a government tried to close down public colleges on account of the learning pill. And Keynesian economists would warn that a collapse of education as an industry would kill aggregate demand and send the economy in turmoil.

We are unlikely to invent a learning pill. However, the Internet is sometimes a close approximation. Do you want a lawyer or a doctor who learned from the Internet? Given a choice between a college graduate in computer science and someone who has a leading reputation on Stack Overflow, which corporation would pass on the college graduate?

Source: Via Seb Paquet.

Fast sets of integers

Maintaining a set of integers is a common problem in programming. It can also be implemented in many different ways.

Maybe the most common implementation uses a hashing (henceforth hashset): it provides optimal expected-time complexity. That is, we expect that it takes a constant time to add or remove an integer (O(1)), and it takes a time proportional to the cardinality of the set to iterate through the elements. For this purpose, Java provides the HashSet class.

An alternative implementation is the bitset: essentially, it is an array of boolean values. It ensures that adding and removing integers takes a constant time. However, iterating through the elements could be less than optimal: if your universe contains N integers, it may take a time proportional to N to enumerate the integers irrespective of the number of integers in the set.

So, suppose you expect to have 1000 integers in the range from 0 to N. Which data structure is best? The hashset or the bitset? Clearly, if N is sufficient large, the hashset will be best. But how large must N be?

I decided to implement a quick test to determine the answer. Instead of using the standard Java BitSet, I decided to write my own bitset (henceforth StaticBitSet) that is faster in my tests. For the hashset, I compared both the standard HashSet and TIntHashSet and found that there was little difference in performance in my tests, so I report just the results with the standard HashSet (from the OpenJDK 7).

The following table reports the speed in millions of elements per second for adding, removing and iterating through 1000 elements in the range from 0 to N.

  N     bitset     hashset  
100,000 77 18
1,000,000 45 19
10,000,000 11 18

These numbers are consistent with the theory. The speed of the hashset data structure is relatively independent from N whereas the performance of the bitset degrades as N increases. However, what might be surprising, is how large N needs to be before the bitset is beaten. The bitset only starts failing you (in this particular test) when the ratio of the size of the universe to the size of the set exceeds 1,000.

The bitset data structure is more generally applicable than you might think.

Source: My Java source code is available, as usual.

Further reading: In to Sorting is fast and useful, I showed that binary search over sorted array of integers could be a competitive way to test whether a value belongs to a set.

Should you follow the experts?

It is silly to say that ignorance is strength. However, the contrary statement is not follow the experts.

The right knowledge is strength. The tricky part is that determining what you should know is just as hard as learning it. So, while the expert might be sitting on the shoulders of giants, these giants might be looking in the wrong direction or maybe they have been sinking in quicksand. The only thing that experts have going for themselves is that they tend to be consistent: if they make mistakes, they are the expected mistakes.


  • The financial crisis of 2008 was not on the radar of the mainstream economists. However, original thinkers like Nouriel Roubini and Nassim Nicholas Taleb did predict it with accurate arguments. They are still being dismissed by mainstream economists as lucky. Indeed, mainstream economists often argue that the events that matter, such as the big crashes and the significant surges, are fundamentally unpredictable. How convenient!
  • The Web was invented by a physicist, not by a librarian. Many librarians resisted the Web.
  • Wikipedia was not invented by a publishing house editing an encyclopedia. In fact, most publishers such as Encyclopædia Britannica ridiculed wikipedia. Instead, it was founded by someone who failed to complete his Ph.D. in finance and went to work for a trading company instead.
  • YouTube was not invented by a media company or a TV station. It was invented by hackers who had worked together at Paypal, where they were trying to disrupt the banking industry.
  • Blogs and Twitter were not invented by journalists. Blogging was invented by a hacker. In fact, journalists ridiculed blogs initially.
  • Amazon.com was invented by a computer scientist, not a bookstore owner. Bookstores were dismissive of Amazon.com initially.
  • The iPhone was invented by a company that made computers and MP3 players, not portable phones.
  • The leading edge in self-driving cars is not at General Motors, but at Google, a search company.

The practical lessons are:

  • Be wary of groupthink. Mainstream experts are most likely to fall prey to groupthink. The worse of it is that introduces systematic errors: every expert will tend to make the same mistake so that the unanimous view might be fatally flawed.
  • Being an expert is overrated when innovation is needed. What if you had planned to invent Amazon.com? You might have decided to go in the bookstore business for a few years to make sure you understood books. Or you might have worked for a publishing house. By the time you had a decade of experience you would probably have not been able to build Amazon.com. Too much of your point of view would be dependent on the old ways.

Is reading memory faster than writing on a PC?

For human beings, reading is much faster than writing. The same is often true for computers: adding a record to a database may take an order of magnitude longer than reading it. But the speed differential can often be explained by the overhead of concurrency. For example, writing to a database may involve some form of locking where the database systems must first acquire the exclusive writing rights over a section of memory.

If you minimize the overhead, by writing straight C/C++ code, it is less obvious that reading and writing to memory should have different speeds. And, indeed, in my tests, the memset instruction in C which initializes a range of bytes with a given value, takes exactly half the running time of memcpy which reads data from one location and writes it to another. Because memset involves just writing to memory, and memcpy does as much reading as it does writing, we can conclude that these tests show symmetry: reading is as fast as writing.

However, these tests (with memset and memcpy) are maybe not representative of how writing and reading is done most commonly. Thus I designed a second batch of tests. In these tests, I have two arrays:

  • A small array (32KB or less) that fits easily in fast L1-L2 CPU cache. It has size M.
  • A larger array (between 1MB to 512MB) that might not fit in CPU cache. It has size M times N so that the small array fits N times in the larger array.

Then I consider two tests that should run at the same speed if reading and writing are symmetrical:

  1. I copy the small array over the large one N times so that all of the larger array is overwritten. In theory, this test reads and writes M times N elements. In practice, we expect the computer to read the small array only once and to keep its data in close proximity with the CPU afterward. Hence, we only need to read the M elements of the small array once and then write N times M elements in the large array. Hence, this test measures mostly writing speed. The simplified source code of this test is as follows:
    for(size_t x = 0; x<N; ++x) {
        bigdata += M;
  2. I copy N segments from the large array to the small array so that at the end, all of the large array has been read. Again, this test reads and writes M times N elements. However, we can expect the processor to delay copying the short array to memory: it might keep it in fast CPU cache instead. So that, in effect, this second tests might be measuring mostly reading speed. The simplified source code is:
    for(size_t x = 0; x<N; ++x) {
        bigdata += M;

Notice how similar the source codes are!

Can you predict which code is faster? I ran these tests on a desktop Intel core i7 processor:

  M     N   test 1 (time)/ test 2 (time)
2048 32 M 3.4
8192 16384 M 1.6

Analysis: The test 1 (writing test) is significantly slower than test 2 (reading test). That is, it is slower to take a small array (located near the CPU) and repeatedly write it to slower memory regions, than doing the reverse. That is, reading is faster than writing.

“But Daniel! That’s only one test and 4 lines of code, it proves nothing!”

Ok. Let us consider another example example. In C/C++, boolean values (bool) are stored using at least one byte. That’s rather wasteful! So it is common to pack the boolean values. For example, you can store 8 boolean values in one byte using code such as this one:

void pack(bool * d, char * compressed, size_t N) {
  for(size_t i = 0; i < N/8; ++i) {
    size_t x = i * 8;
    compressed[i] = d[x] | 
                    d[x+1]<<1 |
                    d[x+2]<<2 |
                    d[x+3]<<3 |
                    d[x+4]<<4 |
                    d[x+5]<<5 |
                    d[x+6]<<6 |
                    d[x+7]<<7 ;

The reverse operation is not difficult:

void unpack(char * compressed, bool * d, size_t N) {
  for(size_t i = 0; i < N/8; ++i) {
    size_t x = i * 8;
    d[x] = compressed[i] & 1;
    d[x+1] = compressed[i] & 2;
    d[x+2] = compressed[i] & 4;
    d[x+3] = compressed[i] & 8;
    d[x+4] = compressed[i] & 16;
    d[x+5] = compressed[i] & 32;
    d[x+6] = compressed[i] & 64;
    d[x+7] = compressed[i] & 128;

These two pieces of code (pack and unpack) are similar. The main difference is that the pack function reads 8 bools (1 byte each) and writes one byte whereas unpack function reads one byte and writes 8 bools. That is, the pack function is a reading test whereas the unpack function is a writing test.

Can you guess which one is faster? In my tests, unpacking is 30% slower than packing. This is consistent with the result of my other analysis: it seems that reading is faster than writing.

Of course, you should only expect to see a relative slowdown if you write much more than you would otherwise read (say by a factor of 8). Also, I expect the results of these tests to vary depending on the compiler and the machine you use. Always run your own tests!

Source code: As usual, you can find my source code online.

Further reading: I know why DRAM is slower to write than to read, but why is the L1 & L2 cache RAM slower to write? via Reverend Eric Ha

Update: Nathanaël Schaeffer points out that the latency of the MOV operation differs depending on whether you are copying data from memory to a register, or from a register to memory. On my test machine, it takes at least 3 cycles to copy data from a register to memory, but possibly only 2 cycles to copy data from memory to a register (Fog, 2012).

Credit: Thanks to L. Boystov, S. H Corey, R. Corderoy, N. Howard and others for discussions leading up to this blog post.

When is a bitmap faster than an integer list?

You can represent a list of distinct integers no larger than N using exactly N bits: if the integer i appears in your list, you set the i th bit to true. Bits for which there is no corresponding integers are set to false. For example, the integers 3, 4, 7 can be represented as 00011001. As another example, the integers 1, 2, 7 can be represented as 01100001.

Bitmaps are great to compute intersections and unions fast. For example, to compute the union between 3, 4, 7 and 1, 2, 7, all I need to do is compute the bitwise OR between 00011001 and 01100001 (=01111001) which a computer can do in one CPU cycle. Similarly, the intersection can be computed as the bitwise AND between 00011001 and 01100001 (=00000001).

Though it does not necessarily make use of fancy SSE instructions on your desktop, bitmaps are nevertheless an example of vectorization. That is, we use the fact that the processor can process several bits with one instruction.

There are some downsides to the bitmap approach: you first have to construct the bitmaps and then you have to extract the set bits. Thankfully, there are fast algorithms to decode bitmaps.

Nevertheless, we cannot expect bitmaps to be always faster. If most bits are set to false, then you are better off working over sets of sorted integers. So where is the threshold?

I decided to use the JavaEWAH library to test it out. This library is used, among other things, by Apache Hive to index queries over Hadoop. JavaEWAH uses compressed bitmaps (see Lemire at al. 2010 for details) instead of the simple bitmaps I just described, but the core idea remains the same. I have also added a simpler sparse bitmap implementation to this test.

I generated random numbers using the ClusterData model proposed by Vo Ngoc Anh and Alistair Moffat. It is a decent model for “real-world data”.

Consider the computation of the intersection between any two random sets of integers. The next figure gives the speed (in millions of integers per second) versus the density measured as the number of integers divided by the range of values. The “naive” scheme refer to an intersection computed over a list of integers (without bitmaps).

I ran the test on a desktop core i7 computer.

Conclusion: Unsurprisingly, the break-even sparsity for JavaEWAH is about 1/32: if you have more than 1000 integers in the range [0,32000) then bitmaps might be faster than working over lists of integers. Of course, better speed is possible with some optimization and your data may differ from my synthetic data, but we have a ballpark estimate. A simpler sparse bitmap implementation can be useful over sparser data though it comes at a cost: the best speed is reduced compared to EWAH.

Source code: As usual, I provide full source code so that you can reproduce my results.

Update: This post was updated on Oct. 26th 2012.

You cannot scale creativity

As a teenager, I was genuinely impressed by communism. The way I saw it, the West could never compete. The USSR offered a centralized and efficient system that could eliminate waste and ensure optimal efficiency. If a scientific problem appeared, the USSR could throw 10, 100 or 1000 scientists at it without having to cajole anyone.

I could not quite understand why the communist countries always appeared to be technologically so backward. Weren’t their coordinated engineers and scientists out-innovating our scientists and engineers?

I was making a reasoning error. I had misunderstood the concept of economy of scale best exemplified by Ford. To me, communism was more or less a massive application of the Fordian approach. It ought to make everything better and cheaper!

The industrial revolution was made possible by economies of scale: it costs far less per car to produce 10,000 cars than to make just one. Bill Gates became the richest man in the world because software offers an optimal economy of scale: it costs the same to produce one copy of Windows or 100 million copies.

Trade and employment can also scale: the transaction costs go down if you sell 10,000 objects a day, or hire 10,000 people a year. Accordingly, people living in cities are typically better off and more productive.

This has lead to the belief that if you regroup more people and you organize them, you get better productivity. I want to stress how different this statement is from the previous observations. We can scale products, services, trade and interaction. Scaling comes from the fact that we need reproduce many copies of the essentially the same object or service. But merely regrouping people only involves scaling in accounting and human ressources: if these are the costs holding you back, you are probably not doing anything important. To get ten people together to have much more than ten times the output is only possible if you are producing an uniform product or service.

Yet, somehow, people conclude that regroup people and getting them to work on a common goal, by itself, will improve productivity. Fred Brooks put a dent in this theory with his Brook’s law:

Adding manpower to a late software project makes it later.

While it is true that almost all my work is collaborative, I consistently found it counterproductive to work in large groups. Of course, as an introvert, this goes against all my instincts. But I also fail to see the productivity gains in practice whereas I do notice the more frequent meetings.

Abramo et al. (2012) looked seriously at this issue and found that you get no more than linear scaling. That is, a group of two researchers will produce twice as much as one researcher. Period. There is no economy of scale when coordinating human brains. Their finding contradicts decades of science policy where we have tried to organize people into larger and better coordinated groups (a concept eerily reminiscent of communism).

We can make an analogy with computers. Your quad-core processor will not run Microsoft Word four times as far. It probably won’t even run it twice as fast. In fact, poorly written software may even run slower when there are more than one core. Coordination is expensive.

The solution is to lessen the need for coordination: have different people work on different things, use smaller teams, and employ fewer managers.

Will I get a job with this degree?

In Quebec, we have had massive student protests. Students were asking for free higher education. It seems that things have quieted down as the new government has vowed to freeze tuition in constant dollar. Though it is never spelled out as such, affordable higher education is viewed as a tool to favor social mobility.

Most of us are in favor of social mobility: it is a good thing if a kid from a poor background can grow to become wealthy.

Sociologists will argue that we have no other social engineering tool to favor social mobility like affordable higher education. That is what justifies massive government involvement in higher education even in countries as right-leaning as the United States. In more left-leaning countries like France, higher education is often entirely free.

Before I pursue this line of thought, there is another unstated assumption: college graduates are smarter and they make better citizens. While the first assumption (social mobility) is founded on solid research, this second one has much weaker support. Spending time in college will improve the mastery of your major, but your overall cognitive skills may not improve. For example, getting a degree in English will not help you if you need to learn software programming later. Arguably, a degree in mathematics might help you become a better programmer, but even that is doubtful.

In any case, it has been historically true that college degrees have translated into better jobs, especially for the poorer half of the population. But, at some point, we have confused the means and the goals. What the young people in Quebec should be asking for are the means to start their careers.

Meanwhile, as a college professor I sometimes have to answer difficult questions such as “if I get this diploma or degree, will I get a job?” That is, many students care very much about the first unstated assumption: attending college helps you get jobs.

Most university-level professors, me included, are uneasy with this function. My job is simply not geared toward providing job training. For example, I try to get our students to program if only because it is such a useful skill to have in industry. Unfortunately, if a student only programs within our classes, he is very unlikely to become a sought-after expert programmer. For most people, it takes 10 years to be great at programming. So, degrees are not a straight path to a career in the software industry. The same is true of most degrees and most industries. Universities are simply not great at job training and they don’t make people all-around smarter.

But this used not to matter. The old industrial-age model was: get a degree or diploma in whatever you care for, get an office job, retire. The purpose of the degree was not to train you but rather to fail those who could not conform to a typical office job. This model still work if you are lucky enough to get a job in government or a big corporation.

Yet conformity is less important in the post-industrial age. And correspondingly, we are slowly moving to a post-industrial career model where a badge like a degree or diploma is only one of the things that are nice to have.

Increasingly, my answer to “Will I get a job with this degree?” is “If that’s all you have, no, you won’t”.

Effectively, I predict that the effectiveness of higher education as a tool for social mobility will weaken. A focus on conformity-based badges like degrees will fail the younger generation. The routine factory and office jobs are being automated and outsourced too fast. I am especially concerned when I hear the student representatives present the degree as a goal in itself. As if we were in the seventies.

Further reading:

How well does peer review work?

Since the second world war, science has relied on what I call traditional peer review. In this form of peer review, researchers send their manuscript to journal. An editor then reviews the manuscript and if it is judged suitable, the manuscript is sent to reviewers who must make a recommendation. If the recommendations are good, the editor might ask the authors to revise their work. The revised manuscript can be sent back to the reviewers. Typically, the authors do not know who the reviewers are.

In computer science and engineering, we often rely also on peer reviewed conferences: they work the same way except that the peer review process is much shorter and it typically only involves one round. That is, the manuscript is either accepted as is or rejected definitively.

Governments attribute research grants according to the same peer review process. However, in this case, a research proposal is reviewed instead and there is typically only one round of review. You can theoretically appeal of the decisions, but by the time you do, the funds have been allocated and the review committees might have been dismissed.

Researchers trust peer review. There is a widely held belief that this process can select the best manuscripts reliably, and that it can detect and reject nonsense.

However, most researchers have never looked at the evidence.

So how well does peer review fare? We should first stress that a large fraction (say 94%) of all rejected manuscripts are simply taken to another journal and accepted there. The editor-in-chief of a major computer science journal once told me: you know Daniel, all papers are eventually accepted, don’t forget that. That is, even if you find that some work is flawed, you can only temporarily sink it. You cannot denounce the work publicly in the current system.

Another way to assess the reliability of peer review is to look at inter-reviewer agreement. We find that as soon as one reviewer feels that the manuscript is acceptable, the inter-reviewer agreement falls to between 44% and 66% (Cicchetti, 1991). That is, consensus at the frontiers of science is as elusive as in other forms of human judgment (Wessely, 1998). Who reviews you is often the determining factor: the degree of disagreement within the population of eligible reviewers is such that whether or not a proposal is funded depends in large proportion of cases upon which reviewers happen to be selected for it (Cole et al., 1981).

How to be happier while annoying your wife

About a year ago, I read Made by Hand: Searching for Meaning in a Throwaway World by Mark Frauenfelder. It is a simple book with a simple message.

How to be happy? Frauenfelder thought that moving to tropical island might be it for him and his family. It failed. Location is not related to your happiness.

But then, in the process of moving, he realized something important. He was happier when he made stuff for himself.

It is not about anti-consumerism. While I do not know Frauenfelder, I expect he has a big screen TV and a nice car.

Rather, it is the idea that if you can afford it, making your own stuff makes your life richer and more meaningful.

This message has changed my life. What have I done with it?

  • I have learned to make thick yogurt. My recipe is simple but it still took me months to get it right. Mostly, I had to learn the hard way that the common recipes cannot be trusted. We never buy commercial yogurt. I make 4l of yogurt a week and I have been doing so for about a year.
  • I have learned to make my own wine, port and beer. When I drink wine, I drink my wine. Of course, I don’t start from grapes, but rather from grape juice. But it is meaningfully my wine since I can screw it up. People love my port wine.
  • We no longer buy commercial bread. I have learned to make several great tasting breads with very little effort. Do not do as I did and try to use a bread machine. It is a waste of time and money. Instead go buy these books:

    If you want to save money, you can find these recipes online or on YouTube (e.g., see the GardenFork.TV recipe). It took me months but now I can reliably produce bread that is superior to what you can find in a store, with little effort.

    I also make my own pizza, entirely from scratch, that has exactly the same great taste as a professional one. The only thing I have not gotten quite right is how to shape the pizza as a nice round pie. I have been making my own pizza for about 10 years, but I only got it right in the last two months or so (thanks to Peter Reinhart’s books). If you are curious, I use a stone and I cook the pizza at 500F.

  • I produce my own herbs, tomatoes and lettuce from my garden. I have canned enough tomatoes to last a year.
  • I build my own radio-control models. I have built two sailboats from scratch. My latest effort was a crawler: built from parts. I have also built a Mario-themed clock for my son. My basement is filled with half-completed robots, arduinos and random other items.

My wife is sometimes annoyed at my projects. I fail all the time. Sometimes I make a mess. I waste money.

Over time, however, I build up expertise. And more importantly, the stuff I make has meaning. My kids eat my pizza, not just any pizza. My kids love my bread, not just any bread.

It is not about saving money. On the contrary, it is about being able to afford to do your own stuff. It makes me happy that I can afford to make my own bread and that I am good at it.

Update: I also roast my own coffee. I picked that up from John Cook.

Fast integer compression: decoding billions of integers per second

Databases and search engines often store arrays of integers. In search engines, we have inverted indexes that map a query term to a list of document identifiers. This list of document identifiers can be seen as a sorted array of integers. In databases, indexes often work similarly: they map a column value to row identifiers. You also get arrays of integers in databases through dictionary coding: you map all column values to an integer in a one-to-one manner.

Our modern processors are good at processing integers. However, you also want to keep much of the data close to the CPU for better speed. Hence, computer scientists have worked on fast integer compression techniques for the last 4 decades. One of the earliest clever techniques is Elias coding. Over the years, many new techniques have been developed: Golomb and Rice coding, Frame-of-Reference and PFOR-Delta, the Simple family, and so on.

The general story is that while people initially used bit-level codes (e.g., gamma codes), simpler byte-level codes like Google’s group varint are more practical. Byte-level codes like what Google uses do not compress as well, and there is less opportunity for fancy information theoretical mathematics. However, they can be much faster.

Yet we noticed that there was no trace in the literature of a sensible integer compression scheme running on desktop processor able to decompress data at a rate of billions of integers per second. The best schemes, such as Stepanov et al.’s varint-G8IU report top speeds of 1.5 billion integers per second.

As your may expect, we eventually found out that it was entirely feasible to decoding billions of integers per second. We designed a new scheme that typically compress better than Stepanov et al.’s varint-G8IU or Zukowski et al.’s PFOR-Delta, sometimes quite a bit better, while being twice as fast over real data residing in RAM (we call it SIMD-BP128). That is, we cleanly exceed a speed of 2 billions integers per second on a regular desktop processor.

We posted our paper online together with our software. Note that our scheme is not patented whereas many other schemes are.

So, how did we do it? Some insights:

  • We find that it is generally faster if we compress and decompress the integers in relatively large blocks (more than 4 integers). A common strategy is bit packing. We found that bit packing could be much faster (about twice as fast) if we used vectorization (e.g., SSE2 instructions). Vectorization is the simple idea that your processor can process several values (say 4 integers) in one operation: for example, you can add 4 integers to 4 other integers with one instruction. Because bit unpacking is the key step in several algorithms, we can effectively double the decompression speed if we double the bit unpacking speed.
  • Most of the integer compression schemes rely on delta coding. That is, instead of storing the integers themselves, we store the difference between the integers. During decompression, we effectively need to compute the prefix sum. If you have taken calculus, think about it this way: we store the derivative and must compute the integral to recover the original function. We found out that this can use up quite a bit of time. And if you have doubled the speed of the rest of the processing, then it becomes a serious bottleneck. So we found that it is essential to also vectorize the computation of the prefix sum.

After all this work, it is my belief that, to be competitive, integer compression schemes need to fully exploit the vectorized instructions available in modern processors. That is, it is not enough to just write clean C or C++, you need to design vectorized algorithms from the ground up. However, it is not a matter of blindly coming up with a vectorized algorithm: getting a good speed and a good compression rate is a challenge.

Credit: This work was done with Leonid Boytsov. We have also benefited from the help of several people. I am grateful to O. Kaser for our early work on this problem.

Software: A fully tested open source implementation is available from github. As caveat, we used C++11 so that a C++11 compiler is required (e.g., GNU GCC 4.7).

Limitations: To be able to compare various alternatives, we have uncoupled some decoding steps so that at least two passes are necessary over the data. In some cases, better speed could be possible if the processing was merged into one pass. We are working on further optimizations.

Further reading: More database compression means more speed? Right?, Effective compression using frame-of-reference and delta coding

Update: The article has been accepted for publication in Software Practice & Experience.

Quick links: The paper, the software.

Will tablets kill PCs?

I have always been a fan of the personal computer. I worked all summer once to buy myself a cloned PC XT. I probably would not be a computer science researcher without the personal computer. It has shaped my life.

We may not remember, but the PC was a somewhat surprising disruption. You could explain the success of some early PCs (e.g., from Commodore) as game machines. But if this is all you had to go on, the success of the IBM PC was puzzling. It was not a very good machine to play games: it had minimal video and sound capabilities. It worked as a word processor, but it was awkward. And yet it soon became your PC, just like you had your car and your phone. It brought freedom. You got software on it to print birthday cards. You wrote your first novel on it. You could run your own business with a PC.

People have been predicting the death of the PC for decades now. For example, several corporations predicted that thin clients would replace the PC in business. That is, businesses would go back to having a few large well-maintained servers and people would use cheap, interchangeable, devices called thin clients to connect to these servers. The thin clients would require no software-related maintenance.

Unfortunately, the PCs kept getting cheaper. So the savings from using thin clients instead of PCs were not worth it. And, unlike a thin client, the PC could keep working when the servers were down. Moreover, the CEO has a PC at home: he does not want to switch to a thin client at the office.

More recently, traditional PCs were almost disrupted by netbooks. Though cheaper and smaller, they were still PCs. And they failed to gain enough traction to displace the PCs.

Something funny was happening in another market however. Phones were getting more powerful. The iPhone is just as powerful than a PC was ten years ago. Importantly, this meant that a whole set of mobile technologies were getting cheap and widely available: tiny cameras, super small CPUs, and so on.

So this made the tablet, as envisioned by the scifi authors for decades, a possibility. Apple was the first one to market it successfully as the iPad. Right now, we can buy a Google Nexus 7 tablet for $200. It has a 4-core 1.3GHz processor, 1GB RAM and many things that most PCs did not have even 5 years ago. Oh! Did I mention that it is $200?

So, I think that this time around, tablets will kill PCs. To be precise, I make the following prediction:

In some quarter of 2015, the unit sales of tablets will be at least twice the unit sales of traditional PCs, in the USA.

Greg Linden calls this prediction incautious because experts predict that, at best, the tablet sales will match PC sales by 2015. Indeed, for my prediction to come true, it is not enough for the tablet market to grow, people must stop renewing their PCs and come to rely on their tablets. If my prediction comes true, the PC industry will have begun a slow march toward irrelevance.

I even put my money where my mouth is: I bet $100 against Greg that I am right. The loser gets to hand over $100 to a charity chosen by the winner.

I should make it clear that I am not silly enough to think that I can meaningfully predict the future. But I also think that the analysts have too much confidence in their own predictions. Consider this beautiful quote for example (source):

Another software technology will come along and kill off the web, just as it killed news, gopher, et al. And that judgment day will arrive very soon — in the next two to three years, not 25 years from now. (George F. Colony, Forrester Research CEO, 2000)

As far as tablets are concerned, tech. people cannot imagine tablets replacing traditional PCs. Yet I see regular folks like my father forgetting all about PCs and using exclusively his iPad. I have seen people showing up at meetings with tablets for at least a year now… some of them just regular people who are not trying to look good. They genuinely get a lot out of their tablets. They have been frustrated with their laptops for too long. A lot of these people are decision makers.

As others have remarked, not everything is great about the tablet. You can write your first novel on it, but I would urge you to get a physical keyboard connected to it (and then it is no longer a real tablet). But PC technology has stagnated. All the faults PCs had 5 years ago are still with us: slow boot sequences, viruses, confusing configurations… and PCs have gotten less and less hackable.

I admit, the desktop computer is a good match for our office jobs: a desk with a desktop PC on it. It goes together. I don’t know how companies and governments can reduce their stock of PCs. But they sure can spend less money renewing their PCs. They will need to if they keep buying more tablets.

I admit that I am not quite sure how students will get by without a PC, but they won’t be using their PCs to watch videos or read in 2015. PCs will start to look like this old typewriter you have in the basement.

I also think that tablets are a much better match for retirees. PCs as sold by Dell are currently a mess. They are confusing and hard to maintain. Cheap tablets are a much better match.

Conclusion: I don’t know whether tablets will kill PCs. But if they do, this will be a big deal and I want to be able to say “I told you so”.

Organizations would not pass the Turing test

In Computer Science, we often informally judge intelligence by using the Turing test. The Turing test is quite simple: if you can convince an observer that you are a human beings, then you are probably at least as smart as a human being.

Yet no organization could ever convince you that it is a real person. Corporations and governments make you sign forms. They are rude, impersonal. They typically react slowly. They only apologize after days of consultations with lawyers. No sane human being behaves this way.

Have you ever tried to have a social exchange with a government or a corporation? No matter how nice they try to be (“your call is important to us”) they fail to convince. In fact, I would rather deal with software constructs than organizations. They are often much closer to passing the Turing test.

Why is it then, that so many (both from the left and the right) want to grant more power to these organizations, to rely more fully on them?

Update: John Regeh contributed the following quote:

A crowd is not the sum of the individuals who compose it. Rather it is a species of animal, without language or real consciousness, born when they gather, dying when they depart. (Gene Wolfe)

Your programming language does not know about elementary mathematics

In Mathematics, we typically require equality to form equivalence classes. That is, it should be reflexive: A should be equal to A. Moreover, it should be symmetric: if A is equal to B, then B should be equal to A. Finally, it should be transitive: if A is equal to B, and B is equal to C, then A must be equal to C.

These conditions appear to make perfect sense. Yet almost all programming languages fail to enforce reflexivity, symmetry and transitivity.

  • Languages such as XPath fail to be transitive. Indeed, consider the following numbers

    x = 10000000000000001;
    y = 10000000000000000.0;
    z = 10000000000000000;

    Then x = y and y = z are true, but x = z is false. The problem with XPath is that it automatically casts integer-to-float comparisons to float-to-float comparisons.

    But even if you restrict yourself to integer types, XPath still fails to provide transitivity: (1,2) is equal to (2,3) which is equal to (3,4), yet (1,2) fails to be equal to (3,4).

  • If you try the same test with the values x, y, z in JavaScript, you will find that transitivity is preserved because… x = z. That’s because JavaScript has no integer type. JavaScript uses finite-precision floating point numbers to represent all numbers.

    However, floating-point numbers have problems of their own. For one thing, they are not reflexive. That is, there is a value such as x = x evaluates to false. (The NaN marker.) This is only one of several challenges that floating point numbers represent.

Conclusion. When working out a theory, it is easy to come up with simple axioms. People often think that software is just a representation of mathematics that can be whatever we want it to be. Thus, software should obey intuitive and simple axioms. However, what we get, instead, are organically grown compromises. There is sometimes little difference between studying software and studying nature. Both can be surprising.

Update: John Regehr pointed me to comparisons in PHP. It is slightly insane trying to figure out what is equal to what.

Update 2: Jeff Erickson reminded me of a very funny presentation by Gary Bernhardt.

To improve your intellectual productivity

We would all like to be smarter, to produce better software, better research papers or better art. It is not difficult to see that, by just about any metric, productivity amongst human beings follow a power law. For example, 80% of all research papers are written by 20% of the researchers. (I made this one up, but the actual numbers are similar: see Lotka’s law.)

However, it is a difficult topic to study because the very idea of measuring intellectual productivity is, at best, controversial.

Yet we can’t help but ask: what does determine your intellectual productivity? Here are some tentative answers:

  • Luck. A lot of intellectual work is based on trial and error. A couple of early breakthroughs, due to pure chance, can motivate you and inspire you to work for years. Repeated failures can discourage you and push to abandon leading edge work. Luck alone can explain the power law in the distribution of intellectual productivity.
  • Persistence. Some people are pig headed. I sure am. They will keep working even when things are not working. This is closely related to the willingness to take chances. Most great intellectuals dive in and take risks.
  • People. Some people drain you without helping you produce. Other people energize you and force you to improve your game. Even though I am an introvert and sometimes an insular person, I have not found a better way to boost my productivity than finding exactly the right people at the right time. Sometimes I will have worked for months on a problem just to have several breakthroughs because I started working with someone smart. The quality of my work can also be strongly affected by the people I work with: people with higher standards than mine tend to push me up, people with lower standards tend to make me more lenient. But it is not just a matter of meeting smart people. I do not think I would get a lot out of meetings with Einstein. I am not interested in Physics right now and I doubt Einstein would share my interests. You need to meet people who have truly compatible goals and interests. Some people have an innate ability to connect with others, to find the right collaborators. Others spend a lot of time meeting people and are therefore more likely to find good matches.Though I have been extremely lucky in this respect for the last few years, collaboration is not and will never be my strong point.
  • Method. Some people pay attention to what works and to what does not work. They experiment. They read about how great people worked. They learn meta-strategies like divide-and-conquer. They constantly tune their approach to intellectual productivity. Some people figure out that working seriously a few hours every day is better than doing pseudo-work all day. I find that a common source of low productivity is excessive pondering: you cannot be productive if you fail to deliver in time. Other people just stumble on the right approach early on. For example, ever since I was a teenager, I have collected ideas in a notebook. This means that I am never out of crazy (and usually bad) ideas.
  • Focus. Intellectual work is boring in a way that manual labor is not. Intellectual work can become frankly alienating. Some people seem to never get bored. They can do narrow mathematics for days and days without ever getting bored. They are satisfied by the work itself. I have weak boredom resilience myself: I constantly question the relevance of my work. While it sounds good, it can actually be quite bad as you can be far more productive if you just keep your head down and push forward. I also see a lot of people who willingly engage in low productivity activities. Then they wonder why they aren’t being productive! A lot of work is actually busy work: you feel as if you were working, but you aren’t producing lasting value.
  • Fire. You have to care to have some intellectual productivity. You must feel bad when nothing was achieved. And I mean really bad. Some people are only driven by fear of losing their financial support. Others cannot stand not producing anything intellectually. They do not wait for permission nor do they plan to retire. These people, those who feel an urge to produce, will naturally be much more productive.

Does time fix all?

As an undergraduate, finding useful references was painful. What the librarians had come up with were terrible time-consuming systems. It took an outsider (Berners-Lee) to invent the Web. Even so, the librarians were slow to adopt the Web and you could often see them warn students against using the Web as part of their research. Some of us ignored them and posted our papers online, or searched for papers online. Many, many years later, we are still a crazy minority but a new generation of librarians has finally adopted the Web.

What do you conclude from this story?

Whenever you point to a difficult systemic problem (e.g., it is time consuming to find references), someone will reply that “time fixes everything”. A more sophisticated way to express this belief is to say that systems are self-correcting.

Yet few systems are self-correcting. We simply bear the cost of the mistakes and inefficiencies. If some academic discipline fails to make us better off, we barely notice and we keep paying for it. What we see as corrections are often disruptions brought by people who worked from outside the supposedly self-correcting system.

Far from self-correcting, the system resists changes. For example, lectures are a terrible way to teach, yet they have been around for centuries and we are likely to lecture for at least decades, if not centuries. They are provably not cost-effective from a learning perspective. But it will take strong outside forces to get any change.

Maybe librarians would have eventually invented the Web. I doubt it, but given enough time everything is possible. Yet “in the long run we are all dead.” (Keynes) The argument that if enough time passes, the problem will be solved mostly makes sense if you plan to live forever. Neither you nor our civilization will be around forever.

We need better science and technological innovation today. It is not ok to say that if we wait another generation, we will find out how to control the climate and generate free energy. Our survival as a civilization depends on our ability to remain innovative today.

The stock market has been flat for the last ten years. It is not at all automatic that new prosperity will emerge through new innovations and inventions. It may very well not happen in my lifetime. I’m a techno-optimist, so I believe we will out-innovate our problems, but I don’t believe that the system will do it. Crazy people will have to act outside of the norm. We won’t have Internet-enabled brain implants if we just wait long enough: some Ph.D. student needs to sign waivers and make his mother cry so we have any chance to know how it feels to have the Internet in our brain.

Free-market advocates believe that free markets are self-correcting. That is a more believable notion except for the fact that there is no such thing as a free market in the real world.

By the way: things often become worse over time. My body is living proof: I’m weaker than when I was in my 20s.

Conclusion: People too often believe that the systems they are in are self-correcting. Yet corrections, when they happen, are often actually disruptions brought forth by outsiders. Trust that systems, when left alone, will do the right thing, is overly optimistic.

We should celebrate outsiders and protect them from the wrath of the insiders.

On feeding your CPU with data

Can you guess the speed difference between these two lines of code? The first line of code does N additions:

for (int i=0; i<N;i++) sum+=arr[i];

The second line of code does N/16 additions:

for (int i=0; i<N;i+=16) sum+=arr[i];

A naive programmer might expect the second option to be 16 times faster. The actual answer is much more complicated and worth further study. I have implemented a simple test.

I used the GNU GCC 4.5 compiler with the -O2 optimization flag. We can compute the complete sum faster using different compiler flags (-O3 -funroll-loops) and I present these results in a separate column (sum-unroll). In this last version, the compiler makes aggressive use of SSE instructions to vectorize the problem.

  N   sum sum-unroll 1/16
20K     1100     6700     20,000
400K     1000     3700     5100
500M     2100     3900     4200

The speeds are expressed in millions of integers per second.

On tiny arrays, most of the data resides close to the CPU. Hence, computations are essentially CPU bound: doing fewer computations means more speed.

The story with large arrays is different. There, skipping almost all of the data (15 integers out of 16) only buys you twice the speed! Moreover, once we take into account the vectorized version that the compiler produced (sum-unroll), the difference becomes almost insignificant!

My source code is available as usual.

Conclusion: We are in the big data era. Maybe ironically, I sometimes get the impression that our processors are drinking data out of a straw. Whereas a speed of 20,000 million integers per second is possible when the data is cached, I barely surpass 4000 million integers per second when reading from RAM.

Source: I stole the problem from Elazar Leibovich who posted it privately on Google+.