I have argued that when seeking professional success, it is best to avoid zero-sum games (e.g., compete for one prestigious slot). It is more fun, less distracting and more productive to focus on non-zero-sum games. That is, you should try to grow the size of the pie, to create value for others out of thin air, instead of obsessing about your ranking.

I love to blog and to publish open source software. When I blog, I do not try to make it on a list of top bloggers… I try to write interesting and useful blog posts. When I publish open source software, my hope is that others will derive value from it… I am not competing to be recognized as one of the top open source programmers…

Of course, I do win some recognition and increased social status (and sometimes money) if people say that they like my blog, my software or my papers. Let me consider a specific example. With Leonid Boytsov and others, I have been working on fast integer compression techniques. Our paper (Decoding billions of integers per second through vectorization) and our corresponding software (1, 2, 3) has been used by brilliant authors who have won the best paper awards at the leading information retrieval conferences this year (ECIR 2014 and SIGIR 2014). Our own work did not win any award and the journal where it appeared has a mid-level ranking. Should I be worried? Of course not! I am as delighted as I could be… our work is proving useful!!! We are providing tangible values to others… My only concern right now is to figure out how to make our next work even more useful…

My motivation is primarily internal: I do the work I do because it is interesting and meaningful on its own. I feel that I help people, or contribute to Science. Sure, if I do good work, my social status or financial well-being might be helped, and that is great… I like money and modest accolades… but that is not what is getting me up in the morning. I am never going to win a Turing Award or a billion dollars, and that is quite fine with me…

I was chatting with a research fellow last week. He felt depressed that his work was “too simple” to warrant a slot at an exclusive conference like VLDB. His peers were encouraging him to “complexify” his work so that he could impress referees. I tried to argue that this was just wrong. He, quite rightly, argued back that it is how the game works… I guess you have to pay tribute to the “system”. But I am afraid that such tributes displace the good work that would otherwise happen.

Michael Hay pointed me to recent research that backs my intuition that external motivations can be distracting. Wrzesniewski et al. (2014) looked at the motivations of cadets and how well they succeeds.

Cadets may want to join a military academy for intrinsics reasons (to become a good soldier) or for extrinsic reasons (to get a good job). It is easy to predict that the soldiers with intrinsic motivations will outperform those with primarily extrinsic motivations. What is less obvious is that the cadets with mostly just intrinsic motivations will outperform those who have both kinds of motivations…

Following their entry into the Army, officers who entered West Point with stronger instrumentally based motives were less likely to be considered for early promotion and to stay in the military following their mandatory period of service, even if they also held internally based motives.

In other words, if you are pursing a career in science or software because it leads to a good and prestigious job (extrinsic motivation) and because you believe it is a meaningful and important activity (intrinsic motivation), you will do worse than if you have mostly just intrinsic motivation.

I believe that it is because external motivations (prestige, money) distract you from doing good work. It may lessen your intrinsic motivations.

My own mental abilities are greatly reduced if I have extrinsic motivations in mind. And there is a strong correlation between extrinsic motivations and zero-sum games (e.g., how well you are ranked).

Fiction writers used to have to submit their manuscripts to 6 or 7 big corporations. Only these corporations could seriously publish a book. Room on library stacks has always been scarce. Only a tiny number of authors could ever become independently wealthy under this system. Amazon.com is changing the game by getting rid of the scarcity. Hugh Howey put his books for sale on Amazon (receiving 70% of the sales instead of the 12.5% offered by publishing houses) and he made a killing. He is a millionnaire by now. He did not have to convince a highly selective editor to accept his books. He simply wrote some great fiction. He created value and was rewarded for his work. He did not have to displace other authors on the bookshelves.

I am sure that many people do not consider him to be a real author because he self-published. Our society is very driven by this fight for status, for elitism. I think it is probably responsible for a lot of unnecessary stress. People overwork, pollute and die younger than they should because of it. And it affects science as well.

There is a common view in computer science research that the only publications that matter are those appearing at selective conferences. Conferences in computer science are characterized by a low acceptance rate (top conferences reject 90% of all papers). The more selective the conference, the better. Ideally you must prove that you belong to the top 1%. A journal article or a workshop paper is “wasted effort” in this respect.

Does it follow that journal articles are inefficient or even wasteful? According to DBLP, Donald Knuth (a living legend) published 120 journal articles, and 12 conference papers. The Turing Award recipient Peter Naur has 25 journal articles and 7 conference papers in DBLP. (The Turing Award is the Nobel prize of computer science.) Robert E. Kahn, another Turing Award recipient, has 12 journal articles and 2 conference papers. Even if we assume that they participated in many conferences that are not indexed by DBLP, it is undeniable that some influential computer scientists like to publish in journals.

But maybe all of the important work appears first at conferences? According to Fortnow, leading conferences regularly refuse to accept such work:

(…) nearly half of the Gödel Prize winners (given to the best CS theory papers after they’ve appeared in journals) were initially rejected or didn’t appear at all in the top theoretical computer science conferences. (Fortnow, 2009)

To land a paper in a very selective conference, you still had to beat incredible odds… acceptance rates are routinely under 10%… surely this says something about your work? Maybe being accepted by the most selective conferences proves your worth. Maybe not:

The view that conference rejection rates are a good proxy for conference quality did not hold up to scrutiny (Freyne, 2010)

I believe that people like to tell themselves simple stories about how one should succeed. Many of these simple stories are based on half-truths. Just like how fiction authors believe that they must land a competitive book deal to be a writer whereas none of us care about any of that. This status ranking game that you play… is probably much less important than you make it out to be on the long run.

Franceschet wrote an interesting survey where he identified the 10 most prolific computer science authors, the 10 most cited authors and the most prestigious computer scientists (i.e., Turing Award recipients). There was no overlap between the 3 lists. It is worth taking time to reflect on this fact.

  • If you could find a way, somehow, to become the most prolific computer scientist in the world, you are not likely to figure in the 10 most cited authors or win a Turing Award.
  • If you could become the most cited computer scientist in the world, you may not receive a Turing Award.

It means that if you are an extremely successful computer scientist, you are still likely to be left out from someone’s top-10 list. Maybe it means that you should not worry about how you are ranked.

There are also significant differences on how the 3 types of authors published.

  • To get a Turing Award, your publications may be unimportant. Out of the 16 Turing Award recipients considered by Franceschet, two thirds had fewer than 50 papers on DBLP and five had less than 30 papers. Alan Kay published less than 20 papers according to DBLP.
  • “(…) high impact scholars publish significantly less than prolific ones, and more frequently in journals.”

So while telling the world about what you do matters… you have a lot of freedom about it.

What should a sane computer scientist do then? His main focus should be on producing lasting contributions to his field. He should then publish them where they are likely to be noticed.

If all you have ever done is fight for scarce spots at a selective venues, you have achieved nothing of importance. Really important work creates tangible value that is self-evident.

Note: To be fair, I have never achieved anything of importance and probably never will. But I am having a lot of fun.

Further reading: Mentoring Advice on “Conferences Versus Journals” for CSE Faculty by Kevin W. Bowyer and How Are the Mighty Fallen: Rejected Classic Articles by Leading Economists by Gans and Shepherd.

Nomad cards and keys
We all have mobile phones or tablets with lightning or micro-USB plugs, and we all have laptops with USB ports. Sure, you can easily find a USB-to-micro-USB cable, and your iPhone came with a USB-to-lightning cable… but who wants to be carrying around a cable?

A company called Nomad makes small connectors that are much easier to carry than a cable. They have key-like connectors (NomadKey) and card-like ones (NomadCard). They fit in your key chain or in your wallet.

They sent me a few review units (both lightning and micro-USB). I tested them out with many of my devices. They do the job well. My iPad complains that the connector is not approved, but it seems to be a harmless warning. (Nomad says that their connectors are certified so I cannot explain the warning.)

I tried carrying them around and using them at a local café. The NomadCard connectors work well but they do not feel as natural as the NomadKey connectors. The NomadKey is smaller and feels more durable, so I recommend it against the NomadCard.

I got them about a month ago and tested them only a few times. They do feel robust and I cannot see any tear or damage on them. This is somewhat surprising given how much I twisted them. I expect the NomadKey to last years.

Both type of connectors look sharp… much sharper than a cable. They are $30 on Amazon. If you spend a lot of time connecting your tablet or smartphone to a USB cable outside of your office, it is probably worth the price.

Disclosure: I got the review units for free.

There is a long tradition among computer scientists of counting the number of operations as a way to estimate the running cost of an algorithm. Many computer scientists and naive programmers believe that adding extra instructions to a piece of code necessarily slows this down.

This is just wrong. If the initial piece of code was dirt cheap, adding more instructions may come for free.

Let me give you an example. I work a lot with bitmaps. A bitmap, in my context, is just an array of bits (bitset) that we pack into machine-size words (e.g., 64-bit integers). We can use them to represent sets of integers: the integer i is in the set if the ith bit is set to true. The union between two such sets can be computed with a bitwise OR:

for(int k = 0; k < length; ++k) {
  output[k] = input1[k] | input2[k];
}

Another common operation on bitmaps is the computation of their cardinality: the number of bits set to true or 1. The compiler I am using (GCC) provides a useful function to compute the number of 1s in a 64-bit word (__builtin_popcountl). So we can compute both the union of the two bitmaps and the cardinality of the result in one pass:

int card = 0;
for(int k = 0; k < length; ++k) {
   output[k] = input1[k] | input2[k];
   card += __builtin_popcountl(output[k]);
}
return card;

This new loop looks a lot more expensive than the previous one. It does a lot more work!

I implemented both in C and benchmarked them on a recent PC (haswell, gcc 4.8). Here are the number of million of iterations per second of the two loops:

array size (one of 3 arrays)  bitwise-or   bitwise-or and cardinality 
8192 kB 550 550
4096 kB 655 655
1024 kB 1,400 1,310
8 kB 1,900 1,500

You read this right: the two loops have identical cost for moderately big arrays!

What happens is that the __builtin_popcountl function is compiled down to a single instruction: popcnt, available on PC processors produced after 2008. This instruction is very efficient: it has a throughput of one instruction per CPU clock cycle. Moreover, recall that modern processors are superscalar: they can execute more than one operation at a time. To make things more complicated, the processor is also often starved for data as it needs to load data from RAM. In this case, the processor is able to compute of the cardinality for free on megabytes of data. But even when all the data mostly fits in cache, the penalty for doing extra work can be small. The computation of the bitwise OR is so cheap that it leaves much of the silicon on your processor free to do other work at the same time if needed.

Note that even for short arrays, we probably exaggerate the benefit of the short loop: in these particular tests, we loop many times over the same memory sections so that it remains in CPU cache whereas in an actual application we would suffer cache misses.

To recap, if you have to compute the bitwise OR between two arrays, you can do extra work (such as computing the number of 1s in the result) for free!

My observation will not surprise an expert programmer… However, many others are unaware that their mental model of computation (essentially a simple sequential Turing machine) is wrong. It is of little consequence… until they try to make sense of performance results… Sadly, the naive model of software performance is widespread in academia…

Update: To make things more complicated, Nathan Kurz showed that we can rewrite the short loop using AVX instructions, thus producing much faster code on short arrays that remain in cache.

  • Manning sent me the second edition of their Hello World: Computer Programming for Kids and Other Beginners. At less than $25 on Amazon, this 450-page color book is a great deal. The first edition got raving reviews. It is a great book for any teenager who wants to learn to program.
  • Princeton University Press sent me a copy of Math Bytes by Tim Chartier (for as little as $13 on Amazon). It is a short, beautiful and fun book for nerds. The scope of topics is amazing. For example, what is the math behind the Angry Bird game? How would you paint a picture of Obama using M&Ms? Probably a good summer book.
Next Page »

Powered by WordPress