Fernando Pérez gave a talk at Pycon 2014 with a brilliant slide:

The ideals reality of science:

  • The pursuit of verifiable answers highly cited papers for your c.v.
  • The validation of our results by reproduction convincing referees who did not see your code or data
  • An altruistic, collective enterprise A race to outrun your colleagues in front of the giant bear of grant funding

Credit: Bill Tozier for the pointer.

We all rely daily on free and open source software, whether we know it or not. The entire Internet is held together by open source software. The cheap router that powers you Wifi network at home uses the Linux kernel. Your android phone is based on the Linux kernel. Google servers run Linux. In 2014, almost everyone is a Linux user.

For most people, the financial value of this software is an abstract concept. I think that most people assume that open source software must be cheap.

On the contrary, producing quality open source software is tremendously expensive. And the financial investment grows every year.

How much did it cost to write the millions of lines of the Linux kernel? García-García and de Magdaleno estimated the cost of the Linux kernel, as of 2010, to 1.2 billion euros. That is how much it would cost of any one company to redo the Linux kernel from scratch.

You might assume that programmers working on the Linux kernel are hopeless nerds who live in their parent’s basement. In fact, most of them are highly qualified engineers earning 6-figure salaries or better. So the financial estimate represents real money. It is not a virtual cost.

Of course, the Linux kernel is a tiny fraction of all the open source software we rely upon. Most open source developers will never contribute to the Linux kernel: it is reserved for a small elite. According to the Linux foundation, the cost of building a standard Linux distribution (in 2008) would have been over $10 billion.

So what is the value of all open source software beyond Linux?

It helps to realize that software is a huge business. In Europe, companies and governments spend over 200 billion euros a year building software. To put this in perspective, the movie industry in the US generates about 10 billion dollars in revenues. In the United States, 1 out of every 200 workers is a software engineer. A very sizeable fraction of all “engineering” today is in software.

Of course, not all of the software is open source. Still, Daffara estimates the financial value of open source software, for Europe alone, to over 100 billion euros a year.

So why don’t we have more open source drug designs, movie content, textbooks, and so on?

The common argument is that nobody will be willing to invest, in say, a new textbook, a new drug or a new show if anyone can copy and redistribute it for free—the investment is too large.

But I think that the real difference is cultural. In the software world, entire businesses grew surrounded by open source software. They learned to thrive with and through open source software. Companies that entirely reject open source are at a competitive disadvantage. The same happened in the fashion industry. Designers assume that other people will copy them. In fact, designers hope others will copy them.

Other industries, like the pharmaceutical or education industry, have internalized the patent and copyright systems. That is why college students have to pay over $100 for a typical textbook whereas they can get an operating system that costed billions to make for free.

I think that if we had had a world where it is fair game to copy and distribute a textbook for free, we would still have textbooks. I think they would still be excellent. I also think that textbook authors would get paid, just like the programmers do.

Would the overall result be better? I do not know but it is fascinating to imagine what such a parallel universe might look like.

Credit: Thanks to Christopher Smith for useful pointers.

The new C++ standard introduced hash functions and hash tables in the language (as “unordered maps”).

As every good programmer should know, hash tables only work well if collisions between keys are rare. That is, if you have two distinct keys k1 and k2, you want their hash values h(k1) and h(k2) to differ most of the time.

The C++ standard does not tell us how the keys are hashed but it gives us two rules:

  • The value returned by h(k) shall depend only on the argument k.
  • For two different values k1 and k2, the probability that h(k1) and h(k2) “compare equal” (sic) should be very small.

The first rule says that h(k) must be deterministic. This is in contrast with languages like Java where the hash value can depend on a random number if you want (as long as the value remains the same through throughout the execution of a given program).

It is a reasonable rule. It means that if you are iterating through the keys of an “unordered set”, you will always visit the keys in the same order… no matter how many times you run your program.

It also means, unfortunately, that if you find two values such that h(k1) and h(k2), then they will always be equal, for every program and every execution of said programs.

The second rule is less reasonable. We have that h(k1) and h(k2) are constant values that are always the same. There is no random model involved. Yet, somehow, we want that the probability that they will be the same be low.

I am guessing that they mean that if you pick k1 and k2 randomly, the probability that they will hash to the same value is low, but I am not sure. If it is what they mean, then it is a very weak requirement: a vendor could simply hash strings down to their first character. That is a terrible hash function!

I am under the impression that the next revision of the C++ standard will fix this issue by following in Java’s footstep and allow hash functions to vary from one run of a program to another. That is, C++ will embrace random hashing. This will help us build safer software.

I used to think that knowledge was strongly transferable. I believed that learning physics could make you a better mechanic. I believed that learning mathematics could make you a better physicist or computer scientist.

After learning a lot of physics and mathematics, I realized that I still found it difficult to write good software, learn about mechanics, design circuits, or understand economics.

This has fundamentally affected my worldview. For example, I no longer take for granted that studying theory can be useful in practice. For me, this is a radical change from my twenties when I believed that computer science wasn’t worth studying since it was just “applied mathematics”.

Since I no longer believe that knowledge is strongly transferable, I have become critical of schooling in general. I used to think that you got smarter with each new college class you took. So I took a lot of them. I took about 30% more classes necessary to graduate in college. In high school, I took extra mathematics classes outside of the regular schedule (by choice, I don’t even think my parents knew). Thus I know a lot about useless topics.

Some of these classes have turned out to be useful. But less because of the knowledge that they have given me, and more because they have built up my confidence.

For example, all my training in abstract algebra helps me a bit when I want to study random hashing. How much did it help? Well, at some point, I realized that I needed to brush up on Galois fields. I picked up an undergraduate text, read one chapter, and I was good to go. That is, I knew that I could learn quickly about Galois fields if needed.

However, taking dozen of classes is an expensive way to build up your confidence. A better way would be for you to learn a few difficult things on your own. For example, I hardly know anything about electronics and I never took any class in it, but I know that I could become good at it because I have mastered similar skills.

In any case, because I believe that knowledge is only weakly transferable, I favour learning practical skills that are immediately useful. If you want to become a great software engineer, learn to program better… don’t study latin.

Given a chance, many parents would cram their kids’ schedule with as many academic classes as possible. The hidden assumption is that kids get “smarter” as they take more classes. But this is almost surely wrong. Of course, there are clear benefits to taking swimming lessons (you learn to swim!) or karate lessons (you learn to fight!), but taking an extra mathematics class might not help as much as you think.

I am old enough that, as a kid, I did not have access to a calculator. My mother, a teacher, had an electronic calculator that you had to plug in the wall. She would use it to crunch in the grades at the end of each term. It felt fantastically modern.

In any case, one night, when my mother was away, I decided to slide into my mother’s bedroom, plug the calculator and use it to do my math. homework. I was in third grade. This was, let us be clear, cheating.

What happened? I got a terrible grade. Turns out that using a calculator does not guarantee the correct results.

I lived through a similar experience when I got to college. After learning algebra by hand for years, I soon realized that computers could do algebra too! I had discovered computer algebra systems. Wow! So mathematics would be easy from now on. But as I would soon learn, these systems allowed evil professors to ask even harder problems…

In 2009, for the proof of a result, I had a computer algebra system (in this case Maxima) run a long script to check many possible cases. This intensive search would have taken days or weeks to do by hand. Yet I was able to do this work with the help of my computer precisely because I have some degree of mathematical sophistication… A naive user would not have known what to ask of the computer. The better at mathematics you are, the more you can get out of computer algebra systems…

The Guardian recently reported that some expect that within 25 years, we will get instant language translation. Contrast this with the fact that Richard Stallman, one of the world’s leading hacker and free software advocate, is fluent enough in French to give talks. In fact, I find that Stallman sounds more dignified and polite in French, but maybe it is just me.

Is learning a foreign language a waste of time given that computers can translate?

Let me consider programming languages as an analogy. Most programming languages today are Turing complete. And all Turing complete programming languages are mathematically equivalent. So you only ever need to learn one programming language… they are all equivalent! But this is abstract nonsense as any programmer will tell you. And it is just as much nonsense to claim that computer translation makes learning a foreign language obsolete.

As an aside, reputed AI experts in the 70s and 80s were discouraging people from learning to program. They expected that very high level languages would make programmers obsolete by 2000 or 2010. I do not think they could have been more wrong. Who benefited most from software technology? People who learned to program.

Or consider spelling. Should I give up teaching my boys to spell flawlessly given that we all have spelling autocorrection? I would say that spelling and grammar checkers have, if anything, raised the bar. Kids today need to master more thoroughly the grammar, and they need to spend more time studying the nuances of spelling.

The problem is the same whether we use calculators, computer algebra systems and language translation… You need to be smart to use technology effectively. As we acquire more and more technology, we need to get smarter.

It is true that as technology evolves, we should learn things differently. For example, there is little need to train students in very technical algebra. Much of the standard curriculum today is antiquated. Last night, I was doing a word problem with one of my boys and he got discouraged because he had to compute 70% of 280. I told him that I would never expect him to do this by hand… what mattered to me is that he was smart enough to realize that the answer was 0.7 times 280. Similarly, when learning a foreign language, you should probably focus more on how the languages differ, than on vocabulary and spelling. But one thing is certain: with increased technology comes higher intellectual requirements.

Next Page »

Powered by WordPress