Zed A. Shaw—author of several books on Ruby and Python—came up with an interesting criticism of Computer Science. He makes some good points:

Computer Science is a pointless discipline with no culture. (…) They rarely teach deep philosophy and instead would rather either teach you what some business down the street wants, or teach you their favorite pet language like LISP. (…) Another way to explain the shallowness of Computer Science is that it’s the only discipline that eschews paradox. Even mathematics has reams of unanswered questions and potential paradox in its core philosophy. (…) There’s an envelope of knowledge so vast in most other disciplines that just when you think you’ve learned it all you find something else you never knew. This is what makes them interesting.

Oh! I think there are many deep and exciting questions in Computer Science. (And not just whether P is equal to NP.) And do Sociology, Economics and History have more depth? But I agree that Computer Science is too often utilitarian. Some like to pretend that by catering to the perceived needs of industry, graduates will get better jobs. Unfortunately, too often, the students have to unlearn their so-called “practical knowledge” once they leave the campus. The honest truth: you don’t need three or four years of college to do great in the software industry.

Maybe more time should be spent on the deep questions. Here are a few discussion points that come to mind :

  • What is “meaning” and how can computation capture or codify it? What does it say about our brain? Is our brain a Turing machine?
  • Why are some programmers ten times more productive than others?
  • Can computers extend our intelligence? How intelligent can we become?

I like to sort things. If you should learn one thing about Computer Science is that sorting is fast and useful.

Here’s a little example. You want to check quickly whether an integer belongs to a set. Maybe you want to determine whether a userID is valid. The solutions:

I wrote a Java benchmark to compare the three solutions:

Binary search over a sorted array is a only 10% slower than the HashSet. Yet, the sorted array uses half the memory. Hence, using a sorted array is the clear winner for this problem.

If you think that’s a little bit silly, consider that column-oriented DBMSes like Vertica use binary search over sorted columns as an indexing technique.

Funding agencies in Canada seek to emulate American funding agencies by promoting excellence. What this means in concrete terms is that few professors get most of the resources whereas the bulk of University professors are left with a pitance or nothing. The intuition behind this more competitive approach is that we must catch up with the American efficiency. We must reward the most productive researchers and stop wasting money with the unproductive ones. (Disclaimer: I am happy with the research grants I got so far. Luckily, I have been judged to be productive…)

But how is the American system holding out against the competition? I looked at the countries publishing most research papers in Computer sciences, in 1998 and then in 2008.

1998:

  1. USA (14,294 papers)
  2. Japan (2,941 papers)
  3. United Kingdom (2,706 papers)

2008:

  1. USA (15,744 papers)
  2. China (14,680 papers)
  3. United Kingdom (5,703 papers)

It appears that whereas most countries have doubled or more their production of research papers, the USA has stood still. Because these numbers are for 2008, I conjecture that right now, in 2010, Chinese researchers already publish more than their American counterparts. Of course, American authors are more cited, but the gap between China and the USA is closing in this respect as well. Interestingly, Americans also appear to be losing their edge compared to the  United Kingdom, France, Germany and Canada.

While I do not have enough evidence to conclude, I conjecture that an all-or-nothing approach, so common in the USA, may not be so efficient after all. By leaving most University professors behind, you are wasting precious resources. And I fear that by emulating this model, Canada might be losing out too.

Source: SJR.

Should you attend the most selective school? Maybe not:

Students who attended more selective colleges do not earn more than other students who were accepted and rejected by comparable schools but attended less selective colleges. (Dale and Krueger, Estimating the payoff to attending a more selective college, 1999).

Should you present papers in the conference with the lowest acceptance rate? Looking at this plot, there seems to be little correlation between acceptance rate and impact factor:

acceptance rate versus impact factor

(Source: Sylvain Hallé’s blog.)

Conclusion: The best schools or the best conferences may not be those with low acceptance rates.

Science and business, so far, have been mostly model driven. That is, you collect a few data points, just enough to fit your model. Then you proceed from your model. However, things have changed:

old new
Manually take samples of the water in a nearby lake (4 times a year) Setup a wireless sensor in the lake (5000 samples a day)
Model an algorithm and test it once on expensive mainframe computer Build dozens of prototypes and test them on cheap laptops
Have an accountant prepare a business intelligence report, once a year See how the business is doing through your dynamic data warehouse

Hence, improving access to data is fast becoming a critical issue. In a thought-provoking post, Andre Vellino sketches the future of data Information Retrieval. Some key points:

  • Back in the early nineties, we had many electronic documents, but a comparatively poor infrastructure to share them. Then came the web and the search engines such as Google. Currently, we have many good data sets, but sharing and indexing them is painful. Clearly, we need to produce a better infrastructure for sharing data!
  • Research papers should reference data sets, by a unique identifier (such as a Digital Object Identifier), so that we can ask “What research relied on this data set?” or “Where can I find the data these authors have used?”

This is one instance where funding agencies should step in and encourage this work. It is not enough to encourage researchers to share their data. We need better tools too!

Powered by WordPress