Patience, persistence, perseverance

In gardening—as in research—there are 3 fundamental values one must cultivate.

  • Patience. Quick results are possible without much effort. However, it takes a minimum of 3 years for a new garden to reach its maturity. The first year you set the ground, the second year you build-up, and the last year you reap your best results.
  • Persistence. You have to continually work at your goals. You do not write great articles or great books the day before the deadline. You must watch over your plants every other day. If you go a week without visiting your garden, many of your fragile plants may die while the sturdy ones may grow out of control.
  • Perseverance. You will fail. No matter what. You may have to change your plans drastically, but you should never give up. So, make sure you are having fun.

Why academia is so conservative: academic freedom

To anyone who worked in industry, academia feels like it is standing still. For example, many Computer Science programs still teach programming as it was done 10 years ago, if you are lucky. Most programs undergo only cosmetic changes over time.

I have the following explanation:

  1. Most people are out of touch. This is true everywhere. I remember when Java first came out. Years after Java had mostly caught up with C++ in speed, people still complained that it was slow. I still hear people say that Java is slow. Keeping up with the latest facts is hard. People prefer to rehash the same, again and again. The human brain prefers to avoid change.
  2. It takes a long time to build new academic material. Older professors have strong incentives to teach and research the same topics again and again. A similar phenomenon occurs in all large organisations, but professors have academic freedom.
  3. Finally, leaving people behind is not an option in academia. Even in large companies, you can leave some people aside. In academia, even one individual who is left behind can create a lot of trouble for others. This is also true in large companies, but most employees do not have as much freedom as a professor: they cannot resist change as strongly as a professor can.

(We could test my explanation by determining whether there is a correlation between the level of academic freedom and the level of conservatism.)

I find it very interesting that increased individual freedom brings about more conservatism.

The art of paper review

I do not claim to be an expert at reviewing academic papers, but I have done my share of work. Here is my recipe:

  • Reproducibility, (self-)plagiarism and presentation are easy to evaluate and I usually spend quite a bit of time on these issues. Science should be reproducible. (Panos Ipeirotis seems to agree with me.) Plagiarism can be surprisingly hard to detect, but it is also amazingly frequent, so I usually search for a few word cooccurrences in Google. Presentation is, on average, quite poor. Figures are often ugly. Poor English is frequent.
  • The relevance and strength of the paper is something I usually have an opinion about. Alas, it is easy to be wrong about the importance of a paper, so I usually do not have much to say unless I have directly worked on the same problems for a couple of years.
  • Correctness is hard to check especially if I am not a domain expert. I usually pick up on secondary details. Are the results credible? Do the authors mention some special cases that should have arisen in their analysis or experiments? I must unfortunately admit that I usually cannot be sure that the papers I have reviewed are correct. At best, I can voice an opinion about their credibility.

The one thing I learned about gardening this year

Since I moved to the Montreal suburbs, I have become an active gardener. I used to apply generous amounts of fertilizers. I also got into serious trouble. My lawn died. Not because I burned it, but because I got a bad case of grubs. Several of my perennials died also or fail to come back healthy after the winter. I learned the hard way that most perennials are better off without any (chemical) fertilizer.

It turns out that in most cases, chemical fertilizers are overrated. As a rule, you should not use them.

There is only one platform: the Web

In academic circles, there are intense platform wars. We have a talk next week from the head librarian about offering Google-like services, but on an academic platform. I won’t go to the talk because I no longer care about library portals. Regarding courseware, there are wars between Moodle and other hybrids. As a professor, all that I care about is to have some tools to post content online, conveniently. The last thing I want to do is live within a monolithic proprietary platform.

Everyone is fighting for his platform. What a tragic mistake! In 2008, only one platform matters. No, it is not facebook, nor Windows Vista, nor Moodle, nor Dot.NET. It is the Web. In some sense, the Web is the platform to rule them all.

Something deeper is going on as well. In Death of the software application, I argued that software as a discrete quantity was finished. How many software applications do you run right now? Nobody cares. Similarly, the term platform is obselete: people hook up the software they need dynamically.

If someone asks you to pick a specific platform for a project, refuse to do so. Pick software the way a taylor will pick fabric: a little bit here, a little bit there. Also, if you are building a Web platform, please stop right there. Turn around and rethink your objectives.

Graph diameter versus maximum node degree

Since I have had amazing luck in the past with questions to the readers of this blog, I have another question.

The diameter of a graph is the longest distance between any two nodes. The degree of a node is the number of edges or links from and to this node.

Intuitively, the higher the node degrees, the denser the graph. If you have n nodes and the maximal degree of the nodes is n-1, then the graph diameter is 1. If you have lesser maximal degrees, then you can get an infinite diameter by producing a disconnected graph.

An interesting question (to me) is:

Given a maximal node degree, and a number of nodes n, what is the smallest possible diameter?

I am sure this is textbook material, but I could not find the answer quickly. Using a hyper-rectangle, I am able to construct a graph having n nodes and log n diameter. Simply start with a 4-node rectangular graph: you have 4 nodes and a diameter of 2. Move to a 8-node cubic graph: you have 8 nodes and a diameter of 3. Generalizing this construction, you have 2d nodes and a diameter of d. Is this the best you can do?

Anyhow. Why do I care about the answer? Because I keep reading that hubs are necessary in graphs to ensure that we have a small diameter. I am trying to quantify this statement. There are about 233 human beings. By my construction above, if everyone knows 33 people, then it is possible to get a diameter of 33. It seems like a relatively large diameter.

An obvious technique to shrink the diameter without using hubs, is to increase the maximal node degree. I am wondering by how much I need to increase the maximal node degree so that I can a 6 degree of separation between any two human beings.

(Yes, I know that social networks are not homogeneous. But stay with me. Assume they were.)

Seeking an efficient algorithm to group identical values

In the past, I have had luck with my requests for help, so here is another one.

Suppose you have a large array made of a large number of distinct values ({A,B,A,B,A,C,C}) and you want to group the identical values like so {A,A,A,B,B,C,C} or like so {C,C, A,A,A,B,B}. That is, you do not care about the order, you just want identical values to be clustered. How do you do it?

  • You could sort the array assuming that there is an ordering between the objects. There are highly efficient external-memory sorting routines. They mostly rely on sequential IO and are amazingly fast.
  • You can build a hash table. Because it is a linear time operation, for very large arrays, it should be faster than sorting, in theory. However, the catch is that external-memory hash tables are not very efficient because they rely on random IO and are prone to cache misses. Remember kids that 100 n > n log n despite what your math. teacher taught you.
  • We can mix hashing and sorting. Scan the array, and randomly hash each value into one of L bins. You know that if the value x appears in bin i, then all values x are in the same bin. So, you can simply sort each bin and concatenate the bins in O(n log n/L) time, assuming your hashing is good enough.
  • One last possible trick might be to adapt fast duplicate detection algorithms such as the Teuhola-Wegner algorithm: J. Teuhola, L. Wegner, Minimal space, average linear time duplicate deletion, Communications of the ACM, 1991.

So, what do you think?

Week-old cappucinos taste bad

I am not a neat guy. I tend to drop coffee cups on my desk and I never look back. What I learned this year is that if you leave a cup of cappucino for a week on your desk, then drink a shot of it by mistake while staring at your computer screen, you are left with a terrible taste in your mouth for the rest of the day. However, day-old cappucinos are ok.

There is a lesson here kids: do not leave projects without attention for weeks at a time. You will get a bad taste once you go back to it later.

Why you get annoying as you grow older

As a young Ph.D. student, I thought that my thesis supervisors were annoying. Looking back, ten years later, I think they were not nearly harsh enough.

  • I used to think that keeping detailed logs of what I have done was pedantic. As a young researcher or developer, I would just quickly jot down my ideas without looking back. I have since learned that this argument that seems so obvious to you now, may escape you a year later. You have to write a lot. All the time. As a side benefit, if you try to explain carefully what you just did, you often find out flaws faster. You also think better if you slow down.
  • The little things do matter. I used to believe that science was about the big issues. I could not be bothered about small details. I was so wrong! Science is about being anal retentive over little details. This off-by-one result may hide a significant result, or may confuse an eventual reader. You have to try hard to get everything right as early as possible.
  • Communication is 80% of the work. This may sound counterintuitive because most researchers only spend a small fraction of the time publishing or giving talks. But when they design experiments, or craft theorems, they are trying to make a point, to communicate an idea, to an imaginary peer. So, you have to design elegant experiments and theoretical results all the time. Hack all you want, but hack elegantly.

The truth will make you relevant

Scientists often cheat. Bad and famous scientists cheat. The cheating can be small or large: putting your name as an author on a paper that you barely read, omitting part of the an experiment, making up experimental results, claiming that you have a proof of a given result, making something look more complicated than it really is, and so on.

Cheating can serve you well. It may help you get a larger grant, a better job, and so on. However, all these gains are short term ones. For longer term goals, I believe cheating eventually makes you less relevant.

This idea came to me as I was reading a comment on this blog:

A scientist or mathematician may achieve relevance as a side-effect of aiming for rigour. (Peter Turney, somewhere on this blog)

Update: One of my colleague has written a book on scientific frauds (in French). Thanks to Sébastien Paquet for the link.

Job offer: education specialist

We are looking for someone to fill a permanent position as an education specialist (spécialiste en sciences de l’éducation). The job includes some research time. You must have a degree in education, or the equivalent. Some of our specialists have Ph.D.s. Some training in Computer Science would be great. The job location is Montreal and the language is French. If you are interested, do not get in touch with me, but send your resume:

Les personnes intéressées doivent faire parvenir leur curriculum vitae ainsi que leur(s) attestation(s) d’études avant 16 h 30, le 5 mai 2008 à la :
Direction des ressources humaines
À l’attention de madame Nathalie Camiré
Concours no. 0804-912
455, rue du Parvis
C.P. 4800, succ. Terminus
Québec (Québec)
G1K 9H5

Rigor or relevance: choose one

Back when I was a Mathematics undergraduate student at the University of Toronto, I was told by some of my peers that I was not a Mathematician but a problem solver. This was meant as a derogatory remark, but I thought it was a correct assessment. In short, I cared only about a given theorem if it allowed me to solve some interesting problems. I was not interested in Mathematics for its own sake. Rigor was not enough, I wanted relevance.

A given scientific or mathematical results has two properties: rigor and relevance. You usually can have one, or the other, but not both.

Engineers and technologists are good at determining relevance. They will discard quickly results that they do not need. The average software engineer is unable to prove that his program is correct. Even when rigor is important, such as when designing medical gear, the engineer is often not interested in proving the optimality of the techniques being used. By sacrificing some rigor, the engineer is able to innovate: if he had to prove every detail, he could never get work done.

Scientists make a business out of correctness. To ensure rigor and depth simultaneously, scientists stay close to the shore. Most scientists specialize in a narrow niche and take months to study what might be considered to be a minor point. This same minor point will get revisited by others. Their work tend to be very incremental. However, scientists are bad at being critical of the revelance of their own work. Indeed, if they did question their work too often, they may need to change topic too often which would reduce considerably their productivity. This explains why we end up with fields such as String theory or classical AI. Notice that you cannot measure relevance by the number citations from people in your field. In fact, the relevance of one’s research is usually never formally measured.

You would think that being critical would be a good thing in science, no? Alas, no. As an experiment, try to go to the next conference in your field and ask your peers whether what you are doing is relevant. It is a good recipe to become unpopular.

References:

Aubrey D.N.J. de Grey, Curiosity Is Addictive, and This Is Not an Entirely Good Thing, Rejuvenation Research. February 1, 2008, 11(1): 1-3.

Dijkstra’s second rule for successful scientific research: “We all like our work to be socially relevant and scientifically sound. If we can find a topic satisfying both desires, we are lucky; if the two targets are in conflict with each other, let the requirement of scientific soundness prevail.”

Google stole my marker

This year, I am the course coordinator for a Java course. One of our tutors went missing. Human resources tried to negotiate with him but he told them he did not care anymore.

I googled him. I got his resume, and then noticed that the top line says “2008: now with Google.”

I guess that must be a common phenomenon in hot spots like Stanford? It was a first for me.

Writing alone: benefits and pitfalls

Yesterday, I wrote about the types of collaboration we commonly observe in science. Today, I want to spend 5 minutes thinking about what happens when you write a science paper alone.

Benefits:

  • New projects can emerge and die quickly.
  • You set your own standards.
  • You increase your range of skills by having to do all of the work.

Pitfalls:

  • It takes slighly longer to write a paper alone since you cannot share the workload.
  • The feedback loop is slow: you can waste months or years without anyone telling you how stupid you are.
  • It is easier to go unnoticed when you work alone.

I believe that you can alleviate some of the pitfalls:

  • Do experimental work early and often. Nature is the best coauthor.
  • Read a lot and keep an open mind. Do not become overspecialized.
  • Manage your time tightly.
  • Make your work widely available.

Collaboration in Science: Three models

Scientists collaborate frequently. Most science articles have at least two authors.

Some collaborations work well, others fail. The first step to understanding what went wrong is to categorize the collaboration. I distinguish three types:

  • Hierarchical collaboration: the student collaborates with his supervisor, the researcher collaborates with his manager. This form of collaboration is usually long-lived. It usually depends on the available funding and is usually more conservative in nature. The lower you are in the hierarchy, the more you work, usually.
  • Symmetric collaboration: two mathematicians write papers by exchanging conjectures over email. This form of collaboration does not scale well to large numbers: the communication overhead grows quadratically.
  • Topical collaboration: a philosopher writes a paper with a software engineer to describe the philosophy of software engineering. This form of collaboration can suffer from communication problems. The collaboration is usually project-centered. It might be risky research. I would expect this form of collaboration to be especially fruitful. Oddly enough, I cannot think of any famous example of topical collaboration in science.

See also The lonely researcher: a loser?

The “e” prefix is obselete

Nicholas Carr asked whether IT departments mattered. What is IT all about? e-Collaboration, e-Mail, e-Learning, e-Health, e-Business, and so on. Does the “e-” matter?

I am working on a graduate program in e-collaboration. At
some point, I had to stop and think… isn’t all collaboration
electronic? Even the construction workers use cell phones and PDAs.

Does anyone seriously sick fails to look their disease on Wikipedia, and enter related posting boards to meet other people who have the same disease?

Do you know any student who fail to use the Web to help them in their classes?

Do you know any business that is not also an e-Business? Even the shops at my local market have computers on their stands so that you can pay with a debit card.

Source: This idea came in an e-discussion with Daniel Tunkelang.

What is academic blogging about?

From the lowly Ph.D. student at a small school, to the Havard professor, researchers are blogging. Here are some of the reasons why they blog:

  • Research is a social activity. Blogging allows us to keep and create links with diverse researchers whose varied interests keeps our mind open and fresh.
  • Blogging is a personal activity, whereas most of science is consensual. Hence, blogging helps to promote ideas that would not survive otherwise. It is easier to go against the grain in a blog then in a research journal.

My thesis is that blogging will ultimately be recognized as an activity encouraging true innovation.

References: