I like to think about graph theory problems these days. Here is one:

What type of graph has minimal diameter for a given number of vertices, given an upper bound on the in-degree and another upper bound on the out-degree?

I will give eternal fame (among the readership of this blog) to anyone who can provide a practical algorithm to construct such graphs. Pointing me to a reference counts.

(No, I have not even tried to solve the problem. I am just interested in the answer.)

I read this on slashdot:

I have a PhD in math, and I still don’t have the multiplication tables memorized

Now I know I am not the only one!

In other news,

  • I still deduce my age from my birth date (takes me a minute or so each time);
  • I was identified as having a learning disability when I entered school (since I could not recite my phone number nor tie my shoes) and put in a special class;
  • I still don’t know my office phone number;
  • I don’t know my bank account number, nor how much money there is in it;
  • I don’t know my Social Insurance Number;
  • I get the birthdays of my sons mixed up.

But I know what a soliton is, I can solve nonlinear differential equations by multiscale methods, and I can program my very own bitmap index from scratch in C++. Oh! and I can grow coreopsis and echinacea from seeds.

Let us face it: the purpose of school should not be to teach specifics. And you should never judge kids by what you expect them to achieve. Let them surprise you!

Yahoo! managed to sort 10 billion 100-byte elements in 209 seconds. This was done in Java using Hadoop.

As a basis for comparison, on a fast and recent Mac Pro, it takes 6000 seconds to sort a 2 GB text file using Unix file utilities. Yahoo!’s problem is 500 times larger, and they solve it 30 times faster : they are 4 orders of magnitude faster! Of course, they have fixed-length records which helps tremendously.

However, I wonder how much energy (power usage) was spent on the sort operation?

A couple of weeks ago, I needed to backup my MacBook Pro to an external disk (a firewire G-Drive) because my hard drive was failing. I started shopping for a good backup solution, but none of them had the following features:

  • support for incremental backups: if a change is made, you only backup the files that differ;
  • adequate handling of IO errors (no all-out abort);
  • inexpensive.

Indeed, I tried two different tools, but they refused to backup my disk due to numerous IO errors. They would not even tell me how to fix my problem.

As it turns out, your Mac has already all it needs, by default, to do just that. First, create a file called “backup.sh”, make it executable (chmod +x backup.sh) and copy the following content to it:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
# my external disk is located
# at /Volumes/G-DRIVE\ MINI/
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/
sudo bless -folder /Volumes/G-DRIVE\ MINI/System/Library/CoreServices

Then run it! Go to a shell and type “./backup.sh”. It will ask for you root password.

If you ever need to restore your files, then create a file called “restore.sh” with the following content:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/ /Volumes/Macintosh\ HD/
sudo bless -folder /Volumes/Macintosh\ HD/System/Library/CoreServices

Executing restore.sh may prove dangerous. Make sure you have tried booting from the external disk first. To boot from an external disk, I think you have to hold down the command key while rebooting.

Everything else being equal, picking the right problems is the key factor determining your success as a researcher (no matter how you define success). In a previous post, I proposed three categories of research problems:

  1. explain a previously unexplained observation;
  2. perfect an existing technique;
  3. invent a new problem.

It appears that all 3 categories are equally valid. Which technique you prefer is a matter of style.

Today, I would like to propose a new, orthogonal, categorization in terms of the depth of the problem you tackle. Some problems

  1. are narrow and well-defined, you can complete them in a few months;
  2. form a set of narrow and well-defined problems, likely to keep you busy for years.

I have tended myself toward the first category (see “my research process“). The benefit of a focused burst of research producing a distinct result should not be underestimated. The most obvious benefit is that you can quickly move on and thus, you can afford to try your hand at random problems. It is the equivalent of a hit-and-run. If you are the curious sort, it allows you to learn about a new topic, without investing your career in it. However, it makes applying for grants more difficult. You are also less likely to achieve some recognition because the depth of your contribution might be less.

The second category means that you must find yourself a niche and work over it for years. Indeed, preferably, not too many people in the world must be aware of these problems you have identified. The catch is: how can you know, ahead of time, that the topic and the problems you see now, will still be interesting in two or three years? Are you investing in vain? Presumably, if you can follow this strategy, grant applications and recognition may come more easily. But what happens if you get bored?

The two categories relate to how you read papers. If you read papers thinking “maybe I could build on their work”, then you will naturally tend to the first category. Reading a lot of papers on different topics favors random hit-and-run research projects. Are you reading the list of accepted papers looking for clues as to what you will work on next? Are you attending talks to pick up random new ideas?

However, if you tend to “pull” research papers out of the (virtual) library based on your own ideas, then you will more likely gravitate toward the deeper research projects. In this case, your mental filters are much stronger: you tend to filter out everything that does not directly relate to your goals. You may still attend many conferences, and read lists of accepted papers, but your brain will filter most of the data out.

« Previous Page

Powered by WordPress