Assessing a researcher… in 2007

Erik Duval asks for help. He points out that it is extremely difficult to figure out who cites him, how often, and so on. Using a tool offered by librarians (Web of Science) gives highly accurate, but also highly incomplete results. Meanwhile, Google Scholar fares better, but gives noisy data which overestimate how many people cite your papers. An astute reader comments on his article saying that you still have to take into account the blogosphere and other media.

My take on this? What tool actually matters is the tool other researchers use. If everyone uses Web of Science daily, and you are not there, then too bad.

I have one paper on PubMed, so, at least, I vaguely exist, as far as medical researchers are concerned. I have 5 papers on MathSciNet, so I exist a bit more for mathematicians. And so on.

Myself? I use Google Scholar. If you are not on Google Scholar, you do not exist for me.

How much impact you are having as a researcher is a fundamentally multidimensional problem. No researcher dominates any other researchers in every way. Trying to find one scalar measure that sums it up is futile, though it can be fun.

See also my posts Are we destroying research by evaluating it? and On the upcoming collapse of peer review.

Software Is Hard: you bet!

In Software Is Hard, Kyle Wilson proposes a law of software development:

It is impossible, by examining any significant piece of completed code, to determine within a factor of two how many man-hours it took to produce that code.

(Oh! I think that he is being generous. I doubt you can estimate reliably within a factor of 10 how long it took to complete a non trivial piece of software. Even if you know who did it and in what conditions.)

And a corollary:

If you can’t tell how long a piece of code would take when you have the finished product available, what chance do you think you have before the first line of code is written?

This is a very convincing case for why writing and maintaining software are not engineering activities.

Tape as the future of storage: are Sun and Dell insulting our intelligence?

The CEO of Sun (Jonathan Schwartz) has decided to refocus his company on storage, or so he said on his blog. So far so good. I do not think you can endlessly assume that people will want more computing power, as it comes with a growing electricity bill, but you can assume they will always need more storage as long as the electricity bill remains constant.

The trouble, of course, is that the more hard drives you have, the more electricity you use. This works well for a PC. But in a multi-exabyte data warehouse where most of the data is never accessed, you want the electricity usage to be independent of the amount of data.

Solution? Just power down your hard drives, I say.

Here is what Jonathan had to say:

Today, only tape can maintain the integrity of that data without electricity. And for the datacenters we serve, many are seeing the cost of electricity threatening to eclipse their hardware budgets (yes, I’m serious).

Ok. Hard drives are not very reliable and you need multiple copies of the data to be safe, and you need refresh these copies all the time. This does require some electricity, but copying hard drives on a regular basis should not be so costly if the hard drives spend much of their time powered down. And, frankly, I would not trust a tape backup forever either: surely, you have to copy those as well from time to time.

Meanwhile, hard drives are at least ten times cheaper than tape backups, given the same storage. In fact, it appears that 1 TB of tape backup costs $5000 whereas 1 TB of hard drives costs well under $500. This means that I can buy 8 hard drives, and duplicate my data 8 times, and still save money over tape backups. With the hard drives, I will have much better performance due to much better random access latency. So why would I ever buy tape backups?

Are you going to tell me that 8 replicated hard drives are less reliable than one tape backup? I bet it uses just about as much space.

The story was recently spinned to Canadian journalists and it seems that Dell agrees with Sun. Here is what a spokesman for Dell had to say:

Our studies suggest about 81 per cent of companies are looking to increase their level of investment in tape and over half are using tape for compliance issues.

My take on this story is that Sun and Dell are insulting our intelligence. They are selling this idea that tape-backups-are-great to managers who have money to waste, but real engineers know better.

(Yes, maybe I am wrong. If so, tell me how!)

The medium is the message, in Computer Science?

We should all know that the medium is the message. What does it mean for research in Computer Science?

I have done some work recently on tag clouds. What is fascinating is that a new widget changes in several significant ways how we perceive what would otherwise be classical problems. You cannot substitute pie charts for tag clouds, no matter how hard you try. Pie charts simply do not scale nearly as well. For every new trick hackers invent, there are many academic papers to write.

I have been saying for a year now that videos on the Web will change e-Learning in a lasting manner. You cannot think about online learning the same way now that we have YouTube. Whereas adding an online video to a Web site used to cost hundreds of dollars, you can now do it for a few cents. Selecting and composing these videos so that they make sense to the viewer is another matter. I have been saying for some time that I do not want a recommender system, I want a composer system: do you not give me the best videos, help me build coherent lists of videos.

What about online word processors and spreadsheets? This pushes the data on the Web. This is a bad thing for security (depending on who you talk to), but it might be a great thing for Business Intelligence. There is a very real opportunity there for a company like Google to help a company make sense of what its employees are doing. I am not talking about micromanaging your employees: who has time to figure out who is cheating with whom? I am talking about cross-referencing, aggregating, and indexing the data in ways that were unthinkable in the desktop era.

Why don’t people use university libraries?

I was recently asked by someone who manages a librarian newsletter, why I thought that library tools did not make it in the Top 100 Tools for Learning by Jane Hart. I immediately replied that Google Scholar made it to the list.

Then I had to think back. What about the last time I used a library tool (other than Google Scholar)? I can remember bad feelings like “why is the user interface so complicated?”. Then “why can’t I find what I am looking for?”. And “why do I have to choose which database I want to search in, can’t it just search them all?”. And “why do I have to go to another, different tool, to know who cited this paper and when?”.

Beyond these frustrations, I came up with some more specific reasons why library tools are not used:

  • They are not user-friendly. They were clearly designed with the “we shall train the users” motto in mind. Sorry. I do not want any training. I want you and your tools to get out of my way and let me find what I am looking for.
  • You may not consider this workshop paper that appeared on the Web two months ago a “worthwhile” reference, but I do.
  • It is ok for you to have to mail order a journal article and wait a week for it. Me? I want it now, on my screen, or else…
  • So, your search engine covers more prestigious journals than Google Scholar? It can count citations appearing in prestigious journals with absolute accuracy? Because, of course, you can only trust the “reliable sources”. Well, you are a librarian and you care about these things, but I am a researcher and I do not care as much as you do. I have a social network, I know who are the researchers you can absolutely trust, those who I should investigate further. In minutes or even seconds, I can tell about the quality of a paper. It is not a problem for me to trim out junk: the Web has trained me well to do it. Students should learn to do the same.

Update: This blog post was cited in the Fall 2007 newsletter of the Online Computer Library Center, Inc.

Canadian dollar reach parity with American dollar

For all my adult life, the American dollar has been worth more than the Canadian dollar, often much more. No longer! We reached parity today. Maybe Americans should reflect on what this means for them that the value of their currency is going downward so fast. (Hint: stuff is going to cost more.)

Google Presentations: What did I tell you?

Less than a year ago, I predicted that Google would come up with a PowerPoint-like tool. They did it. And you know what? It is pretty good.

Downes had this to say about it:

What I’d really like is a slide library I can simply draw from to create presentations. But you can’t even drag and drop slides inside presentations.

To this, I reply:

  • You can’t drag and drop slides within your presentation, but you can move them in the slide desk using buttons to achieve the same result.
  • You can’t drag and drop slides from one presentation to another, but you can copy a slide in a presentation and copy it in another.

(Picture source)

No, you do not have to settle on a poor language because you have bad programmers

I do not entirely believe the title of this post. Clearly, if you hire subpar programmers, you have to settle for whatever programming languages they know. These days, it is probably going to be Java. And you could do a lot worse than choose Java. Or maybe it is PHP. Again, PHP is fine.

The real question is… should you prevent your programmers from using Ruby or Python because you worry about what will happen to the next guy who needs to maintain their code?

On this issue, Eugene makes a great point. What languages like Java offer that “crazier languages” like Ruby do not offer is builtin testing. In Java, types are checked at compile time, except of course, when it does not, such as when you use collections of objects. In languages with dynamic typing, fewer tests are done at compile time.

The solution? Simply get programmers to use unit testing more aggressively. In my experience, unit testing is relatively painless to put in place. It is a great way to document what you expect your code to do, way beyond what static typing offers.

So, next time a programmer working for you wants to use Ruby, just say yes, but require him to do unit testing.

My Experience as a proud Wii user

We got our Wii on Friday. This was a busy week-end!

First, some background. I have owned and played video games ever since I was twelve or so. My wife has played on the Nintendo first generation machine extensively as a kid too. I have two young boys who are still a bit too young for video games.

In the recent past, I have been a Sega Dreamcast owner. I bought it because my wife didn’t want me to play DOS-era strategy games alone on my laptop. (Some enjoy chess, I prefer turn-based strategy games.) This console offered very good gaming experience. It was underpowered for first-person shooters and had no strategy games. But we had lots of fun playing Resident Evil as a family. I played most of the titles all the way through.

Then, I upgraded to the PlayStation 2. It was a bit of a disappointment. The first-person shooters were ok. The action RPG games on the PS2 were pretty good. There were no strategy games. Resident Evil was still there, but it was no better than it had been on the Dreamcast. In the end, my wife never played much with the PS2, and I gave up on many of the titles.

The Wii is very different. First of all, my wife played way more than I did so far. She just love Wii Sports and she can probably beat me at most games by now. Even the lowly Wii Play got us to play together a bit. In fact, the main difference between the PS2 and the Wii is that I am now back at the “console as a family game machine.” I bought the classical Mario and Zelda for the Wii Virtual Console, and again, my wife enjoyed it quite a bit (nostalgia?).

I also bought Super Mario Paper. So far my experience with the game has been so-so, but I have never been a fan of Mario and I hear it gets more interesting later.

The Wii has good news and weather channels. I think I will actually make good use of those. One very important step I took was to enabled the Web browser on the Wii. It actually works very well. It is a great way to browse the Web as a family.

Technologically, the Wii might be underpowered though I doubt it. Yes, its processor only runs at 729 MHz and it only has 88 MiB of RAM, but the processor of the PS2 ran at 294 MHz and it only had 32 MiB of RAM. So, it has easily twice the power of a PS2 and the Wii does not need to support high definition, which is a good thing because drawing fewer pixels takes less time. My guess is that we will simply see less stunning visuals on the Wii, but I doubt it will matter to Wii owners. And since the Wii is outselling other consoles at least 2-to-1, expect the Wii to have great games soon. Isn’t it what matters?

(I should mention that I expect games to be cheaper-to-make on a Wii.)

The Wii remotes are a breakthrough. I will never be able to go back to standard controllers. In fact, now I am wonder why my mouse does not have the some sensors as the Wii remotes. While not as accurate as a mouse, it gives the impression that it is as accurate.

(Picture source)

How to make Smultron even better

Smultron is, by far, the best text editor on MacOS. And it is free. Now, I just found out how to make it even better. One annoying problem with Smultron is that if the underlying file gets updated, Smultron often forgets to reload it. You can make this less likely. First close Smultron, then, in a shell, type:

defaults write org.smultron.Smultron TimeBetweenDocumentUpdateChecks 1

Explanation: by default, Smultron checks for file updates every 10 seconds.

Food for thought: Searching attachments in Gmail

What a good question! Why can’t I search my attachments from withing Gmail?

I think I have an answer, though people will not like it.

I once worked as a project architect for an e-Health project. This was circa 1999 and we were trying to automate exchange of data between laboratories using some kind of specialized email application. I had written a customized email client that would received the data in XML form and present it appropriately (now, you could do the same thing with XSLT in minutes).

I ended up quitting the project. One reason for it was the strong desire for users to send the medical data using Word documents. This is extremely annoying to someone who is working on data schemas to facilitate automated interchange. Basically, secretaries love Word (unstructured data entry), and they hate structured data. And they probably have good reasons too! The worst thing came down when people forking the money decided that we should really fully support Word documents. I really gave up at that point. Try as I may, I could not convince them that storing the data in Word documents was incompatible with a smart data interchange format.

So, why do I think that the GMail team finds that fully supporting attachments is not a priority?

Because they think that exchanging documents via email is inefficient and will probably go away, at least for serious tasks, in the near future. If you document has any kind of serious content, it should not be archived in your email system. Seriously. Post it on an intranet, on a Web site, using a collaborative editing tool, and so on. These solutions are not mature enough for you? It could be, but they soon will be.

Email is for 10 liners, no more. Email is not appropriate for sending large documents, publishing your essays, and so on. The only reason I use emails for other purposes is because there is no other way, but there soon will be.

Machine Smarter Than Naked Human Being?

I keep seeing the statement that machines can now beat human beings at chess.

To me, this is like saying that a car can move faster than a human being.

A naked, unassisted human being is pretty useless. Most of us, me included, would not last a month without tools, naked in the woods. I might not even last a day. But does it matter? Human beings do rule the Earth, last time I checked.

I can move faster, over longer distances than most animals. I can kill even the largest beast from 100 meters away. I can get images from the sky as accurate as what any bird can get. I can swim for hours without coming back to the surface.

I can do all these things because I use tools.

And I submit to you that the only way a machine can surpass human beings is by not being a tool anymore. A tool, no matter how great, cannot surpass the one who holds it. A drill does not make holes faster than a human being, a car does not move faster than a human being, and so on. They only surpass an unassisted (naked) human being.

So, when does a machine stop being a tool for human beings? Presumably when human beings no longer control it.

Science and Technology Advice is Not Free

Ever since I setup a web page, even before I even knew what a blog was, I have had the following odd experiences. People get in touch with me, in some way, typically by email (but not always), because they do not know how to do something and want me to help them. I am not talking about a business seeking my help, but rather just a random individual, sometimes a software developer, sometimes an engineer, sometimes a student… seeking free help. Students are especially likely to seek free help.

When it concerns my published research, this is usually very pleasing to me: I love to answer questions about my research.

However, quite often, the questions are just “work”. By that I mean that they are exactly the type of questions that you pay a consultant to answer. Being a pseudo-polite person, I most often answer the question quickly, sometimes giving a pointer, most often just saying I cannot help.

At some point in time, when I was very active as a consultant, prior to rejoining the research world, I would sometimes agree to answer, for a fee. Only once did such an individual agree to pay: I then produced a sample Java code… it took me about an hour to test, document, and ship the result. The fellow wanted to compress some images in a certain way. I charged him US$150. He complained that I charged too much and never paid.

I am aware that doctors, lawyers and accountants sometimes answer questions without charging people. But the type of questions I get often require 15 minutes or more of work. My expertise has been acquired over the years through hard work. In fact, I estimate that I have invested far more in growing my expertise than most doctors, lawyers and accountants.

Today, I got two questions in my inbox. One of them was “I downloaded this script from your web site, and it does not work for me, can you tell me why.” This fellow expects me to invest 15 minutes, 30 minutes or more. Meanwhile, he will attempt to turn around and charge other folks for the result, either because he is an employee or a consultant himself. Do you think he will be willing to pay my fee? Then, a graduate student from emailed me a question about some algorithm I once used in a paper. This algorithm is generic. Explaining the algorithm to the fellow in question would require about half and hour. Do you think he is willing to pay my fee?

This annoys me profoundly because it suggests that in some people’s mind, science and technology skills are not valuable. Somehow, my time is less valuable than an accountant’s time. I am annoyed that people consider medical, legal, or accounting advice to be worth paying for, but science and technology advice should be free.

I am not saying serious businesses hold this view. But the public does.

(Of course, several businesses do seek free services. Anyone who has been a consultant knows this. There is a category of clients you always get: they want stuff for free on the implicit promise that they will make it up to you later.)

To be fair, the public also does not feel like paying for financial advice: if you call up a financial expert, he will probably charge you nothing, but get a commission on whatever he can sell to you.

I do not know where this whole advice-should-be-free attitude comes from. I much prefer paying for advice. I want my financial advisor to get paid by me, not by the fund managers.

An even deeper issue is that when the public consider that science and technology skills are free, I think you eventually end up with very good doctors, lawyers, and accountants, but you outsource engineering and science to other countries. You also end up getting poor advice about where to invest your money because you are not willing to pay (directly) your financial advisor.

It may not matter all that much where you go to college

Paul Graham, the millionaire, Harvard graduate, Italy art school graduate, the same guy who wrote that Americans would keep the upper hand because all of the best professors are parked in a few small elite colleges instead of wasting their time all over the country teaching to lesser kids, the guy who has written that elite colleges were important because that is where the most brilliant kids meet up and create the best start-ups, the guy who wrote that keeping housing extremely expensive and refusing to tax the rich was key to producing innovation, well, this guy had revelation last week:

It may not matter all that much where you go to college.

Wow!

And he had this revelation because, time and time again, when recruiting kids for his start-up incubator, he found out that kids who graduated from MIT, Stanford or Harvard were not smarter.

How he explains it is great too:

Because how much you learn in college depends a lot more on you than the college. A determined party animal can get through the best school without learning anything. And someone with a real thirst for knowledge will be able to find a few smart people to learn from at a school that isn’t prestigious at all. At most colleges you can find at least a handful of other smart students, and most people have only a handful of close friends in college anyway. The odds of finding smart professors are even better. The curve for faculty is a lot flatter than for students, especially in math and the hard sciences; you have to go pretty far down the list of colleges before you stop finding smart professors in the math department.

(Also see my post Big schools are no longer giving researchers an edge?)

The Web warps space and time

Thomas has evidently been reading David Weinberger. He points out that

The Web folds space in a way that (most of) human knowledge is within our arm’s reach.

He then asks how Frank Herbert would have felt.

Myself, I always ask myself, whenever I read a pre-Web SciFi novel, whether the author could have foreseen the Web.

In any case, it is true that the Web warps time and space. By that I mean that where, physically, the documents are is without concern for you. The Web also speeds up information retrieval tremendously.

Suppose you were the only human being with access to the Web as we know it. You would be able to pull out knowledge faster than any other human being. You would appear like a super hero. You can tell instantly the prices of a given product in hundreds of stores. You can get a satellite view of any house in your area, in seconds. And so on.