Data centers as a utility?

Seems like Gartner predicts data centers are going to become a utility:

The office environment will dramatically change in 50 years’ time, with desktop computers disappearing, robots handling more manual tasks, and global connectivity enabling more intercontinental collaboration. Data centers located outside the city will run powerful database and processing applications, serving up computing power as a utility; many more people will work remotely, using handheld devices to stay connected wherever they go, although those devices will be much more sophisticated and easier to use than current handhelds.

If you haven’t switched to Firefly, do it now.

I’ve finally moved all my machines to Mozilla Firefox 1.0. It is, by far, the best browser I ever used, and it is totally, truely free. Unfortunately, the French version is lagging behind a bit. Unless you are running something else than Windows, Linux, or MacOS, you have no excuse to use another browser. None.

Update: Sean asks why I switched away from Konqueror. The main reasons are XML support and Gmail. Gmail doesn’t support konqueror for some reason, and I badly need a browser having decent support for XSLT. Also, there is a comment below saying that Firefox is not stable on OS X 10.2.

On tools for academic writting and a shameless plug

First, the shameless plug: my long-time friend, Jean-François Racine published a book available both as hardcover and paperback. The title is “The Text of Matthew in the Writings of Basil of Caesarea”.

More seriously, and maybe he had told me about this, but he told me about this specialized word processor he uses, called Nota Bene. More interesting is a component of this word processor called Orbis. Specifically, Orbis generates vocabulary lists, as well as frequency of occurrence; and it allows you to define synonym lists to expand search capabilities.

An Amazon Web Services (AWS) 4.0 application in just a few lines

I have somewhat of a debate with my friend Yuhong about the correct way to use a Web Service. Yuhong seems to prefer SOAP. I much prefer REST. What is a REST Web Service? For the most part, a REST Web Service is really, really simple. You simply follow a URL and magically, you get back an XML file which is the answer to your query. One benefit of REST is that you can easily debug it and write quick scripts for it. What is a SOAP Web Service? I don’t know. I really don’t get it. It seems to be a very complicated way to do the same thing: instead of sending the query as a URL, you send the query as a XML file. The bad thing about it is that if it breaks, then you have no immediate way to debug it: you can’t issue a SOAP request using your browser (or maybe you can, but I just don’t know how).

Now, things never break, do they? Well, that is the problem, they break often because either I’m being stupid or I don’t know what I’m doing or the people on the other side don’t know what they are doing or the people on the other side are experimenting a bit or whatever else. I find that being able to quickly debug my code is the primary feature any IT technology should have. The last thing I want from a technology is for it to be hard to debug.

Here is the problem that I solved this week-end. I have this list of artists and I want to get a list of all corresponding music albums so I can put it all into a relational (SQL) database. Assuming that your list of artists are in a file called artists_big.txt and that you want the result to be in a file called amazonresults.sql, the following does a nice job thanks to the magic of Amazon Web Services:

Yes: the code goes over because I cannot allow HTML to wrap lines (Python allows wrapping lines, but not arbitrarily so, white space is significant in Python). There is just no way around it that I know: suggestions with sample HTML code are invited.

import libxml2, urllib2, urllib, sys, re, traceback
ID=""# please enter your own ID here
outputcontent = ["ASIN","Artist","Title","Amount","NumberOfTracks","SalesRank","AverageRating","ReleaseDate"]
input = open("artists_big.txt")
output = open("amazonresults.sql", "w")
log = open("amazonlog.txt", "w")
output.write("DROP TABLE music;nCREATE TABLE music (ASIN TEXT, Artist TEXT, Title TEXT, Amount INT, NumberOfTracks INT, SalesRank INT, AverageRating NUMERIC, ReleaseDate DATE);n")
def getNodeContentByName(node, name):
for i in node:
if ( return i.content
return None
for artist in input:#go through all artists
print "Recovering albums for artist : ", artist
page = 1
while(True):# recover all pages
resturl = url %(ID,urllib.quote(artist),page)
log.write("Issuing REST request: "+resturl+"n")
try :
data = urllib2.urlopen(resturl).read()
except urllib2.HTTPError,e:
log.write("could not retrieve :n"+resturl+"n")
try :
doc = libxml2.parseDoc(data)
except libxml2.parserError,e:
log.write("could not parse (is valid XML?):n"+data+"n")
isvalid = (ctxt.xpathEval("//aws:Items/aws:Request/aws:IsValid")[0].content == "True")
if not isvalid :
log.write("The query %s failed " % (resturl))
errors = ctxt.xpathEval("//aws:Error/aws:Message")
for message in errors: log.write(message.content+"n")
for itemnode in ctxt.xpathEval("//aws:Items/aws:Item"):
attr = {}
for nodename in outputcontent:
content = getNodeContentByName(itemnode,nodename)
if(content <> None):
content = re.sub("'","'",content)
if(nodename == "SalesRank"):
content = re.sub(",","",content)
attr[nodename] = content
columns = "("
keys = attr.keys()
for i in range(len(keys)-1):
columns += keys[i]+","
row = "("
values = attr.values()
for i in range(len(values)-1):
command = "INSERT INTO music "+columns+" VALUES "+row+";n"
NumberOfPages = int(ctxt.xpathEval("//aws:Items/aws:TotalPages")[0].content)
if(page >= NumberOfPages): break
page += 1
print "You should now be able to run the file in postgresql. Start the postgres client doing psql, and using i amazonresults.sql in the postgresql shell."

Update : this was updated to take into account these comments from Amazon following the upgrade of AWS 4.0 from beta to release:

Service Name change: You will need to modify the Service Name parameter in your application from AWSProductData to AWSECommerceService. We realize that it may take some time to implement this change in your applications. In order to make this transition as easy as possible, we will continue supporting AWSProductData for a short time.

2) REST/WSDL Endpoints: You will need to modify your application to connect to instead of For other locales, the new endpoints are, and

Academic life: a balancing act

Today, I realized that the life of a researcher/professor is really a balancing act. A professor…

  • has a rich personal life;
  • gives great courses;
  • gets a lot of funding;
  • has many students;
  • publishes a lot papers each year;
  • consults on industrial/governmental projects;
  • manages something (departement, project, program).

It is no surprise that many professors end up being overworked. I think you simply cannot pull all these things at once. Maybe 2 or 3 from the list. You have to choose or life will choose for you.

Does your university think that “Jobs are for the little people”?

Tall, Dark, and Mysterious is a Math. professor somewhere in Canada, possibly in British Columbia. She graduated from a big school and now teaches at a smaller (lesser?) school.

Well, is it a lesser school? That’s where her tale becomes interesting. Myself, I attended UofT. I don’t know if the rule is true, probably not, but it seem that the larger the school, the more it suffers from the jobs-are-for-little-people syndrome as documented in a post by Tall, Dark, and Mysterious. Here is an insightful quote:

University isn’t job training, because universities are adamant about university not being job training. And it’s not because they’re too busy enriching students’ lives and fostering a love of learning. Underneath all of the cheap idealism – trumpeted by gainfully employed people, many of whom haven’t learned how to play a musical insturment, how to speak a foreign language, or how to play a new sport because none of those things are related to their jobs and because they’re too old to be doing that sort of thing – about learning for the sake of learning is a willful inability to confront the fact that students are not at universities to learn for the sake of learning.

Some insight from John Travolta

This morning, I was chatting with a colleague, Richard Hotte, and we were discussing research, funding, and the relation between the two. I’ve also had these discussions with Martin Brooks. Richard pointed out that John Travolta has figured it out all some years ago when he received a prize at Cannes for the movie Pulp Fiction. According to Richard, Travolta was asked then about how it felt to finally win a prize that he surely coveted for many years. John Travolta answered that his goal was never to win a prize, but rather to grow as an actor. The prize was nice, but not his goal.

This echoes my own feeling about funding for research. If your goal is funding, you’ll probably get some, maybe a lot, but unfortunately, you may never become a great researcher.

Use of blogs in higher education

Through Downes’, I found this great paper on Exploring the user of blogs as learning spaces in the higher education sector.

Looks like a great paper:

This paper explores the potential of blogs as learning spaces for students in the higher education sector. It refers to the nascent literature on the subject, explores methods for using blogs for educational purposes in university courses (eg. Harvard Law School), and records the experience of the Brisbane Graduate School of Business at Queensland University of Technology, with its ‘MBA blog’. The paper concludes that blogging has the potential to be a transformational technology for teaching and learning.