Is Python going bad? or The curse of unicode….

I’ve wasted a considerable amount of time in the last two days upgrading my RSS aggregate so that it will have better support for atom feeds. I use the feedparser library.

One thing that gets to me is how unintuitive unicode is under Python. For example, the following is a string…

t="éee"

Just copy this in your python interpreter, and it will work nicely. For example,


>>> t='éee'
>>> print t
�ee

However, for some reason, if I just type “t”, then it can’t print it properly…

>>> t
'xe9ee'

See how it is already confusing? (And we haven’t used unicode yet!)

Next, we can map this string to unicode…

r=unicode(t)

which has the following result…

>>> r=unicode(t)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
</stdin>

Ah… so it tries to interpret t as ascii… fair enough, we know it is “latin-1” or “iso8859-1”. It is already quite strange that “print” knows what to do with my string, but nothing else in Python seems to know… so we do


>>> r=unicode(t,'latin-1')
>>> r
u'xe9ee'
>>> print r
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'xe9' in position 0: ordinal not in range(128)
</stdin>

because, see, you can’t print unicode to the string… but you can do the following…


>>> print r.encode('latin-1')
éee
>>> print r.encode('iso-8859-1')
éee

but also


>>> r.encode('latin-1')
'xe9ee'
>>> r.encode('iso-8859-1')
'xe9ee'

What is my beef?

  • If ‘print’ assumes ‘latin-1’ then shouldn’t everything else? Why is this not consistent? If it is unsafe to assume ‘latin-1’, then why does print do it?
  • The encode, decode thing is a mess. We had a perfectly valid construct for converting things to strings, and that’s ‘str’. Now, we have a new one called ‘encode’. So that, given some unicode, I can do either t.encode(‘ascii’) or str(t) for the same result. Bad. Now, I’m stuck forever in a world where I have to figure out whether I encode or decode a string, and which is which. This is hard. This is confusing.
  • A string object should know its encoding so I don’t have to. What happens if I receive a string from some library and I need to convert it to unicode? How am I supposed to know what the encoding of the string is? There is no sensible way to communicate this right now which makes debugging a pain. The only excuse I see is that sometimes it is impossible for python to know the encoding… well, then it should just fail and require the programmer to specify the encoding. There are way too many things that can go wrong when you expect the programmer to keep tracks of his strings and which is encoded how…

Qualities for a good Ph.D. supervisor

Offline someone commented that more than half of the Ph.D. students are foreigners and that the Ph.D. is serving as a funding source. True. But that’s somewhat of a cynical view if you ask me.

In any case, you are a young student, and despite reading my blog, you still want to do a Ph.D. Maybe because you come from a Third World country and getting some cash to study is a compelling idea on its own. But whatever your reasons, here’s what you should be looking for, I humbly suggest…

  • Look at the past projects and students. Did all the students this prof. supervised ended up on welfare? Or can you google them as Harvard professors now? Can you find traces of the past projects this prof. was involved with or did they all fail? Look beyong the fanfare: look for evidence the prof. can’t control easily. Google past students.
  • Is the prof. aware of what the world is like, right now? Does he know the employment rate and career possibilities for young Ph.D.s or does he just pretend he knows? Where does he gets his facts from if he has any?
  • Does he give you the full story, with the pros and cons of doing a Ph.D. with him? Pros and cons of the research life?
  • Does he need to consume graduate students to get his research going or is the training of students only tangential to his research? In other words, can the guy still do research without students or are students cheap labour?

Selling your services as a scientific paper writer?

Nice post on Critical Mass today about a researcher who sold his services as co-author on eBay and actually got 50 bids and many phone calls. It would appear that many people, from industry to students, are willing to pay so they can produce high quality scientific content with their names on it.

I’m not sure it is an interesting line of work though. Most people don’t realize how expensive it is to write a good scientific journal article. Probably upward of $50K. It’d be difficult to sell 10 pages for $50K. Of course, there is other types of “research”, like journalistic research, when you get 10 pages for much less… but real science is awfully expensive.

What’s more interesting to me is the reasons why people where interested: “There’s this whole constellation of things they could get from it. They could get credentials. They would get the ability to have their questions actually answered.”

Why do I pick on this bit of news? Because I was actually offered jobs like this, and I always turned them down. I was offered money to write journal articles at least twice by totally different people. It was meant to promote a product or a service, in the end, or rather, give the product or service some credibility. I think this is misguided since there is an actual proper form for such publications: patents, technical reports or white papers.

In any case, it would actually be doable: sell your services as a scientist who publish papers to give credibility to products and services. It would be similar to a patent consultant, I guess, except that law is not so involved anymore. I found a lot of people everywhere think they have very unique ideas. They’d love them to be validated and have their ideas pushed in a very prestigious publication, just like having patents.

Writting papers is like taking pictures for Playboy. You look at beauty most of the time, and you have to capture the beauty… you have to make sure enough is being shown, but not too much. It is seen as a very romantic job where you are living a dream, but are, in fact, just doing your job. The only difference is that few people write papers attracting as many eye balls as Playboy pictures and most earn less money too.

The world is changing, and I’m there!

Tonight, I really feel like the world is changing.

The typical problem scientists and scholars in general have is that we need to be able to predict paradigms changes, or at least study them. But how can you know that things are changing while you are in it? Can humans study humans? Well, I’m not a social scientist, so I don’t have to worry, officially, with such issues… but all scholars are affected to some level by this paradox.

Well, I’ve been using inDiscover.net. Yes, I’m linking to my own project, well, it isn’t my project, but I’m involved from the side. So, it is self-promotion. Fine. Still, using inDiscover.net has made me realized how the world has changed. A bit like when Stephen Downes worked on his MuniMall portal project and, while the project was a failure, he realized that the world was changing and he embarked on a mission (see his value statement on his site).

Look on the right-hand-side of my blog, you should see my current playlist from inDiscover.net. All of this music is free. It is out there. You can download the MP3s and listen to the same music I listen to. No matter where you are in the world. You can then share your playlist with the world. You can have my playlist, live, as XML, that you can incorporate in any application, any web site.

I hope to write later on why I think this is a paradigm shift. We are beyond the world of blogs, beyond the Web… this is deep. I think it will eventually change society all the way.

Ok, I’m making many claims here… I need to write this up, but it is late…

Changing job for a Linux addict

People who are happy with whatever operating system they are offered probably find it much easier to change job. When you are a linux addict, it means that you have to secretly install Linux and then reverse-engineer the network configuration so that you can, well, print.

This time I installed Gentoo in my office. As it turns out, it was rather painless, but as an overworked prof., it was still hurtful to waste a day configuring the machine. The toughest part this time was getting the printing to work. As it turns out, I had to the cups to use smb://mylogin:[email protected]/uer_com instead of smb://tlmnt4/uer_com. Somehow, cups couldn’t just reuse my known username and password. Hmmm… I wonder why I never had this problem before? I really wish cups was easier to configure. But at least, it works now.

There is also a special java applet system called [email protected] here. But I think I mostly figured out how to get it running “ok” under linux (in French), though I had to waste another day on it.

I have good reasons to believe I must be the only prof. around using Linux. My addiction to command line interfaces has a thing or two to do with it. You can emulate pretty much the unix environment under Windows, but it is never quite the same in terms of productivity.

I’ve been told that MacOS X would be a good choice too. Except that I couldn’t have done what I just did here: take the “free” PC they put in my office and transform it in a Linux box.

Some facts people often don’t know:

  • Networking is mostly painless under Linux. Since most networks use DHCP, the configuration is a joke. With samba, you can access pretty much all of the network services you need even when they are hosted on a Windows server.
  • With OpenOffice and latex2rtf, you can pretty live within the MS Word universe and not get noticed. People will complain that the documents don’t look quite as they expect, but I’m a prof. and I can always claim that I’m not very good with word processing. You can consume and produce Word documents. Not very good ones, but unless you do secretarial work, it will be ok.
  • Email is not a problem even in a supposedly windows-only world: just use the exchange server as a POP server and you’ll be fine. Microsoft is not yet crazy enough to prevent people from checking their mail using the POP protocol. You might not get all the features from Exchange, but what you are missing won’t hurt you, much.

In the end, you can be very productive with Linux.

Now, if I could find how to turn the system bell off once and for all, I’d be happy.

Update: it turns out that we can turn off the system bell easily under Linux. I never knew this. Just do “xset b off”. You can put this in your .xsession file too.

Job prospects for new Ph.D.s are good?

This time, I found stats for Philosophy Ph.D.s. It would appear that the ratio of candidates to job advertize is shrinking quite a bit. I got this from Leiter Reports. In the same blog, we find evidence that 9 out 10 Philosophy Ph.D.s get a cool middle-class job, assuming they went to Princeton.

Well, there seems to be a lot more evidence than I thought that things are improving for new Ph.D.s

As it turns out, though, my claims are also supported by anedotes: Fang and Nielsen.

Research: when does it matter?

Does academic research matter?

I’m not a very good historian, but I seem to recall that rresearch as we know it arose out of the German model. It proved invaluable at least in the Second World War. Or did it?

Of course, Tim Berners-Lee owe to academic research some of the ideas that lead to the Web. Some. But not that much, really. Tim Bray is not exactly from academia, is he? Yet, XML changed the world in a deep way.

It seems like academic research is more and more irrelevant… or is it progressively more underfunded, or mismanaged… or just simply totally irrelevant?

Here’s a theory: we’ve come to define success by the number of publications… yet, amazing folks like Tim Bray don’t necessarily go out of their way to submit papers. They listen, they talk, they write within communities and then they publish proposals. They probably hack some software too. So, maybe academic research is becoming irrelevant because we have success wrongly?

Why would the public respect people whose main achievement is a (smallish) number of 10 pages documents they get in books hardly anyboyd ever read.

Overproduction of Ph.D. a myth?

According to Owen, or maybe, according to what I understand from his email, overproduction of Ph.D.s is a myth. Schools can’t get decent Ph.D. holders.

True. Maybe.

Owen has evidence: CRA Stats.

More precisely, according to this table, 60% of CS Ph.D. holders go to a Ph.D. granting university (on a tenure-track or not?), 4% to a non-Ph.D. granting school, and 29% go in industry. A meagre 2% join the government, and 1% and self-employed.

Death of the invisible adjunct

I stumbled on a channel setup by Seb called Topic Exchange: Channel ‘invisible_adjunct’. I was an avid reader of the invisible adjunct. For those that don’t know, the invisible adjunct was one of the many Ph.D. holders who have a decent publication record and are in every way competent scholars, but they still fall through the system and end up beggars at some university. For those of you not familiar with the context, let’s just say that there is tremendous pressure on professors (like myself) to train more and more and more Ph.D. students.

This comes in part from the government which likes to measure universities by numbers: how many Ph.D. does this university produce… and so on…

Now, of course, if all these graduates are unemployed… well who cares? And who’s going to believe you when you say that Ph.D. holders can’t find jobs? Who’s going to pity them? Surely, they can’t find a job because they want to earn $500k a year? Right.

No. Most Ph.D. graduates are lucky to find a post-doc. A post-doc, in Canada, pays around $30k. Sometimes more, sometimes (amazingly) less. I’m not saying that some of them don’t find great jobs. It happens. But it is statistically insignificant.

So, the invisible adjunct is someone who just gave up. She’s not alone.

Some people are smarter and they leave early, like wolfangel… but many don’t.

I particularly like a recent post by Erin.

Here’s a comment which rings very true:

My experience with a Ph.D program was that the myopic focus on only academic skills was very damaging to the students in my program. Academic who have devoted their whole life to study of a particular subject in an educational setting have little understanding of the learning that takes place on the job or as part of living life.

But here’s some advice from a couple of science Ph.D.s :

There is only one reason to get a Ph.D. — because the career path you want to pursue requires it. Do not do it because you think it will make you feel important, because it will do the opposite. Do not do it because “there are a lot of things you could do with it”, because there are plenty of things you can do without it. Do not do it because you think it will be an intellectual adventure, because you’d do much better with a library card.

I think that all students should read and seriously consider such statements before undertaking a Ph.D. I’m not saying a Ph.D. is a bad thing… but, well, read the quotes above!

There is more:

your odds of getting the PhD are smaller than you think, your odds of getting a job are slighter still, and your odds of getting tenure at a place yet smaller, and then all of this happening at a place you would otherwise choose to live? Infinitesimal.