Piled Higher and Deeper

Thanks to geomblog I found out there is such a thing as daily comics about working on a Ph.D. It is pretty funny though I was so among the lucky ones when I wrote my Ph.D.: I was very naïve.

What I want to see is a follow-up where the Ph.D. student actually gets a job!

I read somewhere last night that according to a study, only 15% of Ph.D.s in science working in Québec (Canada) are on a professorship (Canada). It can be either a good or a bad thing. As for myself, after a got my Ph.D., I never could find a decent job offer in Québec that wasn’t a professorship. I know few jobs that are quite good outside academia, but I certainly don’t know many. Where are all those Ph.D.s and are they happy?

NSERC – Policy on Intellectual Property

NSERC is the main funding body for research in science and engineering in Canada. It has an interesting policy on IP:

NSERC expects that any IP resulting from research it funds wholly or in part will be owned by the university or the inventor, according to university policy. Access to IP should be accorded to other sponsors in recognition of, and in proportion to, the sponsor’s contribution to the collaboration.

Alas, I must say that I violated this rule, against my will, in the past, but I will try harder to stick by it from now on.

The interesting question here is whether things like assigning copyright to a publisher are in violation of the funding body’s rules? Probably.

Academia really needs to get its act together with respect to IP as I’m not the only one who plays with grey areas…

Online courses force a deeper understanding

From eLearn Magazine, I got this quote in a paper by George P. Schell:

Online courses force a deeper understanding of information technology simply because they require immersion in the technology that supports the subject being taught. If students fail to master the technology skills required by the course they ultimately fail the course itself. We’ve long understood that immersion, such as learning a foreign language by living where the language is spoken, is a very effective method for quickly and deeply learning a subject.

Does JavaScript scale?

This post talks about how hard it is to debug JavaScript.

In general, pushing the UI to Javascript makes it hard to develop and debug. There are limited tools, the language is too lenient (no objects, weak typing), and testing involves cycling through webpages over and over again.

Obviously, this statement is false: there are objects in JavaScript and some nice features… JavaScript is not too lenient: there are many solid languages without strong typing and they work just fine. But I must say that, indeed, it is quite hard to debug JavaScript. Incredibly so.

My friend Scott Flinn would say “it’s the browser, stupid”… since he thinks that JavaScript is fine, but that JavaScript in the browser is bad.

That’s why, if I ever attempt to do non-trivial JavaScript, I will try to use command line interpreter. The command line is a powerful programming tool despite what Microsoft and Borland think.

Paquets… ou Seb en français

Seb is now available in French through Paquets… de quoi? Multi-language blogging is an interesting topic I covered elsewhere in my blog. I will eventually open a blog in French, but I’m worried that having too many blogs will kill the fun. I like having one spot to call mine… if I have to run around all over the place, it might get tiring…

My experience so far with Google ads

This is depressing. My blog gets millions of page loads per day (not really). So, being greedy (not really), I decided to put some ads on it. Hence, I put some Google ads following Yuhong’s foot steps. Well, so far, not a single click. Not one of you guys clicked on one of the ads.

I never thought I would make any money, but I still expected a few clicks a week.

To be fair, I think these ads are fairly useless. Right now, I see ads about blogging software. Maybe I write too much about blogging?

Why encyclopaedic row speaks volumes about the old guard

John Naughton wrote about people doubting wikipedia this well phrased bit:

we have become so imbued by the conventional wisdom of managerial capitalism that we think the only way to do things is via hierarchical, top-down, tightly controlled organisations

I certainly can see this phenomenon among many researchers. For them, research is about specifying what ought to be in a top-down approach.

Planning is important: for things you can plan. You cannot plan an encyclopia. You cannot design an encyclopedia using a top-down approach. You cannot design most software using a top-down approach (you can, but your project will fail when you’ll face what you didn’t plan for). You can’t do research in a top-down approach (but you can if you want to build a particule accelerator).

Tim Berners-Lee first executive summary of the World Wide Web

I copy this here for historical reasons. Notice how Tim didn’t simply point to a specification, he actually pointed to a working demo of what the Web could be. (Complete version can be found on the w3c Web site.)

From :Tim Berners-Lee (timbl@info_.cern.ch)
Subject :WorldWideWeb: Summary
Date :1991-08-06 13:37:40 PST
     Information provider view
The WWW browsers can access many existing data systems via existing protocols  
(FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data  
is quickly exceeded, and the increasing use of the system by readers and  
information suppliers encourage each other.
Making a web is as simple as writing a few SGML files which point to your  
existing data. Making it public involves running the FTP or HTTP daemon, and  
making at least one link into your web from another. In fact,  any file  
available by anonymous FTP can be immediately linked into a web. The very small  
start-up effort is designed to allow small contributions.  At the other end of  
the scale, large information providers may provide an HTTP server with full  
text or keyword indexing.
The WWW model gets over the frustrating incompatibilities of data format  
between suppliers and reader by allowing negotiation of format between a smart  
browser and a smart server. This should provide a basis for extension into  
multimedia, and allow those who share application standards to make full use of  
them across the web.
This summary does not describe the many exciting possibilities opened up by the  
WWW project, such as efficient document caching. the reduction of redundant  
out-of-date copies, and the use of knowledge daemons.  There is more  
information in the online project documentation, including some background on  
hypertext and many technical notes. (...)

You can also check out Linus’ first email presenting Linux.

Slope One Predictors for Online Rating-Based Collaborative Filtering (SDM’05 / April 20-23th 2005)

I’m very proud of this little paper called Slope One Predictors for Online Rating-Based Collaborative Filtering. The paper report on some of the core collaborative filtering research leading to the inDiscover web site. I’ll be presenting it at SIAM Data Mining 2005 in April (Newport Beach, California).

This is a case where, with Anna Maclachlan, we did something that few researchers do these days: we looked for something simpler. The main result of the paper is that you can use extremely simple and easy to implement algorithms and get very competitive results.

The current trend, in academia, is to develop crazy algorithms that require not 10 lines of code, not 100 lines of code, but several thousands. I think the same is true in some industries: think of Web Services or Java (with the infinite number of new acronyms).

Well, I like complex algorithms and as a math guy, I like a challenge, but once in a while, I think it pays to go I think “wait! what if the average Joe wants to implement this?”

So, if you write real code and are interested in collaborative filtering, go check this paper.

The Medici Effect

Harold talks about the The Medici Effect:

Johansson tells you to look for reversals which may give you insights into new ways of doing things. He uses a restaurant as an example, saying that the assumption is that restaurants have menus, but the reversal would be a restaurant without a menu. This would be one where, “The chef informs each customer what he bought that day … the diner selects the desired food items and the chef creates a dish from them, specifically for each customer.”

I never tried this reversal approach, but it appears interesting. Maybe I do it without realizing it… I’ll experiment!

He also says to work on a problem, let it go, then come back to it. Well, I’m an expert at this approach, except that sometimes, I never come back… 😉

Mozilla Web Developer’s documentation

I wasted a lot of time last night searching for JavaScript documentation. My friend Scott Flinn was nice enough to give me these pointers regarding DOM and general Web work:

This is much better than flying blind, but I wish I had something more like the Java API documentation.

BTW if you don’t know Scott Flinn, you should. He is probably the best technical resource I ever met. And I don’t mean “technical resource” in an insulting way. He simply understands hands-on technology very deeply. He is also a pessimist like myself, so we do get along, I think.

Here’s some advice from Scott:

If you just want core JavaScript stuff, then you use Rhino or
SpiderMonkey (the Mozilla implementations in Java and C++ respectively).
I swear by Rhino. You just drop js.jar into your extensions directory
and add this simple script to your path:

java org.mozilla.javascript.tools.shell.Main

Then ‘rhino’ will give you a command line prompt that will evaluate
arbitrary JavaScript expressions. The nice part is that you have
a bridge to Java, so you can do things like:

js> sb = new java.lang.StringBuffer( ‘This’ );
js> sb.append( ‘ works!’ );
This works!
js> sb
This works!

What I did was to download Rhino, open the archive, and type “java -jar js.jar”. It brought up a console. System.out doesn’t work, but you can print using the “print” command. (Update:Obviously, you have to do java.lang.System.out…)

Joel on Software – Advice for Computer Science College Students

Through slashdot, I saw this nice article by Joel on what you should do if you want to become a programmer and are studying Computer Science:

Joel on Software – Advice for Computer Science College Students here are Joel’s Seven Pieces of Free Advice for Computer Science College Students (worth what you paid for them):

  • Learn how to write before graduating.
  • Learn C before graduating.
  • Learn microeconomics before graduating.
  • Don’t blow off non-CS classes just because they’re boring.
  • Take programming-intensive courses.
  • Stop worrying about all the jobs going to India.
  • No matter what you do, get a good summer internship.

Meritocracy in America

Through Downes’, I found out about this nice article about the USA not being such a meritocracy:

America’s great universities are increasingly reinforcing rather than reducing these educational inequalities. Poorer students are at a huge disadvantage, both when they try to get in and, if they are successful, in their ability to make the most of what is on offer. This disadvantage is most marked in the elite colleges that hold the keys to the best jobs. Three-quarters of the students at the country’s top 146 colleges come from the richest socio-economic fourth, compared with just 3% who come from the poorest fourth (the median family income at Harvard, for example, is $150,000). This means that, at an elite university, you are 25 times as likely to run into a rich student as a poor one.

State of blogging

Through Downes’ I found this to a report on the state of blogging. The numbers are amazing:

  • 8 million American adults say they have created blogs;
  • blog readership jumped 58% in 2004 and now stands at 27% of internet users;
  • 5% of internet users say they use RSS aggregators or XML readers to get the news and other information delivered from blogs and content-rich Web sites as it is posted online;
  • 12% of internet users have posted comments or other material on blogs.
  • 62% of internet users do not know what a blog is.

Current state of affairs in the XML world (according to me)

I’ve been working hard at an XML course for the last few months. While I’ve been done a lot of e-business related work in recent years, I didn’t consider myself an XML expert.

Still, I’ve been one of the early adopters regarding XML, starting out in 1997-1998 when it was still a cowboyland. I kept hacking away at XML until about 2001 and then I went away and did other things. I actually did a commercial project (as an technology architect) in 2001, but it was one of my last project before going back to academia (Acadia University).

What happened between now and 2001? Well, Mozilla for one thing or, rather, true standard-compliant XML support in widely available browsers. Also, a lot, but really a lot of new “standards” have come along, things like XHTML and so on.

In truth, I don’t think much changed since 2001. Not as much as I thought.

After studying carefully what’s out there, I come away with the following conclusions:

  • Internet Explorer doesn’t support basic things like XHTML and its general support for XML is quite lacking. It is simply not a good XML tool. Mozilla (including Firefox) is pretty good but there are a few gotchas: Mozilla just ignores DTDs (not validating) which brings about many problems (like missing entities) and you can’t save or source-view the output of a XSLT transformation. Still, Mozilla is good enough. I don’t know about Opera, but I heard good things.
  • DTDs are just fine and they are more often than not an overkill. XML Schema and other formal ways to specify XML applications get little support in actual software and are just not so useful.
  • Namespaces are a mess: they complicate things, they are incompatible with DTDs, and URIs as identifiers is a confusing idea. Yet, they work well enough and are usable.
  • XSLT 1.0 is truly powerful and very convenient. Couple XSLT with EXSLT extensions and you really can do pretty much anything you want. Exporting XML to HTML or to LaTeX is really easy. However, some things are tricky, like grouping. The best and fastest free XSLT engine I could find is 4suite. XSLT 2.0 is still pretty much unsupported. Either way, whether you use EXSLT extensions or XSLT 2.0, you need things like regular expressions.
  • Programming for XML through something like DOM is a major pain. Maybe XOM is better, but the basic idea is that if you have powerful high level language like Python, Ruby, Javascript or Perl, DOM-like programming on top of XML is boring. It seems like it DOM was designed with Java or C++ in mind, languages that are already a pain to use in the first place. Still, DOM works well enough but I think you need to have XPath support in your language otherwise, things could get really verbose. (In Python, use the libxml2 wrapper.)
  • Current RDF/XML is a pain, period. RDF itself is sane.

So, I think that a good XML project probably uses XSLT, maybe DTDs, a lot of XPath, but as little of the rest as possible.

Increase in older students forecast

Here’s an article giving interesting figures:

College Board figures show the number of students older than 25 has increased from 29.9 percent in fall 1999 to 31.1 percent in fall 2003. However, roughly 40 percent of regional and U.S. students are older than age 24, according to the board.

In short, a large fraction (about a third) of university students in the USA are adult students (older than 25 years old). Of course, this includes some graduate students, but I bet it includes a large number of students working to get a degree.

ASCIIMathML: a brilliant JavaScript/XML hack

Once in a while, you find something on the Web that makes you go “Wow!”. Ever since MathML came along, I’ve been fairly disappointed because it looked like it was designed to work only inside expensive commercial tools. ASCIIMathML proves I was wrong. You can write standard HTML files with some convenient mathematical notation in it, and a piece of JavaScript will dynamically convert it to MathML which displays fairly nicely in Firefox. However, I always seem to be missing some key fonts.

XML and JavaScript are a potent mix.