The Alps

Found an interesting UK Indie Rock bad on inDiscover: The Alps. They have a number of freely available songs, but I only listened to “The Shining”.

There are also some new bands to explore on inDiscover: Steel Poniez (bunch of ladies) and School for the Dead (bunch of dudes).

Go! Download now! It is free! It is legal! It is fun! And be sure to rate these artists so that you can help others find the good ones!

McGrath on XML usage for Web clients

Interesting post by Sean McGrath (not the inDiscover‘s Sean McGrath, the Propylon’s Sean McGrath) on how Gmail (Google Mail) was designed. For those who don’t know Gmail is a revolutionary Web mail service à la Hotmail, a step beyond anything else I had ever seen. He explains that Gmail is thin client running thanks to javascript (and not Java!!!).

This bring him to raise an interesting question. Why doesn’t Gmail sends XML back and forth? Indeed, isn’t XML the data format of the Web? Here’s what he has to say:

Web clients carry around a basic, low level programming language called Javascript. The real beauty of Javascript is that it is dynamic – you can blurr the distinction between code and data. You can hoist the level of abstraction you work with in your app by layering domain specific concepts on top of it in the form of functions and data structures. You can sling across data structures already teed up for use on the other end with the aid of the magic of “eval”. You can implement complex behaviour by sending across a program to be run rather than trying to explain what you want done declaratively to the other side.

Now, in such a world – would you send XML data to and from? Developers with a static typing programming language background might be inclined to say yes but I suspect javascriptophiles, lispers, pythoneers and rubyites are more likely to say no. Reason being, it is so much more natural to exchange lumps of code – mere text strings remember – that can be eval’ed to re-create the data structure you have in the XML.

I think he is very much on target in the sense that people who see everything as Java or C# are likely to perceive XML very much differently from people using higher level languages.

The lesson here people is that you should master a range of languages and not one or two. And no, taking a class in Haskell once in your life doesn’t qualify.

Tim Bray opposing Web Services

Tim Bray who invented XML among other things, takes a stand against Web Services. Here’s what he says:

No matter how hard I try, I still think the WS-* stack is bloated, opaque, and insanely complex. I think it’s going to be hard to understand, hard to implement, hard to interoperate, and hard to secure.

I look at Google and Amazon and EBay and Salesforce and see them doing tens of millions of transactions a day involving pumping XML back and forth over HTTP, and I can’t help noticing that they don’t seem to need much WS-apparatus.

I’m deeply suspicious of “standards” built by committees in advance of industry experience, and I’m deeply suspicious of Microsoft and IBM, and I’m deeply suspicious of multiple layers of abstraction that try to get between me and the messages full of angle-bracketed text that I push around to get work done.

It should be noted that Tim has recently taken a job with Sun Microsystems. His current employer is very actively involved in Web Services, so I believe he takes this stand despite the current interest of his employer.

So, you want to do a Ph.D.?

Seb sent me this extract of a book. The extract is called So, you want to do a Ph.D.? As usual with this sort of book, it is delightful.

Here’s a fun quote:

One thing which is seldom mentioned is what happens to you after you finish the PhD. A classic story is as follows. A student focuses clearly, submits the thesis and starts looking for a lecturing job, only to discover that they need two years of lecturing experience and preferably a journal publication as well if they are to be appointable for a job in a good department in their field. If they had known this two years previously, they could have started doing some part-time lecturing and submitted a paper or two to a journal.

I haven’t read the entire book, of course, and I’m somewhat worried that the book might not be sufficiently focused on why one does a Ph.D. and might be a tad too cynical. Learning the rules is very nice and very important, and I wished I had learned them when it was time. However, there is also the issue of figuring out whether these rules make sense, and knowing when to break them. Well, I guess that learning the rules to begin with is a very good start.

Building the Open Warehouse

Here’s a link to slides from a talk by Roger Magoulas, (O’Reilly Media, Inc.) about building the open warehouse. The talk was presented at O’Reilly Open Source Convention 2004.

Commodity hardware, faster disks, and open source software now make building a data warehouse more of a resource and design issue than a cost issue for many organizations. Now a robust analysis infrastructure can be built on an open source platform with no performance or functional compromises.

This talk will cover a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.

Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows — tools like Perl, SQL and Apache — rather than having to procure and staff for the proprietary tools that dominate the commercial space.

Data Warehouse Architecture: – Consolidated Data Store (CDS)
– Process to condition, correlate and transform data
– Multi-topic data marts
– dimensional models
– Multi-channel data access

Open Source Components
Database: MySQL
– fast, effective
Data Movement: Perl/DBI/SQL
– flexible data access
Data Access: Perl/Apache/SQL
– template toolkit for ad hoc SQL
– Perl hash for crosstabs/pivot
– Perl for reports

Dimensional Model
– organizes data for queries and navigation from detail to summary
– normalized fact table for quantitative data
– denormalized dimensions with descriptive data
– conforming dimensions available to multiple facts

Performance Considerations
– configuration
– indexing
– SQL-92 joins
– aggregate tables and aggregate navigation

The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.

Graduate student/faculty relations

Sharleen talks about how evil junior faculty can be in their approach with grad students:

(…) in academia, (…), there are limited options, and a poor grad student may have to work with the asshole who has naive, unethical, or objectionable approaches to working with grad students. Now, we could simply say, the ones who survive are the ones who deserve to get jobs/get the PhD. We could point out that the market is much tougher. But if we respond this way, we’re not critiquing the culture of academia (a culture which, if I may point out, is largely responsible for the other problems that we all bitch about); we’re justifying it.

I’m unsure why she points at junior faculty as the source of the problem. She’s probably got some personal experience going.

However, I agree with her criticism of the tough love approach to supervising graduate students. I don’t think it can be justified from a pedagogical point of view, it is not justified from a management point of view, and so, indeed, it might be some kind of power trip.

On the other hand, I disagree with her implication that there are no choices. In most cases, the graduate student can go with another supervisor. It might costly, but it is almost always an option. Or else, you can simply go out there and find a job and be happy.

Repeat after me: the world is big and there are almost always options. Unless you are a slave stranded somewhere, you can almost certainly find another job, another graduate program, another project… it might be costly, it might imply extra work, but it is most often possible.

The reason why these professors are getting away with treating graduate students badly is that graduate students allow it. If they chose not to go with this “evil” supervisor, there wouldn’t be any problems any more.

That’s how the real world works. Evil employers will have trouble finding good employees. The good employees will leave for a better employer. That’s the market at work.

The day when the employees stop leaving, because they are scared or tired, the market stops working and the trouble starts.

Generally speaking, academia doesn’t have so much a culture problem as it has a market problem: too many potential candidates for some positions leading to a general degradation of the working conditions for everyone involved.

The art of supervising students

I had an off-line discussion with a collaborator about student supervision and how frustrating it can be. As a professor, you have, from time to time, to supervise students. It could be a graduate student you are supervising as part of their studies, it could be an undergraduate project, it could an assistant you’ve hired.

You know you have a bad student if the student

  • cannot keep track of tasks assigned to him and be responsible for such tasks;
  • lies to you about what has been done and what hasn’t been done;
  • repeatedly ignores some of your phone calls or emails.

In my experience, a bad student is a drain on your resources and a professor simply has to drop such a student as soon as possible. Even if you have funding or need of a student, you are better off with no student than a bad student.

So, what about my title? The art of supervising students?

My experience has been that there is no need to be tough or strict with the students. There is nothing magical you can do: forcefully organizing many meetings with the student often won’t help. If you have a bad student (see above), cut your losses as early as possible. Otherwise, trust the student.

Here are a few rules based on my experience:

  • Be clear about the tasks you expect the student to perform and the time it should take.
  • Be available to the student in a personalized way: some students benefit from frequent meetings, others do not.
  • Get to know and leverage the student strengths and know his weaknesses: you are better off doing some of the tasks yourself.
  • Trust the student: most students have tremendous potential and will deliver greatness given a chance.

e-Learning or else…

Important post today by Yuhong, on her experience with e-Learning. She recalls a few facts:

  • a decent videoconference setup for a classroom is less than $5000;
  • MIT is setting itself up to become the major competitor in the future education market through e-Learning: webLab and open sourceware;
  • we know of some tremendously succesful endeavours like MusicGrid lead by Martin Brooks.

I think that Yuhong misses the most important example of all: the U.K. Open University. An entire university based on e-Learning and distance education, and yet, it is one of the best schools in U.K.

I think Downes once wrote that while physical classrooms won’t go away, they will increasingly become a lifestyle choice. In the near future, when my son will attend college (if he does so), he will find a very different landscape. There will much high quality learning opportunities outside classrooms, to the extend that he may avoid entirely classrooms and actually get an even better education. On the other hand, the remaining classrooms will be high-tech classrooms with remote instructors, remote laboratories and so on.

You don’t believe me? About a quarter of current students [in U.K.] are now doing all or part of their courses online.

How to Misuse SQL’s FROM Clause

I stumbled on an interesting SQL article on the Misuse of the FROM Clause. The author argues that FROM clauses should refer to only two types of tables:

  • those from which you want values returned
  • those allowing to join two or more tables in the above category

In other words, if your select is on tables A and B, then you can select from tables A and B, and any table that can be joined with A and B, but no others.

The argument he offers is based on performance concerns. It does seem to me that any query not fulfilling this requirement would have to be relatively complex.

If we taught you to memorize, we failed you

Tall, Dark, and Mysterious wrote about this student she has in her class who is actually a fairly typical student:

“I memorized how to do the problem you did in class, but then on the test you put a DIFFERENT problem, and you never showed us how to do THAT one, and it’s not fair! My method of doing math by memorizing formulas and then blindly applying them to problems that are identical to the ones I’ve seen has gotten me A’s until now, so what gives?”

Repeat after me: memorization is not learning. Learning has to be a higher level task.