Being a Nice Researcher and the Real World: pure, applied, and industrial research

This morning, I am deeply upset. Some of you who know me will know why. No, I don’t care so much that Buch was elected, though it does puzzle me. Read through, you might find out why I’m upset.

I did my Ph.D. with the intent of getting into “industrial research”. Yes, there are different types of research. You have “pure” research where people have no idea why they do the research except that it looks nice. For example, solving for all algebra having property X is pure research. You have “applied” research which is often closely related to “pure” research except that the topic is a bit closer to “real world” concerns. For example, finding a new way to solve Navier-Stokes equations is applied research. It should be noted that “applied” research doesn’t equate with “useful in the real world” research. In fact, it could very well be that some “pure” research is more applicable in the real world. Finally, you have “industrial” research. Industrial research is meant to be useful in the real world. It is the primary purpose of such research. The excitement comes not from elegance alone, but mostly by solving real problems people have. You might say that some people working on the linux kernel are industrial researchers, at least when they innovate. A given researcher may cover all three research types, but he may switch hat depending on the project he is working on.

All types of research are equally worthy, but they are not equal in all things. For example, “pure” research might give you a lot of prestige is some universities. Pure researchers have “pure” concerns and it is often thought that only the smarter researchers can be pure researchers. Applied researcher have often an edge when it comes time to get some research funding. That’s because the applied researcher can easily justify that his work might be applicable in the real world. Finally, the industrial researcher might be looked down upon by some people: because industrial research must be tied closely to the real world, it will sometimes trade fashion or elegance for convenience or applicability. Often, “industrial” research may appear to be simpler, maybe easier. Believe me, it is not easier. On the other hand, the “industrial” researcher can really license technology or even start a company. This, in turn, may bring tremendous leverage to such a researcher. Unfortunately, these things take time, and in a university setting where tenure should be first on your mind, the industrial researcher might have a harder time. Hence, in many schools, most professors are pure or applied researchers. Fortunately, the industrial researcher has a broader choice of employment.

This is actually a very interesting topic. Many questions may come to mind… For example, is there any such thing as industrial research in mathematics? You bet! See SIAM.

Now, that you see what I mean by an industrial researcher, you might understand that one of his strength is when dealing with companies. He is able to understand the concerns of their engineers and his research accounted for many of these concerns already. However, for him, it is crucial that people do not get in the way between him and a company. He needs, he wants technology transfert.

That’s why I’m upset today: I feel someone pushed me aside and got in the way. People are always eager to take someone else’s work and then ignore this person when money is involved and the author is not being difficult. I hope this will get fixed, but either way, I’ve been reminded that the industrial researcher must keep a very close eye and much control on the technology transfert process. And not be nice.

At Cross Purposes: What the experiences of today’s doctoral students reveal about doctoral education

Here’s a report released in 2001 on the state of doctoral education. It looks like a serious study and the conclusion is scary:

What we learned may not be entirely surprising because our findings confirm many of the concerns that have been raised in the last 10 years. However, our data provide detailed, confirmatory evidence of particular tension points. We found that:

  • The training doctoral students receive is not what they want, nor does it prepare them for the jobs they take.
  • Many students do not clearly understand what doctoral study entails, how the process works and how to navigate it effectively.

The Alps

Found an interesting UK Indie Rock bad on inDiscover: The Alps. They have a number of freely available songs, but I only listened to “The Shining”.

There are also some new bands to explore on inDiscover: Steel Poniez (bunch of ladies) and School for the Dead (bunch of dudes).

Go! Download now! It is free! It is legal! It is fun! And be sure to rate these artists so that you can help others find the good ones!

McGrath on XML usage for Web clients

Interesting post by Sean McGrath (not the inDiscover‘s Sean McGrath, the Propylon’s Sean McGrath) on how Gmail (Google Mail) was designed. For those who don’t know Gmail is a revolutionary Web mail service à la Hotmail, a step beyond anything else I had ever seen. He explains that Gmail is thin client running thanks to javascript (and not Java!!!).

This bring him to raise an interesting question. Why doesn’t Gmail sends XML back and forth? Indeed, isn’t XML the data format of the Web? Here’s what he has to say:

Web clients carry around a basic, low level programming language called Javascript. The real beauty of Javascript is that it is dynamic – you can blurr the distinction between code and data. You can hoist the level of abstraction you work with in your app by layering domain specific concepts on top of it in the form of functions and data structures. You can sling across data structures already teed up for use on the other end with the aid of the magic of “eval”. You can implement complex behaviour by sending across a program to be run rather than trying to explain what you want done declaratively to the other side.

Now, in such a world – would you send XML data to and from? Developers with a static typing programming language background might be inclined to say yes but I suspect javascriptophiles, lispers, pythoneers and rubyites are more likely to say no. Reason being, it is so much more natural to exchange lumps of code – mere text strings remember – that can be eval’ed to re-create the data structure you have in the XML.

I think he is very much on target in the sense that people who see everything as Java or C# are likely to perceive XML very much differently from people using higher level languages.

The lesson here people is that you should master a range of languages and not one or two. And no, taking a class in Haskell once in your life doesn’t qualify.

Tim Bray opposing Web Services

Tim Bray who invented XML among other things, takes a stand against Web Services. Here’s what he says:

No matter how hard I try, I still think the WS-* stack is bloated, opaque, and insanely complex. I think it’s going to be hard to understand, hard to implement, hard to interoperate, and hard to secure.

I look at Google and Amazon and EBay and Salesforce and see them doing tens of millions of transactions a day involving pumping XML back and forth over HTTP, and I can’t help noticing that they don’t seem to need much WS-apparatus.

I’m deeply suspicious of “standards” built by committees in advance of industry experience, and I’m deeply suspicious of Microsoft and IBM, and I’m deeply suspicious of multiple layers of abstraction that try to get between me and the messages full of angle-bracketed text that I push around to get work done.

It should be noted that Tim has recently taken a job with Sun Microsystems. His current employer is very actively involved in Web Services, so I believe he takes this stand despite the current interest of his employer.

So, you want to do a Ph.D.?

Seb sent me this extract of a book. The extract is called So, you want to do a Ph.D.? As usual with this sort of book, it is delightful.

Here’s a fun quote:

One thing which is seldom mentioned is what happens to you after you finish the PhD. A classic story is as follows. A student focuses clearly, submits the thesis and starts looking for a lecturing job, only to discover that they need two years of lecturing experience and preferably a journal publication as well if they are to be appointable for a job in a good department in their field. If they had known this two years previously, they could have started doing some part-time lecturing and submitted a paper or two to a journal.

I haven’t read the entire book, of course, and I’m somewhat worried that the book might not be sufficiently focused on why one does a Ph.D. and might be a tad too cynical. Learning the rules is very nice and very important, and I wished I had learned them when it was time. However, there is also the issue of figuring out whether these rules make sense, and knowing when to break them. Well, I guess that learning the rules to begin with is a very good start.

Building the Open Warehouse

Here’s a link to slides from a talk by Roger Magoulas, (O’Reilly Media, Inc.) about building the open warehouse. The talk was presented at O’Reilly Open Source Convention 2004.

Commodity hardware, faster disks, and open source software now make building a data warehouse more of a resource and design issue than a cost issue for many organizations. Now a robust analysis infrastructure can be built on an open source platform with no performance or functional compromises.

This talk will cover a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.

Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows — tools like Perl, SQL and Apache — rather than having to procure and staff for the proprietary tools that dominate the commercial space.

Data Warehouse Architecture: – Consolidated Data Store (CDS)
– Process to condition, correlate and transform data
– Multi-topic data marts
– dimensional models
– Multi-channel data access

Open Source Components
Database: MySQL
– fast, effective
Data Movement: Perl/DBI/SQL
– flexible data access
Data Access: Perl/Apache/SQL
– template toolkit for ad hoc SQL
– Perl hash for crosstabs/pivot
– Perl for reports

Dimensional Model
– organizes data for queries and navigation from detail to summary
– normalized fact table for quantitative data
– denormalized dimensions with descriptive data
– conforming dimensions available to multiple facts

Performance Considerations
– configuration
– indexing
– SQL-92 joins
– aggregate tables and aggregate navigation

The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.

Graduate student/faculty relations

Sharleen talks about how evil junior faculty can be in their approach with grad students:

(…) in academia, (…), there are limited options, and a poor grad student may have to work with the asshole who has naive, unethical, or objectionable approaches to working with grad students. Now, we could simply say, the ones who survive are the ones who deserve to get jobs/get the PhD. We could point out that the market is much tougher. But if we respond this way, we’re not critiquing the culture of academia (a culture which, if I may point out, is largely responsible for the other problems that we all bitch about); we’re justifying it.

I’m unsure why she points at junior faculty as the source of the problem. She’s probably got some personal experience going.

However, I agree with her criticism of the tough love approach to supervising graduate students. I don’t think it can be justified from a pedagogical point of view, it is not justified from a management point of view, and so, indeed, it might be some kind of power trip.

On the other hand, I disagree with her implication that there are no choices. In most cases, the graduate student can go with another supervisor. It might costly, but it is almost always an option. Or else, you can simply go out there and find a job and be happy.

Repeat after me: the world is big and there are almost always options. Unless you are a slave stranded somewhere, you can almost certainly find another job, another graduate program, another project… it might be costly, it might imply extra work, but it is most often possible.

The reason why these professors are getting away with treating graduate students badly is that graduate students allow it. If they chose not to go with this “evil” supervisor, there wouldn’t be any problems any more.

That’s how the real world works. Evil employers will have trouble finding good employees. The good employees will leave for a better employer. That’s the market at work.

The day when the employees stop leaving, because they are scared or tired, the market stops working and the trouble starts.

Generally speaking, academia doesn’t have so much a culture problem as it has a market problem: too many potential candidates for some positions leading to a general degradation of the working conditions for everyone involved.

The art of supervising students

I had an off-line discussion with a collaborator about student supervision and how frustrating it can be. As a professor, you have, from time to time, to supervise students. It could be a graduate student you are supervising as part of their studies, it could be an undergraduate project, it could an assistant you’ve hired.

You know you have a bad student if the student

  • cannot keep track of tasks assigned to him and be responsible for such tasks;
  • lies to you about what has been done and what hasn’t been done;
  • repeatedly ignores some of your phone calls or emails.

In my experience, a bad student is a drain on your resources and a professor simply has to drop such a student as soon as possible. Even if you have funding or need of a student, you are better off with no student than a bad student.

So, what about my title? The art of supervising students?

My experience has been that there is no need to be tough or strict with the students. There is nothing magical you can do: forcefully organizing many meetings with the student often won’t help. If you have a bad student (see above), cut your losses as early as possible. Otherwise, trust the student.

Here are a few rules based on my experience:

  • Be clear about the tasks you expect the student to perform and the time it should take.
  • Be available to the student in a personalized way: some students benefit from frequent meetings, others do not.
  • Get to know and leverage the student strengths and know his weaknesses: you are better off doing some of the tasks yourself.
  • Trust the student: most students have tremendous potential and will deliver greatness given a chance.

e-Learning or else…

Important post today by Yuhong, on her experience with e-Learning. She recalls a few facts:

  • a decent videoconference setup for a classroom is less than $5000;
  • MIT is setting itself up to become the major competitor in the future education market through e-Learning: webLab and open sourceware;
  • we know of some tremendously succesful endeavours like MusicGrid lead by Martin Brooks.

I think that Yuhong misses the most important example of all: the U.K. Open University. An entire university based on e-Learning and distance education, and yet, it is one of the best schools in U.K.

I think Downes once wrote that while physical classrooms won’t go away, they will increasingly become a lifestyle choice. In the near future, when my son will attend college (if he does so), he will find a very different landscape. There will much high quality learning opportunities outside classrooms, to the extend that he may avoid entirely classrooms and actually get an even better education. On the other hand, the remaining classrooms will be high-tech classrooms with remote instructors, remote laboratories and so on.

You don’t believe me? About a quarter of current students [in U.K.] are now doing all or part of their courses online.

How to Misuse SQL’s FROM Clause

I stumbled on an interesting SQL article on the Misuse of the FROM Clause. The author argues that FROM clauses should refer to only two types of tables:

  • those from which you want values returned
  • those allowing to join two or more tables in the above category

In other words, if your select is on tables A and B, then you can select from tables A and B, and any table that can be joined with A and B, but no others.

The argument he offers is based on performance concerns. It does seem to me that any query not fulfilling this requirement would have to be relatively complex.

If we taught you to memorize, we failed you

Tall, Dark, and Mysterious wrote about this student she has in her class who is actually a fairly typical student:

“I memorized how to do the problem you did in class, but then on the test you put a DIFFERENT problem, and you never showed us how to do THAT one, and it’s not fair! My method of doing math by memorizing formulas and then blindly applying them to problems that are identical to the ones I’ve seen has gotten me A’s until now, so what gives?”

Repeat after me: memorization is not learning. Learning has to be a higher level task.

More on the CS enrollment drop

I’ve written on this blog about the recent drop in enrollment for Computer Science degrees in North America: I gave an estimate of a drop by 25%. Looks like it is worse:

The number of new undergraduate majors in U.S. computer science programs has fallen 28 percent since 2000, reports the Computing Research Association, a group of more than 200 North American computer science, computer engineering and related academic departments.

The explanation would be that students do not want a Dilbertesque life:

One reason, say those in the field, is that technology jobs appear less lucrative than they did during the dot-com boom. Then, students thought a computer science degree would lead to riches and a quick retirement. Many took on the major.

Another reason might be that Business Schools are now competing with Computer Science departments for students:

Colleges have also begun to integrate computer instruction into other majors such as e-commerce programs in business schools. A computer science degree, therefore, can be unnecessary.

Don’t memorize, change your neural pathways!

Some days ago, I stated on this blog that I had a Ph.D. in mathematics (true fact) and that I didn’t know my own phone number nor did I know multiplication tables (also true). My wife knows it is true. She still claim she has superior brain power because not only does she know our phone number, but she even knows our postal code, and she knows many other things. There is not question that my wife is one of the smartest lady in Montreal. Hey! There is a reason why I fell in love with her!

Still, I claim not to be a brain-damaged moron despite these apparent short-comings. You see, I do not memorize on purpose because I think that my time is better used by solving problems and learning new tricks.

From Downes’, I got the following bit of wisdom telling I’m not alone in thinking that memorizing facts is not key to learning…

My own research – reserach that can be extended through the many resources on this site – has already convinced me that neural structures are, as they say, plastic. For me what this means is that learning based on the fostering of habits is more important than learning based on transmission of facts, that, indeed, the facts aren’t that important at all, not nearly as important modelling effective practice, paying attention to environment, immersive, experiential based education.

So, please, do me a favor: if you teach, do not ask your students to memorize. Ask them to change their neural pathways, their thinking patterns… let their PDAs and the Web be a fact storage unit, don’t waste their brains.

Update: A colleague who has a training in history and who holds a Ph.D. says he could never remember dates, and only memorized one: December 25th 800. So, I can say that I’m not alone to think that memorization is only a minor part of learning.

Don’t Be Afraid to Drop the SOAP

Through Downes’, I found another article speaking up against SOAP: Don’t Be Afraid to Drop the SOAP. Here’s a few things it holds against SOAP, all of which are things I can testify to:

  • SOAP is difficult to debug. The SOAP message format is verbose even by XML standards, and decoding it by hand is a great way to waste an afternoon. As a result, development took almost twice as long as anticipated.
  • The fact that all requests happened live over the network further hampered debugging. Unless the user was careful to log debugging output to a file it was difficult to determine what went wrong.
  • SOAP doesn’t handle large amounts of data well. This became immediately apparent as we tried to load a large data import in a single request. Since SOAP requires the entire request to travel in one XML document, SOAP implementations usually load the entire request into memory. This required us to split large jobs into multiple requests, reducing performance and making it impossible to run a complete import inside a transaction.
  • Network problems affected operations that needed to access multiple machines, such as the program responsible for moving templates and elements. Requests would frequently timeout in the middle, sometimes leaving the target system in an inconsistent state.

SOAP leads to strongly coupled, poorly scalable, and bandwidth hungry solutions?

Here’s some comments by Joe Walnes on his experience with SOAP. The scary thing is that he comes to exactly the same conclusions as I did on my own… Any SOAP supporter out there wants to answer these:

On the last system I worked on, we were struggling with SOAP and switched to a simpler REST approach. It had a number of benefits.

Firstly, it simplified things greatly. With REST there was no need for complicated SOAP libraries on either the client or server, just use a plain HTTP call. This reduced coupling and brittleness. We had previously lost hours (possibly days) tracing problems through libraries that were outside of our control.

Secondly, it improved scalability. Though this was not the reason we moved, it was a nice side-effect. The web-server, client HTTP library and any HTTP proxy in-between understood things like the difference between GET and POST and when a resource has not been modified so they can offer effective caching – greatly reducing the amount of traffic. This is why REST is a more scalable solution than XML-RPC or SOAP over HTTP.

Thirdly, it reduced the payload over the wire. No need for SOAP envelope wrappers and it gave us the flexibility to use formats other than XML for the actual resource data. For instance a resource containing the body of an unformatted news headline is simpler to express as plain text and a table of numbers is more concise (and readable) as CSV.

Victor Shoup’s A Computational Introduction to Number Theory and Algebra

Through Didier, I got to Victor Shoup’s Home Page. He has an on-line textbook called A Computational Introduction to Number Theory and Algebra. It is unclear whether he intends the textbook to remain free, but it is pretty cool to post the book on his home page. Shoup’s is an expert in cryptography.

How Technology Will Destroy Schools

Through Downes’, I found an article by David Wiley’s with the provocative title How Technology Will Destroy Schools (he actually is being needlessly provocative, he means “schools as they exist now”). The gist of his argument goes as follows:

The development of (…) technology will obviate the need for certain types of instruction — like the teaching of facts. Why spend time memorizing when the same information is available just as quickly from the network as it is from your own memory? But never fear, schools! The technology will create the need for new types of instruction — in higher level information literacy skills. Perhaps this will finally force some change through the public schools.

Well, I must admit. I have a Ph.D. in mathematics and I never learned my multiplication tables. There you go. I never saw the point of learning these tables, so I didn’t. Instead, I learned a few tricks to do multiplications… like 9 times 8 is almost 10 times 8, you have to subtract 1 times 8.

Mathematics is not about learning facts. I suspect that all disciplines have a component above learning the facts. You can’t be an expert in something if you only know the facts… because I can easily input the facts into a piece of software and compete with you, but we all know that software can’t compete (yet) against human experts. I’m not very good at memorizing facts, I’ve never been good at it. In fact, I’m not good at memorizing anything and that’s why I have a PDA always with me. Yet, I’m in expert at some things.

It is the difference between real knowledge and shallow knowledge. Most of our education system is based on acquiring and testing shallow knowledge. Most but not all.

How are you going to get past shallow knowledge through technology as Wiley predicts we will? I think that blogs, games, and simulations are good examples. Yes, we can role play without technology, but it becomes so much cheaper to deploy gaming scenarios through technology (because you only have to do it once) that it might become more common place in the future.

Maybe my son Lohan, by the time he makes it to school, will have “gaming instruction” where he will enter a gaming universe to learn basic mathematics. Who knows.

I’m not holding my breath though, I think we lack the human power to do pull it off in the next 5 years.

What the Bubble Got Right

A beautiful article by Paul Graham: What the Bubble Got Right. It is a good analysis of the dot-com era. I totally agree with the analysis too! People tend to overestimate the impact of technology over the short term, but underestimate it over the longer term. The dot-com bubble was proof of that. It is not so much that the new economy was a sham… but rather that the new economy will take a bit more than 2 years to settle… Here’s the conclusion of this beautiful article:

When one looks over these trends, is there any overall theme? There does seem to be: that in the coming century, good ideas will count for more. That 26 year olds with good ideas will increasingly have an edge over 50 year olds with powerful connections. That doing good work will matter more than dressing up– or advertising, which is the same thing for companies. That people will be rewarded a bit more in proportion to the value of what they create.

If so, this is good news indeed. Good ideas always tend to win eventually. The problem is, it can take a very long time. It took decades for relativity to be accepted, and the greater part of a century to establish that central planning didn’t work. So even a small increase in the rate at which good ideas win would be a momentous change– big enough, probably, to justify a name like the “new economy.”

As a side-note, the mere fact that such a good article is waiting at the end of a URL, for all to see and absolutely free, should remind you of how powerful, after all, the Web really is. I was raised in an era where you needed to go buy a magazine to read such a great article. Then you’d get many bad articles, but what could you do: there were few magazines, and your choices were limited. Things have changed, they have changed tremendously.

Data centers as a utility?

Seems like Gartner predicts data centers are going to become a utility:

The office environment will dramatically change in 50 years’ time, with desktop computers disappearing, robots handling more manual tasks, and global connectivity enabling more intercontinental collaboration. Data centers located outside the city will run powerful database and processing applications, serving up computing power as a utility; many more people will work remotely, using handheld devices to stay connected wherever they go, although those devices will be much more sophisticated and easier to use than current handhelds.