Yesterday, I attended Olivier‘s talk at the Curves and Surfaces conference. Olivier is a fellow blogger and researcher. Alas, I was too tired after an afternoon of talks and went to sleep, so I did not hunt Olivier.

In any case, he presented one of the most interesting talk so far in the conference. While quite possibly extremely expensive (sounds like his algorithms are bound to be O(n^2)?!?), the approach he proposed, Learning on Manifolds, seems well grounded for practical problems. Among the key idea is that you draw a graph where there are vertices between close data points, and then you compute some Laplacian. By computing the eigenvectors of the Laplacian, you can “cluster” the data in different manifolds, assuming you know how many clusters you want. There are some hidden assumptions, like the fact that you have rather uniform density, and that all components of all data points are known (no missing data!).

Naturally, Olivier was asked “where are your analytical results?” I felt that while he correctly answered, Olivier was a bit shy. Basically, I would have answered “where are your experimental results?” Because, they have none. They claim that their analytical results are measures of goodness, but that’s only a claim. A theorem is not, necessarily, a valuable thing. Just the right theorems at the right times, are valuable. Olivier had experimental evidence on his side, at least on toy problems. But Olivier was probably wise not to answer the way I would have, because, no doubt, I would have been booed out of the room.

In this kind of work, there are two measures of how good something is. It can work well in practice, which you can verify by trying to solve some real-world problem, or, at least, a toy problem… or else, you can produce analytical results, or theorems, and claim that these theoretical results are an adequate measure of “goodness”. Well, guess what, theory alone is not enough, just like experimental evidence alone is not enough. If you come up with a theorem that says “if the solution is in C^{1} then this and this will happen”, and, no solution is ever in C^{1}, what then? Anyone who has tried to work on industrial problems know that while theory can (and not *is*) be very helpful, it can be equally deceiving. Also, few real problems meet smoothness conditions such as C^{k} classes. And I say this as someone who has a Ph.D. in Mathematics!

An earlier talk in the conference discussed the problem of computing PageRank when the damping (or regularization) factor goes to zero. No, I don’t know what it had to do with Curves and Surfaces, but it was enjoyable anyhow. When I asked the author why he was assuming that removing the damping factor would improve search on the Internet, he clearly did not grasp my question. His answer was basically equivalent to saying “I don’t know that it will be better, but it can’t hurt.” Maybe it is true, maybe it isn’t, though I feel that he might have missed a few papers on the topic, since I don’t think researchers think that the damping factor is a harmful hack to achieve fast numerical convergence. I was annoyed that despite all his fancy algorithms he didn’t bother to present us with some experimental evidence, so we could see if his algorithms did a good job or not. He did not even present results over toy cases. Surely, he thought, presenting the mathematics is sufficient?

I wish we could change the culture and get people to work more often on real industrial problems.

No, I don’t think he made any sort of comparison and I don’t recall mention of ISOMAP or LLE (though, they may have be cited?), and I think that in the talk, he only tested his technique on a toy case with images having two degrees of freedom, and 4 significant eigenvectors (corresponding to the corner of a 2D patch). I got the feeling he had his own Matlab implementation and that’s all he used. He didn’t claim, however, to either present a survey or that all the work presented was his own.

Naturally, the usual disclaimers apply: don’t trust me too much and check with Olivier for details.

Did he compare with ISOMAP and LLE ?

With the warning that my understanding of what he presented is very limited (CS 2006 papers will only be out next year) and that I don’t know what ISOMAP or LLE are (other than what I could learn in 10 seconds through Google), one argument he had was that you don’t actually want to compute or estimate the manifold for the problem types he is interested in. Moreover, solving for the manifold can actually be an ill-posed problem in the sense that an actual manifold might not exist in the real problems, while the “approximate idea” of a manifold might still be useful. I don’t know what ISOMAP or LLE do… My shallow understanding says the he casts the problem as a graph problem and remains there for all time (though it is a weighted graph!). The manifold idea kicks in only through the computation of the Laplacian which is used to find your clusters through a dominant eigenvector approach.

So, all he ever does is classification/clustering. You’ve got nodes in your graph and you want to cluster them. Soooo… he does, essentially, discrete mathematics… (or at least solves discrete problems).

One point that wasn’t entirely clear in his presentation is how you find the nearest points, and he got a question about this from Wolfang Dahmen, but I don’t think he had worried about the computational complexity.

(Need I remind you again not to trust me too much?)

Now, I could imagine machine learning techniques where you actually, given dense and uniform data points, you try to compute the manifold (maybe by local patches?) for the purpose of, say, prediction/extrapolation/interpolation… that wouldn’t be at all what he presented and I think he made a case against the practicality of such approaches.

A lot of learning theory seems to have to do with “regression”: given data points, find the “function” (and I extend “function” to include “manifold”) which best fits the data. Given some assumptions on the how nice the function (or the manifold) is, then you can prove some nice (theoretical) results about the rate of convergence and stuff. This is not what Olivier presented.

My blog post was actually a rant about some of these techniques that were not justified by applications: for example, smoothness is often a crazy expectation in the real world. There are many assumptions you can make, but differentiability of your data seems so unrealistic! (to me, but then again, it is only a blog post)

Inviting accusations of self-promotion, I can point you to a few papers discussing the use of graph Laplacian-based techniques for ad hoc information retrieval. The CIKM paper in particular reduces a few classic IR techniques to graph regularization.

My thinking is very clear-cut: theory and practice are at their best when they collaborate. A theorem is nothing without practical validation, and experimental results are nothing without a theorem. We should never go too far without a theory, and we should never go too far with experimental validation.

I’m amazed at how few people share my views on this. I end up sounding like an extremist because I advocate something that ought to be common sense (I think).

Thanks a lot for attending my talk and for your interest in this work. I must start by saying that what I presented is not my own work, but it was more a survey of recent work that has been done in the Machine Learning community.

The goal was to show to people in the audience that this community had worked on interesting problems and proposed interesting approaches to deal with those.

My own work was more on the theoretical side (trying to understand how these things work, what are their properties, how to unify the various algorithms that have been proposed…). But as you point out, I have been a bit shy when answering Ron DeVore’s question about analytical results.

I am fine with the fact that, as a mathematician, Ron needs properly formulated questions to start looking for results to prove.

But on the other hand, I think the ultimate test is practical efficiency.

In a way, my brain is now split into two INDEPENDENT parts: one part of my brain likes to prove theorems, the other part likes to solve practical problems.

Unfortunately, I never managed to have them collaborate in a productive manner ðŸ˜‰

Still, I concede that it is necessary to try and formalize the problems one is working on. Not in order to prove things but in order to understand what you are doing.

For example, in the context of graph Laplacians, I think it is important to understand the meaning of the quantities you manipulate. For example, the fact that there are analogies between discrete quantities and continuous ones is important to guide intuition.

But proving that the former converge to the latter is just a mathematical exercise, which is often frustrating because you have to introduce a lot of assumptions (which the practical side of my brain is fighting against) in order to prove anything…

If you are interested, I will put the slides on my publications page. Most important is to look at the last slide which contains bibliographic references (and most important is to look for the references inside these papers, not the papers themselves which are too theoretically oriented;)