Peter released a technical report (available from arxiv) on the computation of the Tucker decomposition on large tensors: the Tucker decomposition is just a multidimensional generalization of the Singular Value Decomposition (SVD). The report includes a new algorithm designed by Peter which is more accurate than competing Matlab implementations, in the case where you have very large tensors (3 or 4 dimensional) and need external memory computations.

There exist incremental SVD algorithms. It does seem to me that a nice property of Turney’s tensor algorithm is that it can be made part of an incremental scheme efficiently.

Another challenge would be to have a serious look at parallel implementations. I think that Turney’s scheme could benefit tremendously from several processors.

One thing you never read about is how people do research in their mind. People do describe how to write papers, how to get an academic job, but somehow, I cannot recall anyone describing their thought process.

Mine is simple enough. It includes both theoretical and experimental work. So here it is…

  • I usually start with a specific problem. This problem must be about something significant: a few people worldwide might want to know about the solution. It must be sufficiently narrow that I can address it in a few months. I try to apply the Turney’s principle: be ambitious. In other words, it should not be obvious when I begin that I will succeed. Yes, this means that I do not know I will be able to write a paper at the end! And yes, this means that I sometimes fail. Ideally, I pick a problem so original that I am the only one working on it, worldwide. Almost invariably, the nicest problems take one of the following forms: 1) I want to explain theoretically something I observe experimentally 2) I want to improve on an existing method by at least an order of magnitude (in accuracy, simplicity, speed). Merely aiming to improve an existing approach by a small amount is something I avoid, if only because I know that given enough time, I can always hope to improve any technique by a tiny amount. There is no challenge, no surprise, no risk of failure!
  • A good problem is such that I can then it process down to at least one simple conjecture. A simple conjecture is one that I can realistically hope to make progress on within a few days or a few hours. Sometimes I verify the conjecture experimentally, sometimes theoretically, it does not matter. I avoid working on several small conjectures at the same time: I try to handle them one at a time. Sometimes, the result of my work on a conjecture will be another conjecture. Sometimes these conjectures turn out to be silly, in retrospect.
  • Once I have processed the first simple conjecture, I try to come up with other ones that will bring me closer to a solution to my problem. Always picking the next most promising one.
  • Very often, I will give up on a problem or the problem will change drastically over time. Or the problem will generate worthwhile subproblems. At any given time, I have about a dozen different problems on my radar, but only about 2 or 3 active ones, and only about 2 or 3 conjectures I am working on.

Collaboration messes up this process because I no longer control the overall problem. But I will still decompose the problem into conjectures that I take one at a time. One benefit of working with someone else is that you have someone who will read and check your conjectures. You can also check someone’s else conjecture which is refreshing. You are also much less likely to make crucial mistakes in the process if you work with others (especially if your collaborators are any good).

To a large extend, my process does not rely on brilliant insights nor luck. I merely grind the problem slowly, each time approaching closer and closer to the solution (hopefully). I do not care about making mistakes. I am very, very often wrong. In the past, I have wasted months working on useless problems, generating useless conjectures: this tends to happen more frequently if I work alone.

What makes me more productive, mostly, are nice problems. Often, picking the small conjectures is rather simple: after all, I do not need to be right, I just need to grind at the problem. If there is any talent involved at all in my process, it has to do on how I pick the overall problem. But even then, I think that passion matters more than talent. The more I care about the problem, to faster I make progress. And more importantly, the happier I am as I work.

Funding opportunities, networking, fame and fortune play no role in the above process. At no point do I worry about what others will think except maybe when I pick the overall problem. And even then, I only check, in my mind, that a few people will care, enough that some journal will publish it, eventually. This egocentric process is probably suboptimal. However, my overarching goal is not to be famous, but rather to enjoy myself and get paid in the process. This is not to say I do care about my peers: I want to earn their respect.

I can sometimes offload some of the conjectures to people working on my projects. However, my process does not scale up very well. I can work in small teams (2 or 3 people), but I could not run a large laboratory (10 people or more) with the above process. I am more of a craftsman than a tycoon.

Subscribe to this blog
in a reader
or by Email.

I wish I could realistically attend this. They are holding a tribute to Jim Gray, the famous database researcher. Jim has been lost at sea. We cannot conclude he is dead, though it becomes increasingly difficult to find an explanation for his disappearance. Mike Stonebraker, of Postgresql fame, will give a talk on “Why Jim Got the Turing Award.” Should be interesting.

I have written about Jim quite a bit here: Jim Gray missing at sea, What is infinite storage? , Science in an exponential world, That’s why I tinker, A “Measure of Transaction Processing” 20 Years Later, and so on.

Of all database researchers, Jim is the one who has had the biggest impact on my research and my teaching. Indeed, the cool thing about Jim is that he did not work on abstract nonsense. You can actually take his papers, and give the gist of them to your students, and you will have helped your students a lot.

Somehow I missed this news bite by a few days. It seems AT&T is the the winner of the $50,000 Progress Prize in the Netflix Prize Collaborative Filtering competition! You can download a paper that lays out their strategy.

(Source: Turney)

I recently asked what were the most important problems in 2007. I was fortunate enough to get two answers.

Peter Turney gave the most elaborate answer. He said Artificial Intelligence and getting computers to think in terms of analogies, is the most important problem. He backs up his claim with a reference. He then said that the second most important problem is understanding cooperation and how to improve cooperation. Fortunately, I work on this problem: I have done work in collaborative filtering and collaborative data exploration. I think we need to support better collaborative data intelligence through more flexible database techniques (this is currently my main research drive). Naturally, I will never solve this problem on my own: I only hope to contribute fragments.

Sérgio gave us the following two important problems:

  • Management of large collections of digital media (photos, video, audio). People are accumulating more and more and it is getting harder to find and share items. Microsoft has very good research on this.
  • Digital preservation. The long time preservation of digital content is still unsolved and is an active research field.

I think you can generalize the first problem to “manage large collections of data,” and if you accept that the solution involves collaborative techniques, then I also work on this problem.

So, all of you who stayed silent, what do you think are the important problems?

Are you working on the above problems. If not, why?

« Previous PageNext Page »

Powered by WordPress