Tim Bray on solving the economic crisis
For reasons I will not go into, this quote feels very satisfying today:
Solution to economic crisis: sack everyone who has an MBA. (Tim Bray)
For reasons I will not go into, this quote feels very satisfying today:
Solution to economic crisis: sack everyone who has an MBA. (Tim Bray)
John Cook gives us a nice recipe to quickly find all squares in a set of integers. For example, given 3, 4, 9, 15, you want your algorithm to identify 4 and 9 as squares.
The naïve way to solve this problem goes as follows:
This may prove too expensive since the square-root operation must be computed using a floating-point algorithm.
A better way is to look at the first 4 bits of each integer. If the integer is a square, then the first 4 bits must have value 0, 1, 4, or 9. If you have a random distribution of numbers, this means that you can quickly discard 3 out of 4 numbers.
It is not immediately obvious that you will speed up the retrieval because inserting this check will add some overhead. However, it doubles the speed according to John. It is even less obvious that checking the first 8 or 16 bits instead of just the first 4 bits, can help. John says it does not help in one C++ implementation, but it does in a C# implementation.
This sort of strategy is entirely general. The research question is how much work should you do on fast dismissal? Too much effort toward dismissing lots of candidates might be counterproductive. Too little and your performance might not improve optimally.
Recently I started to wonder whether we could make it multipass: you first dismiss a few candidates with a cheap test, then on the survivors you use a more expensive test and so on. For example, you first check the first 4 bits, and if you cannot dismiss the candidate, you check the next 4 bits and so on. It is not a surprising idea, but figuring out whether it is worth the effort is a research question.
To make my point, I have worked on fast retrieval under the Dynamic Time Warping (DTW) distance, a nonlinear distance measure between time series. The DTW does not satisfy a triangle inequality. It is commonly used as a pattern recognition technique when comparing time series. It was initially designed to compare voice samples, allowing for changes in voice rhythm.
Eamonn Keogh from UCIUCR has come up with a simple but nearly optimal way to compute a lower bound to the DTW between any two times series, called LB_Keogh (named after himself). Just like in the John Cook algorithm, this lower bound quickly discards the false negatives. If you are interested, Eamonn has applied LB_Keogh to just about every time series problem you can think of. (Update: one hundred people or more also used LB_Keogh in their work, see comments below.)
I improved over LB_Keogh as follows. If LB_Keogh is not good enough (and only if it is not good enough), I compute a tighter lower bound (called LB_Improved). Surprisingly, in many cases, I can improve the retrieval time by a factor of two or more.
I have published my work as a software library, but also as the following paper:
Daniel Lemire, Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound, to appear in Pattern Recognition.
This sort of work is much more difficult than it appears. I could have easily made my method look good by optimizing it, while leaving the competing methods unoptimized. By publishing my implementation, I go a long way toward keeping me honest. If I fooled myself and the reviewers, someone might find out by surveying my source code.
My wife sometimes asks me why I am not working on important problems like world hunger. Instead, I am one of the top world expert in tag-cloud drawing. I am sure she thinks that I just fool around, faking serious research.
I actually take my research very seriously.
I like to distinguish abstract from concrete research. Concrete research is when you seek to obtain results in special cases. For example, an AI researcher may try to first understand how we can detect spam. Eventually he might move on to even more sophisticated tasks. In such a form of research, there are no overarching formal plans. You could say it is inductive, maybe. Researchers are often driven to this form of research because the deeper problems are simply too difficult to address directly. (I define a problem to be too difficult when you cannot make noticeable progress in a matter of months.) They hope for a breakthrough to an important problem to come as they work on a narrow issue.
Abstract research derives from a formal plan. Semantic Web is one such a plan. Tim Berners-Lee even drew diagrams early on of what the beast should look like. The research issues are clearly laid out. As a researcher you are tackling an extremely difficult problem, unsure whether you will ever make any noticeable progress. Researchers follow this path because they believe that only a focused effort in a definite direction can solve the difficult problems. Funding agencies love abstract research.
It might be a matter of biology, but my brain has always been much more productive in concrete research. I resist the inductive/deductive classification because it feels wrong. However, times and times again, working on a tractable, but possibly insignificant problem, has lead me to understand a deeper issue. When the problems are too big, my brain gets into circular and incorrect arguments. I need to chop down the problems to a manageable size. The problems need to be hard enough to push me to the limits, but easy enough that I can make weekly progress. Moreover, I cannot never know exactly what I will be doing a month later, as a researcher.
I will make a stronger claim: abstract research is never done. Researchers will give the illusion that they are working directly on some grand problem (like world hunger), but, in reality, they will work at a much smaller scale. And when a researcher solves a grand problem in what seems like a short time, and with few concrete possibly irrelevant steps, I attribute it to luck or lies.
See also my post my research process.
We are trying to design a master degree in Information Technology. To me, this sort of program should be a professional master degree, that is, it does not lead naturally to a research career or a Ph.D.
My business colleagues argue in favour of research methodology courses. Apparently, students need to learn how to conduct interviews and such. In any case, I then pointed out that my master degree did not contain any such course. One of my business colleague then said a deadly thing:
Of course, you got a technical master degree!
This got me really angry. Really, really, really angry. I do not think I ever got so angry in my life.
For the record, my master degree was in Mathematics at the University of Toronto. Is Mathematics technical? If technical is to have a “practical” connotation, I can tell you that none of my graduate courses were technical. Are fewnomials practical? I think not.
But the deeper implication was that anything having to do with Science was technical. That is, it deals with nuts and bolts. And I think that it is squarely wrong. From my view point, business is far more technical. And I ran my own business for several years. The business side of things was always the boring-but-easy component.
There is a distinct feeling in North America that business is king, and science & technology are things monkeys or foreigners can do. Yet, in my experience, it is a lot harder to design a usable web application than negotiate a business deal. I believe that India and China are getting a sweet deal by doing our science & technology while we focus on business. A very sweet deal indeed.
I think that Amazon, Google, Cisco, Microsoft and so on, thrive because many of their engineers have a deep knowledge of Computer Science. Kill the science and you kill the business.
But even if you discard science. Writing good source code is hard. Very hard. And it is not hard for technical reasons, not any more than painting, movie-making and sculpture are technical challenges.
In any case, I believe that North America is headed for a wall if it fails to recognize that its prosperity is due to culture, science and technology. And given that 40% of all students at my school go for a business degree, I am nervous.
See also my post Career Swings where I wrote:
I cannot believe that in 2015, we’ll all be lawyers, business managers, salesman, and medical doctors. I cannot believe that technology will stand still and mathematics beyond basic algebra will be a lost art. I cannot believe my two sons will have business degrees and make three times my salary by managing a bunch of underpaid Indian programmers.
The novel Spin won the Hugo Award for Best Novel in 2006.
It is what I would call a “temporal disparity” novel. Earth becomes suddenly surrounded in a temporal shield that slows time down for human beings. Alas, the Sun is aging very fast for the poor human beings. Are we going to die? Who is creating this field?
This is almost exactly the reverse story from Georges-Jean Arnaud‘s La grande séparation (1971-1973). In Arnaud’s story, a planet has a similar temporal field, but it accelerates time on the planet. Even though the planet has primitive technology, it is constantly surveyed for any sign of technological development. Spin offers the counterpart story.
A temporal disparity leads to a technological disparity: a small band of savages can evolve into a technologically superior race while you are having coffee.
Pros
The novel is very good. The author writes with good scientific rigour. The writing is supported by repeatedly introducing new mysteries in every chapter… to keep you coming for more. The characters are believable and well drawn.
Cons
The author tried to limit the scope of the story to few characters, but not all of them are good characters. The writing style reminds me a bit of Card’s Ender’s game series. There is the extra smart kid who grows up to be the is the only one able to see through what is happening. I found this particular element of the novel irritating. A major catastrophe hits the Earth and only one man seems to be able to put it all together? I am a bit disappointed by how the author dealt with anything outside the Earth, including the Martians. He could have done so much more!
Sequels are upcoming.
Powered by WordPress
© 2004-2012, Daniel Lemire (lemire at gmail dot com). This work is licensed under a Creative Commons License.