Database Questions for 2010: What’s On My Mind

I started 2009 with an interest in Web  2.0 OLAP and collaborative data processing. The field of collaborative data processing has progressed tremendously. Last year, we got Google Fusion Tables and data warehousing products are getting more collaborative.

In 2010, my research might focus more on database theory—while maintaining a strong experimental bias. Specifically, I am currently thinking about:

  • Lightweight compression. The goal of lightweight compression is save CPU cycles—not storage! Of course, the CPU architecture is critical. Thus, you have to worry about instruction-level parallelism. Measuring the quality of the compression by the compression ratio is wrong.
  • Row reordering. Some compression formats, such as run-length encoding, are sensitive to the order of the objects in a database. Reordering them is NP-hard. The efficiency of the heuristics depends on the compression format. I will continue my earlier work on this topic.
  • Concurrency and parallelism. Some believe that multicore CPUs can be used to compress data even more aggressively. It might be misguided. Instead, we must focus on embarrassingly parallel problems. Already, we can scan a large in-memory table using several CPU cores quite fast. In 2010, we should empower our users so that they can explore their data more freely.
  • String hashing. I have argued on this blog that universal hashing of long strings is impossible. While hashing strings is textbook material, our understanding of hashing can be improved further.

Further reading: Search Questions for 2010: What’s On My Mind by Daniel Tunkelang

8 thoughts on “Database Questions for 2010: What’s On My Mind”

  1. Don’t bother!
    There are cheaper ways to improve software performance 😉
    CS research is mostly another form of “mathematical disease”.
    Oh! Happy New Year nevertheless…

  2. I just love these arrogant people who think that CS education and academia are useless. Of course, only they — real warriors — save the world and write the code. Unfortunately, quite often their code contains such horrible jokes as bubble sort. Which, in turn, other folks have to debug and to fix.

  3. Unfortunately, quite often their code contains such horrible jokes as bubble sort. Which, in turn, other folks have to debug and to fix.

    Debugging bubble sort?
    What are the chances that a bug show up in bubble sort relative to any other sort?
    It’s ugly and horrendously inefficient but if you have a dozen items to sort it could be that it IS the right solution.
    You didn’t really grok the points in the linked comment, did you?

  4. Chances are huge! I had to fix this once myself. The premature optimization problem arises usually, when there are two comparable algorithms and you chose a more complicated. Then, of course, it is a waste of time. But to chose the right algorithm, you should first understand the problem. That requires some CS knowledge. Moreover, Moore’s law would not hold forever. Sooner or later you have to optimize to get this 10 or 50 percent increase. In some applications such as large-scale applications (e.g. web search), you have to do this already.

  5. I had to fix this once myself.

    OK, fix that C one:

    for (i = size – 1; i;) {
        for (j = i; ++j < size && a[j-1] > a[j]; swap(a[j-1], a[j]));
        while (–i && a{i] <= a[i+1]);

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see