Final Word on SIAM Data Mining 2007

So, the conference is over. For me, this was a pretty good experience: I was not sick, I met cool people, some folks appreciated my work, and so on. The conference was well organized: coffee was good, the hotel was well chosen, and so on. For people who know me, this is quite a review since I usually complain a lot about my trips.

However, I am a tad disappointed. Actually, I was disappointed the minute I looked at the list of accepted papers. Data Mining has lost its way.

What is Data Mining? It seems that people have totally forgotten what it is about. No, Data Mining is not Machine Learning though Machine Learning can be applied to Data Mining problems. Data Mining is primarily concerned with very large data sets. It is the essence of Data Mining. Any algorithm running in quadratic time with respect to the size of the data set is automatically out.

Data Mining is not only about prediction or classification. Data Mining is also about visualization, explanations, approximations, databases, Business Intelligence, and so on. It is about applying Map Reduce to large data sets. It is about scaling up to billions of data points. It is about dirty data.

Something is wrong about the review process: obviously, the program committee is overly focused on Machine Learning. I cannot complain because my paper was accepted, but, surely, a broader range of papers should have been accepted.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

2 thoughts on “Final Word on SIAM Data Mining 2007”

  1. Well, i think it is not biased towards machine learning, but math (that is – problems that can have a well-defined optimum goal). It might be due to the M in SIAM – Mathematics, although there are also the words “Industrial” and “Applied” over there.

    For a bystander (i’ve never been nor submitted to SIAM, yet), it seems that SIAM’s accepted papers always contain more formulas than other DM conferences. not that there’s something wrong about formalizing or writing things in a compact way… but it shouldn’t be only about it (or +0,73% enhahncement in the algorithm’s efficiency).

  2. Well, to be fair, there were also a couple of interesting application papers in SDM-07. For instance,

    A System for Keyword Search on Textual Streams
    Vagelis Hristidis, Oscar Valdivia, Michail Vlachos and Philip S. Yu

    Preventing Information Leaks in Email
    Vitor R. Carvalho and William W. Cohen

    Rank Aggregation for Similar Items
    D. Sculley

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see