So, the conference is over. For me, this was a pretty good experience: I was not sick, I met cool people, some folks appreciated my work, and so on. The conference was well organized: coffee was good, the hotel was well chosen, and so on. For people who know me, this is quite a review since I usually complain a lot about my trips.

However, I am a tad disappointed. Actually, I was disappointed the minute I looked at the list of accepted papers. Data Mining has lost its way.

What is Data Mining? It seems that people have totally forgotten what it is about. No, Data Mining is not Machine Learning though Machine Learning can be applied to Data Mining problems. Data Mining is primarily concerned with very large data sets. It is the essence of Data Mining. Any algorithm running in quadratic time with respect to the size of the data set is automatically out.

Data Mining is not only about prediction or classification. Data Mining is also about visualization, explanations, approximations, databases, Business Intelligence, and so on. It is about applying Map Reduce to large data sets. It is about scaling up to billions of data points. It is about dirty data.

Something is wrong about the review process: obviously, the program committee is overly focused on Machine Learning. I cannot complain because my paper was accepted, but, surely, a broader range of papers should have been accepted.

2 Comments »

  1. Well, i think it is not biased towards machine learning, but math (that is – problems that can have a well-defined optimum goal). It might be due to the M in SIAM – Mathematics, although there are also the words “Industrial” and “Applied” over there.

    For a bystander (i’ve never been nor submitted to SIAM, yet), it seems that SIAM’s accepted papers always contain more formulas than other DM conferences. not that there’s something wrong about formalizing or writing things in a compact way… but it shouldn’t be only about it (or +0,73% enhahncement in the algorithm’s efficiency).

    Comment by innar — 28/4/2007 @ 5:51

  2. Well, to be fair, there were also a couple of interesting application papers in SDM-07. For instance,

    A System for Keyword Search on Textual Streams
    Vagelis Hristidis, Oscar Valdivia, Michail Vlachos and Philip S. Yu

    Preventing Information Leaks in Email
    Vitor R. Carvalho and William W. Cohen

    Rank Aggregation for Similar Items
    D. Sculley

    Comment by Anonymous — 5/5/2007 @ 10:24

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress