Last week, the Register announced that Google moved “away from MapReduce.” Given that several companies adopted MapReduce (hence copying Google), is Google moving a step ahead of its copycats? Moreover, Tony Bain is asking today whether Stonebraker was right in stating that MapReduce was a “a giant step backward.” Is MapReduce itself any good?
As reported by the Register, one problem with MapReduce is that it is essentially batch-processing oriented. Once you start the process, you can’t easily update the input data and expect the output to be sane. Thus, MapReduce is poor at real-time processing. Yet, it will remain fine for latence-oblivious applications such as Extract-Transform-Load or number crunching.
We now expect Google to index my blog post within minutes after I post them. Google had to update its batch-oriented architecture for a real-time indexing approach. However, it is unclear whether this puts Google technologically ahead of, say, Microsoft Bing.
The big picture is maybe more interesting. We used to view the Web as a large collection of documents—as a library. Indexes updated daily were just fine. We now view the Web as an endless stream of data—like a live meeting between billions of people.
Further reading: Julian Hyde, Data in Flight, ACM Queue, 2009.