Yahoo! managed to sort 10 billion 100-byte elements in 209 seconds. This was done in Java using Hadoop.

As a basis for comparison, on a fast and recent Mac Pro, it takes 6000 seconds to sort a 2 GB text file using Unix file utilities. Yahoo!’s problem is 500 times larger, and they solve it 30 times faster : they are 4 orders of magnitude faster! Of course, they have fixed-length records which helps tremendously.

However, I wonder how much energy (power usage) was spent on the sort operation?

2 Comments »

  1. The Terabyte sort seem pretty silly, of course throwing a shitload of ressources at a problem is bound to give “impressive results” but where is the benefit for the average user?
    i.e your 6000 seconds sort.
    This looks like the Formula 1 racing which is supposed to further technological progress and which does once in a while, but at which cost?
    The Penny sort on the same page seem more sensible.
    BTW, from experience with my linear sort the 6000 seconds you report for 2Gb fall within plausible range of elapsed time due to disk access latency when sorted records are shuffled around, not a compute bound limit, you might check it.

    Comment by Kevembuangga — 9/7/2008 @ 14:29

  2. The 6000 seconds is definitively not “internal memory” since the whole machine has 2 GiB of RAM and it tries to sort 2 GiB of data. So there is quite a bit of IO overhead. Sure.

    Comment by Daniel Lemire — 9/7/2008 @ 17:29

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress