When a terabyte is small

With Kamel and Owen, I am working on a paper involving database indexes. We had over a terabyte of space, and yet, in the middle of the production of the paper, we ran out of space. Only a year ago, I thought that one terabyte was large.

So, I ask our technician about getting a new drive. He comes back with a small 500 GB drive. I ask how much they cost, he says “$200.”

This is a new frontier for me. Producing a simple research paper required us to generate more than one terabyte of data. Moreover, we will generate much more data before the paper is finished.

Assuming I write, say, 4 research papers a year, this means that I will generate over 4 terabytes of data a year at my current rate which is going to cost me about $1600 in storage.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “When a terabyte is small”

  1. I think this is one big obstacle for current research in IR. The time spent dealing with “infrastructure” is getting bigger. This leaves less time for real research. I think that, in the broad field of IR, “industry research” is going to produce much more results in the next years than “academia research”.

    Google’s Peter Norvig is quoted saying – Google does not have the best minds, they have a great infrastructure that allows them to experiment much faster.

    How can academia deal with this?

  2. LOL!!!
    You are probably not old enough to know that rule:
    No matter the size of the drive it is ALWAYS 95/98% full so for the “next run” (whatever this is) you have first to upgrade.
    This is probably even more “solid” than Moore’s law.
    In the very early 70s a 5 megabytes drive was “large”…

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax