For large data sets on disk, indexes are often essential. However, if your data fits in RAM, indexes are often unnecessary. They may even be harmful.
Consider a table made of 10,000,000 rows and 10 columns. Using normalization, you can replace each value by a 32-bit integer for a total of 381 MB. How long does it take to scan it all?
- When the data is on disk, it takes 0.5 s to scan the data. To maximize buffering, I have used a memory-mapped file.
- When the data is in memory, it takes 0.06 s.
Can you bring the 0.06 s figure down with an index? Maybe. But consider that :
- Indexes use memory, sometimes forcing you to store more data on disk.
- Indexes typically slow down construction and updates.
- Indexes typically only improve the performance of some queries. This is especially true with multidimensional data.
Verify my results: My Java code is available on paste.bin. I ran the tests on an iMac with an Intel Core i7 (2.8 GHz) processor.
Further reading: Understanding what makes database indexes work
Code: Source code posted on my blog is available from a github repository.