An index helps you find an item without scanning all of the data. David DeWitt and and Michael Stonebraker have made comments opposing index-light systems such as MapReduce, SimpleDB, and CouchDB.

But David DeWitt and and Michael Stonebraker failed to tell us about schemas falling apart as you scale up. To them, database theory took us out of the dark ages and these new kids are taking up back in caves. I have a different take:

  • Initially, you have a messy start-up. You do the accounting, Joe takes care of hiring the new staff and your wife answers the phone. This is an analogy to the early database days before schemas and relational models.
  • The company grows and you organize it clearly. You now have an IT department, an accounting department, and so on. This is analogous the classical database technology David and Michael say we should respect.
  • Eventually, you have 1500 employees, half of them working from home in India. Nobody knows how many IT departments you have or whether you have one at all. By analogy, as you scale up, the classical database schemas and indexes become much less useful.

Update: Here is a comment by Mark C. Chu-Carroll

(…) indexing is a great tool if your data is tabular, and you have a central index that you can work with. But if your task isn’t fundamentally relational, and what you really need is computation then indexes aren’t going to help.

Steve Jobs just introduced the MacBook Air. The MacBook Air is thin and light, but what matters to me is that it uses a solid-state drive:

Using technology similar to that in the iPod nano and other Flash-based products, MacBook Air introduces a solid-state drive. This drive has no moving parts and can access data more quickly than standard hard drives, so you’ll enjoy a boost in performance when starting up your computer and opening files and applications. In addition, solid-state drives offer greater durability and improved resistance to data loss in the event of an accidental drop.

This follows recent announcements by storage vendors such as IBM and EMC who have started offering solid-state drives for enterprise needs.

Solid-state drives are compelling:

  • Solid-state drives have access speeds about 250 times faster.
  • Solid-state drives use less power (over 30% less).
  • Solid-state drives are silent.
  • Solid-state drives are typically much smaller.
  • Solid-state drives are between 15 to 20 times more expensive, but prices are coming down.

I estimate that typical RAM is now only 10 or 20 times faster to access than a solid-state drive. These new drives lower the gap between internal and external memory.

So, external memory becomes internal memory? Maybe not. For example, solid-state drives tend to have poor random write performance. You better write the data sequentially.

Disclaimer. I wish I was an expert on solid-state drives, but I am not. Please correct me if I am wrong.

Maybe you got monetarily richer over the last few years, but do you have more time outside work? A time tax is a required task with no productive output. When you do not keep these taxes under control, you end up with no free time for your family and friends.

To avoid these time taxes, I know of some strategies:

  • Working at home and online is a highly effective way to avoid time taxes. I am always amazed how much faster I can buy an item on the Web. Lining up for a cashier is so XXth century! Working at home means you spend less time in hallways chatting randomly with colleagues and students.
  • You should avoid synchronous meetings including phone calls. Having to be at a certain place at a certain time introduces several small time taxes in your schedule.
  • Focusing on a few essential and simple projects means you will spend less time filling up forms and doing other bureaucratic chores. There is no evidence that complex projects are better for your career. You know you are focusing when your life is simple.
  • Your emails should be less than five sentences and you should write them during a time period of the day so that email remains asynchronous.
  • Learning to say no is essential to keep your sanity — and your significant other. Without control, random service tasks or secondary research projects can eat up all of your free time.

References: Harold and Nine shifts site.

Andre points us to SciImago — a Web site to mine science journals. Using their aggregates per country and some data from Wikipedia, I made up a table on number of science papers produced per country going back to 1996.

Country Science papers (1996-2006) Population (current) Papers per capita
US 3,437,213 303,202,683 0.011
Japan 983,020 127,718,000 0.0077
UK 962,640 60,587,300 0.015
Germany 888,287 82,251,000 0.010
China 758,042 1,323,128,240 0.00057
France 640,163 64,102,140 0.010
Canada 473,763 33,148,682 0.014
Italy 461,292 59,206,382 0.0077
Spain 330,399 45,116,894 0.0073
India 286,109 1,131,043,000 0.00025
Sweden 194,921 9,174,082 0.021
Switzerland 188,134 7,508,700 0.025
Israel 120,257 7,222,222 0.0166
Norway 70,314 4,738,085 0.015

What is fascinating is that the picture changes dramatically if you just look at the most recent year (2006):

Country Science papers (2006) Population (current) Papers per capita
US 340,268 303,202,683 0.0011
China 166,205 1,323,128,240 0.000125
UK 107,528 60,587,300 0.0018
Japan 97,073 127,718,000 0.00076
Germany 95,310 82,251,000 0.0012
France 67,652 64,102,140 0.0011
Canada 56,571 33,148,682 0.0017
Italy 54,298 59,206,382 0.0009
Spain 41,914 45,116,894 0.0009
India 38,140 1,131,043,000 tiny
Switzerland 22,966 7,508,700 0.003
Sweden 20,926 9,174,082 0.002
Israel 13,049 7,222,222 0.0018
Norway 8,670 4,738,085 0.0018

These numbers suggest some significant changes:

  • The US is still leading in the number of papers produced, but it no longer dominates. And it may not lead for many more decades if China keeps this up.
  • Canada, Switzerland, Norway, Spain and Italy are improving their per capita numbers.
  • Switzerland has a surprisingly high number of papers per capita.
  • Japan has a surprisingly low number of papers per capita.

Carr made the headlines recently because he predicted the death of the IT department. Some time ago, I wrote:

(…) institutions are no longer required to get the system running. No vice-president, no staff. It means you can run the world from your kitchen. Or at least, get some research done.

So, what is exciting about Carr’s prediction?

I will now make a new prediction: the very concept of a software application will go away soon. Software will remain, but not as a countable quantity. What are software applications good for? When selling software, it is convenient to sell a unit of software. Software is increasingly a service, not a product. Software is fluid, fast-changing, and network-based. My prediction will become true because we will have a harder and harder time telling software applications apart. Instead of linking to applications, we will link to features or entry points.

See also taking charge of your IT where I wrote:

Computers are about giving users more control, not less. We shall delegate less to human beings in the future, not more. But we will grow more dependent on computers.

« Previous PageNext Page »

Powered by WordPress