Paul Graham gives a list of attributes characterizing start-ups. It strikes me that many of these attributes could describe research projects as well:

  1. Good research projects fail. If there is no risk of failure, you are doing unoriginal research. (Except that out of my biggest failures have come out some of my best papers…)
  2. Good research directions change frequently. Otherwise, how can you be following the truth where it must lead you? (Except that if you keep changing direction, you’ll never get anything done.)
  3. It takes little money. While some research projects are expensive, Einstein changed Physics forever without a research grant. (Except that if you are worrying about your next pay check, you can hardly worry about research.)
  4. Good research is threatening. If your research never upsets anyone, maybe you are not pushing hard enough? (Except that you should not be bold just for boldness sake.)
  5. Research is a solitary task. Ultimately, all research projects involve many hours working alone. (Except that research is fundamentally social!)

Bitmap indexes are used by search engines (such as Apache Lucene), they are available in DBMSes such as Oracle and PostgreSQL. They are used in column stores such as the Open Source engines Eigenbase and C-Store, as well as by many commercial solutions such as Vertica.

Bitmap indexes are silly data structures. Map each value to an array of booleans. Hence, if you have n rows in your table, and k distinct values, you get an n by k matrix containing booleans. Thus, some people falsely assume that bitmap indexes are only adequate when there are few distinct values (e.g., the gender column, male and female being the only two options). However—using techniques based on run-length encoding—the total size of your bitmaps is proportional to the size of the original table, irrespective of the number of distinct values!

Bitmap indexes are fast because they benefit from vectorization. Indeed, let the predicate “sex=male” is satisfied on rows 1, 5, 32, 45, 54 and 63. I can determine which rows satisfy the extended predicate “(sex=male) AND (city=Montreal)” using a single instruction! The secret? A bitwise AND between the bitmaps “sex=male” and “city=Montreal”. You can compute unions, differences and intersections between sets of integers in [1,N] using only N/64 operations. All microprocessors have built-in parallelism because they operate on several bits at once.

To benefit from vectorization, you need to store the data in a word-aligned manner: that is, you store consecutive segments of bits uncompressed. The longer the words, the less compression. Roughly speaking, 64-bit bitmap indexes are nearly twice as large as 32-bit bitmap indexes. What is the effect on the processing speed? We found that despite being much larger, 64-bit bitmap indexes were faster. That is right: it was faster to load twice as much data from disk!

Yet, we often equate concise data structures with more speed. This assumption can be misguided. Given a choice between more compression, or more vectorization, I would choose more vectorization.

References:

Further reading: See my posts Compressed bitmaps in Java, To improve your indexes: sort your tables!, and The mythical bitmap index.

Last night, I watched a great talk by Dan Pink—author of several self-help books. He made a compelling point and he cited research papers. I went and read these research papers and I had great fun. Essentially, boosting your motivation with external rewards can lower the quality—though maybe not the quantity—of your work.

Indeed, Ariely et al. tell us in Large stakes and big mistakes that rewards may fail to increase worker productivity on cognitive demanding tasks. Eriksson et al. describes the phenomenon as follows:

(…) increased conscious attention to one’s own process of performance, implicit competition, cash incentives, or the presence of an audience can reduce performance (…)

These results have been independently verified several times (see Mobbs et al. for example).

My conclusion: Researchers or engineers pushed hard by external rewards will produce more, but not better work. To get higher quality and creative work, you need low pressure environment. This may explain why many creative people cut themselves off from the world for extended periods.

Rewarding researchers with better funding, better pay and prizes, may increase the volume of their contribution, while lowering the quality of their work.

Here is the talk that motivated this blog post:

Further reading: We Perform Best When No One Tells Us What To Do

Currently, I am finishing off House of Suns by Alastair Reynolds. I am fascinated by Reynolds’ universe. Let me quote the beginning of the book:

I was born in a house with a million rooms, built on a small, airless world on the edge of an empire of light and commerce that the adults called the Golden Hour, for a reason that I did yet grasp. I was a girl then, a single individual called Abigail Gentian.

Abigail is not making any figure of speech. Why she lives in a house with a million rooms, why her empire was called the Golden Hour, why she points out that she once was a single individual, and a girl, each one of these questions has an intriguing answer.

Here are some science-fictions books that I liked this summer.

  • I began the summer with Pandora’s Star (Commonwealth Saga, Book 1) by Peter F. Hamilton. What I found fascinating is the universe painted by Hamilton. A future where human beings live forever, travel between stars instantaneously, transform their bodies at will, extend their minds, and so on. If it sounds like utopia, it is! Except that a small group of terrorists are creating trouble, claiming that unseen aliens are manipulating us. Clearly these terrorists are madmen. Or are they? After the conclusion of the three books of the Commonwealth Saga, the story continues with the Void Trilogy, 1,200 years later. I will be waiting for the conclusion in The Evolutionary Void with high expectations. The quality of the writing is exceptional, the characters are compelling. Even the aliens are original, and that’s quite a feat.
  • I am not a big fan of David Weber. He is a good writer, but his stories are full of holes and shallow. Yet, his Safehold series is engaging. I had great fun with Off Armageddon Reef (Book 1 of the series). The fancy and unbelievable science-fiction plot is an excuse to relive through the construction of the British Empire. If you like to think about how technological innovations come about and you like military history, this is a great series. Do not expect spaceships or laser cannons.

Many funding agencies and some universities require researchers to publish their articles as open access. That is, research articles must be available to all, freely. The main argument in favor of these policies is social justice: why should publishers acquire the exclusive rights of work funded by students, governments and other benefactors?

Professor Steven Shavell goes further: we should abolish copyright for academic work altogether. At first, I was confused: once your research articles are under open access, what more is there? Quite a lot, it turns out.

You see, these compulsory Open Access policies target exclusively research articles published in journals. These policies exclude: books, book chapters, conference proceedings, reports and so on.

Why? My colleague and Open Access leader Stevan Harnad explains:

Books are still largely preferred by users in analog form, not digital-only—journal articles are increasingly sought and used in digital form, (…) It is not clear that for most or even many authors of “academic works” (…) the sole “benefit” sought is scholarly uptake and impact (…), rather than also the hope of some royalty revenue

Using the economic model as an argument is the stance of publishers resisting Open Access. To them, we answered: your financial well-being is not our concern. Yet, a researcher writing a book—with funding from the government—should be allowed to ignore the ideals of Open Access for his own profit? Hardly fair, I say! Anyhow, I have written tens of articles in conference proceedings and I have edited a couple of books but I never received a penny. Professors do not edit books or write book chapters for profit.

Consider the larger picture. Would our arguments change if paper books were replaced by Amazon or Sony ebook readers? And what happens if publishers start paying authors for research articles? Would our arguments in favor of Open Access melt? Yes, paper is expensive, but Open Access policies do not prevent publishers from charging anything they like! They just have to make sure it is cheaper to buy the book than to print it locally.

What is the real reason we want to exclude books, book chapters and proceedings from Open Access policies? For-profit non-academic publisher already make available ebooks for free. What is our real excuse?

Disclaimer: I hold no grudge against the big publishers such as Springer.

Further reading: Is Open Access publishing the solution? Really?

Next Page »

Powered by WordPress