How many Computer Science researchers are there?

picture by -Kj

In current work with do on database indexes, we decided to use DBLP as a data source. Among other things, we use the authors’ name as a dimension. From one plot, I noticed that there must have half a million distinct authors. I doubted this number, and Kamel was nice enough to investigate further. It turns out that there are 531,480 different authors in DBLP! (As a basis for comparison, there about 945,000 articles.)

I don’t know about you, but this feels like a large number. We started to look for explanations. I already reported that the USA is producing 1,500 new Computer Science Ph.D.s a year. Still, there cannot be many more than 100,000 recently active Computer Science authors holding a Ph.D.

Owen pointed us to the recent CACM article Are your citations clean? by Lee et al. Alas, while DBLP is certainly dirty, in that some researchers will appear under two or more different names, it cannot explain why we end up with half a million authors!

The best explanation so far is that many undergraduate or M.Sc. students have papers on DBLP. So much so that they make up the majority of the authors in DBLP.

Do you buy this theory? If not, do you have a better explanation?

(As a side-effect, it should not be very hard to be in the top 10% among the most prolific DBLP authors!)

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

5 thoughts on “How many Computer Science researchers are there?”

  1. You should also take into account papers from industrial research lab and industry in general. It would be very interesting to see a grouping of those 500K authors by affiliation-at-time-of-publication (or even current affiliation).

  2. There might also be a substantial number of authors from math/physics/econ/whatever other field who have published in a CS journal/conference at some point.

  3. On the other hand, they’re not cataloging CS researchers who publish in non-CS journals. OK, that probably only reduces the number of hits for some strange folks, not the number of authors.

  4. Just stumbled across this blog entry. Here is some statistics. As of 31Dec07, DBLP lists 588,150 different authors, 48,126 of which with 10 or more publications, 20,345 with 20 or more publications, and 1,178 with 100 or more publications.

    BTW, check out now linked from every DBLP author page, providing convenient prefix search, faceted search, etc. You probably have noticed it already. Feedback very welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see