Whether you submit your work scientific journal or just post it on a blog, you can expect to receive harsh criticism from time to time. Sometimes you are facing arrogant or ignorant readers. Other times, your work is genuinely flawed. My own work is frequently flawed, as you know if you read this blog.

Over time, I have learned that even if the reviewer is wrong, spending time to careful respond can be tremendously useful. If you are 100% correct, then you get to build up your confidence and can later answer similar criticism hastily. Very often, however, you did not do everything perfectly. Maybe your arguments and data are correct, but you might have presented them better.

There are specific strategies to deal with harsh reviews:

  • Expose yourself regularly to criticism from total strangers. In my experience, if you rarely publish, you are more likely to have difficulty dealing with criticism. I have been called an idiot, I have had to deal with overly aggressive people and I have been ridiculed on occasion. Of course, I occasionally get depressed after receiving harsh criticism, especially if I thought I had produced great work and feel unappreciated, but I am typically able to recover mentally in minutes or, at least, hours. Part of it is just habit: my brain has learned that harsh criticism does not necessarily signify upcoming pain.
  • It is critically important to distinguish yourself from your work. If someone repeatedly produces inferior work, his reputation will suffer. However, everyone (even Nobel prize winners) gets it wrong from time to time. It is important to keep in mind that most reviewers do not care that much about you. In fact, they often quickly forget about you while you ruminate over their review.
  • The best way to address criticism is to take it one comment at a time. If someone finds ten different flaws in your work, don’t look at it as one message: break it into ten components and address each one separately. This approach scales up linearly: it just take ten times longer to address 10 flaws than one. Brian Martin describes it well:

    I’ve found a way to make the revision process easier. I don’t reread my text, because that just cements my previous approach. Instead, I go through the recommendations of the referees and the editor one by one, making changes. After I finish all those changes, large and small, I print out the whole article and read through it, fixing up expression and making it flow.

    Tackling recommendations one by one is important psychologically. Looking at a list of criticisms, sometimes pages of them, can be demoralizing; the task seems too big. Focusing on a single point is easier. Once it’s done, you can check it off and proceed to the next point, either immediately or tomorrow.

    Sometimes responding to a point requires additional work, such as obtaining and reading some new theory or doing some new calculation. It’s helpful to write down every step that’s required – for example, (1) order Smith’s book, (2) read the theory section, (3) write a one-paragraph summary – and tackle them one by one.

Open access journals make articles freely available. Some of them even allow the authors to keep the copyright of their work. It would seem that they offer a compelling alternative to traditional journals, especially if you hope to reach to people outside academia.
However, open access may allow you to get a free copy of an article, but your rights might still be limited. For example, videos on YouTube are freely available, but you are not allowed to copy or reuse them freely.

The directory of open access journals gives a list of over 300 open access journals in Computer Science. Thus, finding an adequate open access journal where you can submit your work is relatively easy.

However, there are a few sore points.

1. Indexing of open access Computer Science journal is generally weak

A journal needs to be indexed so that your fellow researchers can find out about your work. Most open access journals will be indexed by Google Scholar, but other indexes are important in Computer Science such as DBLP and the ACM Digital Library. Scopus is also often used by hiring and promotion committees. (Scopus is run by Elsevier.)

As I review the open access journals in Computer Science, I find that indexing is often a sore point. The next table shows that the ACM Digital Library does a poor job at indexing open access journals. In fact, I could find only two open access journals indexed by ACM. It cannot be explained by the prestige of the respective journals: some of these open access journals that ACM fails to index are just as good or better than others it indexes. And, of course, no ACM publication is open access. Quite clearly, ACM is doing little to help open access.

DBLP Scopus ACM
Chicago Journal of Theoretical Computer Science yes
Discrete Mathematics and Theoretical Computer Science yes yes
Electronic Journal of Combinatorics yes yes
IEEE Data Engineering Bulletin yes
Journal of Artificial Intelligence Research yes yes yes
Journal of Computational Geometry yes
Journal of Computers yes
Journal of Emerging Technologies in Web Intelligence
Journal of Machine Learning Research yes yes yes
Journal of Universal Computer Science yes yes
Journal of Graph Algorithms and Applications yes yes
Open Research Computation yes
Theory of Computing yes

Elsevier and Springer allow authors of papers in some regular journals to make them available under an open access format in exchange for a one-time fee. Their journals are typically well indexed so they may offer good alternatives.

2. Many open access Computer Science journals require complete copyright transfer

To publish an article, a journal does not require complete copyright ownership. The only valid justification for requiring that the author gives away his copyright is to restrict access. When reviewing open access journals in Computer Science, I see that several of them inexplicably require complete copyright transfer:

author keeps copyright publication fee
Chicago Journal of Theoretical Computer Science yes
Discrete Mathematics and Theoretical Computer Science no
Electronic Journal of Combinatorics yes
IEEE Data Engineering Bulletin no
Journal of Artificial Intelligence Research no none
Journal of Computational Geometry yes
Journal of Computers no €360
Journal of Emerging Technologies in Web Intelligence no
Journal of Machine Learning Research yes
Journal of Universal Computer Science no
Journal of Graph Algorithms and Applications no
Open Research Computation yes €1195
Theory of Computing yes

Conclusion There is still much room for progress.

There is a growing list of famous scientists who have pledged to boycott Elsevier as a publisher. If I were in charge of Elsevier, I would be very nervous: academic publishers need famous authors more than the famous authors need the publishers. After all, famous scientists could simply post their work online, and people would still read it.

Elsevier has committed too many sins to give an exhaustive list: they have created fake academic journals so that pharmaceutical corporations could claim that certain facts appeared in a journal, they have sponsored evil regulations, and they have restrictive views on what constitutes fair use. Unbelievably, they were also involved in arms trade. They probably have the devil on their board of directors.

The boycott is currently lead by a famous mathematician, Timothy Gowers. Gowers accuses Elsevier of charging exorbitant prices for its journals.

Focusing solely on database-related journals, I decided to look at how much journals charge per article.

journal publisher price per article
Distributed and parallel databases Springer 61.50
Information systems journal Wiley 58.16
Information Systems Elsevier 53.44
Knowledge and information systems Springer 25.39
Data & knowledge engineering Elsevier 24.55
VLDB journal Springer 22.19
Information Sciences Elsevier 21.67
IEEE Trans. knowledge & data engineering IEEE 10.80
ACM Trans. on database systems ACM 6.64
SIGMOD Record ACM 0.00

Observations:

  • The price distribution appears almost random. I can see no relation between prestige or paper length and prices.
  • Elsevier is hardly alone at charging high prices for papers. Wiley and Springer are just as expensive. Of course, it is possible that Elsevier ends up charging more through deals and bundling.
  • ACM is very inexpensive on a per-article basis. However, ACM often asks the authors to pay page charges whereas Elsevier rarely does in my experience.
  • Though SIGMOD Record is limited to short contributions, its price is unbeatable. And it has no page charge. Moreover, it is generally a well regarded publication venue among database researchers.

My take: The evidence is strong that high-quality inexpensive journals are possible. Current journals are up to an order of magnitude too expensive. However, Elsevier is selling what we want to buy: prestigious journals that people outside the best schools cannot afford. Just like middle-income Americans get into debt to keep up with the top 1%, colleges increase their library budgets to keep up with Stanford and Harvard.

The solution to overpriced journals is to reduce library purchasing power. Most colleges do not have the infinite budgets Harvard and Stanford have, and they should not act like they do. In fact, if we could reduce the purchasing power of most libraries to zero, then researchers and students would be forced to pay $20 or more per article. You can be quite certain that they would mostly read the cheaper (and more competitive) journals. And Stanford researchers want to be cited by the researchers from the lesser institutions so they would also migrate away from overpriced journals. Reduced budgets would still allow publishers like Elsevier to make generous profits, but they would only profit by offering great products at an affordable price.

Disclaimer: I am currently reviewing a paper for Pattern Recognition (an Elsevier journal), and I recently published in Discrete Applied Mathematics (another Elsevier journal).

Update: Though you can get articles from SIGMOD Record for free if you to the SIGMOD Record home page, ACM sells them through its Digital Library for over $10 a piece.

Hashing is a programming technique that maps objects (such as strings) to integers. It is a necessary component of hash tables, one of the most frequently used data structure in Computer Science.

Typically, Hash tables have the property that looking up or storing a value associated with a key requires constant time. If you use user identifiers to retrieve names and phone numbers, you can scale up to millions and millions of users without performance penalty. However, the worst case complexity of a hash table is linear: it may need to go through most values each time you want to look up a key. Thankfully, the worst case is typically improbable: it only happens when too many objects hash to the same value. In practice, hash functions are chosen so as to spread hash values uniformly (pseudo-randomly).

Most programming languages like Java or C++ use deterministic hash functions. This means that given a string, it will always hash to the same integer, for all Java software in the whole world. And overall, deterministic hashing works quite well. Unfortunately, deterministic hashing is insecure. If your are building a web application, and hackers know which hash function you are using, they can create a denial-of-service attack and bring down your application. The gist of it is not complicated: it suffices to ensure that the hash tables fall back on their worst case performance.

This is very serious: it means that if you rely on the default hash functions of your programming language (e.g., String.hashCode in Java), your application could be at risk. On this issue, Alexander Klink and Julian Wälde issued a well written security advisory.

The fix is relatively simple: programming languages need to adopt random hashing. In random hashing, every time the software is initialized, a new hash function is picked, at random. This does not make attacks impossible, but it makes them much more difficult.

The problem is not novel. In 2003, Crosby and Wallach raised the issue and many responsible vendors fixed their products. Alas, the only programming languages to adopt random hashing were Ruby and Perl. Others are more reluctant.

So, how easy is it to hack the hash functions in, say, Java? Java uses an iterated hash function. At each iteration, iterated hash functions compute a new hash value from the preceding hash value and the next character. Strings in Java are hashed using the function
F(y,c) = 31 y + c.
where y is the previous hash value and c is the current character value. Thus, the hash value of a string made of the characters 65, 66 (corresponding to “AB” in ASCII) is 31 times 65 + 66 which is 2081.

Why does Java uses the number 31? The choice is somewhat arbitrary (and 31 might fail to be ideal) but because it is an odd number, the compression function F is permuting which helps distribute more uniformly the hash values.

It is fairly hard to construct reasonable strings that collide over 32 bits in Java. However, a modest hash table will use only the first few bits of the hash values. Let us consider only the first 16 bits. It is not difficult to check that the strings “Ace”,”BDe”,”AdF” and “BEF” all have the same hash value in Java.

Of course, having 4 strings colliding will not disrupt hash tables. But because the hash function is iterated, we can multiply the number of collisions. Indeed, any two same-length sequences of these four colliding strings will also collide. This means that you can construct 16 strings of length 6 all colliding (“AceAce”,”AceBDe”,”AceAdF”, “AceBEF”, “BDeAce”,”BDeBDe”,”BDeAdF”, “BDeBEF”, “AdFAce”,”AdFBDe”,”AdFAdF”, “AdFBEF”, “BEFAce”,”BEFBDe”,”BEFAdF”, and “BEFBEF”). You can keep going to 64 strings of length 9. And so on.

How badly does this impact the performance of a hash table? I tried inserting all the colliding strings in a Java Hashtable container. For comparison, I also inserted randomly chosen strings into either a Hashtable or a TreeMap (tree structure). The net result is that what should be a tiny cost (0.006 s) becomes a massive cost (30s). A server able to process thousands of queries per second might quickly become bogged down trying to process a couple of queries per second.

number of strings hash table: average time (s) hash table: worst time (s) tree: average time (s)
16384 0.002 1.1 0.005
65536 0.006 30 0.03

For these tests, I am using a MacBook Air with a 1.8 GHz Intel Core i7 running Java 6. My code is available.

Why aren’t programming languages adopting random hashing? A potential issue is that language designers like determinism. They much prefer reproducible bugs. Nevertheless, any expert programmer should be aware of this problem.

Further reading:

Update: I initially reported that Ruby was the only language to adopt random hashing. In fact, Perl adopted random hashing with version 5.8.1. In version 5.8.2, Perl adopted an hybrid that switches between a deterministic and a random hash function when needed. (Thanks to Mike Giroux for the pointer.)

Open access is the idea that scholarship should be accessible to all. Many believe that we should require publicly funded researchers to make their work available to the public. That is, if some professor discovers a new algorithm or a new remedy while on a government grant, you should be able to download and read the paper freely.

To non-scientists, open access may sound like a socialist utopia. Why would anyone give away carefully curated content for free? The problem is that the content is overwhelmingly produced by scientists who have no share of the profit made by the publishers. These scientists are often funded directly or indirectly by the government. Journal editors are typically not paid. Reviewers are almost never paid. In fact, the opposite is true. Over the years, I have given thousands of dollars in page charges or conference registration to publishers. For example, several ACM journals request $60 per page to the authors (so that publishing 30 pages costs $1800). That is right: as a scientist, you are often asked to pay to get your worked published. Thankfully, most of these fees are paid by research grants, which often come from the government.

Open access is a problem for publishers however. In the current system, the publisher has a monopoly on the journals it has published over the years. This means that as long as researchers need access to these journals, the publisher can charge millions for access. Open access kills this monopoly. Certainly publishers can increase their profits by increase the page charges, but authors can also take their papers to other, more reasonable journals.

Nevertheless, I could never get excited about open access. I find it annoying that I cannot download papers freely, but astrophysicists have already solved this problem without any government intervention or lawsuit. Indeed, nearly all recent astrophysics papers are on arXiv. What matters is the culture: physicists care about being read, they love the web. In this sense, open access is a short-sighted fight.

Thus, a much more significant vision is Nielsen’s open science. Michael Nielsen is arguing for a culture shift in science: from a science obsessed with individual performance (and publications) to a science culture resembling more that of open source software or wikipedia.

I fear however that despite all the (well deserved) press that Michael Nielsen’s latest book has been getting, too few people understand the importance of this shift. It is not about becoming hippies. It is not a socialist utopia. On the contrary, the system we have right now is akin to an highly regulated industry. All power is in the hands of the government and a few large organizations (universities, publishers) working in tandem. The barrier to entry is maintained artificially high. Open science is really about creating “open markets” with freer exchanges. It has the potential to boost our collective productivity by orders of magnitude through the removal of unneeded friction.

Meanwhile, American corporations are concerned with copyright violations on the Internet. Thus, they are pushing a bill, the Stop Online Piracy Act (SOPA) which would allow the government to shut down web site that is suspected of violating copyright. Using SOPA, a publisher could have a repository of research papers shut down. While at it, the publishers are also promoting a bill, the Research Works Act which would make it illegal for government agencies to require open access from publicly funded researchers.

If you are one of the thousands of members of the Association for Computing Machinery (ACM) or the Institute of Electrical and Electronics Engineers (IEEE), then you are indirectly supporting this new legislation. Indeed, the ACM and IEEE are members of the Association of American Publishers (AAP). The AAP is a lobbyist for both proposals.

And we finally get a hint at why it is so hard it is to open up science: the business of science has become intertwined with businesses like the publishing business. ACM has to speak both as an association of computing professionals, and as a publishing house.

What should be a critical support service, the publication of results, ends up driving much of our culture. The journals become the science. The medium becomes the message.

In effect, we have too much organizational scarring tissue in science. It could be that we need to reboot the system. As a starting point, we should collectively recognize the problem. Repeat after me: scholarship is not a publishing business.

Further reading:

Update:

The ACM charges the authors of any conference for the publication of proceedings. However, they do not require payment for publishing in their journals: instead they request page charges.

Next Page »

Powered by WordPress