Are CAPTCHAs a good idea?

A CAPTCHA is a small test used to distinguish human users from robots. They are popular as an anti-spam tool.

Until a few months ago, I had an annoying CAPTCHA on this blog. I have since removed it and I will not go back.

What happened?

  1. The long-term problem with CAPTCHAs is that computers are getting so good at passing the Turing tests that we must stretch the cognitive abilities of human beings to distinguish machines from human beings. Thus, we end up requiring users to make greater and greater effort. It is simply unsustainable. It is a race that can only end up as a victory for the spammers.
  2. I thought, naively, that I could get around this problem with a home-made CAPTCHA. After all, I am certainly not important enough for spammers to write code specifically to pass my CAPTCHA. Unfortunately, spammers appear to be recruiting human beings. There is a large pool of people on Earth who will gladly get paid just to post spammy comments on minor blogs. Thus, no matter how good you are at distinguishing human beings from bots, you still cannot win with CAPTCHAs.
  3. Though not perfect, automated spam detection has gotten quite good. For my blog, I use the free service Akismet. It can stop most naive attempts to spam bloggers. I also have some fixed rules that will sent a comment directly in the spam box. There is a small fraction of the legitimate comments that I will never get to see, but this is already true with email. I have come to grasp with the fact that messages online sometimes get lost.

So the default on this blog is that comments go to a moderation queue and I have to approve them, one by one. About half of the comments that pass my filters are still spam. If I were hosting a more popular service, I would probably still find a way to prevent abuse without using CAPTCHAs.

Credit: Thanks for John Regehr for inspiring this post.

Update: Sathappan Muthu pointed out to me a very cool CAPTCHA service:

Daniel Lemire, "Are CAPTCHAs a good idea?," in Daniel Lemire's blog, January 2, 2013.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

30 thoughts on “Are CAPTCHAs a good idea?”

  1. On spammers paying humans to break CAPTCHAs, there was a great paper on that a while back by Stefan Savage, Geoffrey Voelker, and other folks at UCSD CS. Definitely worth a read if you have a chance, here it is:

    Note this line: “Today, there are many service providers that can solve large numbers of CAPTCHAs via on-demand services with retail prices as low as $1 per thousand.”

  2. @Daniel Lemire, that’s an interesting point, that the cost CAPTCHAs put on users is much higher now than the cost they put on spammers.

    From that 2010 USENIX paper, the cost on spammers appears to be a mere $.001 per captcha. The cost on users, back of the envelope, might be roughly 300 times higher ($20/hour average wage, 1 minute spent on the captcha, equals $.333 as the user cost per captcha).

    That’s not very efficient, is it?

  3. @Greg Linden

    Thanks for the great reference.

    I would argue however that this otherwise excellent paper makes little of the cost to users. I find CAPTCHAs to be increasingly annoying… just consider the fact that I routinely fail them one, twice and sometimes three times. I know that I am not alone. They have gotten hard. I think that they are bound to get harder and harder so this problem is not going away. So I am much less confident than these authors that history will remember CAPTCHAs as a success.

  4. Once I attempted to post a comment on your blog, but was foiled by your captcha. I was able to figure out the correct answer 🙂 but I think there was a bug in the web form, and my comment didn’t seem important enough to email you about.

  5. On the other hand, I wonder whether a captcha combined with other methods of spam detection would be more effective, since otherwise there is no cost for the spammer to try to leave billions of comments on your site.

  6. I went through the same sort of decision a few months ago. Unfortunately, as soon as I removed the CAPTCHA I was inundated with emails from bots. I went with a really simple home-spun method based upon single digit maths (and no hard to read text) and 99% of the spam disappeared overnight.

  7. I thought about a spam filter but I like to get my email on the move on an iPad, which doesn’t filter the spam. My hosted email can filter but I don’t quite trust it not to lose something. Of course, 75% of my email seems to be spam anyway but when I had nothing filtering my contact form I that would increase to probably 98%. Finding that one good message in a sea of crud wasn’t fun!

  8. Thanks for the post!

    I used to not really mind captchas but then during the last few years they got so difficult or obscure that now I often don’t get them on the first try. It’s hard to express how infuriating this is.

    For me Akismet has a success rate well over 99%. Out of ~2400 non-spam comments on my blog it has misclassified about 3 as spam (I skim the spam before deleting it). It has failed to classify less than about 50 spams as spam.

  9. I think it’s a matter of importance as well. CAPTCHAs might be appropriate for something that is a highly significant form submission (such as registering a new account), whereas for the more common occurrence of submitting a comment, it might be overkill. I would hate to have to fill out a CAPTCHA every time I tweet!

    I had a similar CAPTCHA to the one you used to use on my own blog. A few days ago, I disabled comments entirely on my blog because I was receiving so much spam that the spammers actually managed to consume my bandwidth for the month! (I chose to disable comments instead of adding some kind of moderation because it was the most economical decision for me. I receive almost zero legitimate comments as it is, so it’s not worth my time to rework the comment form right now. I’ll leave that for the summer.)

  10. So I think that people are taking the wrong conclusion from our paper. That you can get get 1k CAPTCHAs solved for $1k (or less if you are willing to exit the retail market) is not important in and of itself. Sure it sounds cheap in isolation, but the key question is how it fits into the overall cost structure for the spammer. Thus, the $0.001 you paid is only cheap if the value reaped by solving that CAPTCHA is higher still. To put it another way, what is the ROI on posting spam on this blog? How much traffic will it drive (either via PR or direct clicks) and how much of that traffic will convert and what is the marginal revenue per conversion? Only after you know this do you understand if the cost-per-sale (the CPTCHA solving premium) was cheap or perhaps expensive. What CAPTCHAs do, and tend to do relatively cheaply, is filter out those scammers with inefficient business models.

    That said, blogs are not an ideal setting for CAPTCHAs because since they don’t maintain account state, they don’t allow you to amortize the solve over the lifetime of an account (although I’ve seen some people try to fake this by eliminating the CAPTCHA requirement from IP addresses with successful solves in the past).

  11. I am in full agreement that captchas are a failure. I am somewhat colourblind, and there is nothing more annoying than having to make 4 or 5 attempts to post a comment. Here I am trying to make a contribution to debate, and the system is trying to prevent my engagement.

    As noted, Askimet is pretty effective. I think I would also add a “user flag” system, let comments go through directly, and have a flag that e-mails me immediately, to use “crowd power” to detect spam or other abuses. That way I could allow comments to flow through directly.

  12. I suppose manual filtering can’t be all that onerous – Assuming you were going to read all the legitimate comments posted to your blog and that Askimet is reasonably efficient.

  13. @Stefan Savage, I think there are two issues: (1) Are captchas effective? (2) Are they efficient?

    As you said, to be effective, they only have to increase the costs to the point that the dumbest attempts at spam are no longer lucrative. They do appear to do that.

    However, you don’t address whether they are efficient, which is what I was trying to get at. Back of the envelope, it appears they are not — costs on users are roughly x300 costs on spammers, as I said earlier — and it’s something that would be interesting to explore further.

    Efficiency of anti-spam and security measures in general is starting to get more attention. It’s a topic Bruce Schneier touches on sometimes as well as a couple people at MSR (e.g. Cormac Herley’s “So Long, and No Thanks for the Externalities: the Rational Rejection of Security Advice by Users”). Would love to see even more on it.

  14. @Muigai

    As @John Regehr wrote, Askimet is now very good. It catches most spam.

    @Stefan Savage

    I think you need to factor in the cost to the users somewhere in the modelling though. After all, we could simply ask users to pass through 100 different CAPTCHAs. We would magically make CAPTCHAs 100x more expensive for the spammers.

    Reducing usability can have a tremendous cost. I think that @Greg Linden’s analysis is right: the cost to users can dwarf any other cost.

    Of course, Google has invested a lot in CAPTCHAs by buying reCAPTCHA so they have an incentive to using them, but even so, they rarely use them. Many other popular online services do not require CAPTCHAs.

    The question of whether CAPTCHAs are a success or a failure is somewhat subjective here since I have no hard data, and that’s why I concluded the title of my blog post with a question mark.

    @William Hartmann

    You can contribute to the Gutenberg Project directly by reviewing OCR texts. I am not sure we should justify CAPTCHAs because of their contribution to the Gutenberg Project.

  15. I think there are two different issues here. One is a very pragmatic one around using CAPTCHAs for transactional activities (i.e., a single blog comment) where the mechanism cost is proportional to the number of transactions. This is not an ideal use of the technology and one might be better served pushing out a cookie with the CAPTCHA solve that allowed subsequent transactions to bypass the test. Alternatively (as per some of the original uses of CAPTCHAs) one could use an even cheaper filter (e.g., known-bad IP address range, past history, posts a URL, etc) to predicate CAPTCHA use.
    However, I think the larger issue is how to structure thinking about how one evaluates security mechanisms such as CAPTCHAs. One question is indeed effectiveness, but this is not a binary issue. All security mechanisms are filters – for a given cost (in overhead, false positives, time, etc) – they filter out a class of attackers for whom the cost structure of the attack now exceeds the value of the attack itself. In many large-scale scenarios (account signup, blogs, etc) the marginal value of most successful attacks is quite small and thus small incremental changes to the cost structure can render the vast majority of potential attacks unprofitable. Does that mean someone can’t defeat the CAPTCHA? No, it doesn’t. Does it mean that there aren’t spammers for whom it is worth paying for CAPTCHA solving service? No again. In fact, if you didn’t have CAPTCHAs in front of account signups at places like Yahoo, Google, Microsoft, etc you would see much more abuse than you do now.
    I’m sympathetic to the question about cost to normal users. Our group tends to focus on criminal motivations so this wasn’t something we looked at (but I think it would be a great empirical study for someone to do…. Anecdotally I understand that CAPTCHA-based turnaways are rare, but I’ve never seen a real study on this, in part I suspect because it’s very difficult to account for user motivation). However, I don’t think you can look at this as simply as multiplying prevailing wages by 10 seconds and comparing against solving prices. Using this structure, one might conclude for example that it would be better not to have passwords because if you multiply the time we all have to type them in by our prevailing wages it is very likely to outstrip the value of stolen accounts on the open market. Back to the question at hand, if you truly care about total utility, then there are more variables to account for. For example, you either need to model the amount of time wasted by users having to read the spam that would otherwise be eliminated, or you need to model the cost of the alternative (e.g., having a human moderate the posts individually :-). The reason CAPTCHAs have been so successful for account signups is that they can significantly reduce abuse volume at extremely low operational expense, when compared with alternative solutions (and, anecdotally at least, the use of CAPTCHAs has not kept people from signing up for Gmail, Yahoo Mail, Hotmail, etc…). The real threat to CAPTCHAs ultimately is the extent to which automated solvers can be generalized such that the cost to create new automated solvers is significantly reduced (or, considering it conversely, that improvements in vision, audio processing, etc, will increase the time/cost to develop new CAPTCHA algorithms that are acceptable to users and not easily solvable using off-the-shelf automated tools)

  16. @Stefan Savage

    The real threat to CAPTCHAs ultimately is the extent to which automated solvers can be generalized such that the cost to create new automated solvers is significantly reduced (…)

    Yes. Human beings at their keyboards are not getting better over time at image processing. Software, however, is getting better.

    I could even imagine the day when people use software to help them pass CAPTCHAs. 😉

  17. I am not sure that the cost to reward ratio is linked to the price you can pay to low paid workers to solve them on your behalf. The cost has to be related to the cost to the owner of the web site or blog.

    I was often receiving enough spam emails from my contact form to take an hour or two per week of my time to deal with. Aside from the cost of my time, which is essentially zero as the site is my hobby, this is a couple of hours that I am unable to create new content.

    The original captcha removed the vast majority of emails. Presumably, for me at least, the benefit of sending spam to me did not warrant paying people to solve captchas. However, it became gradually more difficult for people to solve as the text became more adjusted to try to avoid algorithm solutions. The simpler solution performs almost as well but hopefully annoys people less.

  18. Just to re-iterate what William Hartmann mentioned; I think the concept of harnessing ‘human idle time’ is a fantastic idea which reCAPTCHA (and others) leverage. For that reason alone I think they’re a worthy ‘feature’.

  19. @Josh

    It is great if you want to spend some of your free time (“idle time”) contributing to projects like Gutenberg, but when I am trying to get something done and someone throws a hard CAPTCHA at me, I would not say that it is harnessing ‘human idle time’.

    Like everyone, I think that the concept behind reCAPTCHA is brilliant… but I can’t seem to pass these tests… not on the first attempt at least… I find that it is a usability nightmare. I’m not having any fun at all. It frustrates me.

    Maybe I am particularly bad at this or impatient, but I don’t think I am alone.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.