December 4, 2006

comment verification

Filed under:, , , , , — cwage @ 5:25 pm

Some interesting discussion on NiT on the topic of comment verification, in which my wordverify plugin is mentioned -- specifically, on the annoyance of the image-based obfuscated letters as verification. I'll just post what I've got on the wordverify page again, for starters as far as what Wordverify aims to accomplish:

The idea is that a lot of commentspam is driven by automation, naturally, and the introduction of a human element in submitting an extra bit of verification can help kill a lot of this spam. SecureImage is an example of a great plugin that uses ImageMagick to display an image with random letters that the commenter must verify. WordVerify provides a simpler alternative to this method, by just requiring the entry of a single word. This provides a healthy compromise for smaller blogs that don’t necessarily need the security of a dynamic image. The chances of any comment spammer bothering to screen-scrape my blog just to comment-spam it, much less OCR an image, are pretty low. For smaller blogs, the simple addition of a codeword is probably more than enough.

Even this description is lacking, however, but I'll get to that. Mack asks:

I hate word verification. Most of the time, I have trouble distinguishing the letters, i's and l's, for example, so inevitably I get it wrong, and have to start all over. So, I started wondering, how many people just don't bother to get into the "settings" of their blogger account to turn this decidedly inconvenient feature off? Surely most bloggers don't get enough traffic to warrant having this extra security feature, do they?

No. but that's not the issue. The issue is that they're all using blogger. Or wordpress. Let me explain:

It's not really an issue of "big" or "small", so much, as it is an issue of whether or not you're a target. You're a target if spam to your blog can be automated -- if the mechanism to comment on your blog is predictable. This means you're a target if you use a popular blogging service like blogspot or you're using popular blogging software like WordPress.

You don't have to be a high-profile blogger to get comment spam. You just have to have a blog. Spamming is easy. The Save Claudia website (running wordpress) was getting comment-spam and trackback spam within a few days of going live.

The idea behind the image-based comment verification is that it introduces a human element into the process -- something that is not easily (or at least cheaply) automated. But this approach is still defeatable. The problem is not the method of verification itself -- the problem is that it's the same for every blog on blogspot, or the same for every installation of WordPress. It doesn't really matter how complicated you make the verification process -- barring implementing a turing test, it's probably always going to be defeatable. If it's the same on every blog, it can be automated. So, we have two choices: resort to ever-increasingly complicated human-verification methods that we standardize on each blogging platform in a neverending arms-race with comment spammers. That's the decision driving the image-verification approach. It's complicated enough and expensive (resource-wise) enough to defeat that it works. For now.

Alternatively, we can perhaps do something smarter: we give the individual blog owner the control to mix up the verification process and make it harder to predict what's being asked, rather than making the question harder. That's the philosophy behind Wordverify, and it's a barebones simple approach to accomplishing that: it allows you to change not just the codeword you need to enter, but also the phrase that asks or demands that you enter it.

This means that the only defeat of my implementation of wordverify requires a human element to go to my blog, see the phrase Please enter 'confront' without the quotes. and realize that they need to send codeword=confront in the POST. This can be automated, yes, but if so, it's a simple matter for me of changing the codeword and the phrase so that it again requires a human element to tweak the automated script. This of course is unlikely to happen, since no one spammer cares that much about specifically spamming quietlife.net. I'd probably be retired on ad revenue alone if that were the case.

It's for this reason that I beg to differ with Jeffraham P who says that it's "cool, but easily defeated by spammers with skillz." It's not. It's easily defeated by spammers with more free time than me, intent on specifically spamming my blog. This is almost guaranteed to never happen. It's been almost a year since I wrote and installed Wordverify, and in that time I've gotten approximately 0 automated comment spam. I don't think MQL has even had a human spammer (the Centresource blog has, however, but that's another story).

The point is: comment-spamming happens because comment forms are all the same. Normal verification processes are circumventable, because they're all the same. Even obfuscated image-based verification processes are defeatable, because you simply add OCR into the mix, and, yep, they're all the same. Until there are more options in the mix, spammers are going to continue to target what gets the most bang for the buck.

So, do I think wordverify is the end-all/be-all solution to comment spam? No -- but I think it's more elegant and more to-the-point than the more irritating and convoluted obfuscated-letters-in-an-image techniques. Rather than making the test for a human more complicated, blogging software and services should work on making the process more variable and harder to automate.

  • http://jonathanhickman.com jonathan hickman

    Surely most bloggers don’t get enough traffic to warrant having this extra security feature, do they?

    I would consider my blog to be a "small blog." I get about a hundred page views a day, and I get about 40 spams a day (almost all on old posts).

  • Cubes

    Why not just have a large selection of instructional phrases which require an understanding of the English language (I'm thinking of the old PC games where the license key was something like "enter the fifth word on page 37 of the manual"). Switch randomly from "Enter 'foo' wthout the quotes" to "Type in the third letter of the last word of this sentence" to "Count the number of words in this sentence" and so on. I guess the spammers could watch and figure it out, but if everyone came up with their own set of Q&As, they'd have a much harder time of it because it wouldn't just be screen scraping and OCR, they'd have to have humans back in the mix, a lot.

  • http://chris.quietlife.net Chris

    Yeah, One of my "todos" for the wordverify plugin itself is to allow for an array of phrases/words to be rotated randomly..

    And even with the phrase/word there's a lot of flexibility to be had.. "What is the opposite of hot?" (cold), or "What were the reasons for the decline of the Roman empire?"

  • http://blog.mxchange.org Quix0r

    As Chris wrote: The most comment spam is automated. So the bots are sending their POST-request to wp-comments.php, right?

    My idea for my anti-spam plugin was that I insert a blog-unique authorization key into the "action" part of the form-tag. E.g.: wp-comments-post-xxxxx.php. By xxxxx is around 50 chars long and a salted MD5/SHA1 hash from many "unique" data like your server's IP number, the modification timestamp of the plugin and the constant WP_SECRET (plus an own secret key which you have typed).

    This plugin is currently stopping 99.9% of my comment spam here. :)

  • http://86x54.info Alex

    Very interesting message. Thank you!

  • http://jprestonian.blogspot.com/ Jeffraham Prestonian

    What I mean my "easy" is that the verification word is transmitted to the browser as plain text -- it takes very little savvy to search for "Please enter * without the quotes." if you're interesting in scripting a spambot to defeat this. I can't program a lick, but I bet I could do it in an hour or so.
    .

  • http://chris.quietlife.net Chris

    it takes very little savvy to search for “Please enter * without the quotes.” if you’re interesting in scripting a spambot to defeat this. I can’t program a lick, but I bet I could do it in an hour or so.

    As I said before: It’s easily defeated by spammers with more free time than me, intent on specifically spamming my blog. This is almost guaranteed to never happen.

    This is not a plugin designed to keep people from commenting on my blog. It's a plugin designed to eliminate automated comment spam. Any script you write to specifically spam my blog is as easily broken by changing the phrase or word as it was for you to write.

  • http://blog.mxchange.org Quix0r

    :-) Please note that this trick is as good as long as the spambot is not analyzing your HTML code to search for the authorization key. :-( This can be avoided by adding some complex JavaScript code which "renders" the key.

    Well, I guess it's only a question of time until their bots has the capability to run JavaScript without any trouble. :-/

  • http://chris.quietlife.net Chris

    What would they search for?