Thursday, 28 June 2012

OCR and Blogspot's Word Verification

Have you noticed the new-style word verification on some Blogger blogs? As of late, the interface has switched to offering two "words" to check that you're a human.

Except... that's not quite the whole story. While one half is still a standard word verification, the other half is now taken from an image - either scanned or photographed. Lately (as in this example) I've been getting a load of things that look like photos of house numbers.

I was quite pleased when I found out what's actually going on here. While the standard distorted letters are still being used as a "captcha" for verifying commenters, the other half is being used to collect vast sets of data for optical character recognition (OCR) from images. Every time a blogger does word verification, they're telling Google what they think the letters and numbers in the image are. Google can store up responses for the same image from dozens of bloggers, to get a "right answer," and use these image/label pairs to improve their OCR.

You can test this for yourself by typing utter nonsense for the image part of the verification: it still publishes your comment. It doesn't seem to check whether you're entering even approximately the right number of characters - presumably all the analysis is done at a later stage.


Tabor said...

I hate word verifications but I hate spammers more.

Mama Hen said...

That is interesting! I have thought sometimes I typed the wrong number and it still went through. I hope you are doing well my friend! have a great day!

Mama Hen

Elizabeth Braun said...

How interesting! Thanks for the insight.=)

Charlotte Klein said...

Wow really? I didn't know that. Will have to try!

christine said...

how intriguing!

Post a Comment

Thanks for dropping in! I'd love to hear what you have to say, and if you leave a URL, I'll be round to visit you soon. (Comment moderation is on because the spam has become overwhelming!)

Related Posts Plugin for WordPress, Blogger...