Saturday, April 25, 2015

user behavior - "Typocaptcha" - an alternative to CAPTCHA?



We all hate CAPTCHAs, but to some applications they're a necessary evil. Today I wondered if there's a better alternative we just haven't thought of yet. I considered the dilemma: how do you create something that is indecipherable to a computer, but readable to a human?


Then I remembered an email doing the rounds years ago along the lines of:



I cdn'uolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg: the phaonmneel pweor of the hmuan mnid. Aoccdrnig to a rseearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Scuh a cdonition is arppoiatrely cllaed Typoglycemia.



In case you can't read the above:



I couldn't believe that I could actually understand what I was reading: the phenomenal power of the human mind. According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia.




This is called Typoglycemia, and although it wasn't actually researched at Cambridge, there is an element of truth in that people find it surprisingly easy to read.


Could Typocaptcha be the future? Read these three questions:



  • Wihch anmial is bgeigr - a fox or an eplthneat?

  • Waht aianml is siad to nverr freogt?

  • Waht tpye of aimnal was Wlat Dsi'enys Dmbuo?


In case you haven't guessed it, the same answer to all three questions - is:



elephant




There are millions of possible combinations of questions, but before getting into the 'how', it all boils down to user experience.


Would Typocaptcha result in a better or worse user experience when compared to CAPTCHA?


P.S. I am aware that this would not be very accessible to visually impaired users, much like CAPTCHAs aren't.



Answer



This is not effective for keeping out a targeted attack by someone who uses a word list, such as /usr/share/dict/words, to solve your anagrams. A task like "unscramble the words in standard input, assuming the first and last letters are correct, given a word list file for the language" is probably so straightforward that it'd make a good puzzle for our Code Golf site. Sorting out words that are already anagrams, such as could and cloud, could be done with an n-gram database derived from the Project Gutenberg corpus. Then the attacker sees each clue, makes a database of correct responses with the help of Mechanical Turk, and gains the technical ability to spam your site.


If an English proficiency test like this is effective for anything, it'd be for shutting out human users who happen to live in the wrong country. If you have a license to offer your service only to customers in (say) the United States, then someone coming in through a VPN who's not a native English speaker is less likely to actually be a U.S. resident. So it might be useful for the sign-up page of an Internet music or video streaming service, which are markets that are still heavily balkanized by decades-long exclusive territorial distribution contracts. In fact this technique has been seen in the wild: Two levels of the first WarioWare game for Game Boy Advance were typo tests in Japanese, which made it hard for people who downloaded an infringing copy to play through until Nintendo released the English version of the game to the North American market later.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...