Further improvements

To improve the CAPTCHA OCR performance further, there are a number of possibilities, as follows:

  • Experimenting with different threshold levels
  • Eroding the thresholded text to emphasize the shape of characters
  • Resizing the image (sometimes increasing the image size helps)
  • Training the OCR tool on the CAPTCHA font
  • Restricting results to dictionary words

If you are interested in experimenting to improve performance, the sample data used is available at http://github.com/kjam/wswp/blob/master/data/captcha_samples. There is also a script to test the accuracy at http://github.com/kjam/wswp/blob/master/code/chp7/test_samples.py. However, the current 88 percent accuracy is sufficient for our purposes of registering an account because actual users will also make mistakes when entering CAPTCHA text. Even 10 per cent accuracy would be sufficient because the script could be run many times until successful, though this would be rather impolite to the server and may lead to your IP being blocked.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset