zaterdag 13 augustus 2011

Recaptcha OCR

Wel it has been a long time since I posted here.
I worked on it a few times, and have gone after recaptcha.
You can try it out here http://deRecaptcha.dyndns.info:2000/
Its nothing fancy but you get the idea.
For the decoder itself, go to the decoder tab.
Enjoy

If you have any questions you can always ask.

donderdag 30 september 2010

Captcha basics : pre-processing

So before writing my updates about the captchas I wanted to explain that there a few phases that occure every time again. (wich are all solveable with enough resources)
This posts will describe the pre-processing basics.
The three phases:
  1. Pre-processing
  2. Segmentation
  3. Characterizing
Pre-processing:

To make segmenting the characters harder they usually insert random pixels (noise) into the images.
Noisy pixels

As seen in the image above, there are a lot of noisy pixels added and some of the numbers are faded out a little.

When looked at closely the noisy pixels look like this:
A:255  R:189  G:21  B:86

Where as the normal pixels look like this:
A:255  R:15 G:12 B:31

Conclusion: The black pixels that we want are ALWAYS round the same value as the other values.
Note: the A value or alpha layer sets the transparency wich is not used here.

Solution: Remove all pixels where the difference is higher than 30. Some pixels will remain but are not part of a number. So we also remove pixels if they aren't in a group bigger than 5 pixels.

Result:
Captcha cleaned

After these steps you are able to perfectly segment all the characters.

Starting the blog

Well basically I plan on updating this blog with ocr solutions for captchas.
Since captchas are a pain in the ass, I'm going to post progress and results on OCR'ing captchas.
I will probably host them as a free service and not as programs.
Anyway I hope this blog will help people and if you have certain requests to ocr captchas you can always send me a message.