Captcha's are dead: Captcha basics : pre-processing

So before writing my updates about the captchas I wanted to explain that there a few phases that occure every time again. (wich are all solveable with enough resources)
This posts will describe the pre-processing basics.
The three phases:

Pre-processing
Segmentation
Characterizing

Pre-processing:

To make segmenting the characters harder they usually insert random pixels (noise) into the images.

Noisy pixels

As seen in the image above, there are a lot of noisy pixels added and some of the numbers are faded out a little.

When looked at closely the noisy pixels look like this:
A:255 R:189 G:21 B:86

Where as the normal pixels look like this:
A:255 R:15 G:12 B:31

Conclusion: The black pixels that we want are ALWAYS round the same value as the other values.
Note: the A value or alpha layer sets the transparency wich is not used here.

Solution: Remove all pixels where the difference is higher than 30. Some pixels will remain but are not part of a number. So we also remove pixels if they aren't in a group bigger than 5 pixels.

Result:

Captcha cleaned

After these steps you are able to perfectly segment all the characters.

Captcha's are dead

donderdag 30 september 2010

Captcha basics : pre-processing

2 opmerkingen: