donderdag 30 september 2010

Captcha basics : pre-processing

So before writing my updates about the captchas I wanted to explain that there a few phases that occure every time again. (wich are all solveable with enough resources)
This posts will describe the pre-processing basics.
The three phases:
  1. Pre-processing
  2. Segmentation
  3. Characterizing
Pre-processing:

To make segmenting the characters harder they usually insert random pixels (noise) into the images.
Noisy pixels

As seen in the image above, there are a lot of noisy pixels added and some of the numbers are faded out a little.

When looked at closely the noisy pixels look like this:
A:255  R:189  G:21  B:86

Where as the normal pixels look like this:
A:255  R:15 G:12 B:31

Conclusion: The black pixels that we want are ALWAYS round the same value as the other values.
Note: the A value or alpha layer sets the transparency wich is not used here.

Solution: Remove all pixels where the difference is higher than 30. Some pixels will remain but are not part of a number. So we also remove pixels if they aren't in a group bigger than 5 pixels.

Result:
Captcha cleaned

After these steps you are able to perfectly segment all the characters.

2 opmerkingen: