This posts will describe the pre-processing basics.
The three phases:
- Pre-processing
- Segmentation
- Characterizing
To make segmenting the characters harder they usually insert random pixels (noise) into the images.
![]() |
Noisy pixels |
As seen in the image above, there are a lot of noisy pixels added and some of the numbers are faded out a little.
When looked at closely the noisy pixels look like this:
A:255 R:189 G:21 B:86
Where as the normal pixels look like this:
A:255 R:15 G:12 B:31
Conclusion: The black pixels that we want are ALWAYS round the same value as the other values.
Note: the A value or alpha layer sets the transparency wich is not used here.
Solution: Remove all pixels where the difference is higher than 30. Some pixels will remain but are not part of a number. So we also remove pixels if they aren't in a group bigger than 5 pixels.
Result:
![]() |
Captcha cleaned |
After these steps you are able to perfectly segment all the characters.
I f'ing hate captchas time to destroy them with maths
BeantwoordenVerwijderennice post
BeantwoordenVerwijderen