LWN.net Logo

There is no need to actually OCR the image

There is no need to actually OCR the image

Posted Aug 28, 2006 17:43 UTC (Mon) by tack (subscriber, #12542)
In reply to: There is no need to actually OCR the image by spitzak
Parent article: Fighting image spam

I occasionally receive scanned newspaper articles that may be of interest to me.

I prefer the approach of OCRing the image and filtering that through a Bayesian classifier. One might be able to use some of the techniques described here to first optionally determine if the image likely contains a lot of text, and only then OCR it, which would help out with the CPU overhead.


(Log in to post comments)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds