The OCRopus project launches
[Posted April 11, 2007 by corbet]
| From: |
| "mark cox" <markcox-AT-email.com> |
| To: |
| lwn-AT-lwn.net |
| Subject: |
| google Announces the OCRopus Open Source OCR System |
| Date: |
| Wed, 11 Apr 2007 09:59:42 +1000 |
http://google-code-updates.blogspot.com/2007/04/announcin...
Announcing the OCRopus Open Source OCR
System<http://google-code-updates.blogspot.com/2007/04/announcin...>
Monday,
April 09, 2007
Posted by Thomas Breuel, OCRopus Project Leader
We're happy to announce the OCRopus OCR Project <http://www.ocropus.org/>, a
Google-sponsored project to develop advanced
OCR<http://en.wikipedia.org/wiki/Optical_character_recognition>technologies
in the IUPR
research group <http://www.iupr.org/doku.php>, headed by Prof. Thomas Breuel
at the DFKI (German Research Center for Artificial Intelligence,
Kaiserslautern, Germany).
The goal of the project is to advance the state of the art in optical
character recognition and related technologies, and to deliver a high
quality OCR system suitable for document conversions, electronic libraries,
vision impaired users, historical document analysis, and general desktop
use. In addition, we are structuring the system in such a way that it will
be easy to reuse by other researchers in the field.
The OCRopus <http://www.ocropus.org/> engine is based on two research
projects: a high-performance handwriting recognizer developed in the
mid-90's and deployed by the US Census bureau, and novel high-performance
layout analysis methods.
The project is expected to run for three years and support three Ph.D.
students or postdocs. We are announcing a technology preview release of the
software under the Apache license (English-only, combining the Tesseract
character recognizer with IUPR layout analysis and language modeling tools),
with additional recognizers and functionality in future releases.
The IUPR research group has extensive experience in OCR and related
technologies, and will be basing the work on previous research and existing
software in the area. Existing software components include high-performance
handwriting recognition software that has received top evaluations by NIST
and was deployed by the US Census Bureau, the recently open sourced Tesseract
OCR system <http://sourceforge.net/projects/tesseract-ocr>, a separate
Google project for probabilistic natural language modeling, and software for
layout analysis and character recognition. The IUPR research group
gratefully acknowledges funding by the German BMBF, the state of Rhineland
Palatinate, and other public and private partners (please see
www.iupr.org<http://www.iupr.org/doku.php>for more details).
We are hoping for contributions by the open source community in areas such
as adapting the system to additional languages, creating a Gnome desktop
application, integration with Gnome desktop search, web-based tools for
proofing and training, language modeling, additional character recognition
engines, and other useful tools and add-ons.
The project web page can be found at ocropus.org <http://www.ocropus.org/>.
(
Log in to post comments)