LWN.net Logo

Development

Introducing the OCRopus Project

The OCRopus Project is a new open-source optical character recognition (OCR) effort that was launched this week by Google:

OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

According to the FAQ document, OCRopus is mainly intended to be used for character recognition of scanned and digitally photographed text. Output will be in HTML+CSS format. The OCRopus plug-in architecture will support multiple character recognition plug-ins. Scanning of non-English text will be provided by language-specific plug-in modules. The Processing Steps diagram gives a graphical overview of the code flow.

The software is being released under the Apache license, it is written in C++ and Python. One of the main components of OCRopus is Tesseract OCR, which was released as open-source code by HP and UNLV in 2005.

The lead OCRopus developer is Professor Thomas Breuel from the German Research Center for Artificial Intelligence in Kaiserslautern. Funding has been set aside to support a number of graduate students.

The source code is available for an early release of the project: "The technology preview release is basically the first check-in of the source code into the subversion repository. What you can expect is that this code performs about as well as Tesseract in terms of character-level performance, but that is able to cope better with non-trivial layouts. There is no packaging, binary distribution, or full autoconf yet." The getting started document explains the dependencies and shows how to build the software.

The project roadmap calls for an alpha release in the third quarter of 2007, a beta release in the first quarter of 2008 and a 1.0 release in the third quarter of 2008.

Open-source contributions are being requested: "We are hoping for contributions by the open source community in areas such as adapting the system to additional languages, creating a Gnome desktop application, integration with Gnome desktop search, web-based tools for proofing and training, language modeling, additional character recognition engines, and other useful tools and add-ons."

Help is being requested for porting to non-Linux platforms. Support for KDE is not yet mentioned, but should be possible with a bit of developer effort.

Comments (none posted)

System Applications

Database Software

PostgreSQL Weekly News

The April 8, 2007 edition of the PostgreSQL Weekly News is online with the latest PostgreSQL DBMS articles and resources.

Full Story (comments: none)

SQLite 3.3.15 released

Version 3.3.15 of SQLite, a lightweight DBMS, is out. "An annoying bug introduced in 3.3.14 has been fixed. There are also many enhancements to the test suite."

Comments (none posted)

Filesystem Utilities

Earth 0.2 released

Stable version 0.2 of Earth is out with bug fixes and other improvements. "Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes filesystems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data." See the What's New document for change details.

Comments (none posted)

Interoperability

Samba 3.0.25rc1 released

Version 3.0.25rc1 of Samba has been announced. "This is the first release candidate of the Samba 3.0.25 code base and is provided for testing only. An RC release means that we are close to the final release but the code may still have a few remaining minor bugs. This release is *not* intended for production servers. There has been a substantial amount of development since the 3.0.23/3.0.24 series of stable releases. We would like to ask the Samba community for help in testing these changes as we work towards the next significant production upgrade Samba 3.0 release."

Full Story (comments: none)

Mail Software

Sendmail 8.14.1 released

Version 8.14.1 of the Sendmail mail transfer agent has been announced. "Sendmail, Inc., and the Sendmail Consortium announce the availability of sendmail 8.14.1 which fixes some bugs, e.g., If a milter rejected a recipient the MTA still kept it in its list of recipients and delivered to it if the transaction was accepted. The new DaemonPortOptions which begin with a lower case character can now be set."

Comments (none posted)

Web Site Development

Dimdim 1.6.0 alpha released (SourceForge)

Version 1.6.0 alpha of Dimdim, a web conferencing application, has been announced, it features usability improvements. "With Dimdim you can show Presentations, Applications and Desktops to any other person over the internet without installing anythign on the Attendee side. You can chat, show your webcam and talk with others in the meeting."

Comments (none posted)

Release of Remo 0.1.4 alpha

Version 0.1.4 alpha of Remo, the Rule Editor for ModSecurity, is out with a number of new features.

Full Story (comments: none)

Desktop Applications

Audio Applications

Ardour 2.0rc1 released

The first release candidate of Ardour 2.0, a multi-track audio workstation, is out. "A couple of weeks after 2.0 beta12, the Ardour team brings you 2.0rc1 , and the OS X Tiger universal DMG. Dozens of bug fixes, a few usability improvements, even a couple of new features (e.g. rename & delete snapshots). This is first release candidate for 2.0, but it is missing last minute tweaks, specifically up to date and complete translations. We hope to release RC2 within the next 7-10 days which will hopefully be the final release before 2.0." See the full release announcement for more details.

Comments (none posted)

Data Visualization

Asymptote 1.25 released

Release 1.25 of Asymptote is out with some path changes. "Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for technical drawing. Labels and equations are typeset with LaTeX, for high-quality PostScript output. A major advantage of Asymptote over other graphics packages is that it is a programming language, as opposed to just a graphics program."

Comments (none posted)

Desktop Environments

Compiz/Beryl merger is official

The Compiz and Beryl projects have sent out an announcement that their merger is now official. There's a lot of details to be worked out, but the decision to proceed has been made. "We will create a code review panel consisting of the best developers from each community who will see that any code included in a release package meets the highest standards and is suitable for distribution in an officially supported package. " There's no word on the naming issue, though. (LWN looked at the merger proposal back in March).

Full Story (comments: 14)

GNOME 2.18.1 released

Version 2.18.1 of the GNOME desktop environment has been released. "This is the first release in a series of point releases for the 2.18 branch. Come and see all the bug fixing, all the new translations and all the updated documentation brought to you by the wonderful team of GNOME contributors! While development has started on the GNOME 2.19/2.20 road, work on the stable branch continues to make it even more solid."

Full Story (comments: none)

GNOME Software Announcements

The following new GNOME software has been announced this week: You can find more new GNOME software releases at gnomefiles.org.

Comments (none posted)

KDE Software Announcements

The following new KDE software has been announced this week: You can find more new KDE software releases at kde-apps.org.

Comments (none posted)

KDE Commit-Digest (KDE.News)

The April 8, 2007 edition of the KDE Commit-Digest has been announced. The content summary says: "Bluetooth support in Solid. 'Breadcrumb" navigation widget from Dolphin is made more modular to allow use in other KDE contexts. Support for different caret (text cursor) styles in Konsole. Various bugfixes in TagLib. Better AIM protocol file transfer support in Kopete. KWord gets the ability (through Kross scripting) to use an OpenOffice.org instance to import from supported file formats. KPackage starts to be ported to the SMART package management scheme..."

Comments (none posted)

The Road to KDE 4: Strigi and File Information Extraction (KDE.News)

The KDE.News "Road to KDE4" series is back. "This week I am featuring Strigi, an information extraction subsystem that is being fully deployed for KDE 4.0. KDE has previously had the ability to extract information about files of various types, and has used them in a variety of functional contexts, such as the Properties Dialog. Strigi promises many improvements over the existing versions."

Comments (none posted)

Xorg Software Announcements

The following new Xorg software has been announced this week: More information can be found on the X.Org Foundation wiki.

Comments (none posted)

Educational Software

TCExam 4.0.011 released (SourceForge)

Version 4.0.011 of TCExam has been announced. "TCExam is a Web-based Assessment Software system (e-exam or CBT - Computer Based Testing) that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams. The software is used all over the world by universities, schools, companies and independent teachers."

Comments (none posted)

Electronics

Atom 200704 released

OpenCollector has announced the release of version 200704 of Atom. "Atom is a new functional hardware description language embedded in Haskell. Unlike Confluence and HDCaml, Atom is an adventure above RTL. Borrowing on ideas developed by Arvind, Hoe, and others, Atom compiles rule-based circuit descriptions down to Verilog for simulation and synthesis. This method of design works particularly well for complex control logic."

Comments (none posted)

Financial Applications

LedgerSMB 1.2.0 released

Version 1.2.0 of LedgerSMB is out with a security fix. "LedgerSMB 1.2.0 has been released, completing a comprehensive SQL injection audit of the code inherited from SQL-Ledger. Numerous SQL injection issues were fixed. In fact, most fields were not properly quoted and escaped. These problems should affect all known versions of SQL-Ledger as well. The fix was delayed because the scale of the changes made required extensive testing-- these were not trivial changes. Users are advised to upgrade as soon as possible."

Full Story (comments: none)

Games

Globulation2 alpha 22 released

The alpha 22 release of Globulation2, a real time strategy game, has been announced. "Globulation 2 brings a new type of gameplay to RTS games. The player chooses the number of units to assign to various tasks, and the units do their best to satisfy the requests. This allows players to manage more units and focus on strategy rather than individual unit's jobs. Globulation 2 also features AI allowing single-player games or any possible combination of human-computer teams."

Comments (2 posted)

Interoperability

Wine Weekly Newsletter

The April 10, 2007 edition of the Wine Weekly Newsletter is online with coverage of the Wine project. Topics include: Winebot, X Error, No Packages Yet For Ubuntu 7.04, Fedora Packages, On the Fly Debugging, Sound Test and Nautilus File Management.

Comments (none posted)

Miscellaneous

Alerttail 0.1.1 released

Version 0.1.1 of Alerttail is available for download. "Alerttail executes actions when "some text" has been written to a file. This software tails a file and when a line matches some text pattern alerttail will execute a list of actions defined on it's own configuration file. Imagine you want to be warned when some text is written to a log file, you could just configure alerttail asking it to notify you with a gtk notify popup."

Comments (none posted)

Languages and Tools

C

Getting Familiar with GCC Parameters (O'ReillyNet)

Mulyadi Santosa discusses ways to optimize gcc compilation on O'Reilly. "gcc (GNU C Compiler) is actually a collection of frontend tools that does compilation, assembly, and linking. The goal is to produce a ready-to-run executable in a format acceptable to the OS. For Linux, this is ELF (Executable and Linking Format) on x86 (32-bit and 64-bit). But do you know what some of the gcc parameters can do for you? If you're looking for ways to optimize the resulted binary, prepare for a debugging session, or simply observe the steps gcc takes to turn your source code into an executable, getting familiar with these parameters is a must. So, please read on."

Comments (19 posted)

Caml

Caml Weekly News

The April 10, 2007 edition of the Caml Weekly News is out with new Caml language articles.

Full Story (comments: none)

Haskell

Call for Contributions - HC and A Report

A Call for Contributions has gone out for the May, 2007 edition of the Haskell Communities & Activities Report. The submission deadline is May 2. "If you are working on any project that is in some way related to Haskell, write a short entry and submit it to the me. Even if the project is very small or unfinished or you think it is not important enough -- please reconsider and submit an entry anyway!"

Comments (none posted)

Java

Controlling Threads by Example (O'Reilly)

Viraj Shetty works with Java threads on O'Reilly. "One of the useful features in Java is the built-in support for writing multithreaded applications. A thread is an execution path in the program that has its own local variables, program counter, and lifetime. If the task being executed on the thread takes a long time, there needs to be a mechanism to stop, monitor, pause, and resume the task. This article will take a nontrivial example with threads and refactor the code to include these capabilities."

Comments (none posted)

Real-time Java, Part 1: Using the Java language for real-time systems (developerWorks)

IBM developerWorks begins a series on real-time Java. "This article, the first in a five-part series on real-time Java, describes the key challenges to using the Java language to develop systems that meet real-time performance requirements. It presents a broad overview of what real-time application development means and how runtime systems must be engineered to meet the requirements of real-time applications. The authors introduce an implementation that addresses real-time Java challenges through a combination of standards-based technologies."

Comments (none posted)

Perl

Weekly Perl 6 mailing list summary (O'Reilly)

The April 3, 2007 edition of the Weekly Perl 6 mailing list summary is out with coverage of the latest Perl 6 developments.

Comments (none posted)

PHP

PHP OpenID 2.0.0-rc1 released

Version 2.0.0-rc1 of PHP OpenID has been announced. "PHP OpenID 2.0.0-rc1 implements revision 294 of the OpenID 2 specification. I'd very much like it if you can give it a try. With only a few changes to your application, you should be able to upgrade from version 1.2.2. Otherwise, the library transparently supports OpenID 1 and OpenID 2 relying parties and servers. This release also incorporates numerous bugfixes and feedback from library users."

Full Story (comments: none)

Python

pycairo release 1.4.0

Release 1.4.0 of pycairo, a set of Python bindings for the Cairo multi-platform 2D graphics library, has been announced. A number of new methods have been added and some obsolete methods have been removed.

Comments (none posted)

Python 2.5.1 release candidate 1 is out

Release candidate 1 of Python 2.5.1 has been announced. "This is the first bugfix release of Python 2.5. Python 2.5 is now in bugfix-only mode; no new features are being added. According to the release notes, over 150 bugs and patches have been addressed since Python 2.5, including a fair number in the new AST compiler (an internal implementation detail of the Python interpreter)."

Full Story (comments: none)

Python-URL! - weekly Python news and links

The April 11, 2007 edition of the Python-URL! is online with a new collection of Python article links.

Full Story (comments: none)

Tcl/Tk

Tcl-URL! - weekly Tcl news and links

The April 11, 2007 edition of the Tcl-URL! is online with new Tcl/Tk articles and resources.

Full Story (comments: none)

XML

Introducing RDFa, Part Two (O'Reilly)

Bob DuCharme presents part two of an O'Reilly XML.com series on RDFa. "In this second part of a two-part series, Bob DuCharme concludes his introduction of RDFa--a new, XHTML-friendly standard syntax for RDF metadata that allows you to embed RDF metadata into the Web in a novel way."

Comments (none posted)

Editors

Emacs 22 on April 23

A brief message has been sent to the emacs-devel list stating that the final Emacs 22 pre-test release will happen on April 16. If all goes well, the long-awaited Emacs 22.1 release will happen on Monday, April 23. The last major Emacs release was in 2001. (LWN looked at the upcoming Emacs release last October).

Full Story (comments: 29)

Libraries

Pantheios 1.0.1 beta 24 released (SourceForge)

Version 1.0.1 beta 24 of Pantheios is available with bug fixes. "Pantheios is an Open Source C/C++ Logging API library, offering an optimal combination of 100% type-safety, efficiency, genericity and extensibility. It is simple to use and extend, highly-portable (platform and compiler-independent) and, best of all, it upholds the C tradition of you only pay for what you use."

Comments (none posted)

Version Control

colorsvn 0.3.2 announced

Stable version 0.3.2 of colorsvn has been released. "colorsvn is the Subversion output colorizer. Colorsvn was extracted from kde-sdk and was extended with build process and configuration."

Comments (1 posted)

Page editor: Forrest Cook
Next page: Linux in the news>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds