Introducing the OCRopus Project
The
OCRopus Project
is a new open-source optical character recognition (OCR) effort
that was
launched
this week by Google:
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.
OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
According to the
FAQ document, OCRopus is mainly intended to be used for character
recognition of scanned and digitally photographed text.
Output will be in HTML+CSS format.
The OCRopus plug-in architecture will support
multiple character recognition plug-ins.
Scanning of non-English text will be provided by language-specific
plug-in modules.
The
Processing Steps diagram gives a graphical overview of the
code flow.
The software is being released under the Apache license, it is written
in C++ and Python. One of the main components of OCRopus is
Tesseract OCR,
which was released as open-source code by HP and UNLV in 2005.
The lead OCRopus developer is Professor Thomas Breuel
from the German Research Center for Artificial Intelligence in
Kaiserslautern. Funding has been set aside to support a
number of graduate students.
The
source code
is available for an early release of the project:
"The technology preview release is basically the first check-in of the source code into the subversion repository. What you can expect is that this code performs about as well as Tesseract in terms of character-level performance, but that is able to cope better with non-trivial layouts. There is no packaging, binary distribution, or full autoconf yet."
The getting started document explains the dependencies and shows
how to build the software.
The project roadmap calls for an alpha release in the third quarter of
2007, a beta release in the first quarter of 2008 and a 1.0 release
in the third quarter of 2008.
Open-source contributions are being requested:
"We are hoping for contributions by the open source community in areas such
as adapting the system to additional languages, creating a Gnome desktop
application, integration with Gnome desktop search, web-based tools for
proofing and training, language modeling, additional character recognition
engines, and other useful tools and add-ons."
Help is being requested for porting to non-Linux platforms.
Support for KDE is not yet mentioned, but should be possible with
a bit of developer effort.
Comments (none posted)
System Applications
Database Software
PostgreSQL Weekly News
The April 8, 2007 edition of the PostgreSQL Weekly News
is online with the latest PostgreSQL DBMS articles and resources.
Full Story (comments: none)
SQLite 3.3.15 released
Version 3.3.15 of
SQLite,
a lightweight DBMS, is out.
"
An annoying bug introduced in 3.3.14 has been fixed. There are also many enhancements to the test suite."
Comments (none posted)
Filesystem Utilities
Earth 0.2 released
Stable version 0.2 of
Earth
is out with bug fixes and other improvements.
"
Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes filesystems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data."
See the
What's New document for change details.
Comments (none posted)
Interoperability
Samba 3.0.25rc1 released
Version 3.0.25rc1 of Samba has been announced.
"
This is the first release candidate of the Samba 3.0.25 code
base and is provided for testing only. An RC release means
that we are close to the final release but the code may still
have a few remaining minor bugs. This release is *not* intended
for production servers. There has been a substantial amount
of development since the 3.0.23/3.0.24 series of stable releases.
We would like to ask the Samba community for help in testing
these changes as we work towards the next significant production
upgrade Samba 3.0 release."
Full Story (comments: none)
Mail Software
Sendmail 8.14.1 released
Version 8.14.1 of the Sendmail mail transfer agent
has been announced.
"
Sendmail, Inc., and the Sendmail Consortium announce the availability of sendmail 8.14.1 which fixes some bugs, e.g.,
If a milter rejected a recipient the MTA still kept it in its list of recipients and delivered to it if the transaction was accepted.
The new DaemonPortOptions which begin with a lower case character can now be set."
Comments (none posted)
Web Site Development
Dimdim 1.6.0 alpha released (SourceForge)
Version 1.6.0 alpha of Dimdim, a web conferencing application,
has been announced, it features usability improvements.
"
With Dimdim you can show Presentations, Applications and Desktops to any other person over the internet without installing anythign on the Attendee side. You can chat, show your webcam and talk with others in the meeting."
Comments (none posted)
Release of Remo 0.1.4 alpha
Version 0.1.4 alpha of Remo, the Rule Editor for ModSecurity, is out
with a number of new features.
Full Story (comments: none)
Desktop Applications
Audio Applications
Ardour 2.0rc1 released
The first release candidate of
Ardour 2.0,
a multi-track audio workstation, is out.
"
A couple of weeks after 2.0 beta12, the Ardour team brings you 2.0rc1 , and the OS X Tiger universal DMG. Dozens of bug fixes, a few usability improvements, even a couple of new features (e.g. rename & delete snapshots). This is first release candidate for 2.0, but it is missing last minute tweaks, specifically up to date and complete translations. We hope to release RC2 within the next 7-10 days which will hopefully be the final release before 2.0."
See the full
release announcement
for more details.
Comments (none posted)
Data Visualization
Asymptote 1.25 released
Release 1.25 of
Asymptote
is out with some path changes.
"
Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for technical drawing. Labels and equations are typeset with LaTeX, for high-quality PostScript output.
A major advantage of Asymptote over other graphics packages is that it is a programming language, as opposed to just a graphics program."
Comments (none posted)
Desktop Environments
Compiz/Beryl merger is official
The Compiz and Beryl projects have sent out an announcement that their
merger is now official. There's a lot of details to be worked out, but the
decision to proceed has been made. "
We will create a code review panel consisting of the best
developers from each community who will see that any code included in a
release package meets the highest standards and is suitable for distribution
in an officially supported package. " There's no word on the naming
issue, though. (LWN
looked at
the merger proposal back in March).
Full Story (comments: 14)
GNOME 2.18.1 released
Version 2.18.1 of the GNOME desktop environment has been released.
"
This is the first
release in a series of point releases for the 2.18 branch.
Come and see all the bug fixing, all the new translations and all the
updated documentation brought to you by the wonderful team of GNOME
contributors! While development has started on the GNOME 2.19/2.20
road, work on the stable branch continues to make it even more solid."
Full Story (comments: none)
GNOME Software Announcements
The following new GNOME software has been announced this week:
You can find more new GNOME software releases at
gnomefiles.org.
Comments (none posted)
KDE Software Announcements
The following new KDE software has been announced this week:
You can find more new KDE software releases at
kde-apps.org.
Comments (none posted)
KDE Commit-Digest (KDE.News)
The April 8, 2007 edition of the
KDE Commit-Digest has been
announced.
The content summary says:
"
Bluetooth support in Solid. 'Breadcrumb" navigation widget from Dolphin is made more modular to allow use in other KDE contexts. Support for different caret (text cursor) styles in Konsole. Various bugfixes in TagLib. Better AIM protocol file transfer support in Kopete. KWord gets the ability (through Kross scripting) to use an OpenOffice.org instance to import from supported file formats. KPackage starts to be ported to the SMART package management scheme..."
Comments (none posted)
The Road to KDE 4: Strigi and File Information Extraction (KDE.News)
The KDE.News "Road to KDE4" series
is back. "
This week I am featuring Strigi,
an information extraction subsystem that is being fully deployed for KDE
4.0. KDE has previously had the ability to extract information about files of
various types, and has used them in a variety of functional contexts, such as
the Properties Dialog. Strigi promises many improvements over the existing
versions."
Comments (none posted)
Xorg Software Announcements
The following new Xorg software has been announced this week:
More information can be found on the
X.Org Foundation wiki.
Comments (none posted)
Educational Software
TCExam 4.0.011 released (SourceForge)
Version 4.0.011 of TCExam
has been announced.
"
TCExam is a Web-based Assessment Software system (e-exam or CBT - Computer Based Testing) that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams. The software is used all over the world by universities, schools, companies and independent teachers."
Comments (none posted)
Electronics
Atom 200704 released
OpenCollector
has announced the release of version 200704 of
Atom.
"
Atom is a new functional hardware description language embedded in Haskell. Unlike Confluence and HDCaml, Atom is an adventure above RTL. Borrowing on ideas developed by Arvind, Hoe, and others, Atom compiles rule-based circuit descriptions down to Verilog for simulation and synthesis. This method of design works particularly well for complex control logic."
Comments (none posted)
Financial Applications
LedgerSMB 1.2.0 released
Version 1.2.0 of LedgerSMB is out with a security fix.
"
LedgerSMB 1.2.0 has been released, completing a comprehensive SQL
injection audit of the code inherited from SQL-Ledger. Numerous SQL
injection issues were fixed. In fact, most fields were not properly
quoted and escaped. These problems should affect all known versions of
SQL-Ledger as well. The fix was delayed because the scale of the
changes made required extensive testing-- these were not trivial changes.
Users are advised to upgrade as soon as possible."
Full Story (comments: none)
Games
Globulation2 alpha 22 released
The alpha 22 release of
Globulation2,
a real time strategy game, has been announced.
"
Globulation 2 brings a new type of gameplay to RTS games. The player chooses the number of units to assign to various tasks, and the units do their best to satisfy the requests. This allows players to manage more units and focus on strategy rather than individual unit's jobs. Globulation 2 also features AI allowing single-player games or any possible combination of human-computer teams."
Comments (2 posted)
Interoperability
Wine Weekly Newsletter
The April 10, 2007 edition of the
Wine Weekly Newsletter
is online with coverage of the Wine project. Topics include:
Winebot, X Error, No Packages Yet For Ubuntu 7.04, Fedora Packages,
On the Fly Debugging, Sound Test and Nautilus File Management.
Comments (none posted)
Miscellaneous
Alerttail 0.1.1 released
Version 0.1.1 of
Alerttail
is available for download.
"
Alerttail executes actions when "some text" has been written to a file.
This software tails a file and when a line matches some text pattern alerttail will execute a list of actions defined on it's own configuration file.
Imagine you want to be warned when some text is written to a log file, you could just configure alerttail asking it to notify you with a gtk notify popup."
Comments (none posted)
Languages and Tools
C
Getting Familiar with GCC Parameters (O'ReillyNet)
Mulyadi Santosa
discusses ways to optimize gcc compilation on O'Reilly.
"
gcc (GNU C Compiler) is actually a collection of frontend tools that does compilation, assembly, and linking. The goal is to produce a ready-to-run executable in a format acceptable to the OS. For Linux, this is ELF (Executable and Linking Format) on x86 (32-bit and 64-bit). But do you know what some of the gcc parameters can do for you? If you're looking for ways to optimize the resulted binary, prepare for a debugging session, or simply observe the steps gcc takes to turn your source code into an executable, getting familiar with these parameters is a must. So, please read on."
Comments (19 posted)
Caml
Caml Weekly News
The April 10, 2007 edition of the Caml Weekly News
is out with new Caml language articles.
Full Story (comments: none)
Haskell
Call for Contributions - HC and A Report
A
Call for Contributions has gone out for the May, 2007
edition of the Haskell Communities & Activities Report.
The submission deadline is May 2.
"
If you are working on any project that is in some way related
to Haskell, write a short entry and submit it to the me. Even
if the project is very small or unfinished or you think it is
not important enough -- please reconsider and submit an entry
anyway!"
Comments (none posted)
Java
Controlling Threads by Example (O'Reilly)
Viraj Shetty
works with Java threads on O'Reilly.
"
One of the useful features in Java is the built-in support for writing multithreaded applications. A thread is an execution path in the program that has its own local variables, program counter, and lifetime. If the task being executed on the thread takes a long time, there needs to be a mechanism to stop, monitor, pause, and resume the task.
This article will take a nontrivial example with threads and refactor the code to include these capabilities."
Comments (none posted)
Real-time Java, Part 1: Using the Java language for real-time systems
(developerWorks)
IBM developerWorks
begins
a series on real-time Java. "
This article, the first in a
five-part series on real-time Java, describes the key challenges to using
the Java language to develop systems that meet real-time performance
requirements. It presents a broad overview of what real-time application
development means and how runtime systems must be engineered to meet the
requirements of real-time applications. The authors introduce an
implementation that addresses real-time Java challenges through a
combination of standards-based technologies."
Comments (none posted)
Perl
Weekly Perl 6 mailing list summary (O'Reilly)
The April 3, 2007 edition of the
Weekly Perl 6 mailing list summary is out with coverage of the latest
Perl 6 developments.
Comments (none posted)
PHP
PHP OpenID 2.0.0-rc1 released
Version 2.0.0-rc1 of PHP OpenID has been announced.
"
PHP
OpenID 2.0.0-rc1 implements revision 294 of the OpenID 2
specification. I'd very much like it if you can give it a try. With
only a few changes to your application, you should be able to upgrade
from version 1.2.2. Otherwise, the library transparently supports
OpenID 1 and OpenID 2 relying parties and servers.
This release also incorporates numerous bugfixes and feedback from
library users."
Full Story (comments: none)
Python
pycairo release 1.4.0
Release 1.4.0 of pycairo, a set of Python bindings for the Cairo
multi-platform 2D graphics library,
has been announced. A number of new methods have been added and some
obsolete methods have been removed.
Comments (none posted)
Python 2.5.1 release candidate 1 is out
Release candidate 1 of Python 2.5.1 has been announced.
"
This is the first bugfix release of Python 2.5. Python 2.5 is now
in bugfix-only mode; no new features are being added. According to
the release notes, over 150 bugs and patches have been addressed
since Python 2.5, including a fair number in the new AST compiler
(an internal implementation detail of the Python interpreter)."
Full Story (comments: none)
Python-URL! - weekly Python news and links
The April 11, 2007 edition of the Python-URL! is online with
a new collection of Python article links.
Full Story (comments: none)
Tcl/Tk
Tcl-URL! - weekly Tcl news and links
The April 11, 2007 edition of the Tcl-URL! is online with new
Tcl/Tk articles and resources.
Full Story (comments: none)
XML
Introducing RDFa, Part Two (O'Reilly)
Bob DuCharme presents
part two of an O'Reilly XML.com series on RDFa.
"
In this second part of a two-part series, Bob DuCharme concludes his introduction of RDFa--a new, XHTML-friendly standard syntax for RDF metadata that allows you to embed RDF metadata into the Web in a novel way."
Comments (none posted)
Editors
Emacs 22 on April 23
A brief message has been sent to the emacs-devel list stating that the
final Emacs 22 pre-test release will happen on April 16. If all
goes well, the long-awaited Emacs 22.1 release will happen on Monday,
April 23. The last major Emacs release was in 2001. (LWN
looked at the upcoming Emacs
release last October).
Full Story (comments: 29)
Libraries
Pantheios 1.0.1 beta 24 released (SourceForge)
Version 1.0.1 beta 24 of Pantheios
is available with bug fixes.
"
Pantheios is an Open Source C/C++ Logging API library, offering an optimal combination of 100% type-safety, efficiency, genericity and extensibility. It is simple to use and extend, highly-portable (platform and compiler-independent) and, best of all, it upholds the C tradition of you only pay for what you use."
Comments (none posted)
Version Control
colorsvn 0.3.2 announced
Stable version 0.3.2 of
colorsvn
has been released.
"
colorsvn is the Subversion output colorizer. Colorsvn was extracted from kde-sdk and was extended with build process and configuration."
Comments (1 posted)
Page editor: Forrest Cook
Next page: Linux in the news>>