The Debian archive is known to be one of the largest software
collections available in the free software world. With more than
16,000 source packages and 30,000 binary packages, users sometimes have trouble
finding packages that are relevant to them. Debian developer Enrico
Zini has been working on infrastructure to solve this problem.
During the recent mini-debconf
Paris, Enrico gave a talk presenting what he has
been working on in the last few years, which "hasn't gotten yet the
attention
it deserves".
Enrico is
known in the Debian community for the introduction of debtags, a system
used to classify all packages using facets.
Each facet describes a specific kind of property: type of user-interface,
programming language it's written in, type of document manipulated,
purpose of the software, etc. His most recent work builds on that. It is
available in Debian and Ubuntu
in the apt-xapian-index
package. Its purpose is to allow advanced queries over the database of
available packages.
Users of apt-xapian-index
He started by presenting some early users of the infrastructure. The most
widely know is Ubuntu's software center. Its search feature provides
results almost instantly thanks to apt-xapian-index. But it is
a very simple interface that doesn't exploit many of the advanced
features provided by the apt-xapian-index.
Another early adopter, making use of some more advanced features, is
GoPlay!. It's a
graphical user interface to find games. It makes use of debtags to
classify games so that you can browse, for example, all 3D action/arcade
games related to cars. GoPlay has even been extended to be a more generic
debtags based package browser and the package now also provides
GoLearn!, GoAdmin!, GoNet!, GoOffice!, GoSafe!, and GoWeb!.
Fuss-launcher is
an application launcher and not a package browser, but by using
apt-xapian-index, it's able to reuse information provided at the package
level to make it easier to find installed applications. Package
descriptions tend to be more verbose than those embedded in .desktop files.
Enrico also showed another nice feature to the audience: if you drag a
document onto its window, it will show you a list of applications that can
open it.
Last but not least, apt-xapian-index provides a command line search
tool that is vastly superior to the traditional apt-cache
search: it's axi-cache search (axi stands for
apt-xapian-index). Enrico compared the output of a search on the letter
"r". While apt-cache spits out an infinite list of packages containing
this letter somewhere in the description, axi-cache only
listed packages related to GNU R. He also demonstrated the contextual tab
completion. It makes it easy to use debtags and to refine your search. Once
you have typed a first keyword, the tab-completion for the second one only
contains keywords or debtags that are actually able to provide more
restrictive results. Advanced queries with logical operations (AND, OR,
NOT, XOR) are also supported.
Features of the backend
Enrico then dived into the internals. Xapian's search engine is at the
root of this infrastructure. He likes it because it's a simple library
(i.e. no daemon) and it has nice Python bindings. While apt-xapian-index's
core work is to index the descriptions of all the packages, it actually
stores much more and can be easily extended with plugins (written in
Python).
For instance, the information stored encompasses:
words appearing in the description of the packages (including the translated
descriptions if the user uses a non-English locale);
their origin;
their section;
their size and installed size;
the time they have been first seen;
icons, categories, descriptions from the .desktop files they
contain (through app-install-data);
aliases for names of some popular applications that are not
available on Linux (for instance "excel" maps to the debtag
office::spreadsheet).
He already has plans to store more: adding popularity contest data (see wishlist
bugs #602180 and #602182) will make it possible to
sort query results in a useful way. The most widely used applications are
good choices when it comes to community support, and they are likely of
better quality due to the larger user base. Adding timestamps of the last
installation/upgrade/removal, will make it easier to pin-point a regression
to a specific package update.
The generated index is world-readable and can be used from any
application provided it can use the Xapian library—which is written
in C++ but has bindings for Perl, Python, PHP, Java, Tcl, C#, and
Ruby.
Call for experimentation
Enrico believes that many useful applications have yet to be invented
on top of apt-xapian-index's features. He's calling for experimentation
and asking for new ideas. The only practical limit that he has encountered
is the size of the index: currently it varies between 50 Mb (Debian
unstable without translation) and 70 Mb (Debian stable/testing/unstable
with one translation). He would like it to not grow over 100 Mb since it's
installed by default (due to aptitude recommending it) and he's not
comfortable with the idea of using more than 20% of the disk footprint
of a basic install just for this service. That's why the index was
configured to not store the position of the terms: it's thus not possible
to find out packages whose description contains the word "statistical"
immediately followed by the word "computing". You can however find
those which have both terms somewhere in their description.
Enrico wondered if apt-xapian-index offers too much freedom.
That could explain why few people experimented with it despite his
numerous blog posts
with code samples and information on how to get started using it.
But it's not difficult to imagine use cases for this data. It could be used
to extend tools like rc-alert or wnpp-alert, for example. They provide a long
list of Debian packages that are looking for some help and are
installed on the machine. With apt-xapian-index, it would be possible to
restrict the results to the set of packages written in a specific programming
language or for a particular desktop environment.
The more likely explanation is that too few people know about
the tool. There are many more itches to scratch where
apt-xapian-index's features could be very useful, and my guess is
that Enrico's wishes will eventually come true.
Comments (9 posted)
Brief items
If I go on record with my official opinion that you sir are indeed
crazy, does that qualify me for a reimbursement check from Red Hat
corporate for services rendered as an independent contractor?
--
jef"would love to get paid just for having an opinion"spaleta
Second, I believe that the 6-months release cycle everyone is doing
right now is crazy. It gives just some weeks of development time between
one release and another, and limited time to freeze and fix bugs, so the
developers must run against the clock and depend on the time after the
release to fix remaining or not that throughly tested issues with updates.
--
Eugeni Dodonov
Some folks at LPC suggested we should switch from grub to syslinux
rather than grub2. Meego uses syslinux. I have little clue how both
compare, but maybe it's worth considering syslinux given that we already
use it for the cd booting and maybe we should consolidate our options
and use syslinux everywhere?
--
Lennart Poettering
Comments (none posted)
There is
a
new release of the openSUSE Medical Version available. It includes a
long list of specialized software of interest to the medical industry.
"
TEMPO is open source software for 3D visualization of brain
electrical activity. TEMPO accepts EEG file in standard EDF format and
creates animated sequence of topographic maps. Topographic maps are
generated over 3D head model and user is able to navigate around head and
examine maps from different viewpoints."
Comments (none posted)
The third milestone in the openSUSE 11.4 development series is available
for testing. M3 includes LibreOffice and systemd is available for testing.
Full Story (comments: none)
A
terse set of
notes from the Ubuntu Developer Summit, recently concluded in Orlando,
has been posted. "
This page summarizes many of the outcomes of the
event, and for each track there is a link to further detailed notes. Please
note: these are proceedings and plans, and some of these things may not get
completed as planned for whatever reason. As such, please read this list as
a set of goals, and not a promise of what Ubuntu 11.04 will
include."
Comments (none posted)
Distribution News
Debian GNU/Linux
The release team has an update on the release of Debian 6.0 "squeeze".
Topics include release notes, freeze status, bug squashing parties, and
current blocker bugs.
Full Story (comments: none)
Debian Project Leader Stefano Zacchiroli has a few bits about what he's
been up to recently. Topics include the squeeze release, sprints, events,
and delegations.
Full Story (comments: none)
The Debian Multimedia Maintainers have been busy getting multimedia
applications ready for squeeze. Click below to see what's in and what's
out.
Full Story (comments: none)
The Debian kernel team had a very productive meeting in Paris recently.
Click below for a summary.
Full Story (comments: none)
The Debian Women project has announced a series of training sessions which
will be held on IRC by experienced community members. "
The main goal
of this initiative is to encourage more people, and specifically women, to
contribute to Debian while introducing them to different aspects of the
Debian Project. Topics will span over a wide range of subjects related to
daily Debian maintenance efforts as well as advanced tasks."
Full Story (comments: none)
Fedora
Fedora project leader Jared Smith has sent out a message intending to
clarify the Fedora Board's decision to exclude SQLNinja.
"
Considering these questions against the other security tools that
were commonly mentioned in feedback I received (such as tcpdump), it is
pretty easy to see how they're different than SQLNinja. I should also note
that much of the objections to our decision were against blocking security
tools in general, not the SQLNinja package specifically. (In my own
limited investigation, I have yet to find a single security professional
who was actively using the tool before our decision.)"
The question will apparently be revisited at some future time.
Full Story (comments: 10)
John Poelstra welcomes Robyn Bergeron as the new Fedora Program Manager. "
Through the end of 2010 and a little bit beyond, I will be working along side Robyn Bergeron to transition my official Fedora responsibilities to her. This will include getting the Fedora 15 team schedules into shape, feature wrangling, bugzilla maintenance, and any number of other things. Robyn and I are committed to making this transition as smooth, complete and timely as we can, and expect the transition to be completed before the Fedora 15 feature submission deadline."
Full Story (comments: none)
The Fedora Project has sent out a reminder that Fedora 12 will reach end of
life on December 2, 2010.
Full Story (comments: none)
Click below for a recap of the November 15 meeting of the Fedora board.
Topics include a draft charter for a Community Working Group, elections,
and more. You can also see Máirín Duffy's
summary
of the meeting.
Full Story (comments: none)
Mandriva Linux
Eugeni Dodonov looks at the schedule Mandriva 2010.2 (expected December 22)
and Mandriva 2011 (expected May 30, 2011). "
Starting with Mandriva
2011 release, the release policy for Mandriva will change to 1 release per
year. This will allow us to develop even greater releases, and - of course
- will give us more time to test, validate and further improve the overall
quality of the release."
Full Story (comments: 1)
Ubuntu family
Click to see the minutes from the November 16 meeting of the Ubuntu
technical board. Topics include KDE micro version, couchdb on lucid, and
ARB exception proposal.
Full Story (comments: none)
Newsletters and articles of interest
Comments (none posted)
The H has a
review
of RHEL 6. "
Heaps of new features become apparent when comparing the
RHEL 6 [2.6.32] kernel with the version 2.6.18 kernel of RHEL 5, although more than a few of them are already old hat in many other distributions. For instance, the Completely Fair Scheduler (CFS) highlighted by Red Hat has been part of the Linux kernel since version 2.6.23. The "tickless" kernel, which stops the timer interrupt from going off a hundred or a thousand times per second when a system is idle, is already well-tested. This trick reduces both the power consumption and the basic load of RHEL 6 systems that operate as virtualised guests, which frees up the host CPU for productive tasks."
Comments (11 posted)
PCWorld
reviews
Mint 10. "
Launched in 2006, Linux Mint has quickly become the third most popular Linux distribution out there behind only Ubuntu and Fedora, and version 10 makes it easy to see why. Based on Ubuntu 10.10, or Maverick Meerkat, Julia offers numerous enhancements that put it at the forefront of usability."
Comments (none posted)
Susan Linton
takes
a look at PCLinuxOS. "
PCLinuxOS is a rolling release distribution, which means users can usually update through the package management rather than perform a fresh install every six months. But a few times a year developers release Quarterly Updates for new users or machines. Recently it was that time again when several varieties of PCLOS saw new releases."
Comments (none posted)
Linux Planet
takes a
look at grml. "
You don't lack for options with grml. The boot
menu not only offers the standard options to get into grml, but a FreeDOS
option, a minimal BSD (MirOS bsd4grml), PXE boot, hardware detection tool,
and Memtest. You also can choose to load grml entirely into RAM in case you
need the CD-ROM for something, and it's faster. You can use it on a USB
stick instead of a CD. There are several failsafe options if you have
trouble booting grml due to incompatible hardware. In short - you have
options." (LWN
looked at grml back
in April 2006.)
Comments (none posted)
Page editor: Rebecca Sobol
Next page: Development>>