Distributions
A high-level search interface for Debian packages
The Debian archive is known to be one of the largest software
collections available in the free software world. With more than
16,000 source packages and 30,000 binary packages, users sometimes have trouble
finding packages that are relevant to them. Debian developer Enrico
Zini has been working on infrastructure to solve this problem.
During the recent mini-debconf
Paris, Enrico gave a talk presenting what he has
been working on in the last few years, which "hasn't gotten yet the
attention
it deserves
".
Enrico is known in the Debian community for the introduction of debtags, a system used to classify all packages using facets. Each facet describes a specific kind of property: type of user-interface, programming language it's written in, type of document manipulated, purpose of the software, etc. His most recent work builds on that. It is available in Debian and Ubuntu in the apt-xapian-index package. Its purpose is to allow advanced queries over the database of available packages.
Users of apt-xapian-index
He started by presenting some early users of the infrastructure. The most widely know is Ubuntu's software center. Its search feature provides results almost instantly thanks to apt-xapian-index. But it is a very simple interface that doesn't exploit many of the advanced features provided by the apt-xapian-index.
![[GoPlay!]](https://static.lwn.net/images/2010/debian-goplay-sm.png)
Another early adopter, making use of some more advanced features, is GoPlay!. It's a graphical user interface to find games. It makes use of debtags to classify games so that you can browse, for example, all 3D action/arcade games related to cars. GoPlay has even been extended to be a more generic debtags based package browser and the package now also provides GoLearn!, GoAdmin!, GoNet!, GoOffice!, GoSafe!, and GoWeb!.
Fuss-launcher is an application launcher and not a package browser, but by using apt-xapian-index, it's able to reuse information provided at the package level to make it easier to find installed applications. Package descriptions tend to be more verbose than those embedded in .desktop files. Enrico also showed another nice feature to the audience: if you drag a document onto its window, it will show you a list of applications that can open it.
Last but not least, apt-xapian-index provides a command line search tool that is vastly superior to the traditional apt-cache search: it's axi-cache search (axi stands for apt-xapian-index). Enrico compared the output of a search on the letter "r". While apt-cache spits out an infinite list of packages containing this letter somewhere in the description, axi-cache only listed packages related to GNU R. He also demonstrated the contextual tab completion. It makes it easy to use debtags and to refine your search. Once you have typed a first keyword, the tab-completion for the second one only contains keywords or debtags that are actually able to provide more restrictive results. Advanced queries with logical operations (AND, OR, NOT, XOR) are also supported.
Features of the backend
Enrico then dived into the internals. Xapian's search engine is at the root of this infrastructure. He likes it because it's a simple library (i.e. no daemon) and it has nice Python bindings. While apt-xapian-index's core work is to index the descriptions of all the packages, it actually stores much more and can be easily extended with plugins (written in Python).
For instance, the information stored encompasses:
words appearing in the description of the packages (including the translated descriptions if the user uses a non-English locale);
their origin;
their section;
their size and installed size;
the time they have been first seen;
icons, categories, descriptions from the .desktop files they contain (through app-install-data);
aliases for names of some popular applications that are not available on Linux (for instance "excel" maps to the debtag office::spreadsheet).
He already has plans to store more: adding popularity contest data (see wishlist bugs #602180 and #602182) will make it possible to sort query results in a useful way. The most widely used applications are good choices when it comes to community support, and they are likely of better quality due to the larger user base. Adding timestamps of the last installation/upgrade/removal, will make it easier to pin-point a regression to a specific package update.
The generated index is world-readable and can be used from any application provided it can use the Xapian library—which is written in C++ but has bindings for Perl, Python, PHP, Java, Tcl, C#, and Ruby.
Call for experimentation
Enrico believes that many useful applications have yet to be invented on top of apt-xapian-index's features. He's calling for experimentation and asking for new ideas. The only practical limit that he has encountered is the size of the index: currently it varies between 50 Mb (Debian unstable without translation) and 70 Mb (Debian stable/testing/unstable with one translation). He would like it to not grow over 100 Mb since it's installed by default (due to aptitude recommending it) and he's not comfortable with the idea of using more than 20% of the disk footprint of a basic install just for this service. That's why the index was configured to not store the position of the terms: it's thus not possible to find out packages whose description contains the word "statistical" immediately followed by the word "computing". You can however find those which have both terms somewhere in their description.
Enrico wondered if apt-xapian-index offers too much freedom. That could explain why few people experimented with it despite his numerous blog posts with code samples and information on how to get started using it. But it's not difficult to imagine use cases for this data. It could be used to extend tools like rc-alert or wnpp-alert, for example. They provide a long list of Debian packages that are looking for some help and are installed on the machine. With apt-xapian-index, it would be possible to restrict the results to the set of packages written in a specific programming language or for a particular desktop environment.
The more likely explanation is that too few people know about the tool. There are many more itches to scratch where apt-xapian-index's features could be very useful, and my guess is that Enrico's wishes will eventually come true.
Brief items
Distribution quotes of the week
openSUSE Medical Version 0.0.6 released
There is a new release of the openSUSE Medical Version available. It includes a long list of specialized software of interest to the medical industry. "TEMPO is open source software for 3D visualization of brain electrical activity. TEMPO accepts EEG file in standard EDF format and creates animated sequence of topographic maps. Topographic maps are generated over 3D head model and user is able to navigate around head and examine maps from different viewpoints."
openSUSE 11.4 Milestone 3
The third milestone in the openSUSE 11.4 development series is available for testing. M3 includes LibreOffice and systemd is available for testing.Ubuntu Developer Summit proceedings
A terse set of notes from the Ubuntu Developer Summit, recently concluded in Orlando, has been posted. "This page summarizes many of the outcomes of the event, and for each track there is a link to further detailed notes. Please note: these are proceedings and plans, and some of these things may not get completed as planned for whatever reason. As such, please read this list as a set of goals, and not a promise of what Ubuntu 11.04 will include."
Distribution News
Debian GNU/Linux
Squeeze Release Update - Upgrades, deep freeze info, BSPs
The release team has an update on the release of Debian 6.0 "squeeze". Topics include release notes, freeze status, bug squashing parties, and current blocker bugs.bits from the DPL: sprints, events, delegations, assets
Debian Project Leader Stefano Zacchiroli has a few bits about what he's been up to recently. Topics include the squeeze release, sprints, events, and delegations.Bits from the Debian Multimedia Maintainers
The Debian Multimedia Maintainers have been busy getting multimedia applications ready for squeeze. Click below to see what's in and what's out.Debian linux-2.6 Paris meeting
The Debian kernel team had a very productive meeting in Paris recently. Click below for a summary.Debian Women IRC Training Sessions
The Debian Women project has announced a series of training sessions which will be held on IRC by experienced community members. "The main goal of this initiative is to encourage more people, and specifically women, to contribute to Debian while introducing them to different aspects of the Debian Project. Topics will span over a wide range of subjects related to daily Debian maintenance efforts as well as advanced tasks."
Fedora
A "clarification" from Fedora on the SQLNinja decision
Fedora project leader Jared Smith has sent out a message intending to clarify the Fedora Board's decision to exclude SQLNinja. "Considering these questions against the other security tools that were commonly mentioned in feedback I received (such as tcpdump), it is pretty easy to see how they're different than SQLNinja. I should also note that much of the objections to our decision were against blocking security tools in general, not the SQLNinja package specifically. (In my own limited investigation, I have yet to find a single security professional who was actively using the tool before our decision.)" The question will apparently be revisited at some future time.
Welcoming New Fedora Program Manager Robyn Bergeron
John Poelstra welcomes Robyn Bergeron as the new Fedora Program Manager. "Through the end of 2010 and a little bit beyond, I will be working along side Robyn Bergeron to transition my official Fedora responsibilities to her. This will include getting the Fedora 15 team schedules into shape, feature wrangling, bugzilla maintenance, and any number of other things. Robyn and I are committed to making this transition as smooth, complete and timely as we can, and expect the transition to be completed before the Fedora 15 feature submission deadline."
Fedora 12 end of life
The Fedora Project has sent out a reminder that Fedora 12 will reach end of life on December 2, 2010.Fedora Board Recap 2010-11-15
Click below for a recap of the November 15 meeting of the Fedora board. Topics include a draft charter for a Community Working Group, elections, and more. You can also see Máirín Duffy's summary of the meeting.
Mandriva Linux
Next Mandriva release dates and schedule
Eugeni Dodonov looks at the schedule Mandriva 2010.2 (expected December 22) and Mandriva 2011 (expected May 30, 2011). "Starting with Mandriva 2011 release, the release policy for Mandriva will change to 1 release per year. This will allow us to develop even greater releases, and - of course - will give us more time to test, validate and further improve the overall quality of the release."
Ubuntu family
Minutes from the Technical Board meeting, 2010-11-16
Click to see the minutes from the November 16 meeting of the Ubuntu technical board. Topics include KDE micro version, couchdb on lucid, and ARB exception proposal.
Newsletters and articles of interest
Distribution newsletters
- DistroWatch Weekly, Issue 380 (November 15)
- Fedora Weekly News Issue 251 (November 10)
- openSUSE Weekly News, Issue 149 (November 13)
Red Hat Enterprise Linux 6 (The H)
The H has a review of RHEL 6. "Heaps of new features become apparent when comparing the RHEL 6 [2.6.32] kernel with the version 2.6.18 kernel of RHEL 5, although more than a few of them are already old hat in many other distributions. For instance, the Completely Fair Scheduler (CFS) highlighted by Red Hat has been part of the Linux kernel since version 2.6.23. The "tickless" kernel, which stops the timer interrupt from going off a hundred or a thousand times per second when a system is idle, is already well-tested. This trick reduces both the power consumption and the basic load of RHEL 6 systems that operate as virtualised guests, which frees up the host CPU for productive tasks."
Linux Mint 10 'Julia' Is Now Official (PCWorld)
PCWorld reviews Mint 10. "Launched in 2006, Linux Mint has quickly become the third most popular Linux distribution out there behind only Ubuntu and Fedora, and version 10 makes it easy to see why. Based on Ubuntu 10.10, or Maverick Meerkat, Julia offers numerous enhancements that put it at the forefront of usability."
PCLinuxOS Releases a Slew of Quarterly Updates (Linux Journal)
Susan Linton takes a look at PCLinuxOS. "PCLinuxOS is a rolling release distribution, which means users can usually update through the package management rather than perform a fresh install every six months. But a few times a year developers release Quarterly Updates for new users or machines. Recently it was that time again when several varieties of PCLOS saw new releases."
grml, the No-Frills Linux Rescue CD--USB (Linux Planet)
Linux Planet takes a look at grml. "You don't lack for options with grml. The boot menu not only offers the standard options to get into grml, but a FreeDOS option, a minimal BSD (MirOS bsd4grml), PXE boot, hardware detection tool, and Memtest. You also can choose to load grml entirely into RAM in case you need the CD-ROM for something, and it's faster. You can use it on a USB stick instead of a CD. There are several failsafe options if you have trouble booting grml due to incompatible hardware. In short - you have options." (LWN looked at grml back in April 2006.)
Page editor: Rebecca Sobol
Next page:
Development>>