Leading items
The Grumpy Editor's guide to RSS aggregators
This article is part of the LWN Grumpy Editor series. |
Most sites with news-oriented content export one or more files with information about the most recently-posted articles; LWN's is over here. An RSS aggregator will grab the headline files from sites of interest and present them, in some unified format, to the reader. The result is a single interface to new postings from a multitude of sites, and an end to the tedious business of plowing through a long list of bookmarks.
There is a huge variety of RSS aggregators out there. To narrow things down, your editor concentrated on standalone utilities with graphical interfaces. There are some console-based aggregators available, and quite a few web-based sites and systems. Your editor, believing (hoping) that an interface designed specifically for the aggregation task will work best, has chosen to pass over the other approaches for now.
When looking at RSS aggregators, there are a few issues to think about:
- How hard is it to get sites into the tool? Most, but not all,
aggregators can have an RSS feed URL dropped into them, making the
task easy. Just about every aggregator can import a feed list in the
OPML format, which makes switching between them easy.
- Which feed formats are supported? All aggregators can handle most
varieties of RSS; the newer Atom format is not yet as widely
supported.
- How does the tool help with organizing feeds? As the list of feeds
grows long, it is natural to want to organize them into categories.
After all, it does not do to mix those serious, work-oriented sites
with the more frivolous fare (LWN, say).
- Does the tool make it easy to keep up with a large number of feeds? A
tool which makes it easy to pass through a mixed presentation of all
new articles (perhaps limited to a specific category) will be faster
than one which required each site to be explicitly "opened."
- How does the tool handle updates? LWN's RSS feed accounts for a huge part of our total traffic, and the situation is probably the same for other sites. If your aggregator is pulling the feed every ten minutes, you are helping to create a great deal of wasted traffic. The defaults for polling intervals should be conservative, and, when available, the aggregator should use the update time suggestions found in the feed itself. There is no point in polling the "cute puppy of the day" site several times each hour.
Various other factors come into play as well, as will be seen in the discussions of the individual tools, below.
Akregator
Akregator is a KDE-based
tool with a reasonably long history. It is able to handle both RSS and
Atom feeds.
Akregator provides a file manager-like navigation pane on the left, allowing the user to file feeds in a hierarchical system of folders. Each entry includes the number of unread articles for that feed - a nice feature that is not provided by all aggregators. Clicking on a folder will display a mixture of articles from all feeds in that folder. A prominent button allows the user to mark all articles as being read. It is also possible to mark articles as being "important." The display can be filtered (by way of a pulldown menu) so that only important, new, or unread articles are shown. A search bar at the top can be used to further limit the results to those matching a given string. Of the tools reviewed, Akregator is probably the most flexible in how it can be told to select articles for display.
While most aggregators hand off the task of displaying web pages to a browser, akregator will, by default, display selected pages internally, using a tabbed interface. This behavior can be changed, of course, and a middle-click sends the URL to an external browser in any case.
For some reason, it is not possible to drag a feed URL from firefox and drop it into an akregator window. So firefox users have to copy-and-paste the URL into the "new feed" dialog. Dropping a URL from konqueror does work, however. Feeds can be configured with their own archiving and update interval preferences; akregator does not appear to use update intervals supplied with the feeds themselves. If desired, akregator can generate notifications when new articles are found.
Overall, akregator feels like a quick, flexible, and solid tool; definitely one of the better aggregators out there.
Blam
Blam is a GNOME-based, C#/Mono application; it would appear to lack a web
site of its own. It is one of the simpler applications, lacking features
found in some of the other aggregators.
The blam left pane is a simple, alphabetical list of feeds; there is no ability to rearrange or group them. A total count of unread articles is given, but there is no user-visible per-feed count. (Actually, there is - but the default width of the left pane hides it). There is no ability to mix articles from multiple feeds into a single stream. Marking a feed as read requires accessing a pulldown menu. Unlike almost every other aggregator, blam sorts articles (by default) from the oldest to the newest.
Formatting of RSS items is done with gecko, with visually pleasing results. Clicking on a URL displays the page in firefox; there does not appear to be an option to make blam work with other browsers.
Blam does not automatically poll feeds by default; an explicit user action is required. If automatic polling is turned on, the default interval is fifteen minutes, which is rather short. Blam can handle Atom feeds, but appears unable to work with feeds requiring authentication. Blam does not appear to be able to perform notifications, though it does put an icon into the GNOME notification area.
Overall, your editor's opinion is that blam has some potential and a solid base for the creation of a powerful tool. But the current version, despite its 1.8.2 number, is not ready for widespread use.
Liferea
Liferea (the "Linux feed
reader") is a GNOME-based tool with a number of capabilities. It
can handle Atom feeds, and can also handle feeds with enclosures (the sort
normally used with podcasts). Update intervals provided with feeds are
respected (though they can be overridden by the user). Liferea can do
notifications if so desired.
Despite its GNOME origins, Liferea has a large number of configuration options; only akregator compares on that score. It can be set up to automatically download enclosures into a user-specified directory, so those who follow podcasts can find new files waiting for them without having to explicitly grab them. Liferea can be quickly configured to work with a large variety of external browsers. Unfortunately, the switch controlling whether already-read articles are displayed is hidden inside the configuration dialogs; that adds up to a fair amount of clicking if the user wants to change the display mode often.
Liferea has a plugin mechanism which can be used to load filters for feeds of interest. There is a respectable list of filters, many of which generate specialized RSS feeds from web sites.
In general, Liferea is a pleasant and powerful tool - arguably the most advanced of the GNOME-based aggregators.
RSSOwl
RSSOwl is a feed reader written on
Java. Your editor, it must be admitted, felt some trepidation when
yum wanted to download over 120MB of packages to install this
thing, but the investigative spirit cannot balk at such obstacles. So down it
came, along with its vast Java life support system. It's not every RSS
aggregator which requires eclipse just to install.
A quote on the RSSOwl site reads "Simply the best RSS reader. Fast,
lightweight and cross platform
". Your editor begs to differ on the
"fast, lightweight" portion of that claim. Not only was RSSOwl not fast,
but, while it was running, nothing on the system was fast. It may
be that, on a different Java platform, things might be different. But, on
your editor's 1GB-memory system, RSSOwl managed to put everything into
full-scale thrash mode.
When first started, RSSOwl maximizes its window, a behavior which your editor finds to be flat-out rude. Once it gets itself established (and has been politely told how much screen space it may use), it is a reasonably capable aggregator. It comes with a long list of built-in feeds, and it has a search capability for finding more. Your editor, however, needed his system back and was not able to allow a search to run to completion.
RSSOwl does not, by default, render HTML in article descriptions. This behavior can be changed; in the process dragging the gecko engine into the mix. Feeds are grouped hierarchically in the left pane, but it is not possible to mix articles from multiple feeds. Opening a feed requires a double-click - RSSOwl is the only aggregator reviewed which requires extra clicks in this way. Each feed opens in its own tab. The search feature is more capable than most, with the ability to work with boolean expressions.
For whatever reason, RSSOwl is able to export an RSS feed to a PDF file. That must be useful to somebody, somewhere.
RSSOwl handles Atom feeds, and it can deal with feeds requiring authentication. There is also an interface to AmphetaRate, which can be used to generate recommendations for other sites of interest.
RSSOwl is certainly a capable tool, and it has some unique features. At its current level of performance, however, it is not particularly usable - at least on the Fedora platform.
Straw
Straw is a GNOME-based
aggregator written in Python. Its 0.26 version number suggests a young
project, but the first Straw release happened back in 2002. Straw is a
reasonably capable feed reader, but it has a couple of quirks.
One of those is that there is no hierarchical ordering of RSS feeds. Instead, each feed may be assigned one or more keywords, and the view of feeds can be restricted to a specific keyword. For added fun, the set of legal keywords must be managed in a separate dialog; until a keyword has been officially created in this manner, Straw will not acknowledge its existence. Once the keywords have been established, the left-pane view can be restricted to any one keyword.
Browsing through feeds is reasonably quick, once one gets the hang of Straw's keyboard bindings, which use a lot of upper-case characters. If one types lower-case keystrokes at the Straw window, the reward is an unlabeled text entry field which materializes toward the bottom of the screen; experimentation shows that this field can be used to move directly to a feed by typing its name. There is no way to mix articles from multiple feeds.
Straw does allow the configuration of per-feed update intervals, though it does not appear to use feed-supplied intervals. There is a reasonable search capability, but the resulting window behaves a bit strangely. Articles from multiple feeds will appear there, but the normal keyboard commands will not step through them - it is necessary to use the mouse.
Despite its relatively long history, Straw feels unfinished to your editor. There are enough questionable user interface decisions to make Straw relatively difficult to use - though somebody, clearly, likes it that way.
Sage
There are a few RSS aggregators which have been implemented as Firefox
extensions, but the most advanced of those appears to be Sage. This aggregator is well
integrated into the browser, which does present certain advantages.
The Sage screen has three panes. The left column contains a hierarchical list of subscribed feeds above a window containing a list of headlines from the currently-selected feed. The bulk of the window, however, contains a "newspaper style" rendering of the feed text in a somewhat strange two-column layout with a fair amount of empty space. Clicking on a title will pull up the full page. Sage allows the organization of this window to be changed by way of style sheets; predictably, a fair number of customized style sheets are available.
Sage's feed discovery feature is nice: bring up a site of interest and click on the little magnifying glass icon. The Sage code will dig through the page and present any feeds it finds, allowing the user to subscribe to any or all of them. No more time spent looking for that little "XML" icon.
There does not appear to be any option allowing the configuration of update intervals. Sage is not able to display a mixture of feeds on a single screen. There is also no ability to search for strings in feed text (though the normal Firefox search mechanism can be used in the article display screen).
Sage is a slick and well-developed product, and there is real value in integrating the aggregator into the browser. If nothing else, there's one less window hanging around and cluttering up the screen. Still, the task of displaying a page is somewhat different from that of finding pages to look at in the first place. A tool which maintains its focus on the latter task should be able to provide a better interface than the Swiss army knife approach of cramming all of the tools into a single package.
Conclusion
On that note, one might well ask: how well do the current tools work at enabling us to find the articles of interest to us, quickly? The current readers have some nice features, and your editor favors akregator and liferea as the ones which are the most productive at this time. If your purpose is to keep up with the latest from a variety of news sites, either of those applications will do the job nicely.
Your editor can't help but feel that much of the RSS and aggregation technology we are seeing now is just a stage in a longer transition, however. The net is not just about dispatches from news sites. People are using web logs, RSS feeds, "planet" sites and aggregator software in an attempt to organize, follow, and participate in conversations. When evaluated for that purpose, current RSS aggregators have quite a bit of ground to cover. Don Marti has written some worthwhile comments on this topic.
So there is some ground to be covered, yet. And that, in turn, suggests that having a number of active development projects in this area is a good thing. If the developers behind these applications can go beyond mere aggregation, they stand a good chance of creating a new and powerful interface to the net and the discussions taking place there. Your editor, while pleased with the state of these tools as they exist now, is looking forward to where they will go from here.
Gutenberg 2.0: the birth of open content
A previous LWN.net feature examined the parallels between open source and open access, which strives for the free online availability of the academic knowledge distilled into research papers. Although it has some particular characteristics of its own, open access can be considered part of a wider move to gain free online access to general digital content.The roots of this open content movement, as it came to be called, go back to before the Internet existed, and when even computers were relatively rare beasts. In 1971, the year Richard Stallman joined the MIT AI Lab, Michael Hart was given an operator's account on a Xerox Sigma V mainframe at the University of Illinois. Since he estimated this computer time had a nominal worth of $100 million, he felt he had an obligation to repay this generosity by using it to create something of comparable and lasting value.
His solution was to type in the US Declaration of Independence, roughly 5K of ASCII, and to attempt to send it to everyone on ARPANET (fortunately, this trailblazing attempt at spam failed). His insight was that once turned from analogue to digital form, a book could be reproduced endlessly for almost zero additional cost what Hart termed "Replicator Technology". By converting printed texts into etexts, he was able to create something whose potential aggregate value far exceeded even the heady figure he put on the computing time he used to generate it.
Hart chose the name "Project Gutenberg" for this body of etexts, making a bold claim that they represented the start of something as epoch-making as the original Gutenberg revolution. Indeed, he goes further: he sees the original Gutenberg as the well-spring of the Industrial Revolution, and his own project as the precursor of the next Industrial Revolution, where Replicator Technology will be applied not just to digital entities as with Project Gutenberg but to analogue ones too.
The Replicator idea is similar to one of the key defining characteristics of
free software: that it can be copied endlessly, at almost no marginal cost.
Hart's motivation for this move the creation of a huge permanent store of
human knowledge is very different from Stallman's reason for starting the
GNU project, which is powered by his commitment to spreading freedom. But on
the Project Gutenberg site, there
is a
discussion about the ambiguity of the
word "free" that could come straight from Stallman: "The word free in the
English language does not distinguish between free of charge and freedom.
.... Fortunately almost all Project Gutenberg ebooks are free of charge and
free as in freedom.
"
There are other interesting parallels between the two men. After they had their respective epiphanies, both labored almost entirely alone to begin with Hart entering page after page of books into a computer, and Stallman coding the first few programs of the GNU project. Even 20 years after Project Gutenberg had begun, Hart had only created 10 ebooks (today, the figure is 17,000). Given the dedication required, it is no surprise that both are driven men, sustained by their sense of moral duty and of the unparalleled possibilities for changing the world that the digital realm offers.
Both, too, were aided enormously as the Internet grew and spread, since it allowed the two projects to adopt a distributed approach for their work. In the case of Project Gutenberg, this was formalized with the foundation of the Distributed Proofreaders team in October 2000; since then - and thanks in part to a Slashdotting in November 2002 - hundreds of books are being turned into ebooks every month.
Moreover, just as free software paid back the debt by creating programs that pushed Internet adoption to even higher levels, so Project Gutenberg returned the compliment by making key early titles like "Zen and the Art of the Internet" (June 1992) and "The Hitchhikers Guide to the Internet" (September 1992) available to help new Internet users find their way around.
The Internet was also the perfect low-cost distribution medium for the digital creations of Hart and Stallman. After starting out at the University of Illinois, Project Gutenberg was mirrored at the University of North Carolina, under the auspices of Paul Jones, one of the pioneers in facilitating free access to all kinds of digital files. In 1992, SunSITE was launched there, designed as "a central repository for a collection of public-domain software, shareware and other electronic material such as research articles and electronic images" according to the press release of the time. SunSITE became iBiblio.org in 2000 (after briefly turning into MetaLab in 1998), and received a $4 million grant from the Center for the Public Domain, set up by Red Hat co-founders Bob Young and Marc Ewing. Over time, iBiblio became Project Gutenberg's official host and primary distribution site.
To the collection of open content at SunSITE was soon added an early GNU/Linux archive, managed successively by Jonathan Magid, Erik Troan, and Eric Raymond. Given this close association between SunSITE and GNU/Linux, it was only natural that it became the host for the Linux Documentation Project (LDP) when it was founded in 1992 by Matt Welsh, and this soon grew into another important early collection of free content. The LDP began with the Linux FAQ, and expanded to include a kernel hackers guide and system administrator guide when Michael K. Johnson and Lars Wirzenius joined the project. These texts were originally created in LaTeX, but documentation later appeared in the then-new HTML. Around the same time, in April 1993, there were discussions between people like Tim Berners-Lee, Guido van Rossum and Nathan Torkington about the idea of working with Project Gutenberg to distribute HTML versions of its etexts, in part, presumably, to use the well-established Project Gutenberg to help promote the fledgling Web format.
An early concern about the LDP materials was that they might be published commercially without permission. To avoid this, a fairly restrictive license was employed, which allowed reproduction in electronic or printed form, but only non-commercially, and without modifications. This was later relaxed, and the current license allows derivative works. This issue of whether to allow changes has been a vexed one from the earliest days of online content: what were probably the first digital documents available on a network, the RFCs (which first appeared in 1969, even before ARPANET), had also forbidden modifications.
Since Project Gutenberg's materials are almost exclusively drawn from the public domain (a few copyrighted works have been included with the author's permission), it might be expected that the license would allow any kind of use, including modifications. However, it imposes a number of conditions on those who wish to use the name Project Gutenberg in the ebooks they distribute; in this case, only verbatim copies are permitted, and commercial distributors must pay royalties. If all references to the Project are stripped out, leaving the bare text, the latter can be used in any way.
One other condition for etexts distributed under the Project Gutenberg name is worth noting. The license stipulates:
Just as the GPL does for software, the Project Gutenberg license insists that the "source code" of etexts distributed in non-ASCII formats be freely available.
In fact, an explicit connection between Project Gutenberg and free software is to be found at the top of every page on the Project Gutenberg Web site, which offers thanks to those who wrote the programs which the site employs GNU/Linux, Apache, PostgreSQL, PHP, Perl and Python and a link to the Free Software Foundation.
Licensing proved to be the crucial issue for freely-available materials, and it was only when it was fully resolved that open content really began to take off. The next feature in this series will look at how that happened, and what some of the immediate consequences were.
Glyn Moody writes about open source and open content at opendotdotdot.
Page editor: Jonathan Corbet
Next page:
Security>>