Your editor reads a lot of web sites. Quite a lot of web sites. This
reading has generally been a process of stepping through the bookmark list,
checking to see what is new on each of many interesting sites. Actually
going to sites to check for new news has
been an obsolete mode of operation for some time, but your editor can be a
little slow to come around, sometimes. Nonetheless, the nagging feeling
that there had to be a better way eventually got strong enough to inspire
an inquiry into the state of the art in RSS aggregators.
Most sites with news-oriented content export one or more files with
information about the most recently-posted articles; LWN's is over here. An RSS aggregator will grab the
headline files from sites of interest and present them, in some unified
format, to the reader. The result is a single interface to new postings
from a multitude of sites, and an end to the tedious business of plowing
through a long list of bookmarks.
There is a huge variety of RSS aggregators out there. To narrow things
down, your editor concentrated on standalone utilities with graphical
interfaces. There are some console-based aggregators available, and quite
a few web-based sites and systems. Your editor, believing (hoping) that an
interface designed specifically for the aggregation task will work best,
has chosen to pass over the other approaches for now.
When looking at RSS aggregators, there are a few issues to think about:
- How hard is it to get sites into the tool? Most, but not all,
aggregators can have an RSS feed URL dropped into them, making the
task easy. Just about every aggregator can import a feed list in the
OPML format, which makes switching between them easy.
- Which feed formats are supported? All aggregators can handle most
varieties of RSS; the newer Atom format is not yet as widely
supported.
- How does the tool help with organizing feeds? As the list of feeds
grows long, it is natural to want to organize them into categories.
After all, it does not do to mix those serious, work-oriented sites
with the more frivolous fare (LWN, say).
- Does the tool make it easy to keep up with a large number of feeds? A
tool which makes it easy to pass through a mixed presentation of all
new articles (perhaps limited to a specific category) will be faster
than one which required each site to be explicitly "opened."
- How does the tool handle updates? LWN's RSS feed accounts for a huge
part of our total traffic, and the situation is probably the same for
other sites. If your aggregator is pulling the feed every ten
minutes, you are helping to create a great deal of wasted traffic.
The defaults for polling intervals should be conservative, and, when
available, the aggregator should use the update time suggestions found
in the feed itself. There is no point in polling the "cute puppy of
the day" site several times each hour.
Various other factors come into play as well, as will be seen in the
discussions of the individual tools, below.
Akregator
Akregator is a KDE-based
tool with a reasonably long history. It is able to handle both RSS and
Atom feeds.
Akregator provides a file manager-like navigation pane on the left,
allowing the user to file feeds in a hierarchical system of folders. Each
entry includes the number of unread articles for that feed - a nice feature
that is not provided by all aggregators. Clicking on a folder will display
a mixture of articles from all feeds in that folder. A prominent button
allows the user to mark all articles as being read.
It is also possible to mark articles as being "important." The display can
be filtered (by way of a pulldown menu) so that only important, new, or
unread articles are shown. A search bar at the top can be used to further
limit the results to those matching a given string.
Of the tools reviewed, Akregator is probably the most flexible in how
it can be told to select articles for display.
While most aggregators hand off the task of displaying web pages to a
browser, akregator will, by default, display selected pages internally,
using a tabbed interface. This behavior can be changed, of course, and a
middle-click sends the URL to an external browser in any case.
For some reason, it is not possible to drag a feed URL from firefox and
drop it into an akregator window. So firefox users have to copy-and-paste
the URL into the "new feed" dialog. Dropping a URL from konqueror does
work, however. Feeds can be configured with their own archiving and update
interval preferences; akregator does not appear to use update intervals
supplied with the feeds themselves. If desired, akregator can generate
notifications when new articles are found.
Overall, akregator feels like a quick, flexible, and solid tool; definitely
one of the better aggregators out there.
Blam
Blam is a GNOME-based, C#/Mono application; it would appear to lack a web
site of its own. It is one of the simpler applications, lacking features
found in some of the other aggregators.
The blam left pane is a simple, alphabetical list of feeds; there is no
ability to rearrange or group them. A total count of unread articles is
given, but there is no user-visible per-feed count. (Actually, there is -
but the default width of the left pane hides it). There is no ability to
mix articles from multiple feeds into a single stream. Marking a feed
as read requires accessing a pulldown menu. Unlike almost every other
aggregator, blam sorts articles (by default) from the oldest to the newest.
Formatting of RSS items is done with gecko, with visually pleasing
results. Clicking on a URL displays the page in firefox; there does not
appear to be an option to make blam work with other browsers.
Blam does not automatically poll feeds by default; an explicit user action
is required. If automatic polling is turned on, the default interval is
fifteen minutes, which is rather short. Blam can handle Atom feeds, but
appears unable to work with feeds requiring authentication.
Blam does not appear to be able to
perform notifications, though it does put an icon into the GNOME
notification area.
Overall, your editor's opinion is that blam has some potential and a solid
base for the creation of a powerful tool. But the current version, despite
its 1.8.2 number, is not ready for widespread use.
Liferea
Liferea (the "Linux feed
reader") is a GNOME-based tool with a number of capabilities. It
can handle Atom feeds, and can also handle feeds with enclosures (the sort
normally used with podcasts). Update intervals provided with feeds are
respected (though they can be overridden by the user). Liferea can do
notifications if so desired.
Despite its GNOME origins, Liferea has a large number of configuration
options; only akregator compares on that score. It can be set up to
automatically download enclosures into a user-specified directory, so
those who follow podcasts can find new files waiting for them without having to
explicitly grab them. Liferea can be quickly configured to work with a
large variety of external browsers. Unfortunately, the switch controlling whether
already-read articles are displayed is hidden inside the configuration
dialogs; that adds up to a fair amount of clicking if the user wants to
change the display mode often.
Liferea has a plugin mechanism which can be used to load filters for feeds
of interest. There is a
respectable list of filters, many of which generate specialized RSS
feeds from web sites.
In general, Liferea is a pleasant and powerful tool - arguably the most
advanced of the GNOME-based aggregators.
RSSOwl
RSSOwl is a feed reader written on
Java. Your editor, it must be admitted, felt some trepidation when
yum wanted to download over 120MB of packages to install this
thing, but the investigative spirit cannot balk at such obstacles. So down it
came, along with its vast Java life support system. It's not every RSS
aggregator which requires eclipse just to install.
A quote on the RSSOwl site reads "Simply the best RSS reader. Fast,
lightweight and cross platform." Your editor begs to differ on the
"fast, lightweight" portion of that claim. Not only was RSSOwl not fast,
but, while it was running, nothing on the system was fast. It may
be that, on a different Java platform, things might be different. But, on
your editor's 1GB-memory system, RSSOwl managed to put everything into
full-scale thrash mode.
When first started, RSSOwl maximizes its window, a behavior which your
editor finds to be flat-out rude. Once it gets itself established (and has
been politely told how much screen space it may use), it is a reasonably
capable aggregator. It comes with a long list of built-in feeds, and it
has a search capability for finding more. Your editor, however, needed his
system back and was not able to allow a search to run to completion.
RSSOwl does not, by default, render HTML in article descriptions. This
behavior can be changed; in the process dragging the gecko engine into the mix. Feeds are
grouped hierarchically in the left pane, but it is not possible to mix
articles from multiple feeds. Opening a feed requires a double-click -
RSSOwl is the only aggregator reviewed which requires extra clicks in this
way. Each feed opens in its own tab. The search feature is more capable than
most, with the ability to work with boolean expressions.
For whatever reason, RSSOwl is able to export an RSS feed to a PDF file.
That must be useful to somebody, somewhere.
RSSOwl handles Atom feeds, and it can deal with feeds requiring
authentication. There is also an interface to AmphetaRate, which
can be used to generate recommendations for other sites of interest.
RSSOwl is certainly a capable tool, and it has some unique features. At
its current level of performance, however, it is not particularly usable -
at least on the Fedora platform.
Straw
Straw is a GNOME-based
aggregator written in Python. Its 0.26 version number suggests a young
project, but the first Straw release happened back in 2002. Straw is a
reasonably capable feed reader, but it has a couple of quirks.
One of those is that there is no hierarchical ordering of RSS feeds.
Instead, each feed may be assigned one or more keywords, and the view of
feeds can be restricted to a specific keyword. For added fun, the set of
legal keywords must be managed in a separate dialog; until a keyword has
been officially created in this manner, Straw will not acknowledge its
existence. Once the keywords have been established, the left-pane view can
be restricted to any one keyword.
Browsing through feeds is reasonably quick, once one gets the hang of
Straw's keyboard bindings, which use a lot of upper-case characters. If
one types lower-case keystrokes at the Straw
window, the reward is an unlabeled text entry field which materializes
toward the bottom of the screen; experimentation shows that this field can
be used to move directly to a feed by typing its name. There is no way to
mix articles from multiple feeds.
Straw does allow the configuration of per-feed update intervals, though it
does not appear to use feed-supplied intervals. There is a reasonable
search capability, but the resulting window behaves a bit strangely.
Articles from multiple feeds will appear there, but the normal keyboard
commands will not step through them - it is necessary to use the mouse.
Despite its relatively long history, Straw feels unfinished to your
editor. There are enough questionable user interface decisions to make
Straw relatively difficult to use - though somebody, clearly, likes it that
way.
Sage
There are a few RSS aggregators which have been implemented as Firefox
extensions, but the most advanced of those appears to be Sage. This aggregator is well
integrated into the browser, which does present certain advantages.
The Sage screen has three panes. The left column contains a hierarchical
list of subscribed feeds above a window containing a list of headlines from
the currently-selected feed. The bulk of the window, however, contains a
"newspaper style" rendering of the feed text in a somewhat strange
two-column layout with a fair amount of empty space. Clicking on a title
will pull up the full page. Sage allows the organization of this window to
be changed by way of style sheets; predictably, a fair number of
customized style sheets are available.
Sage's feed discovery feature is nice: bring up a site of interest and
click on the little magnifying glass icon. The Sage code will dig through
the page and present any feeds it finds, allowing the user to subscribe to
any or all of them. No more time spent looking for that little "XML" icon.
There does not appear to be any option allowing the configuration of update
intervals. Sage is not able to display a mixture of feeds on a single
screen. There is also no ability to search for strings in feed text
(though the normal Firefox search mechanism can be used in the article
display screen).
Sage is a slick and well-developed product, and there is real value in
integrating the aggregator into the browser. If nothing else, there's one
less window hanging around and cluttering up the screen. Still, the task
of displaying a page is somewhat different from that of finding pages to
look at in the first place. A tool which maintains its focus on the latter
task should be able to provide a better interface than the Swiss army knife
approach of cramming all of the tools into a single package.
Conclusion
On that note, one might well ask: how well do the current tools work at
enabling us to find the articles of interest to us, quickly? The current
readers have some nice features, and your editor favors akregator and
liferea as the ones which are the most productive at this time. If your
purpose is to keep up with the latest from a variety of news sites, either
of those applications will do the job nicely.
Your editor can't help but feel that much of the RSS and aggregation
technology we are seeing now is just a stage in a longer transition, however. The net is
not just about dispatches from news sites. People are using web logs, RSS
feeds, "planet" sites and aggregator software in an attempt to organize,
follow, and participate in conversations. When evaluated for that purpose,
current RSS aggregators have quite a bit of ground to cover. Don Marti has
written some
worthwhile comments on this topic.
So there is some ground to be covered, yet. And that, in turn, suggests
that having a number of active development projects in this area is a good
thing. If the developers behind these applications can go beyond mere
aggregation, they stand a good chance of creating a new and powerful interface
to the net and the discussions taking place there. Your editor, while
pleased with the state of these tools as they exist now, is looking forward
to where they will go from here.
Comments (51 posted)
March 29, 2006
This article was contributed by Glyn Moody
A previous LWN.net
feature examined the
parallels between open source and open access, which strives for the free
online availability of the academic knowledge distilled into research
papers. Although it has some particular characteristics of its own, open
access can be considered part of a wider move to gain free online access to
general digital content.
The roots of this open content movement, as it came to be called, go back to
before the Internet existed, and when even computers were relatively rare
beasts. In 1971, the year Richard
Stallman joined the MIT AI Lab, Michael Hart
was given an operator's account on a Xerox
Sigma V mainframe at the
University of Illinois. Since he estimated this computer time had a nominal
worth of $100 million, he felt he had an obligation to repay this generosity
by using it to create something of comparable and lasting value.
His solution was to type in the US Declaration of Independence, roughly 5K
of ASCII, and to attempt to send it to everyone on ARPANET (fortunately,
this trailblazing attempt at spam failed). His insight was that once turned
from analogue to digital form, a book could be reproduced endlessly for
almost zero additional cost what Hart termed "Replicator Technology". By
converting printed texts into etexts, he was able to create something whose
potential aggregate value far exceeded even the heady figure he put on the
computing time he used to generate it.
Hart chose the name "Project Gutenberg" for this body of etexts, making a
bold claim that they represented the start of something as epoch-making as
the original Gutenberg revolution. Indeed, he goes further: he sees the
original Gutenberg as the well-spring of the Industrial Revolution, and his
own project as the precursor of the next Industrial Revolution, where
Replicator Technology will be applied not just to digital entities as with
Project Gutenberg but to analogue ones too.
The Replicator idea is similar to one of the key defining characteristics of
free software: that it can be copied endlessly, at almost no marginal cost.
Hart's motivation for this move the creation of a huge permanent store of
human knowledge is very different from Stallman's reason for starting the
GNU project, which is powered by his commitment to spreading freedom. But on
the Project Gutenberg site, there
is a
discussion about the ambiguity of the
word "free" that could come straight from Stallman: "The word free in the
English language does not distinguish between free of charge and freedom.
.... Fortunately almost all Project Gutenberg ebooks are free of charge and
free as in freedom."
There are other interesting parallels between the two men. After they had
their respective epiphanies, both labored almost entirely alone to begin
with Hart entering page after page of books into a computer, and Stallman
coding the first few programs of the GNU project. Even 20 years after
Project Gutenberg had begun, Hart had only created 10 ebooks (today, the
figure is 17,000). Given the dedication required, it is no surprise that
both are driven men, sustained by their sense of moral duty and of the
unparalleled possibilities for changing the world that the digital realm
offers.
Both, too, were aided enormously as the Internet grew and spread, since it
allowed the two projects to adopt a distributed approach for their work. In
the case of Project Gutenberg, this was formalized with the foundation of
the Distributed Proofreaders team in
October 2000; since then - and thanks in part to a Slashdotting in November
2002 - hundreds of books are being turned into ebooks every month.
Moreover, just as free software paid back the debt by creating programs that
pushed Internet adoption to even higher levels, so Project Gutenberg
returned the compliment by making key early titles like "Zen and the Art of
the Internet" (June 1992) and "The
Hitchhikers Guide to the Internet"
(September 1992) available to help new Internet users find their way around.
The Internet was also the perfect low-cost distribution medium for the
digital creations of Hart and Stallman. After starting out at the University
of Illinois, Project Gutenberg was mirrored at the University of North
Carolina, under the auspices of Paul Jones,
one of the pioneers in facilitating free access to all kinds of digital
files. In 1992, SunSITE was launched there, designed as "a central
repository for a collection of public-domain software, shareware and other
electronic material such as research articles and electronic images"
according to the press release of the time. SunSITE became
iBiblio.org in 2000 (after briefly turning
into MetaLab in 1998), and received a $4
million grant from the Center for the Public Domain, set up by Red Hat
co-founders Bob Young and Marc Ewing. Over time, iBiblio became Project
Gutenberg's official host and primary distribution site.
To the collection of open content at SunSITE was soon added an early
GNU/Linux archive, managed
successively by Jonathan Magid, Erik Troan, and Eric Raymond. Given this
close association between SunSITE and GNU/Linux, it was only natural that it
became the host for the Linux Documentation Project (LDP)
when it was founded in 1992 by Matt Welsh, and this soon grew into another
important early collection of free content. The LDP began with the Linux
FAQ, and expanded to include a kernel hackers guide and system administrator
guide when Michael K. Johnson and Lars Wirzenius joined the project. These
texts were originally created in LaTeX, but documentation later appeared in
the then-new HTML. Around the same time, in April 1993, there were
discussions between people like Tim Berners-Lee, Guido van Rossum and Nathan
Torkington about the idea of working with Project Gutenberg to distribute
HTML versions of its etexts, in part, presumably, to use the
well-established Project Gutenberg to help promote the fledgling Web format.
An early concern about the LDP materials was that they might be published
commercially without permission. To avoid this, a fairly restrictive license
was employed, which allowed reproduction in electronic or printed form, but
only non-commercially, and without modifications. This was later relaxed,
and the current license allows derivative
works. This issue of whether to allow changes has been a vexed one from the
earliest days of online content: what were probably the first digital
documents available on a network, the RFCs (which first appeared in 1969,
even before ARPANET), had also forbidden modifications.
Since Project Gutenberg's materials are almost exclusively drawn from the
public domain (a few copyrighted works have been included with the author's
permission), it might be expected that the
license would allow any kind of
use, including modifications. However, it imposes a
number of conditions on those who wish to use the name Project Gutenberg in
the ebooks they distribute; in this case, only verbatim copies are
permitted, and commercial distributors must pay royalties. If all
references to the Project are stripped out, leaving the bare text, the
latter can be used in any way.
One other condition for etexts distributed under the Project Gutenberg name
is worth noting. The license stipulates:
if you provide access to or distribute copies of a Project
Gutenberg work in a format other than "Plain Vanilla ASCII" or
other format used in the official version posted on the official
Project Gutenberg-tm web site (www.gutenberg.net), you must, at no
additional cost, fee or expense to the user, provide a copy, a means
of exporting a copy, or a means of obtaining a copy upon request, of
the work in its original "Plain Vanilla ASCII" or other form.
Just as the GPL does for software, the Project Gutenberg license insists
that the "source code" of etexts distributed in non-ASCII formats be freely
available.
In fact, an explicit connection between Project Gutenberg and free software
is to be found at the top of every page on the Project Gutenberg Web site, which
offers thanks to those who wrote the programs which the site employs
GNU/Linux, Apache, PostgreSQL, PHP, Perl and Python and a link to the Free
Software Foundation.
Licensing proved to be the crucial issue for freely-available materials, and
it was only when it was fully resolved that open content really began to
take off. The next feature in this series will look at how that happened,
and what some of the immediate consequences were.
Glyn Moody writes about open source and open content at
opendotdotdot.
Comments (2 posted)
Page editor: Jonathan Corbet
Next page: Security>>