LWN.net Weekly Edition for February 23, 2006
The Grumpy Editor's guide to bayesian spam filters
This article is part of the LWN Grumpy Editor series. |
In recent years, much of the development energy in anti-spam circles has gone into bayesian filters. The bayesian approach was kicked off in 2002 with the publication of Paul Graham's A plan for spam. In its simplest form, a bayesian filter keeps track of words found in email messages and, for each word, a count of how many times that word appeared in spam and in legitimate mail ("ham"). Over time, these statistics can be used to examine incoming mail and come up with a probability that each is spam.
Bayesian filters have proved to be surprisingly effective. A well-trained filter can catch a high proportion of incoming spam with a very low false positive rate. The filter will adapt to each user's particular email stream, which is an important feature. It should not be surprising that different people have wildly different legitimate email. It turns out, though, that the spam stream can also vary quite a bit. An account which looks like it could belong to a woman, for example, will tend to get messages offering to alter the sizes of different parts of the recipient's anatomy than a man's account. So the ability to tune a filter to a specific mail stream - ham and spam - will increase its accuracy.
There is quite a large selection of free bayesian filters out there. Your editor decided to have a look around to see if there is any reason to prefer one over the others. To that end, a number of characteristics were examined:
- Accuracy. A filter which does not accurately classify mail will not
be of much use, so good results in this area are required. In
particular, false positives (legitimate mail classified as spam) must
be avoided.
- Training. Bayesian filters must be trained before they become
effective. Some filters, it turns out, are easier to train than
others. In general, the training process is somewhat like
house-training a puppy: it's a painful process, involving contact with
unpleasant materials, and with a messy failure mode. And, somewhere
in the process, something you care about is likely to get chewed up.
So, in general, this is a process which one would like to be done
with quickly and not have to do again later on.
There are people who lovingly tweak and tune their spam filters the way an automobile enthusiast tweaks his car. Your editor is not one of those people. Life is too short - and too busy - to spend a lot of time screwing around with spam filters.
- Speed. The difference in performance between the fastest and slowest
filters covers two orders of magnitude. Since filtering tends to be
done in the background, speed will not be crucially important to all
users. When filtering is being done on a busy mail server, however,
processing speed can matter a lot.
- Ease of integration. How much work is it to hook a filter into the mail stream?
To carry out the tests, your editor collected two piles of mail from his personal stream; one was purely spam, and the other ham. Just over 1,000 messages from each pile were set aside to be used to train the filters. Then, 6,000 hams and 9,000 spams were fed to each filter, with the filter's verdict and processing time being recorded. Each mis-classified message was immediately fed back to the filter to further train it. Multiple runs were made with different parameters, but, in general, your editor resisted the urge to go tweaking knobs. Some of these filters offer a vast number of obscure parameters to set; one can only hope they come with reasonable defaults.
As a side note, your editor was surprised and dismayed at how difficult the task of producing pure sets of spam and ham was. The process started with mail sorted by SpamAssassin at delivery time. Your editor then passed over the entire set, twice, reclassifying each message which was in the wrong place. It was only after some early tests started reporting "false positives" that it became clear how much spam still lurked in the "ham" pile. It took more manual passes, and many passes with multiple filters, to clean them all out. The developers who claim that their filters do a better job than a human does are right - when that human is your editor, at least. It also turns out that a few incorrectly classified messages can badly skew the results; bayesian filters are easily confused if you train them badly.
Anyway, the results will be presented in five batches of 1200 hams and 1800 spams. Nothing special was done between these batches; this presentation is intended to show how the filter's behavior evolves as it is trained on more messages. All of the results are also pulled together in a summary table at the end of the article.
Bogofilter
Bogofilter was originally written by Eric Raymond shortly after Paul Graham's article was posted. It has evolved over time, and has picked up a wider community of contributors and maintainers. Bogofilter uses a modified version of the bayesian technique, with a number of knobs to tweak. It is written in C and is quite fast.
Training for bogofilter is somewhat complex; your editor was unable to train it into a stable configuration by feeding it hams and spams directly. The presence of several different training scripts in the source tree's "contrib" directory suggests that others have had to put some work into training as well. In the end, the contributed "trainbogo.sh" script appeared to do a reasonable job, but it required about three runs to get bogofilter into a stable state.
Bogofilter offers two approaches to ongoing training. By default, the filter is not trained by new messages as it classifies them. People who use bogofilter in this mode will set aside bogofilter's mistakes for later training. When the -u option is provided, however, bogofilter will train itself on all messages it feels sufficiently strongly about. Use of -u makes retraining on mistakes even more important, or the filter will become increasingly likely to misclassify mail. In general, training a bayesian filter on its own output must be done with care. It can help the filter to keep up with the spam stream as it evolves, but it also is a positive feedback loop which can go badly wrong if not carefully watched.
Your editor ran bogofilter (v1.01) in both modes (starting with a freshly trained database in each case). Here's the results:
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
bogofilter | 141 | 0 | 0.02 | 69 | 0 | 0.01 | 96 | 0 | 0.02 | 48 | 0 | 0.02 | 52 | 0 | 0.02 | 5 |
bogofilter -u | 87 | 0 | 0.05 | 54 | 0 | 0.05 | 41 | 0 | 0.05 | 45 | 0 | 0.06 | 41 | 0 | 0.09 | 32 |
Legend: Fn is the number of false negatives (spam which makes it through the filter); Fp is false positives (legitimate mail misclassified as spam), and T is the processing time (clock, not CPU), in seconds per message. The Size value at the end is the final size of the word database, in MB.
Here we see that bogofilter without the -u option tends toward around 50 missed spams out of a set of 1800 (a 2.8% error rate), but with no false positives at all. If bogofilter self-trains itself, the false negative rate drops to closer to 2.2%. As we will see, these results are not as good as some of the other filters reviewed.
Without self-training, bogofilter requires a roughly constant 0.02 seconds
of time to classify a message; with self-training, that time increases as
the word database grows.
Bogofilter is clearly fast - the fastest of all the filters reviewed here.
One of the ways in which it gains that speed
is to not bother with attachments in the mail. The web page says
"Experience from watching the token streams suggests that spam with
enclosures invariably gives itself away through cues in the headers and
non-enclosure parts.
"
Bogofilter is intended to be integrated as a simple mail filter, optimally invoked via procmail. It can place a header in processed messages (making life easy for procmail) and also returns the spam status in its exit code (making life easy for grumpy editor testing scripts). Bogofilter has options for dumping out the word database and, for a given message, listing the words which most influenced how that message was classified. Nothing special has been done to make retraining easy; most users will probably create folders of mistakes and occasionally feed them to the filter.
CRM114
An interesting - if intimidating - offering is CRM114, subtitled "the controllable regex mutilator." While the main use of CRM114 appears to be spam filtering, it has a wider scope; it can be trained, for example, to filter interesting lines out of logfiles. According to the project page:
This tool comes with a 275-page book in PDF format, and needs every one of those pages. Setting up CRM114 is not for the faint of heart; it involves the manual creation of database files and the editing of a long configuration file which, while not quite up to sendmail.cf standards, is still one of the more challenging files your editor has encountered in some time. Once all of that is done, however, the "crm" executable can be hooked into a procmail recipe in the usual way without too much trouble.
The CRM114 documentation recommends against any sort of initial training of the filter. The developers are strong believers in the "train on errors" approach, saying that there are "mathematically complicated" reasons why pre-training leads to worse results. For users who don't get the hint, they do provide a way to perform pre-training:
The prospect of hand-decoding binary spam attachments is likely to put off most people who were pondering pre-training their filters - and, one assumes, that is the desired result. Of course, one can also use the normal training commands to feed messages into the system in a pre-training mode, but the documentation doesn't say that.
While filter training can be done on the command line, users can also retrain the filter by forwarding errors back to themselves. The message must be edited to include a training command and password; the developers also recommend removing anything which shouldn't be part of the training. Strangely, users are also told to remove the markup added by CRM114 itself - something which, one would think, could be handled automatically.
Your editor tested the 20060118 "BlameTheReavers" release of CRM114. The first test was done without training, as recommended; then, just to be stubborn, a test was run with a pre-trained filter.
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
CRM114 | 1 | 1 | 0.06 | 1 | 1 | 0.06 | 3 | 2 | 0.06 | 4 | 6 | 0.06 | 5 | 6 | 0.07 | 24 |
CRM pretrain | 6 | 2 | 0.07 | 1 | 2 | 0.07 | 5 | 2 | 0.07 | 1 | 2 | 0.07 | 1 | 6 | 0.07 | 24 |
Some things jump out immediately from those numbers. CRM114 is quite fast. It is also quite effective very early on; the first 3000 messages were processed with exactly one false positive and one false negative - starting with an untrained filter. On the other hand, its performance appears to worsen over time, and, in particular, the false positive rate grows in a discouraging way. The false positives varied from vaguely spam-like messages (Netflix updates, for example) to things like kernel patches. Your editor concludes that CRM114 operates as a very aggressive and quick-learning filter, but that it is also relatively unstable.
DSPAM
DSPAM is a GPL-licensed filter written in C. It is clearly aimed at large installations - places with dedicated administrators and, possibly, relatively unsophisticated users. As a result, it has a few features not found in other systems. For example, it has a web interface with statistics, facilities to allow users to manage filter training, and "pretty graphs to dazzle CEOs." Users who don't want to train the filter through a web page can forward mistakes to a special address instead.
There are several ways to hook DSPAM into the mail system, including a command-line filter, a POP proxy, and an SMTP front-end which can be put between the net and the mail delivery agent. There are several choices of backend storage (SQLite, Berkeley DB, PostgreSQL, MySQL, Oracle, and more), and a number of different filtering techniques. The filter can also run in a client-server mode, much like SpamAssassin.
DSPAM is also a package with a dual-license option; companies interested in shipping the software without providing source can purchase a separate license from the developer.
The system is intended to require relatively little maintenance. It has a set of tools, meant to be run from cron, which handle much of the routine housekeeping. Among other things, DSPAM will automatically trim its word list - getting rid of terms which have not been seen for a while and which have little influence on message scoring.
Initial training can be performed using the dspam_train utility; it uses a train-on-errors approach. Thereafter, DSPAM offers several training modes. The recommended "teft" mode trains on every message passing through the system. There is a train-on-errors mode, and a "tum" mode ("train until mature") which emphasizes relatively new and uncommon words. Your editor ran DSPAM (in the standalone, command-line mode) using all three training schemes, with the following results:
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
DSPAM teft | 17 | 0 | 0.1 | 17 | 0 | 0.1 | 11 | 0 | 0.1 | 3 | 0 | 0.1 | 7 | 0 | 0.1 | 305 |
DSPAM toe | 23 | 0 | 0.1 | 21 | 3 | 0.1 | 12 | 0 | 0.1 | 3 | 1 | 0.1 | 8 | 4 | 0.1 | 276 |
DSPAM tum | 26 | 0 | 0.1 | 23 | 0 | 0.1 | 12 | 0 | 0.1 | 7 | 0 | 0.1 | 15 | 0 | 0.1 | 305 |
So DSPAM shows strong spam detection in all three modes with a mid-range execution time; it is much slower than bogofilter, but much faster than some of the alternatives. The comprehensive training mode would appear to be the most effective; the TUM mode increases the false negative rate slightly, and the TOE mode introduces false positives. Note that the DSPAM database is quite large; to a great extent, this volume is taken up by a directory full of message hashes used to keep track of which messages have been used to train the filter.
SpamAssassin
SpamAssassin, which is written in Perl, is unique among the filters tested in that it combines a bayesian filter with a large set of heuristic scoring rules. The filter, in essence, is just another rule which gets mixed in with the rest. The rule database takes a great deal of effort (on the part of the SpamAssassin developers) to maintain, and testing messages against all of those rules makes SpamAssassin relatively slow. There is a huge advantage to this approach, however: SpamAssassin works well starting with the first message it sees, and it is able to train its own bayesian filter using the results from the rules.
Another nice feature in SpamAssassin is its word list maintenance. Most bayesian filters seem to grow their word lists without bound. Since spam can contain a great deal of random nonsense (actually, much of your editor's ham does as well), the word list can quickly fill up with tokens which are highly unlikely to ever help in classifying messages. Documentation for some other filters suggests occasionally dumping the word list and starting over. SpamAssassin, instead, will occasionally (and automatically) delete tokens which have not been seen for some time. So the word list stays within bounds. In general, SpamAssassin is relatively good at minimizing the need for the user to perform maintenance tasks.
The sa-learn utility is used for most bayesian filter tasks. It can retrain the filter on mistakes, dump out word information, and force cleanup operations. SpamAssassin can be run in a client/server mode, which improves performance on busy systems. The client/server mode can also help to put a bound on SpamAssassin's memory use, which can be a little frightening. Standalone SpamAssassin on a small-memory system can create severe thrashing.
Your editor ran two sets of tests with SpamAssassin 3.1.0, running in the client/server mode, with network blacklist tests enabled. (Before somebody asks: the test was run on a standalone system to avoid any possible contamination by your editor's regular mail stream). Exactly one scoring tweak was made: the score for BAYES_99 (invoked when the bayesian filter is absolutely sure that the message is spam) was set to 5.0, enabling the filter to condemn messages on its own. That change helps to emphasize the bayesian side of SpamAssassin, and, in your editor's experience, makes it more effective. The first test involved a pre-trained database, as was done with the other filters. The second test, instead, started with an empty bayesian database in an effort to see how well the tool trains itself. Here's the results:
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
SpamAssassin | 8 | 0 | 1.1 | 3 | 0 | 1.1 | 5 | 0 | 1.1 | 3 | 0 | 1.0 | 2 | 0 | 1.0 | 10 |
SA untrained | 32 | 0 | 0.6 | 9 | 0 | 1.0 | 18 | 0 | 1.0 | 15 | 0 | 1.0 | 7 | 0 | 1.0 | 10 |
The results here show that SpamAssassin filters up to 99.9% of incoming spam, at the cost of significant amounts of CPU time. The untrained run shows higher error rates, but does eventually converge on something similar to the pre-trained version. But, at over one second per message, each testing run (comprising 15,000 messages) took a rather long time.
SpamAssassin operates as a filter, adding a header to messages as they pass through. That header can be used in procmail recipes; the thunderbird mail agent is also set up to optionally use the SpamAssassin header.
SpamBayes
SpamBayes is a filter written in Python. The SpamBayes hackers have, perhaps more than some of the other filter developers, made tweaks to the bayesian algorithm in an attempt to improve performance. Those hackers have also put more effort into mail system integration than some; as a result, SpamBayes comes with an Outlook plugin, POP and IMAP proxy servers, and a filter for Lotus Notes. It is still possible to use SpamBayes as a command-line filter with procmail, however.There is a separate script (sb_mboxtrain.py) which is used to train the filter. Your editor followed the instructions and found it seemingly easy to use - it nicely understands things like MH and Maildir folders. However, when used as documented, sb_mboxtrain.py happily (and silently) puts the resulting word database in an undisclosed location, and filtering works poorly. Adding a few options to make the database location explicit took care of the problem.
SpamBayes 1.0.4 was tested in two modes: retraining just on errors, and training on all messages.
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
SpamBayes | 71 | 0 | 0.4 | 44 | 0 | 0.4 | 29 | 0 | 0.4 | 21 | 0 | 0.4 | 20 | 1 | 0.4 | 4 |
SB train all | 90 | 0 | 0.8 | 58 | 0 | 0.8 | 54 | 0 | 0.8 | 46 | 0 | 0.9 | 46 | 0 | 0.9 | 16 |
SpamBayes takes a while to truly train itself, but it does eventually get to a 98.9% filtering rate - better than some, but not truly amazing. The word database remains relatively small, but processing time is significant - especially if comprehensive training is used. Everything gets worse with comprehensive training, however - the spam detection rate drops while processing time increases. SpamBayes is able to avoid false positives in both modes, however.
SpamProbe
SpamProbe is a filter written in C++ and released under the Q Public License. Unlike most filters, which record statistics on individual words, SpamProbe is also able to track pairs of words (DSPAM can do that too). SpamProbe looks at text attachments, discarding other types of attachments with one exception: there is a simple parser for GIF images. This parser creates various words describing images in a message (based on sizes, color tables, GIF extensions, etc.) and uses them in evaluating each message.
SpamProbe is packaged as a single command with a vast number of options. There is an "auto-train" mode for getting the filter trained in the first place. There are two filtering modes which the author calls "train" and "receive." Both will filter the message; the "train" mode only updates the word database "if there was insufficient confidence in the message's score," while "receive" always updates the database. The author recommends "train" mode; your editor tested SpamProbe 1.4a in both modes:
Batch: | 1 | 2 | 3 | 4 | 5 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Fn | Fp | T | Size | |
SpamProbe train | 80 | 0 | 0.2 | 39 | 1 | 0.1 | 37 | 1 | 0.1 | 39 | 1 | 0.1 | 27 | 0 | 0.1 | 81 |
SP receive | 90 | 0 | 0.6 | 51 | 2 | 0.6 | 39 | 1 | 0.6 | 42 | 0 | 0.7 | 35 | 1 | 0.9 | 201 |
SpamProbe's "receive" mode demonstrates that, with bayesian filters, more training is not always better. The added training slows down processing significantly, to the point that SpamProbe is almost as slow as SpamAssassin, but the end results are worse than those obtained without comprehensive training. SpamProbe has a significant false positive rate in either mode, but the "receive" mode makes it worse. In either mode, SpamProbe generates vast amounts of disk traffic, rather more than was observed with the other filters.
Unlike most other filters, SpamProbe does not insert a header in filtered mail. Instead, it emits a single line giving its verdict; the author then suggests using a tool like formail to create a header using that score. So integration of SpamProbe is a little harder than with some other tools.
Summary
Here is a summary table combining all of the filter runs described above:
Test False neg. False pos. Time Size bogofilter 406 5.5% 0.02 5 bogofilter -u 268 3.0% 0.06 32 CRM114 14 0.1% 16 0.3% 0.06 24 CRM114 pretrain 14 0.2% 15 0.3% 0.06 24 DSPAM teft 50 0.6% 0.1 305 DSPAM toe 67 0.7% 15 0.3% 0.1 276 DSPAM tum 83 0.9% 0.1 305 SpamAssassin 21 0.2% 1.1 10 SpamAssassin untrained 81 0.9% 0.9 10 SpamBayes 185 2.1% 1 0.02% 0.4 4 SpamBayes train all 294 3.3% 0.8 16 SpamProbe train 222 2.5% 3 0.05% 0.1 81 SpamProbe receive 257 2.9% 4 0.07% 0.7 201
In the above table, the "false positives" columns were left blank for tests in which there were none. Since false positives will be the bane of any spam filter, it is good if they stand out.
One should, of course, take all of the above figures with a substantial grain of salt. They reflect performance on your editor's particular mail stream; things could be very different with somebody else's mail. Still, your editor's mail stream is varied enough that, perhaps, a few conclusions can be drawn.
One of those would be that SpamAssassin is still hard to beat. It is, by far, the slowest of the filters, but it is highly effective with a minimum amount of required setup and maintenance on the user's part. For the most part, it Just Works, and it works quite well. In situations where an administrator is setting things up for a large group of users, DSPAM may well be indicated. The broad flexibility of that tool make it easy to integrate into just about any mail system, and the web interface makes its operation relatively transparent to users. Just be sure you have a big disk for its databases.
CRM114 is an interesting project; its combination of technologies has the potential to make it the most accurate of all the filters. It has the look of a hardcore power tool. This tool, however, is not ready for prime time at this point. It is a major hassle to set up, and, for your editor at least, keeping the filter stable was a challenge. The other three filters all have their strong points, but none of them had the level of spam detection that your editor would like to see.
There are, of course, other filters out there as well. Some of the graphical mail clients have started to integrate their own filters. There is a great convenience in having a "junk" button handy, but the integrated filters sacrifice transparency and, in your editor's (admittedly limited) experience, they do not seem to develop the same level of accuracy. There is also ifile, which is intended to be a more general mail classifier. That tool is no longer under development, however.
In the end, none of the filters reviewed is perfect - it would be nice to see no spam at all. But some of them are surprisingly close. Think back, for a minute, to the days when were complaining about getting a dozen spams per day - or per week. Who would have thought that we would be able to cope with thousands of spams per day and still deal with our mail? The developers of these filters have, in a significant way, saved the net, and your editor thanks them.
Parallel universes: open access and open source
The growing success of free software has led to a widening of the culture clash between "open" and "closed" to include other domains. One recent skirmish, for example, concerned a particularly important kind of digital code the sequence of the human genome and whether it would be proprietary, owned by companies like Celera, or freely available. Openness prevailed, but in another arena scholarly publishing advocates of free (as in both beer and freedom) online access to research papers are still fighting the battles that open source won years ago. At stake is nothing less than control of academia's treasure-house of knowledge.The parallels between this movement - what has come to be known as open access and open source are striking. For both, the ultimate wellspring is the Internet, and the new economics of sharing that it enabled. Just as the early code for the Internet was a kind of proto-open source, so the early documentation the RFCs offered an example of proto-open access. And for both their practitioners, it is recognition not recompense that drives them to participate.
Like all great movements, open access has its visionary the RMS figure - who constantly evangelizes the core ideas and ideals. In 1976, the Hungarian-born cognitive scientist Stevan Harnad founded a scholarly print journal that offered what he called open peer commentary, using an approach remarkably close to the open source development process. The problem, of course, was that the print medium was unsuited to this kind of interactive development, so in 1989 he launched a Usenet/Bitnet magazine called Psycoloquy, where the feedback process of the open peer commentary could take place in hours rather than weeks. Routine today, but revolutionary for scholarly studies back then.
Harnad has long had an ambitious vision of a new kind of scholarly sharing (rather as RMS does with code): one of his early papers is entitled Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge, while a later one is called bluntly: A Subversive Proposal for Electronic Publishing. Meanwhile, the aims of the person who could be considered open access's Linus to Harnad's RMS, Paul Ginsparg, a professor of physics, computing and information science at Cornell University, were more modest.
At the beginning of the 1990s, Ginsparg wanted a quick and dirty solution to the problem of putting high-energy physics preprints (early versions of papers) online. As it turns out, he set up what became the arXiv.org preprint repository on 16 August, 1991 nine days before Linus made his fateful I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones posting. But Ginsparg's links with the free software world go back much further.
Ginsparg was already familiar with the GNU manifesto in 1985, and, through his brother, an MIT undergraduate, even knew of Stallman in the 1970s. Although arXiv.org only switched to GNU/Linux in 1997, it has been using Perl since 1994, and Apache since it came into existence. One of Apache's founders, Rob Hartill, worked for Ginsparg at the Los Alamos National Laboratory, where arXiv.org was first set up (as an FTP/email server at xxx.lanl.org). Other open source programs crucial to arXiv.org include TeX, GhostScript and MySQL.
In 1994, Harnad espoused the idea of self-archiving in his Subversive Proposal, whereby academics put a copy of their papers online locally (originally on FTP servers) as well as publishing them in hardcopy journals. The spread of repositories soon led to interoperability issues. The 1999 Open Archives Initiative (in which Ginsparg was a leading figure) aimed to deal with this by defining a standard way of exposing an article's metadata so that it could be harvested efficiently by search engines.
Beyond self-archiving - later termed green open access by Harnad lies publishing in fully open online journals (gold open access). The first open access magazine publisher, BioMed Central a kind of Red Hat of the field appeared in 1999. In 2001 the Public Library of Science (PLoS) was launched; PLoS is a major publishing initiative inspired by the examples of arXiv.org, the public genomics databases and open source software, and which was funded by the Gordon and Betty Moore Foundation (to the tune of $9 million over five years).
Just as free software gained the alternative name open source at the Freeware Summit in 1998, so free open scholarship (FOS), as it was called until then by the main newsletter that covered it - written by Peter Suber, professor of philosophy at Earlham College - was renamed open access as part of the Budapest Open Access Initiative in December 2001. Suber's newsletter turned into Open Access News and became one of the earliest blogs; it remains the definitive record of the open access movement, and Suber has become its semi-official chronicler (the Eric Raymond of open access - without the guns).
After the Budapest meeting (funded by speculator-turned-philanthropist George Soros, who played the role taken by Tim O'Reilly at the Freeware Summit), several other major declarations in support of open access were made, notably those at Bethesda and Berlin (both 2003). Big research institutions started actively supporting open access rather as big companies like IBM and HP did with open source earlier. Key early backers were the Howard Hughes Medical Institute (2002) in the US and the Wellcome Trust (2003) in the UK, the largest private funders of medical research in their respective countries.
Both agreed to pay the page charges that gold open access titles need in order to provide the content free to readers typically $1000 per article. This is not as onerous as it sounds: the annual subscription for a traditional scientific journal can run to $20,000 (even though the authors of the papers receive nothing for their work). For a major research institution, the cumulative cost adds up to millions of dollars a year in subscriptions. This annual tax is very like the licensing fees in the proprietary software world. What an institution saves by refusing to pay these exorbitant subscriptions as the libraries at Cornell, Duke, Harvard and Stanford Universities have done in the US it can use to fund page charges, just as companies can use monies saved on software licensing costs to pay for the support and customization they need.
With all this activity, governments started getting interested in open access, and so did the big publishers, worried by the potential loss of revenue (the Microsoft of the scientific publishing world, the Anglo-Dutch company Elsevier, has had operating profits of over 30%). The UK House of Commons Science and Technology committee published a lengthy report recommending obligatory open access for publicly-funded research: it was ignored by the UK government because of pressure from British publishing houses. In 2004, the US NIH issued a draft of its own plans for open access support and was forced to water them down because of fierce lobbying from science publishers.
Given the many similarities between the respective aims of open source and open access, it is hardly surprising that there are direct links between them. In 2002, MIT released its DSpace digital repository application under a BSD license, while Eprints, the main archiving software used for creating institutional repositories, went open source under the GPL. As the latter's documentation proudly proclaims:
There is a commercial, supported version too. Open Journal Systems is another journal management and publishing system released under the GPL.
As the mainstream open source projects mature, the applications used by the open access movement could well prove increasingly attractive to coders who are looking for a challenge and an area where they can make a significant contribution not just to free software, but also to widening free access to knowledge itself.
Glyn Moody writes about open source and open access at opendotdotdot.
Security
A new Linux worm
A Linux worm (called "Mare.D" by some) that exploits an old PHP XML-RPC vulnerability has been sighted in the wild and was reported on Sunday to the full-disclosure mailing list. An update later in the day makes it clear that this is a new attack, based on an earlier worm, kaiten, and attempts to connect infected systems to a botnet.
The attack starts with a crafted XML-RPC request targeted at Wordpress, Drupal, phpBB and other content management systems that were known to be vulnerable in June 2005, when this problem was first reported. The request contains code which will be executed by PHP; this code, in turn, retrieves another script from a (now defunct) server and executes it. The second script then retrieves yet another pair of executables from the server; these are the main payload of the attack.
The first of these programs is the 'spreader' which attempts to find other vulnerable hosts and infect them. The other program, instead, connects to an IRC server which functions as the 'command and control' (C&C) element for a botnet. The irc server would instruct the client to download yet another program which opens a backdoor shell when executed. It is unknown what else the attacker planned with the bots as the C&C server has been shut down.
It is interesting to note that this worm does not compromise root and does not gain complete control of the host, but it does provide enough privileges that makes it attractive for a botnet. The exploit will allow the attacker to run with the permissions of the user who owns the httpd process (typically 'apache' or 'httpd') which is sufficient to perform the two most likely bot tasks: spamming and distributed denial of service. On the flipside, because it did not gain root privileges, it cannot do very much to hide itself and it should be very easy to detect on an infected system.
Overall, the impact of this attack is relatively small thanks, in part, to fast action to shut down the servers providing the scripts and controlling the botnet. But it seems likely that the backdoor shell is running on some hosts which got an "execute" command for that script before the servers were terminated. Another possibility is that there are different versions of the attack floating around, using different server addresses; those servers may still be running.
As is the case for many malware attacks, this would only affect systems that did not have up-to-date software. Eight months seems like enough time to update affected systems, so the fact that there are still vulnerable systems out there is a sad testament to how little attention is paid to security by some, probably many, Linux system administrators.
More information about this exploit can be found in the Shadowserver article and updates on this attack are being posted to the Securiteam blog.
New vulnerabilities
bluez-hcidump: buffer overflow
Package(s): | bluez-hcidump | CVE #(s): | CVE-2006-0670 | ||||||||||||
Created: | February 18, 2006 | Updated: | March 10, 2006 | ||||||||||||
Description: | A buffer overflow in l2cap.c in hcidump allows remote attackers to cause a denial of service (crash) through a wireless Bluetooth connection via a malformed Logical Link Control and Adaptation Protocol (L2CAP) packet. | ||||||||||||||
Alerts: |
|
BomberClone: remote execution of arbitrary code
Package(s): | bomberclone | CVE #(s): | CVE-2006-0460 | ||||||||
Created: | February 17, 2006 | Updated: | March 14, 2006 | ||||||||
Description: | Stefan Cornelius of the Gentoo Security team discovered multiple missing buffer checks in BomberClone's code. By sending overly long error messages to the game via network, a remote attacker may exploit buffer overflows to execute arbitrary code with the rights of the user running BomberClone. | ||||||||||
Alerts: |
|
CASA: buffer overflow
Package(s): | CASA | CVE #(s): | CVE-2006-0736 | ||||
Created: | February 22, 2006 | Updated: | February 22, 2006 | ||||
Description: | The pam_micasa module from the CASA authentication system suffers from a remotely exploitable buffer overflow. "Since this module is added to /etc/pam.d/sshd automatically on installation of CASA it was possible for remote attackers to gain root access to any machine with CASA installed." If you are using CASA, fixing this one in a hurry would be a good idea. | ||||||
Alerts: |
|
gnupg: false positive signature verification
Package(s): | gnupg | CVE #(s): | CVE-2006-0455 | ||||||||||||||||||||||||||||||||||||||||||||
Created: | February 17, 2006 | Updated: | March 10, 2006 | ||||||||||||||||||||||||||||||||||||||||||||
Description: | Tavis Ormandy noticed that gnupg, the GNU privacy guard - a free PGP replacement, verifies external signatures of files successfully even though they don't contain a signature at all. See this update from the gnuPG team for more information. | ||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
heimdal: remote denial of service
Package(s): | heimdal | CVE #(s): | CVE-2006-0677 | ||||||||
Created: | February 17, 2006 | Updated: | February 24, 2006 | ||||||||
Description: | A remote Denial of Service vulnerability was discovered in the heimdal implementation of the telnet daemon. A remote attacker could force the server to crash due to a NULL de-reference before the user logged in, resulting in inetd turning telnetd off because it forked too fast. | ||||||||||
Alerts: |
|
metamail: buffer overflow
Package(s): | metamail | CVE #(s): | CVE-2006-0709 | ||||||||||||||||
Created: | February 21, 2006 | Updated: | March 17, 2006 | ||||||||||||||||
Description: | A buffer overflow bug was found in the way Metamail processes certain mail messages. An attacker could create a carefully-crafted message such that when it is opened by a victim and parsed through Metamail, it runs arbitrary code as the victim. | ||||||||||||||||||
Alerts: |
|
tar: buffer overflow
Package(s): | tar | CVE #(s): | CVE-2006-0300 | ||||||||||||||||||||||||||||||||
Created: | February 22, 2006 | Updated: | April 10, 2006 | ||||||||||||||||||||||||||||||||
Description: | A buffer overflow (exploitable via a carefully-crafted archive file) has been discovered in GNU tar, versions 1.14 and above. | ||||||||||||||||||||||||||||||||||
Alerts: |
|
tin: buffer overflow
Package(s): | tin | CVE #(s): | CVE-2006-0804 | ||||||||
Created: | February 19, 2006 | Updated: | November 24, 2006 | ||||||||
Description: | An allocation off-by-one bug exists in the TIN news reader version 1.8.0 and earlier which can lead to a buffer overflow. | ||||||||||
Alerts: |
|
tutos: SQL injection and cross-site scripting
Package(s): | tutos | CVE #(s): | CVE-2004-2161 CVE-2004-2162 | ||||
Created: | February 22, 2006 | Updated: | February 22, 2006 | ||||
Description: | The tutos groupware package has (old) SQL injection and cross-site scripting vulnerabilities. | ||||||
Alerts: |
|
Page editor: Jonathan Corbet
Kernel development
Brief items
Kernel release status
The current 2.6 prepatch is 2.6.16-rc4, announced by Linus on February 17. Things are settling down, and this prepatch contains "only" 100 fixes or so, many concentrated in the SCSI subsystem. Details can be found in the long-format changelog.As of this writing, the mainline git repository contains about 75 post-rc4 patches, including one reverting a change which broke systems running non-current versions of HAL (see below).
The current -mm tree is 2.6.16-rc4-mm1. Recent changes to -mm include the addition of Al Viro's "bird" tree, a big x86-64 update, some memory management tweaks, some software suspend patches, a big "generic bit operations" patch set, and the lightweight robust futex patch.
For 2.4 users, Marcelo has released the second 2.4.33 prepatch with several fixes, some of which are security-related.
Kernel development news
Quote of the week
-- Jens Axboe gets tired of the cdrecord discussion (going strong into its second month).
The kevent interface
The Linux asynchronous I/O implementation is notoriously incomplete; among the many things on the "to do" list is asynchronous network I/O. Network writes are already, to some extent, asynchronous, but only if the kernel is able to copy user data into a kernel buffer. The current interface cannot be simultaneously zero-copy and asynchronous. There is also no way to set up asynchronous, zero-copy reads. Evgeniy Polyakov has recently posted a patch which tries to fill that gap - and quite a bit more besides - through the addition of three new system calls and a completely new kernel event subsystem.Evgeniy's patch adds a new "kevent" type. The kernel can generate and report kevents for a number of possible situations, including:
- The arrival of network data or connections.
- Any situation which can be reported by the poll() system call.
- Events which can be returned by inotify(), such as the creation or removal of files.
- Network asynchronous I/O events.
- Timer events.
All of this becomes possible through the addition of a complex system call:
struct kevent_user_control { unsigned int cmd; unsigned int num; unsigned int timeout; }; long kevent_ctl(int fd, struct kevent_user_control ctl);
The file descriptor argument to kevent_ctl() has little to do with any requested events; it is, instead, mostly used as a place for the kevent subsystem to stash some of its own housekeeping information. That file descriptor must be allocated, however, with a call like:
ctl.cmd = KEVENT_CTL_INIT; int kevent_fd = kevent_ctl(0, &ctl);
The returned file descriptor can be used to add, remove, modify, and wait for events. Event requests are passed from user space in a structure like:
struct kevent_id { __u32 raw[2]; }; struct ukevent { struct kevent_id id; __u32 type; __u32 event; __u32 req_flags; /* ... */ };
Here, the embedded id structure usually holds a file descriptor number for which associated events are desired. For timer events, instead, it holds the timeout period. The type and event fields describe what sorts of events are desired; type can be one of: KEVENT_SOCKET (data and/or connections on sockets), KEVENT_INODE (file creation and removal), KEVENT_POLL (any poll() event), KEVENT_TIMER (timer events), or KEVENT_NAIO (network asynchronous I/O). The event field is a bitmask which depends on type; as an example, for inode events, it can contain KEVENT_INODE_CREATE and/or KEVENT_INODE_REMOVE. The main thing seen in req_flags is KEVENT_REQ_ONESHOT, indicating that only one event should be returned.
The attentive reader may have noticed that the kevent_ctl() interface has no parameter for the ukevent structure. Instead, the user-space process is expected to place one or more ukevent structures immediately after the kevent_user_control structure in memory, and to set the num field to how many of those structures are present. A process interested in events should create this set of structures and pass them to kevent_ctl() with a cmd value of KEVENT_CTL_ADD. After that, the kernel will start generating events at the appropriate times. Other possible cmd values are KEVENT_CTL_REMOVE and KEVENT_CTL_MODIFY, which have the obvious effect.
The final supported command is KEVENT_CTL_WAIT, which will wait for the number of events specified in the num field. An optional timeout value can also be provided. The returned events will, once again, go into memory just after the kevent_user_control structure. It is also possible to pass the kevent file descriptor to poll() or select().
Extending this mechanism to asynchronous network I/O requires the addition of two more system calls:
long aio_send(int kevent_fd, int socket_fd, void *buffer, size_t size, unsigned flags); long aio_recv(int kevent_fd, int socket_fd, void *buffer, size_t size, unsigned flags);
Either one of these calls will put together and enqueue a special kevent request on the given kevent_fd file descriptor. The I/O will remain outstanding; once it completes, the associated event will be returned to the process. Until the completion event, the buffer should not be touched. There is also a provision for an aio_sendfile() system call, though it has not yet been implemented.
At the lower levels, enabling asynchronous I/O for a protocol requires the addition of two new methods to the proto structure:
int (*async_recv) (struct sock *sk, void *dst, size_t size); int (*async_send) (struct sock *sk, struct page **pages, unsigned int poffset, size_t size);
In Evgeniy's patch, only the TCP protocol has been extended in this manner.
There has been very little discussion of this patch on the netdev mailing list (where it was posted). Your editor suspects that, while the functionality provided by the patch is welcome, the user-space interface, perhaps, needs a little bit of work before it will be ready for inclusion into the mainline kernel.
Sysfs and a stable kernel ABI
Some things are fairly predictable. There is a long list of regressions in the 2.6.16 kernel, and some of those do not appear to be getting a whole lot of developer attention. But when one of those bugs causes a developer's iPod to stop working with Linux, it will get fixed in a timely manner. This time around, it also set off a discussion on what it really means to have a stable application interface to the kernel.Back in the dim and distant past (last year), the "user events" mechanism was added to the kernel. One of the first events to be implemented was block device mount and unmount operations. Over time, however, it was concluded that user events were not the right way to communicate this information. So a new interface - allowing interested user-space processes to call poll() on /proc/mounts - was added to the kernel. Then, a patch was merged for 2.6.16 which removes the mount and unmount events.
When Pekka Enberg (the iPod user) fingered this patch as the cause of the
problem, the author of that patch (Key Sievers) responded: "Upgrade
HAL, it's too old for that kernel.
" This response didn't sit well with Andrew Morton:
We. Don't. Do. That.
Linus, too, was unimpressed:
For now, the issue has been resolved by reverting the patch in question. The feature removal schedule has been updated to note that the mount and unmount events will disappear in February of 2007. iPod owners can rest easy for now.
But this episode drives home a point which is worth noting. Longstanding kernel policy has been that, while kernel internals can change at any time, the user-space interface must remain absolutely stable. Even when an interface turns out to have been badly designed, it must continue to work. Interfaces can be augmented or superseded, but they cannot be broken.
Not that long ago, the kernel ABI consisted entirely of the system call interface and a few files in /proc. While regressions were not unknown, the fact is that keeping a couple hundred system calls in a stable state is a relatively straightforward task. People notice when a system call interface is changed. In more recent times, the interface to the kernel has gotten much wider; it includes several netlink-based protocols and a number of kernel-based virtual filesystems like configfs and sysfs. It can be easy for kernel developers to lose track of the fact that, when they work on one of those interfaces, they risk breaking the user-space ABI. And it can be easy for changes which change the user-space interface to slip past the review process.
This risk is especially acute with sysfs. The directory tree exported via sysfs matches, in a very close way, the data structures maintained within the kernel. Every sysfs directory corresponds to a kobject embedded within some kernel structure, and every sysfs attribute is tied, somehow, to an attribute of the associated structure within the kernel. There are some advantages to this arrangement; sysfs has become a clear window into the organization of the system as seen by the kernel. And, because sysfs is so closely tied to the kernel's data structures, most developers need not even think about it. When a new type of device, for example, is added to the kernel, the associated sysfs entries will generally just happen by themselves.
But every entry in sysfs - 3400 attributes in 1175 directories on your editor's relatively simple system - is part of the kernel ABI. That's 3400 attributes tied to 1175 kernel internal data structures which cannot be changed without the risk of breaking user-space code. Sysfs has evolved into a highly complex - and, to a great extent, undocumented - binary interface to the kernel. In the short term, that makes sysfs susceptible to inadvertent regressions as developers make changes without thinking about the possible user-space effects.
In the longer term, a different problem might arise. The kernel developers have always been willing to make incompatible changes to the internal API if the end result is a better, more capable, or safer interface. This freedom to change things is widely exploited; see the LWN 2.6 API changes page to see just how widely. As kernel data structures get tied into sysfs, however, they become part of an ABI which cannot be broken. In a few years, the kernel hackers may find themselves in the position of wanting to make significant internal structural changes, only to be thwarted by the inability to change the associated sysfs structure. At that point, the choice be to either (1) not make the changes, or (2) interpose some sort of compatibility translation layer between sysfs and the kernel structures it represents. Neither looks like a whole lot of fun.
Wasabi white paper on kernel modules
The folks at Wasabi Systems have published a white paper on the legal status of loadable kernel modules. "As attorneys ourselves, we cannot find a coherent legal argument for excluding LKMs from [GPL] coverage. So why does the Free Software Foundation tolerate them? Because of its dual interests. On the one hand, it seeks to enforce the GPL. On the other hand, it seeks to promote the use of free software such as Linux." Or, perhaps, because the FSF has little copyright interest in the kernel.
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Miscellaneous
Page editor: Jonathan Corbet
Distributions
News and Editorials
Creating a Live CD with Kadischi
Last week Ladislav Bodnar covered the Linux-Live method of creating custom Slackware-based live CDs. This week I thought I'd look at a much newer project, Kadischi, to create a Fedora-based live CD. Kadischi is still in early development and unfortunately I was unable to create a working CD in the brief time that I spent working with it.Kadischi is well documented with translations available in French, Netherlands and Swedish. For the most part I found myself copying commands straight from the documentation into a root terminal.
I started out on Tuesday evening, after finishing up LWN's daily updates, by booting up a box with Fedora Core 4 previously installed. Soon I had the Kadischi documentation in a web browser and a terminal su'ed to root.
Step one is to make sure you have all the required packages. So I did yum install and ran into a minor hiccup. Red Hat, including fedora.redhat.com, was down for maintenance. This didn't affect the wiki site, or any repositories, but I couldn't get to http://fedora.redhat.com/download/mirrors/, where yum was looking for a good mirror to use. After editing a few repo files to look for a particular repository instead of mirror list I was able grab the packages I needed. The anaconda package requires a couple of patches to enable the --livecd option. The documentation told me what to type and soon had mine patched.
Step two is get the Kadischi code, which is currently only available from CVS. Once again the commands you need are in the documentation, ready to paste or type into a terminal, and I had my own version of Kadischi. During the 'make install' I noticed a few complaints about undefined macros, and I didn't really pay any attention to them. That was undoubtedly a mistake. Those who pay more attention to such details will, no doubt, fare better.
You can tell Kadischi to build your ISO image anywhere you like on your system. The default is /tmp. Then you enter the basic command:
kadischi path-to-the-repository path-to-the-iso-image
You can build your own repository or use an existing Fedora repository. You can also choose to run Anaconda interactively or automatically using kickstart files. I tried it with a pointer to the Fedora Core 4 mirror list (once that was available again) and told it to create /tmp/fedora-live.iso and found myself running Anaconda in my terminal. I chose a basic desktop install and let Anaconda do it's thing. Once Anaconda was done with it's part, Kadischi started running post-install scripts, and in theory after that you would have an ISO image ready to burn into a CD. Instead I ended up with:
/tmp/livecd-build_no3/system/lib/modules/None is not a directory.
*** Fatal error: /usr/local/share/kadischi/livecd-mkinitrd.sh returned non zero (256) exit code. Aborting execution.
Cleaning up temporary files...
Done.
Overall I thought it went pretty well for a first try of a beta product. Most people with a little software experience, especially if they are more motivated to actually create live CDs, should not have a problem getting Kadischi to build and run. The use of custom repositories and kickstart scripts make Kadischi highly flexible allowing for the creation of highly customized Fedora CDs.
New Releases
Announcing Fedora Core 5 Test 3
The Fedora Project has announced the third release of the Fedora Core 5 development cycle, available for the i386, x86_64, and PPC/PPC64 architectures. "Beware that Test releases are recommended only for Linux experts/enthusiasts or for technology evaluation, as many parts are likely to be broken and the rate of change is rapid." According to the current schedule the final release is due on March 15.
Openwall GNU/*/Linux (Owl) 2.0 released
Owl 2.0, a security-enhanced distribution put together by Solar Designer and colleagues, is now available. "Owl 2.0 is built around Linux kernel 2.4.32-ow1, glibc 2.3.6 (with our security enhancements), gcc 3.4.5, and recent versions of over 100 other packages. It offers binary- and package-level compatibility for most packages intended for Red Hat Enterprise Linux 4 (RHEL4) and Fedora Core 3 (FC3), as well as for many FC4 packages."
SUSE Linux 10.1 Beta4 seeks adventurous experts and their *test* systems
The openSUSE Project has released a fourth beta of SUSE Linux 10.1. "Beta4 has a number of ROUGH edges, so read the following before you decide to download and test it. I advise to not put it on any production system!" That said, you can find out more about known bugs and a list of mirror sites in the announcement (click below).
Ubuntu Flight CD 4
The Ubuntu/Kubuntu/Edubuntu Flight CD 4 is ready for testing. This is the fourth snapshot of the current development version "Dapper Drake". While it is believed to be reasonably free of showstopper CD-build or installer bugs, it is not recommended for production systems. Edubuntu has its own announcement.
Distribution News
Ubuntu Dapper news
Xgl and compiz packages are available in universe for Dapper. "As noted before, these are highly experimental packages. If it crashes, this is unsurprising. Please do feel free to file bugs, but right now they'll probably just be forwarded upstream. Please do not be surprised if it doesn't work. If you're running binary drivers, things get even more complicated and there's a reasonable chance that things will fail to work in strange and unexpected ways."
Ben Collins has issued a call for testing
of the current Dapper kernel. "The main point of this testing (for
me at least) is to catch regressions from breezy. It is far more important
for us to make sure we don't lose users because of a non-upgrade issue,
than to fix a long standing superficial quirk.
"
Meanwhile, the feature freeze for Dapper
has been announced. "The feature freeze for Dapper begins this
Thursday, February 23rd. This means that feature goal development be
substantially complete. Features which are behind schedule may be granted
exceptions (for priority goals with a clear roadmap to completion) or
deferred to the next release.
"
Yellow Dog Linux v4.1 now available from the Terra Soft Store
Terra Soft Solutions has announced the availability of Yellow Dog Linux v4.1 at the Terra Soft on-line Store.New CMS for FedoraNEWS.ORG
A new CMS (Content Management System) for FedoraNEWS.ORG has been launched. This new site allows Fedora users to self-register and submit stories, it has RSS feeds for stories, calendars for Fedora events, and more.Debian mirror split, amd64 update
Anthony Towns has an update on the mirror split and on amd64.
New Distributions
Kaboot
Kaboot is a Gentoo-based Linux LiveCD/USB distribution. It's currently available in four flavors, Recovery, Lite, Science and Kaboot Komplete (a full-featured KDE desktop). (Found in this week's edition of the Gentoo Weekly Newsletter.
Distribution Newsletters
Fedora Weekly News Issue 34
This edition of the Fedora Weekly News covers Red Hat Magazine February 2006, Fedora Project Wiki Policy Change Update, OLPC (One Laptop Per Child) Base Software, Tools to roll your own distribution - Kadischi, Must-have Firefox and Thunderbird extensions, Google Windows apps coming to Linux, New CMS for FedoraNEWS.ORG, and more.Gentoo Weekly Newsletter
The Gentoo Weekly Newsletter for the week of February 20, 2006 covers the opening of FOSDEM next Saturday, Qmail request for comments, yet another Gentoo-based distribution, Gentoo on Intel Macs, and several other topics.DistroWatch Weekly, Issue 139
The DistroWatch Weekly for February 20, 2006 is out. "Mark Shuttleworth, the founder of Ubuntu Linux and one of the most prominent personalities of the Free Software world, is the focus of today's issue. The featured article is then followed by a news round-up quoting Mandriva's position on Xgl, discussing the current delays in the development of both SUSE Linux 10.1 and Fedora Core 5, revealing "Ebuntu", a new Ubuntu derivative with Enlightenment 17, and monitoring the career path of Daniel Robbins, the founder of Gentoo Linux. The issue concludes with the usual sections detailing the upcoming releases and new distributions."
Package updates
Fedora updates
Updates for Fedora Core 4: netpbm (make xwdtopnm work on x86_64), compat-db (bug fixes), kdebase (bug fixes), hplip (upgrade to 0.9.8), xterm (upgrade to upstream v208), kdemultimedia (bug fixes).Slackware updates
The Slackware change log entry for February 16 shows that everything using libreadline.so.4 has been recompiled, with the exception of AbiWord. "the --disable-gnome option no longer seems to work with abiword-2.4.2 -- it still demands libgnomeprint and all of its dependencies. Anyone know a way around this one? If not, AbiWord will likely be removed soon." There are lots of additional updates and new gnupg packages in testing. The next entry shows updates to dvd+rw-tools, bind and tin.
Newsletters and articles of interest
Design a custom network install image with Instalinux (Linux.com)
Linux.com looks at another method of creating a custom Linux system. "Former Hewlett-Packard employee Chris Slater has created Instalinux.com, a site based on HP's Linux Common Operating Environment (LinuxCOE) System Designer. Instalinux lets you build a custom Linux boot image and perform network installs quickly, especially when you have several machines with the same requirements. I tested Instalinux recently, with good results."
My desktop OS: Slackware 10.2 (NewsForge)
NewsForge has a quick look at Slackware 10.2. "Slackware installation and configuration requires some Linux knowledge. The distribution is not as user-friendly as other Linux packages. When making partitions you need to use fdisk or cfdisk. After installing the software on my laptop, I configured the kernel to activate ACPI and other important hardware by following documentation that you can find on the Internet about kernel compilation."
Distribution reviews
The evolution of Fedora Core Linux (LinuxNoob)
Niall C. Brady looks at Fedora Core. "I use Fedora core daily and I've used every final release of Fedora since Yarrow (Fedora Core Release 1). When I get time, I also look at some of the test releases to see how Fedora is changing, and if there's one thing certain about Fedora, it's change. I decided to write this article to hopefully give people a chance to learn a little bit more about Fedora since the first release came to life back in November 2003, how the distro has matured and what to expect for Fedora Core release 5 in mid-March 2006."
Page editor: Rebecca Sobol
Development
Urwid, a Console UI library for Python
Urwid is a terminal-based user interface library for the Python language that is reminiscent of the old Unix curses terminal control library. Urwid is used for implementing user interfaces that work with simple ASCII terminals.
The basic Urwid feature set includes:
- A list box mechanism with support for scrolling.
- An edit box for entering and modifying text.
- Pushbutton, check box, and radio button widgets.
- Simple character-style graphical capabilities.
- The ability to adapt to a dynamically resizeable terminal window.
- Support for capturing the screen.
- Support for embedded tables of widgets.
- Support for UTF-8, 8 bit ASCII, and other character encodings.
- Includes a text attribute markup language.
- Support for multiple text alignment and line wrapping modes.
- Support for user-defined text layout classes.
- Support for a web-based Apache/CGI display mode.
- Runs on Linux, OSX, Cygwin-based, and other systems.
There is also an online web_display module live demo site for those who wish to test Urwid in action.
Installation is quite simple, packages are available for Debian-based systems, and a simple setup.py script is provided for other platforms. A number of useful demo programs are included for testing and reference. Demos include a directory browser, a numerical calculator, a text editor, a test suite, a fibonacci generator and a package tour, see the screenshot and programming examples for some images. Your author was able to run all of the demos with no trouble.
Version 0.9.0 of Urwid
was released
this week:
"This is the first release of Urwid with UTF-8 input and display support.
A new raw_display module was added to enable UTF-8 display. This module
also fixes the "dead corner" in the bottom right of the screen and
improves legibility of bright text in some terminals.
"
Urwid is a useful tool for applications that need the portability offered by a text-only user interface, it fills a void between full-fledged GUI applications and a simple command line interface. Being Python-based, it is portable, easy to install, and simple to use.
System Applications
Database Software
PostgreSQL Weekly News
The February 19, 2006 edition of the PostgreSQL Weekly News is out with the latest PostgreSQL database articles.
Security
Sussen 0.15 released
Version 0.15 of Sussen is out with multiple enhancements. "Sussen is a tool that checks for vulnerabilities and configuration issues on computer systems. It is based on the Open Vulnerability and Assessment Language."
Telecom
Speex 1.1.12 released
Version 1.1.12 of Speex, an audio CODEC, is available. Changes include: the echo canceler has been converted to fixed-point, improvements have been to the experimental Vorbis-based masking model, and several bugs have been fixed.
Web Site Development
Midgard 1.7.4 released
Version 1.7.4 of the Midgard web development platform has been released. "Midgard's 1.7 branch is a major overhaul of the whole Content Management System. Besides the stable and mature Content Management features of first generation Midgard, it also ships a preview version of second generation Midgard capabilities, allowing developers to have a glimpse at the new day of Midgard2. 1.7.4 is maintenance and bugfix release."
mnoGoSearch 3.2.37 released
Version 3.2.37 of mnoGoSearch, a web site search engine, has been released with numerous bug fixes. See the Change Log file for release details.
Desktop Applications
Audio Applications
jack_capture 0.2.3 released
Version 0.2.3 of jack_capture, an application for copying JACK audio stream data to a file, is out with minor enhancements. Also, The initial release of das_watchdog has been announced.mp3splt-gtk 0.3 and libmp3splt 0.3 released (SourceForge)
Versions 0.3 of mp3splt-gtk and libmp3splt are out with various improvements. "mp3splt is a free utility to split mp3/ogg files (without decoding), selecting begin/end time; if file is an album, you can get splitpoints automatically from internet or a local cue/cddb file. It splits also Mp3Wrap and AlbumWrap archives."
Rivendell version 0.9.66 is out
Version 0.9.66 of Rivendell, a radio automation system, is out. Changes include CD ripper enhancements, new build targets, RDCatch enhancements and more.
Business Applications
RUNA WFE 2.0 RC4 is released (SourceForge)
Version 2.0 of RC4 RUNA WFE is available with new features and performance improvements. "RUNA WFE is a workflow/BPM environment for JBOSS JBPM engine (written in Java). It is a cross-platform end user solution for business process management. It provides rich web interface with tasklist, form player, graphical process designer, bots and more."
Desktop Environments
Accelerated Indirect GL X
Now there are two 3D-enhanced X servers available: some Red Hat hackers have released some code which they call Accelerated Indirect GL X, or AIGLX. "We have a lightly modified X server (that includes a couple of extensions), an updated Mesa package that adds some new protocol support and a version of metacity with a composite manager. The end result is that you can use GL effects on your desktop with very few changes, the ability to turn it on and off at will, and you don't have to replace your X server in the process." Much of this code will ship with Fedora Core 5; for the impatient, there are some packages available which will make AIGLX work on the just-announced FC5t3 release. The site includes the obligatory demo animations and a few digs at Novell's competing XGL work.
GARNOME 2.13.91 Released (GnomeDesktop)
Version 2.13.91 of GARNOME is available for testing. "We are pleased to announce the release of GARNOME 2.13.91 Desktop and Developer Platform. This release includes all of GNOME 2.13.91 (aka 2.14.0 Beta 2) plus a whole bunch of updates that were released after the GNOME freeze date."
GNOME 2.14.0 Beta 2 Released
Version 2.14.0 Beta 2 of GNOME has been announced. "We are pleased to announce the delicious release of tasty GNOME 2.14.0 Beta 2 (2.13.91). This is one of the last delicate releases in the delectable 2.13 development series and represents a toothsome release that is now API/ABI, feature, string and UI frozen. This means that we're pretty close to the succulent final 2.14.0 release. The delightful GNOME contributors are now busy fixing the most important nectareous bugs that are still out there, localizing the whole pleasant-tasting desktop or updating our scrumptious documentation."
A Look at GNOME 2.14
GNOME hacker Davyd Madeley has posted a look at the upcoming GNOME 2.14 release, with lots of highlights and screen shots. "One application that got a lot of attention is GNOME Terminal which can now display the entire contents of the dictionary on the screen literally in a second, or in under 2 seconds using antialiased fonts (using antialiased fonts it took xterm 1m 13s to do the same!)."
GNOME Software Announcements
The following new GNOME software has been announced this week:- Celestia 1.4.1 (new features and bug fixes)
- Eye of GNOME 2.13.91 (bug fixes and translation work)
- Fantasdic 1.0-beta1 (initial release of DICT client)
- Glom 0.9.6 (new features and bug fixes)
- GNOME User Documentation 2.13.1 (documentation work)
- GnuCash 1.9.1 (unstable testing release)
- Gossip 0.10 (new features, bug fixes and translation work)
- gparted 0.2.1 (new features, bug fixes and translation work)
- gst-plugins 0.10.1 (new features and bug fixes)
- gst-plugins 0.10.2 (bug fixes)
- kiwi 1.9.6 (new features and bug fixes)
- nautilus-python 0.4.3 (new features and bug fixes)
- Pango 1.11.6 (unstable development release)
KDE Software Announcements
No KDE software announcements were received this week, you can find other KDE software releases at kde-apps.org.
Electronics
Covered 20060218 released
Development version 20060218 of Covered, a Verilog code coverage analysis tool, is out. Here is a change summary: "A lot of work has gone into adding a lot more Verilog-2001 support, added Verilog-1995 support, GUI improvements/fixes, user documentation additions/updates, adding LXT dumpfile support and the usual bug fixes. I have also removed the diagnostic directory from the Covered tarball and am making it available as its own tarball since it is growing by leaps and bounds these days."
XCircuit 3.4.14 released
Stable version 3.4.14 of XCircuit, an electronic schematic drawing package, is out. Changes include new key bindings.Circuit Design on Your Linux Box Using gEDA (Linux Journal)
Linux Journal reviews gEDA, a collection of electronics tools. "A lot of attention-and hype-has focused on bringing traditional office-productivity programs, such as the OpenOffice.org suite, to Linux. However, another important-and far less-hyped-area where Linux's desktop abilities come to the fore is in engineering software, and in particular, CAD (computer-aided design). Non-engineers tend to think of the term CAD as referring to mechanical design software, and they are partially right. We are used to seeing complicated drawings of mechanical assemblies appearing on computer screens in advertising and television. However, CAD doesn't mean only mechanical design. Electronics designers also long have used computer-based design tools to help them perform their design tasks."
Financial Applications
KMyMoney 0.8.3 released (SourceForge)
Version 0.8.3 of KMyMoney is available. "KMyMoney is the Personal Finance Manager for KDE. It operates similar to MS-Money and Quicken, supports different account types, categorisation of expenses, QIF import/export, multiple currencies and initial online banking support. The KMyMoney team is pleased to announce the availability of Release 0.8.3. This is an update to our latest stable release, and contains several bug fixes and some improvements to the user interface."
Games
Goblin Exported to Ember (WorldForge)
The WorldForge game project continues to put the Blender animation suite to good use. The game Ember has a new animated goblin: "Exported the goblin out of Blender 2.3. I added a couple of animations, a run and a taunt. The plan for this model is to clean it up when I bring it into Blender 2.41. This is going to happen when the Cal3D or the OGRE scripts are fixed."
Interoperability
Wine 0.9.8 released
Version 0.9.8 of Wine has been announced. Changes include: Better Web browser support, Beginnings of a Wordpad application, Many richedit improvements, A number of Direct3D fixes, A few more options in winecfg and Lots of bug fixes.
Medical Applications
OpenEHR Release 1.0 published (LinuxMedNews)
LinuxMedNews covers the release of openEHR 1.0. "Release 1.0 of openEHR was published on 10/Feb/2006. openEHR is a set of public specifications, tested in implementation, for a distributed EHR/EHR computing platform and is designed for use at all levels of e-Health. It integrates with existing data sources, terminologies and is multi-lingual."
FreeMED 0.8.2 and REMITT 0.3.1 released (LinuxMedNews)
LinuxMedNews covers the release of FreeMED 0.8.2 and REMITT 0.3.1, two medical applications. "The FreeMED Software Foundation is proud to announce the release of version 0.8.2 of FreeMED and version 0.3.1 of REMITT. These releases are stable releases in the FreeMED 0.8.x and REMITT 0.3.x release cycles."
Music Applications
das_watchdog 0.1.2 announced
Version 0.1.2 of das_watchdog has been announced. "I have fixed up the compilation problems, corrected the DISPLAY environment variable, and let both the program and makefile give warning/error if the softirq-timer/0 or ksoftirqd/0 processes aren't set to have highest priority. It might still not work, but at least you get a message about /why/ it doesn't work, and what you can do to fix it."
E-Radium V0.61e and Das_Watchdog V0.2.0 announced
New versions of E-Radium, a music event editor, and Das_Watchdog, a realtime process monitor, are available.Rosegarden-4 1.2.3 released
Version 1.2.3 of Rosegarden-4 is out with many improvements. "The Rosegarden team are delighted to announce the release of version 1.2.3 of Rosegarden 4, an audio and MIDI sequencer and musical notation editor for Linux. Rosegarden is among the largest and most insanely ambitious Linux music software projects, and is the only Linux application to offer full composition and recording capabilities to musicians who prefer to use classical notation."
Shelljam 0.0.2 MIDI keyboard
Version 0.0.2 of Shelljam has been announced. "Shelljam is a way of playing electronic music live using standard computer hardware. It is implemented using fast portable libraries. It is designed to be suitable for live performance and studio work."
Office Suites
OpenOffice.org build oob680.1.0 released
Build oob680.1.0 of the OpenOffice.org office suite is available. "This package contains Desktop integration work for OpenOffice.org, several back-ported features & speedups, and a much simplified build wrapper, making an OO.o build / install possible for the common man. It is a staging ground for up-streaming patches to stock OO.o."
Streaming Media
Democracy Internet TV launches
The "Democracy" Internet television project has announced its existence with a press release proclaiming the availability of its GPL-licensed video player. It is a Windows download for now, though there is a developer release of a Linux-based player available. "Democracy builds on cutting edge RSS, Firefox, and BitTorrent technology to empower anyone to watch, share, broadcast and download video over the internet in a way that enables higher digital resolution, full screen video playback, continuous non-buffered play, and an open standards environment free of adware or spyware -- a much more TV-like experience than traditional web video, and with far more diversity and freedom than traditional TV."
Web Browsers
Minutes of the Firefox Team Status Meeting (MozillaZine)
The minutes from the February 14, 2006 Firefox team meeting have been announced. "Issues discussed include status updates on planned Firefox 2 features, list of features to be included in Alpha 1 release, product updates and action items."
Minutes of the mozilla.org Staff Meeting (MozillaZine)
The minutes from the February 13, 2006 mozilla.org staff meeting have been announced. "Issues discussed include Upcoming Releases, Firefox 2, Marketing and Foundation updates."
Miscellaneous
OmegaT 1.6.0 RC7 released (SourceForge)
Version 1.6.0 RC7 of OmegaT is out. "OmegaT is a free and open source multiplatform Computer Assisted Translation tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects. OmegaT project is proud to announce the 7th Release Candidate of OmegaT 1.6.0. 1.6.RC7 contains more than 50 bugfixes over the 1.4.5.04 release, so we consider it being more stable than 1.4.5.04 in terms of bugs."
Languages and Tools
C++
Ultimate++ 602 beta 3 released (SourceForge)
Version 602 beta 3 of Ultimate++ has been announced. "U++ is a complete C++ cross-platform rapid application development suite, where "rapid" is achieved by the 'smart and aggressive' use of C++ features. The new version brings fix of multi-threading issues in Linux, new Report package, fixes and optimization of Assist++ parser and refinements of project organization and build system."
Caml
Caml Weekly News
The February 14-21, 2006 edition of the Caml Weekly News is out. Topics include: Weblogs 1.2 released, ocaml+twt v0.81, ocaml ncurses bindings, What library to use for arbitrary precision decimals and Menhir available under GODI.
Java
This week on harmony-dev
The February 12-18. 2006 edition of This week on harmony-dev is online with coverage of Harmony, an open-source Java implementation.
Lisp
The LispDoc.com search engine
The LispDoc.com site has been launched. "William Bland has made available online LispDoc.com (The Lisp Dictionary), which is a search engine for Common Lisp documentation and is itself written in Common Lisp. It currently indexes a number of Common Lisp reference documentation sources and books."
Perl
Managing Rich Data Structures (O'Reilly)
Dave Baker uses perl to work with Rich Data Structures on O'Reilly. "If you're like me, you've written plenty of scripts that use simple text files to store snippets of data. Those scripts might have evolved over time into using several snippets of data for each item, which translates into lots and lots of little text files in a data directory somewhere. After reading that Linux doesn't like more than a hundred or so text files per directory, and thinking about the amount of space wasted on my hard drive due to the small size of the snippets compared to the size of a sector and the hassle of all those little files when making a backup, I decided to move from snippets to a single database. Here's how I did it."
PHP
PHP Weekly Summary for February 20, 2006
The PHP Weekly Summary for February 20, 2006 is out. Topics include: C++ extensions, casting and Unicode, iterator usage in PHP classes, asymmetric comparison, zip in 5.1.2, true labelled break, safe_mode gone, Deprecation marker, sys_getloadavg and stream_close.
PostScript
flpsed - a pseudo PostScript editor
Version 0.3.6 of flpsed is out with random page access and other new functionality. "flpsed is a WYSIWYG pseudo PostScript editor. "Pseudo", because you can't remove or modify existing elements of a document. But flpsed lets you add arbitrary text lines to existing PostScript 1 documents. Added lines can later be reedited with flpsed. Using pdftops, which is part of xpdf one can convert PDF documents to PostScript and also add text to them. flpsed is useful for filling in forms, adding notes etc. GsWidget is now part of flpsed."
Python
Dr. Dobb's Python-URL!
The February 20, 2006 edition of Dr. Dobb's Python-URL! is online with the latest Python language articles and resources.
Ruby
Ruby Weekly News
The February 19th, 2006 edition of the Ruby Weekly News looks at the latest discussions from the ruby-talk mailing list.
Tcl/Tk
Dr. Dobb's Tcl-URL!
The February 22, 2006 edition of Dr. Dobb's Tcl-URL! is online with new Tcl/Tk articles and resources.
Cross Compilers
GNU HC11/HC12 Release 3.1 is available
Release 3.1 of the GNU Development Chain for 68HC11 and 68HC12 microprocessors is out. Changes include upgrades to gcc 3.3.6 and gdb 6.4, and some bug fixes.
IDEs
Integrating Ant with Eclipse, Part 1 (O'ReillyNet)
O'Reilly presents part one in a series on integrating Ant with Eclipse. "Ant and Eclipse are the top Java build system and IDE, both by wide margins, so it's only natural you'd want to integrate them. In this excerpt from Ant: The Definitive Guide, 2nd Edition, Steve Holzner shows how to create and run Ant build.xml files from within Eclipse."
pydev 1.0.2 released
Version 1.0.2 of pydev, a Python IDE plugin for Eclipse, has been announced. Changes include new Jython debugging support, bug fixes and more. (Thanks to Bobby Hesselbo.)
Profilers
Interview with Valgrind Author Julian Seward (KDE.News)
KDE.News has an interview with Julian Seward, author of Valgrind. "JS: My background is in compiler technology, having been fascinated by them for a good couple of decades. I've also been interested in issues of software correctness for a long time. Eventually I combined these interests into creating Valgrind, a simulation-based tool which you can use to debug and profile your programs."
Page editor: Forrest Cook
Linux in the news
Recommended Reading
RIAA, others says CD ripping, backups not fair use (ARS Technica)
ARS Technica looks into a triennial review of the DMCA by the content industry. "But supporting the status quo isn't in their interest. No, the idea is to embrace and extend. To wit, the joint reply also argues that making backups of your CDs is also not fair use. "The [submitted arguments in favor of granting exemptions to the DMCA] provide no arguments or legal authority that making back up copies of CDs is a noninfringing use. In addition, the submissions provide no evidence that access controls are currently preventing them from making back up copies of CDs or that they are likely to do so in the future. Myriad online downloading services are available and offer varying types of digital rights management alternatives. For example, the Apple FairPlay technology allows users to make a limited number of copies for personal use. Presumably, consumers concerned with the ability to make back up copies would choose to purchase music from a service that allowed such copying. Even if CDs do become damaged, replacements are readily available at affordable prices.""
New initiative aims to improve the quality of patents (NewsForge)
NewsForge covers a quality over quantity initiative at the United States Patent and Trademark Office (USPTO). "The United States Patent and Trademark Office (USPTO), in looking for ways to improve the quality of the patents it issues, has turned to the biggest patent holder in the country, which also happens to be one of the biggest supporters of open source software (OSS). IBM's 2,941 patents from 2005 make it far and away the top patentee for the thirteenth consecutive year, but Big Blue -- with the help of the USPTO, Open Source Development Labs (OSDL), Novell, Red Hat, and SourceForge -- is now aiming for quality over quantity, and is enlisting the OSS community to do it."
Trade Shows and Conferences
Does Open Source Matter? To IT, It Does, Says Nicholas Carr (InformationWeek)
InformationWeek covers the OSBC keynote of Nicholas Carr. ""We're in the early stages of a revolution in IT. We're entering a true utility era for IT" in which open source code, from the Apache Web Server, Linux operating system and other pieces of open source code working with them will form a commoditized base for most enterprise computing, he said in a keynote speech Tuesday at the Open Source Business Conference in San Francisco. Open source code "will fundamentally change the way software is bought and used in the IT world," he said. The exact shape of things to come is still hard to discern, but he predicted a commodity base of open-source software was likely to become available through large centralized suppliers." (Thanks to Peter Masiar.)
Historic Libre Graphics Meeting set for next month (NewsForge)
Nathan Willis looks forward to the first Libre Graphics Meeting, on NewsForge. "LGM is free to attend and will be held at the university campus at La Doua, Villeurbanne, in Lyon, France. Speakers are scheduled for Friday and Saturday. Among them is the GIMP's Øyvind Kolås, who will present a talk on his implementation of the long-awaited Generic Graphical Library (GEGL) concept, Gggl. Marti Maria of LittleCMS will talk about color management, and adding it to graphics applications. Neil Howe, chief technology officer of Xara, will present an update on the company's work at opening the source of the Xara Extreme vector graphics editor and porting it to Linux."
Final FOSDEM interviews
The final set of FOSDEM interviews has been published; they are: Jeff Waugh, Tomasz Kojm, Alex Russell, and Mark Spencer. "FOSDEM sounds like it would be a great chance to help spread the word about Asterisk. It is ironic, really, that Asterisk is *so* well known in the communications space (Network World was so kind as to name me among the '50 most powerful people in networking' this year) but yet in the Linux world it is surprisingly unknown."
The SCO Problem
SCO Attacks Open Group (Groklaw)
Groklaw has a new SCO filing which reveals the latest mutation of SCO's story: "Seeking to make Linux a viable, commercial-ready UNIX-on-Intel alternative, IBM misappropriated UNIX technology from SCO and provided that technology to The Open Group for purposes of The Open Group's 'Single UNIX Specification 2001' and The Open Group's efforts to work on 'UNIX Developer Guide -- Programming Interface'" Some of the Groklaw folks have had fun debunking this one. Also on Groklaw: IBM has gotten around to sending subpoenas to interesting companies like Sun, Microsoft, and Baystar.
Companies
Jim Starkey joined MySQL AB
Jim Starkey is the original creator of InterBase which became Firebird. Here's blog entry at Firebird News in which he made it publicly known that he now works for MySQL AB. "My company, Netfrastructure, Inc., has been acquired by MySQL, AB. As part of the agreement, I will be working full time for MySQL." (Thanks to marius popa)
Oracle's open source buying spree (NewsForge)
Joe "Zonker" Brockmeier covers the issue of large corporations controlling open-source projects. "When asked, MySQL CEO Marten Mickos says that the company's ambition is "a successful independent existence" and called Oracle's purchase of Sleepycat "a great validation of the power of open source. We have been predicting for a long time that the incumbent vendors will adopt open source in some form: by acquiring companies, by launching open source initiatives, and by opensourcing old closed source products. This is all in line with that." Not everyone is as sanguine about Oracle's buying spree as Mickos. PostgreSQL developer Josh Berkus has worried for years about the "perils of corporate-owned open source" and says that Oracle's acquisition of Sleepycat "is a perfect case in point.""
Does Oracle Understand What It's Buying? (Technocrat.net)
Bruce Perens wonders if Oracle truly understands what it gets out of its open source acquisitions. "You can't really buy an Open Source project. The GPL was designed to make it possible for any Open Source participant to circumvent any other party who gets in the way. Other Open Source licenses are similar. Larry Ellison can buy business and influence over an Open Source project, but if he tries to have absolute control, Open Source developers will code elsewhere, replace whatever Larry holds close, and create new businesses."
Will major vendors dilute open source? (Network World)
Network World looks at recent acquisitions of free software companies. "'I believe what will really determine the success or failure of commercial firms purchasing open source vendors is the extent to which they can keep the key developers,' says Barry Strasnick, CIO at CitiStreet, a benefits management company in Quincy, Mass. 'One of the main reasons that CitiStreet likes to deal with vendors such as JBoss is that our senior technical staff can deal with their technical staff, instead of having to deal with useless layers in between,' he says."
Linux Adoption
South Korea plans 'Linux showcase city' (ZDNet UK)
ZDNet UK reports that the South Korean government has plans to showcase the use of Linux, by paying for a city and a university to deploy the software on their servers and desktops. "The government believes the showcase city and university will encourage other organisations to migrate to open source software. "The test beds will prompt other cities and universities to follow suit through the showcasing of Linux as the major operating system without any technical glitches and security issues," said MIC director Lee Do-kyu, according to The Korea Times."
Linux taken for a ride in the Old West (ZDNet)
ZDNet covers a migration to open source in Steamboat Springs, Colorado. "ZDNet UK spoke to Kent Morrison, the manager of information systems at Steamboat Springs, to find out more about the city's migration to open source. Morrison is responsible for two other members of staff in the town's IT department, which supports 160 networked workstations and approximately 220 email accounts across the town."
Linux at Work
PCs for the poor: Which design will win? (ZDNet)
ZDNet examines a number of technologies that are competing for the sub $100 PC space. "Only about 1 billion, or 16 percent of the 6.5 billion people living today, use the Internet, according to a running tally at Advanced Micro Devices. Designing machines that are resilient, powerful and cheap enough to reach those not yet online, though, has proven a lot tougher than expected. India's Simputer, an inexpensive handheld, flopped. Brazil has worked for years on a Linux PC for the poor, to no avail. "Initiatives of this sort need serious consideration from everyone. Developing nations need to start teaching about technology early in schools," said Luis Anavitarte, an analyst at Gartner. "But the reality kind of changes when we look at the costs and the functionality of these devices.""
Legal
California takes up transparency, open source voting (NewsForge)
NewsForge covers California State Senator Debra Bowen, who is overseeing hearings on whether the state should move toward using electronic voting systems that rely on open source software. "Open Voting Consortium President and CEO Alan Dechert is also focused on a more open, transparent vote, and backs not only Bowen and her bid for Calif. Secretary of State, but also state legislation to be introduced soon that requires disclosure of voting code and systems. Calif. Assemblywoman Jackie Goldberg pushed legislation requiring consideration of open source software for electronic voting systems in 2004 as well."
Reports from the USPTO Meeting (Groklaw)
Groklaw has posted a series of reports from the U.S. patent office meeting on prior art databases. From Bruce Perens's statement: "I respect that there are questions we've been asked to avoid, because this isn't the right forum. I'd just like to make sure that this activity is not confused as addressing the problems that software patenting presents for Open Source. It only deals with patent quality, and I hope that anyone reporting on this meeting understands that patent quality is a little piece of the overall problem for Open Source."
EU Council passes directive on data retention (Heise)
Heise reports that the European Union data retention directive has passed its last hurdle. "At their meeting in Brussels on Tuesday, the Ministers of Justice and Home Secretaries of the EU have paved the way for the retention of telephone and Internet data without grounds for suspicion. Without any further discussion, they approved a directive already passed last December with votes from the main people's parties in the EU Parliament. This directive makes it mandatory for telecommunications providers to retain data from the last six to 24 months for some 450 million EU citizens."
Resources
Asterisk on OpenWrt (NewsForge)
Joe Barr is running Asterisk on OpenWrt. "I installed Asterisk on OpenWrt White Russian RC4 on a Linksys WRT54GS wireless router. It's my first Asterisk installation. I admit that I scraped the knuckles on both hands getting Asterisk correctly configured, but now that I've done it, I would say it was worth all the frustrations it caused me. Not only do I now have a functional personal PBX, I've also learned a little about the black art of telephony along the way."
February GNOME Journal
The February 2006 GNOME Journal has been posted. Topics covered include GStreamer 1.10, Cairo, GNOME marketing, and an interview with Jeff Waugh. "There are no killer apps. I am quite serious about that. If we look at the kinds of things we describe as 'killer apps', they're almost always killer network effects. Look at the success of LAMP - which is the killer app? Is it Linux, Apache, one of the Free databases, or one of the Free languages that rocks for web stuff? None of them."
Building a High-Availability MySQL Cluster (O'ReillyNet)
O'ReillyNet looks at building a high-availability MySQL cluster. "When building highly available clusters, people often choose one extra physical machine per service, creating an A-B, fail-over schema. With static websites, there is no problem making the application highly available; you can just store the data in two places. However, the moment you add a database to your environment, things start to become more difficult. The easy way out is to move the database to a different machine and move that server into a SEP field."
Preventing SSH Dictionary Attacks With DenyHosts (HowtoForge)
HowtoForge prevents SSH dictionary attacks with DenyHosts. "In this HowTo I will show how to install and configure DenyHosts. DenyHosts is a tool that observes login attempts to SSH, and if it finds failed login attempts again and again from the same IP address, DenyHosts blocks further login attempts from that IP address by putting it into /etc/hosts.deny. DenyHosts can be run by cron or as a daemon. In this tutorial I will run DenyHosts as a daemon."
Reviews
Among Linux music players, Banshee really wails (Linux.com)
Linux.com has posted a glowing review of Banshee. "Banshee is perfect for managing your entire music collection, and particularly items stored on iPod music players. The software allows you to carry out many tasks in ways similar to Apple's iTunes, including playing music directly from the device and creating playlists with your songs. Banshee supports a wide variety of codecs, including Ogg Vorbis, FLAC, and MP3. The player read ID3 tags perfectly for my music collection, and sorting through the tracks -- comprising several file formats -- was incredibly easy."
Must-have Firefox and Thunderbird extensions (NewsForge)
Joe 'Zonker' Brockmeier reviews several firefox and thunderbird extensions on NewsForge. "The Quicktext extension comes in handy for anyone who needs to send out form letters or canned responses via email. This extension lets you define templates that you can insert into an email message from a menu, or (even better) using hotkey combinations. Templates can be simple or very complex."
PalmSource releases Linux platform for smartphone (Mobilisled)
Mobilisled looks at the Access Linux Platform (ALP) from PalmSource, a Linux platform for mobile phones. "Major components include a standard, commercial-grade Linux kernel, an optimised implementation of GIMP ToolKit , GStreamer -- an open source, modular and multi-threaded streaming media framework and the SQLite embedded database engine. The company is also adding in a few extras of its own, including the NetFront browser, PalmSource messaging and telephony middleware, the PalmSource mobile applications including PIMs, multimedia, messaging and HotSync capabilities along with Palm Desktop."
Miscellaneous
KDE 4 developers look toward new desktop possibilities (NewsForge)
Stephen Feller discusses the changes coming to KDE 4 in a NewsForge article. "Developers on the projects expected to make up the next major version of the K Desktop Environment (KDE) want KDE 4 to offer features and software interaction beyond what is available now, and better, easier access for users to their files and information. Among the ideas are universally available personal information and a desktop that is tailored for and responds to the things users do most. Ian Geiser, a KDE developer and official US representative for the KDE project, says KDE 4 will most likely be released in late 2006, though internal debate could push the release back to early 2007. Developer Till Adam says developers are still trying to figure out the combined vision for KDE 4, and how everything fits together."
KDE and GNOME collaborating on free desktop promotion (NewsForge)
NewsForge takes a look at collaboration between desktops. "KDE and GNOME undeniably occupy a very small share of the desktop market. If GNOME took 20% of that share from KDE, it'd make a marginal gain. But if KDE and GNOME together took a 10% of the desktop market by 2010 (a stated goal of the GNOME marketing project), they'd both gain a massive amount."
Linux is running on ...
Linux-Watch reports that Edgar 'Gimli' Hucek has gotten Gentoo Linux running on a Mactel. Also ZDNet reports that Dave Miller is running Linux on Sun's new UltraSparc T1 "Niagara"-based server. Neither is running perfectly yet. "The boot didn't go entirely swimmingly, however: Later in the process, the file system caused a serious problem called a kernel panic."
Page editor: Forrest Cook
Announcements
Non-Commercial announcements
The Open Group is developing new API sets
he Open Group has announced an effort to develop new API sets. "The Open Group's Base Working Group is developing four new sets of APIs for consideration as input into the next revision of the Austin Group joint standard (IEEE Std POSIX 1003.1 and The Open Group Base Specifications)."
PubForge: software for public broadcasting
A new site called PubForge aims to collect open-source software for use in the field of public broadcasting. "PubForge is looking for open-source software projects focused on the needs of public broadcasters to feature and distribute. What have you built lately?"
Sony BMG Settles Up with Music Fans for Copy-Protection Debacle
The Electronic Frontier Foundation has sent out a press release concerning the SONY BMG settlement. "The Electronic Frontier Foundation (EFF) is urging music fans who purchased Sony BMG music CDs containing flawed digital rights management (DRM) to submit their claims now for clean CDs and extra downloads as part of a class action lawsuit settlement. "This settlement gives consumers what they thought they were buying in the first place -- clean, safe music that will play on their computers and their iPods as well as their stereo systems," said EFF Staff Attorney Kurt Opsahl."
Commercial announcements
Active Endpoints Announces Free BPEL Design Environment
Active Endpoints, Inc. has announced the availability of free downloads of its ActiveBPEL Designer Business Process Execution Language software. "ActiveBPEL Designer is a comprehensive Eclipse Ready(TM) BPEL authoring environment. The product includes extensive productivity features to speed developers through the BPEL design and testing cycle, including advanced visual controls, runtime simulation, integrated debugging and 100% standard BPEL code generation."
ActiveState acquired by employees and Pender Financial Group
The word has been out for a while, but now we have a real press release: ActiveState, a developer of programming tools for scripting languages, has been acquired by a group consisting of Pender Financial and its own employees.atsec information security Completes Security Evaluation of RHEL 4
atsec information security corporation has completed the Common Criteria (CC) evaluation of Red Hat Enterprise Linux 4 on a range of IBM server platforms.CodeWeavers and WorldVistA Collaborate on Health Information Software
CodeWeavers, Inc. has announced a collaboration with WorldVistA with the goal of making healthcare management software freely available around the globe. "As the centerpiece of that partnership, CodeWeavers is porting the CPRS (Computerized Patient Record System) component of VistA, a free electronic health records software application developed by the U.S. Department of Veterans Affairs, for use on Linux open source computers."
Mandriva to participate in NEPOMUK
Mandriva has announced that its EDGE-IT subsidiary has won a seat in the NEPOMUK Social Semantic Desktop project, funded under the European Union's Sixth Framework Programme. The total project budget is ¤17 million, of which ¤1.8 million are reserved to EDGE-IT. Funding from the EU represents 50% of the budget. Major partners of the project includes DFKI, SAP, Thales, and IBM.Novell provides solution for Catholic Healthcare West
Novell has announced that Catholic Healthcare West, a 9,500-bed hospital system, has implemented a Novell(R) solution to increase security and strengthen its compliance initiatives. The organization centralized identity management with Novell Identity Manager 3 to ensure only authorized users have access to patient data in accordance with HIPAA regulations, reducing administration time by 70 percent. The hospital system also anticipates saving $1.5 million by deploying Novell's SUSE(R) Linux Enterprise Server.PolyServe Matrix Server Certified with Novell GroupWise
PolyServe, Inc. has announced its product certification on Novell GroupWise 7 and SUSE Linux. "PolyServe today announced Novell has certified the configuration of Novell GroupWise 7 running on PolyServe Matrix Server(TM) V3 shared data clustering software. This configuration delivers improved availability and manageability for e-mail, instant messaging, scheduling and other collaboration capabilities provided by GroupWise."
Sun to Acquire Aduva
Sun Microsystems, Inc. has announced plans to acquire Advua. "Aduva technology allows enterprises to automate the processes associated with patch and dependency management -- providing a solution that scales from individual servers, up to large scale data centers with tens of thousands of machines in complex networks. Aduva currently runs an active dependency service for Solaris and Linux servers, easing the burden on systems administrators deploying a continuous stream of patches, updates, and changes required throughout the data center lifecycle. Aduva's multi-platform services will be available for operation by individual customers behind their own firewalls, or as an automated service from Sun's Grid."
Visual Numerics' IMSL Fortran Library to Support PathScale EKOPath
PathScale Inc. has announced a port of the Visual Numerics IMSL Fortran Numerical Library to the PathScale EKOPath Compiler Suite. "Support of Visual Numerics' Fortran Library will enable scientific and technical institutions using PathScale's award-winning EKOPath Compiler to better perform advanced numerical analysis, leading to an acceleration of breakthroughs in science and engineering."
New Books
Pearson publishes Python Essential Reference, 3rd edition
Pearson has published the third edition of the book Python Essential Reference by David Beazley.
Resources
Programming with PostgreSQL
PostgreSQL ile Programlama is a Turkish book that is available for download under the GNU Free Documentation License. "PostgreSQL ile Programlama [Programming with PostgreSQL], is a book looks through how to connect a PostgreSQL database using C, PHP and Python interfaces."
Pervasive Software creates PostgreSQL Resource Directory
Pervasive Software Inc. has announced a new PostgreSQL database Resource Directory. ""The PostgreSQL database community is doing a great job adding tools around this powerful, enterprise-ready pen source database," said Gilbert van Cutsem, general manager, Database Products at Pervasive Software. "One inconvenience facing the community is that there is an overwhelming number of pockets of knowledge out there. That is why we compiled a central directory of what we believe are the best and most useful sites."
Upcoming Events
Bleepfest: London, UK
The 2006 Bleepfest will take place in London, UK on March 25. "Bleepfest 06 will be a part-day and night event that will be like the Demos of old and where people can have the option to display what they're doing "off stage" to small groups around them or to plug into the PA and be an "event". Events will have time spaces between them so that everybody else isn't drowned out. The object is to attract people who like to play with music as well as people who are quite serious about it. The object is also to create a fun and friendly environment where people can wander around and get new ideas."
The KDE DevRoom at FOSDEM
The schedule for the KDE DevRoom at FOSDEM has been announced, the conference takes place in Brussels, Belgium on February 25 and 26, 2006.FOSS Means Business, Ireland
The FOSS Means Business conference will take place in Belfast, Northern Ireland on March 16, 2006.The Second GPLv3 Conference
The Free Software Foundation has announced the second conference on the GPLv3 draft. This one will be held on April 21 and 22 in Porto Alegre, Brazil, alongside the International Free Software Forum.Kapor and Shuttleworth to Speak at 2006 MySQL Users Conference
Mitch Kapor and Mark Shuttleworth will be the keynote speakers at the 2006 MySQL Users Conference. The event will be held on April 24-27, 2006 in Santa Clara, CA.TimeSys Webinars on Embedded Linux Development
TimeSys has announced a series of four Webinars on embedded Linux development, taking place on March 7, 14, 21 and 28. "During each session, a technical expert will guide attendees through practical embedded development tasks using LinuxLink by TimeSys(TM), a continuously updated, Web-based resource for embedded Linux development."
Events: February 23 - April 20, 2006
Date | Event | Location |
---|---|---|
February 24 - 26, 2006 | PyCon 2006 | (Dallas/Addison Marriott Quorum hotel)Addison, TX |
February 25 - 26, 2006 | FOSDEM 2006 | (ULB Campus)Brussels, Belgium |
February 26 - 28, 2006 | OSDC::Israel::2006 | (Netanya Academic College)Netanya, Israel |
February 27 - March 3, 2006 | SELinux Symposium and Developer Summit | (Wyndham Hotel)Baltimore, MD |
February 28 - March 3, 2006 | Black Hat Europe Briefings and Training 2006 | (Grand Hotel Krasnapolsky)Amsterdam, the Netherlands |
March 3 - 4, 2006 | LinuxForum 2006 | Copenhagen, Denmark |
March 3 - 5, 2006 | Akademy-es 2006 | Barcelona, Spain |
March 6 - 9, 2006 | O'Reilly Emerging Technology Conference(ETech) | (Manchester Grand Hyatt)San Diego, CA |
March 8 - 10, 2006 | New Orleans Plone Symposium | (Astor Crowne Plaza)New Orleans, LA |
March 16, 2006 | FOSS means Business | (Spires Conference Centre)Belfast, Northern Ireland |
March 17 - 19, 2006 | Libre Graphics Meeting 2006 | (Ecole d'Ingénieurs CPE)Lyon, France |
March 18 - 19, 2006 | Rockbox International Developers Conference 2006 | Stockholm, Sweden |
March 19 - 24, 2006 | Novell BrainShare 2006 | (Salt Palace Convention Center)Salt Lake City, UT |
March 21 - 23, 2006 | UKUUG Spring Conference 2006 | Durham, UK |
March 25, 2006 | Penguin Day | Seattle, WA |
March 25, 2006 | Bleepfest 06 | (Christchurch Spitalfields Crypt)London, England |
March 29 - 31, 2006 | PHP Quebec 2006 | (Plaza Montreal Hotel)Montreal, Canada |
April 3 - 6, 2006 | Embedded Systems Conference(ESC) | (McEnery Convention Center)San Jose, CA |
April 3 - 7, 2006 | CanSecWest/core06 | (Marriott Renaissance Harbourside hotel)Vancouver, Canada |
April 3 - 4, 2006 | Freedom To Connect 2006(FTC) | (AFI Silver Theater)Washington, DC |
April 3 - 6, 2006 | LinuxWorld Conference and Expo | (Boston Convention and Exposition Center)Boston, MA |
April 7 - 9, 2006 | Notocaon 3 | (Holiday Inn Select Cleveland)Cleveland, OH |
April 11 - 12, 2006 | CELF Embedded Linux Conference | San Jose, California |
April 15 - 16, 2006 | LayerOne 2006 | (Pasadena Hilton)Pasadena, California |
April 19 - 22, 2006 | Forum Internacional Software Livre 7.0(FISL) | Porto Alegre, Brazil |
April 20 - 22, 2006 | International Conference on Availability, Reliability and Security(AReS 2006) | Vienna, Austria |
Web sites
New KDE Localisation Website (KDE.News)
KDE.News mentions the new KDE Localisation web site. "After 6 months of development, the KDE Localisation (l10n) website web site has been launched replacing the old i18n.kde.org. It uses the default KDE layout, and its admins hope this site will help the KDE translation process work better than ever. Read on for the details. This refactoring, for the moment, mostly modifies scripts, pages, and styles on the site."
validator.annodex.org service announced
An Annodex Media Validation Service is available. "The Annodex Foundation is pleased to announce the general availability of the Annodex Media Validation Service, a free service that checks Web-accessible Ogg, CMML and Annodex media resources for conformance to Annodex and Xiph.Org specifications:" http://validator.annodex.org/.
Audio and Video programs
l.o.s.s. open-source sound project
The l.o.s.s. open-source sound project has been announced. "As well as a CD of curated work (also available for free download), the project's online presence is intended to become a focal point for artists working with open source software, and releasing their work through CC licenses."
Page editor: Forrest Cook