LWN.net Logo

LWN.net Weekly Edition for February 23, 2006

The Grumpy Editor's guide to bayesian spam filters

This article is part of the LWN Grumpy Editor series.
If there is one thing which is especially effective at making your editor grumpy, it is spam. The incoming flood consumes bandwidth, threatens to drown out the stream of real mail, and creates ongoing system administration hassles. With current tools, it is possible to keep spam from destroying the utility of the email system. But it's not always easy or fun.

In recent years, much of the development energy in anti-spam circles has gone into bayesian filters. The bayesian approach was kicked off in 2002 with the publication of Paul Graham's A plan for spam. In its simplest form, a bayesian filter keeps track of words found in email messages and, for each word, a count of how many times that word appeared in spam and in legitimate mail ("ham"). Over time, these statistics can be used to examine incoming mail and come up with a probability that each is spam.

Bayesian filters have proved to be surprisingly effective. A well-trained filter can catch a high proportion of incoming spam with a very low false positive rate. The filter will adapt to each user's particular email stream, which is an important feature. It should not be surprising that different people have wildly different legitimate email. It turns out, though, that the spam stream can also vary quite a bit. An account which looks like it could belong to a woman, for example, will tend to get messages offering to alter the sizes of different parts of the recipient's anatomy than a man's account. So the ability to tune a filter to a specific mail stream - ham and spam - will increase its accuracy.

There is quite a large selection of free bayesian filters out there. Your editor decided to have a look around to see if there is any reason to prefer one over the others. To that end, a number of characteristics were examined:

  • Accuracy. A filter which does not accurately classify mail will not be of much use, so good results in this area are required. In particular, false positives (legitimate mail classified as spam) must be avoided.

  • Training. Bayesian filters must be trained before they become effective. Some filters, it turns out, are easier to train than others. In general, the training process is somewhat like house-training a puppy: it's a painful process, involving contact with unpleasant materials, and with a messy failure mode. And, somewhere in the process, something you care about is likely to get chewed up. So, in general, this is a process which one would like to be done with quickly and not have to do again later on.

    There are people who lovingly tweak and tune their spam filters the way an automobile enthusiast tweaks his car. Your editor is not one of those people. Life is too short - and too busy - to spend a lot of time screwing around with spam filters.

  • Speed. The difference in performance between the fastest and slowest filters covers two orders of magnitude. Since filtering tends to be done in the background, speed will not be crucially important to all users. When filtering is being done on a busy mail server, however, processing speed can matter a lot.

  • Ease of integration. How much work is it to hook a filter into the mail stream?

To carry out the tests, your editor collected two piles of mail from his personal stream; one was purely spam, and the other ham. Just over 1,000 messages from each pile were set aside to be used to train the filters. Then, 6,000 hams and 9,000 spams were fed to each filter, with the filter's verdict and processing time being recorded. Each mis-classified message was immediately fed back to the filter to further train it. Multiple runs were made with different parameters, but, in general, your editor resisted the urge to go tweaking knobs. Some of these filters offer a vast number of obscure parameters to set; one can only hope they come with reasonable defaults.

As a side note, your editor was surprised and dismayed at how difficult the task of producing pure sets of spam and ham was. The process started with mail sorted by SpamAssassin at delivery time. Your editor then passed over the entire set, twice, reclassifying each message which was in the wrong place. It was only after some early tests started reporting "false positives" that it became clear how much spam still lurked in the "ham" pile. It took more manual passes, and many passes with multiple filters, to clean them all out. The developers who claim that their filters do a better job than a human does are right - when that human is your editor, at least. It also turns out that a few incorrectly classified messages can badly skew the results; bayesian filters are easily confused if you train them badly.

Anyway, the results will be presented in five batches of 1200 hams and 1800 spams. Nothing special was done between these batches; this presentation is intended to show how the filter's behavior evolves as it is trained on more messages. All of the results are also pulled together in a summary table at the end of the article.

Bogofilter

Bogofilter was originally written by Eric Raymond shortly after Paul Graham's article was posted. It has evolved over time, and has picked up a wider community of contributors and maintainers. Bogofilter uses a modified version of the bayesian technique, with a number of knobs to tweak. It is written in C and is quite fast.

Training for bogofilter is somewhat complex; your editor was unable to train it into a stable configuration by feeding it hams and spams directly. The presence of several different training scripts in the source tree's "contrib" directory suggests that others have had to put some work into training as well. In the end, the contributed "trainbogo.sh" script appeared to do a reasonable job, but it required about three runs to get bogofilter into a stable state.

Bogofilter offers two approaches to ongoing training. By default, the filter is not trained by new messages as it classifies them. People who use bogofilter in this mode will set aside bogofilter's mistakes for later training. When the -u option is provided, however, bogofilter will train itself on all messages it feels sufficiently strongly about. Use of -u makes retraining on mistakes even more important, or the filter will become increasingly likely to misclassify mail. In general, training a bayesian filter on its own output must be done with care. It can help the filter to keep up with the spam stream as it evolves, but it also is a positive feedback loop which can go badly wrong if not carefully watched.

Your editor ran bogofilter (v1.01) in both modes (starting with a freshly trained database in each case). Here's the results:

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
bogofilter 141 0 0.02 6900.01 96 0 0.02 4800.02 52 0 0.02 5
bogofilter -u 87 0 0.05 5400.05 41 0 0.05 4500.06 41 0 0.09 32

Legend: Fn is the number of false negatives (spam which makes it through the filter); Fp is false positives (legitimate mail misclassified as spam), and T is the processing time (clock, not CPU), in seconds per message. The Size value at the end is the final size of the word database, in MB.

Here we see that bogofilter without the -u option tends toward around 50 missed spams out of a set of 1800 (a 2.8% error rate), but with no false positives at all. If bogofilter self-trains itself, the false negative rate drops to closer to 2.2%. As we will see, these results are not as good as some of the other filters reviewed.

Without self-training, bogofilter requires a roughly constant 0.02 seconds of time to classify a message; with self-training, that time increases as the word database grows. Bogofilter is clearly fast - the fastest of all the filters reviewed here. One of the ways in which it gains that speed is to not bother with attachments in the mail. The web page says "Experience from watching the token streams suggests that spam with enclosures invariably gives itself away through cues in the headers and non-enclosure parts."

Bogofilter is intended to be integrated as a simple mail filter, optimally invoked via procmail. It can place a header in processed messages (making life easy for procmail) and also returns the spam status in its exit code (making life easy for grumpy editor testing scripts). Bogofilter has options for dumping out the word database and, for a given message, listing the words which most influenced how that message was classified. Nothing special has been done to make retraining easy; most users will probably create folders of mistakes and occasionally feed them to the filter.

CRM114

An interesting - if intimidating - offering is CRM114, subtitled "the controllable regex mutilator." While the main use of CRM114 appears to be spam filtering, it has a wider scope; it can be trained, for example, to filter interesting lines out of logfiles. According to the project page:

Criteria for categorization of data can be by satisfaction of regexes, by sparse binary polynomial matching with a Bayesian Chain Rule evaluator, a Hidden Markov Model, or by other means.

This tool comes with a 275-page book in PDF format, and needs every one of those pages. Setting up CRM114 is not for the faint of heart; it involves the manual creation of database files and the editing of a long configuration file which, while not quite up to sendmail.cf standards, is still one of the more challenging files your editor has encountered in some time. Once all of that is done, however, the "crm" executable can be hooked into a procmail recipe in the usual way without too much trouble.

The CRM114 documentation recommends against any sort of initial training of the filter. The developers are strong believers in the "train on errors" approach, saying that there are "mathematically complicated" reasons why pre-training leads to worse results. For users who don't get the hint, they do provide a way to perform pre-training:

If you really feel you must start by preloading some sample spam, copy your most recent 100Kbytes or so of your freshest spam and nonspam into two files in the current directory. These files MUST be named "spamtext.txt" and "nonspamtext.txt" They should NOT contain any base64 encodes or "spammus interruptus", straight ASCII text is preferred. If they do contain such encodes, decode them by hand before you execute this procedure.

The prospect of hand-decoding binary spam attachments is likely to put off most people who were pondering pre-training their filters - and, one assumes, that is the desired result. Of course, one can also use the normal training commands to feed messages into the system in a pre-training mode, but the documentation doesn't say that.

While filter training can be done on the command line, users can also retrain the filter by forwarding errors back to themselves. The message must be edited to include a training command and password; the developers also recommend removing anything which shouldn't be part of the training. Strangely, users are also told to remove the markup added by CRM114 itself - something which, one would think, could be handled automatically.

Your editor tested the 20060118 "BlameTheReavers" release of CRM114. The first test was done without training, as recommended; then, just to be stubborn, a test was run with a pre-trained filter.

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
CRM114 1 1 0.06 110.06 3 2 0.06 460.06 5 6 0.07 24
CRM pretrain 6 2 0.07 120.07 5 2 0.07 120.07 1 6 0.07 24

Some things jump out immediately from those numbers. CRM114 is quite fast. It is also quite effective very early on; the first 3000 messages were processed with exactly one false positive and one false negative - starting with an untrained filter. On the other hand, its performance appears to worsen over time, and, in particular, the false positive rate grows in a discouraging way. The false positives varied from vaguely spam-like messages (Netflix updates, for example) to things like kernel patches. Your editor concludes that CRM114 operates as a very aggressive and quick-learning filter, but that it is also relatively unstable.

DSPAM

DSPAM is a GPL-licensed filter written in C. It is clearly aimed at large installations - places with dedicated administrators and, possibly, relatively unsophisticated users. As a result, it has a few features not found in other systems. For example, it has a web interface with statistics, facilities to allow users to manage filter training, and "pretty graphs to dazzle CEOs." Users who don't want to train the filter through a web page can forward mistakes to a special address instead.

There are several ways to hook DSPAM into the mail system, including a command-line filter, a POP proxy, and an SMTP front-end which can be put between the net and the mail delivery agent. There are several choices of backend storage (SQLite, Berkeley DB, PostgreSQL, MySQL, Oracle, and more), and a number of different filtering techniques. The filter can also run in a client-server mode, much like SpamAssassin.

DSPAM is also a package with a dual-license option; companies interested in shipping the software without providing source can purchase a separate license from the developer.

The system is intended to require relatively little maintenance. It has a set of tools, meant to be run from cron, which handle much of the routine housekeeping. Among other things, DSPAM will automatically trim its word list - getting rid of terms which have not been seen for a while and which have little influence on message scoring.

Initial training can be performed using the dspam_train utility; it uses a train-on-errors approach. Thereafter, DSPAM offers several training modes. The recommended "teft" mode trains on every message passing through the system. There is a train-on-errors mode, and a "tum" mode ("train until mature") which emphasizes relatively new and uncommon words. Your editor ran DSPAM (in the standalone, command-line mode) using all three training schemes, with the following results:

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
DSPAM teft 17 0 0.1 1700.1 11 0 0.1 300.1 7 0 0.1 305
DSPAM toe 23 0 0.1 2130.1 12 0 0.1 310.1 8 4 0.1 276
DSPAM tum 26 0 0.1 2300.1 12 0 0.1 700.1 15 0 0.1 305

So DSPAM shows strong spam detection in all three modes with a mid-range execution time; it is much slower than bogofilter, but much faster than some of the alternatives. The comprehensive training mode would appear to be the most effective; the TUM mode increases the false negative rate slightly, and the TOE mode introduces false positives. Note that the DSPAM database is quite large; to a great extent, this volume is taken up by a directory full of message hashes used to keep track of which messages have been used to train the filter.

SpamAssassin

SpamAssassin, which is written in Perl, is unique among the filters tested in that it combines a bayesian filter with a large set of heuristic scoring rules. The filter, in essence, is just another rule which gets mixed in with the rest. The rule database takes a great deal of effort (on the part of the SpamAssassin developers) to maintain, and testing messages against all of those rules makes SpamAssassin relatively slow. There is a huge advantage to this approach, however: SpamAssassin works well starting with the first message it sees, and it is able to train its own bayesian filter using the results from the rules.

Another nice feature in SpamAssassin is its word list maintenance. Most bayesian filters seem to grow their word lists without bound. Since spam can contain a great deal of random nonsense (actually, much of your editor's ham does as well), the word list can quickly fill up with tokens which are highly unlikely to ever help in classifying messages. Documentation for some other filters suggests occasionally dumping the word list and starting over. SpamAssassin, instead, will occasionally (and automatically) delete tokens which have not been seen for some time. So the word list stays within bounds. In general, SpamAssassin is relatively good at minimizing the need for the user to perform maintenance tasks.

The sa-learn utility is used for most bayesian filter tasks. It can retrain the filter on mistakes, dump out word information, and force cleanup operations. SpamAssassin can be run in a client/server mode, which improves performance on busy systems. The client/server mode can also help to put a bound on SpamAssassin's memory use, which can be a little frightening. Standalone SpamAssassin on a small-memory system can create severe thrashing.

Your editor ran two sets of tests with SpamAssassin 3.1.0, running in the client/server mode, with network blacklist tests enabled. (Before somebody asks: the test was run on a standalone system to avoid any possible contamination by your editor's regular mail stream). Exactly one scoring tweak was made: the score for BAYES_99 (invoked when the bayesian filter is absolutely sure that the message is spam) was set to 5.0, enabling the filter to condemn messages on its own. That change helps to emphasize the bayesian side of SpamAssassin, and, in your editor's experience, makes it more effective. The first test involved a pre-trained database, as was done with the other filters. The second test, instead, started with an empty bayesian database in an effort to see how well the tool trains itself. Here's the results:

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
SpamAssassin 8 0 1.1 301.1 5 0 1.1 301.0 2 0 1.0 10
SA untrained 32 0 0.6 901.0 18 0 1.0 1501.0 7 0 1.0 10

The results here show that SpamAssassin filters up to 99.9% of incoming spam, at the cost of significant amounts of CPU time. The untrained run shows higher error rates, but does eventually converge on something similar to the pre-trained version. But, at over one second per message, each testing run (comprising 15,000 messages) took a rather long time.

SpamAssassin operates as a filter, adding a header to messages as they pass through. That header can be used in procmail recipes; the thunderbird mail agent is also set up to optionally use the SpamAssassin header.

SpamBayes

SpamBayes is a filter written in Python. The SpamBayes hackers have, perhaps more than some of the other filter developers, made tweaks to the bayesian algorithm in an attempt to improve performance. Those hackers have also put more effort into mail system integration than some; as a result, SpamBayes comes with an Outlook plugin, POP and IMAP proxy servers, and a filter for Lotus Notes. It is still possible to use SpamBayes as a command-line filter with procmail, however.

There is a separate script (sb_mboxtrain.py) which is used to train the filter. Your editor followed the instructions and found it seemingly easy to use - it nicely understands things like MH and Maildir folders. However, when used as documented, sb_mboxtrain.py happily (and silently) puts the resulting word database in an undisclosed location, and filtering works poorly. Adding a few options to make the database location explicit took care of the problem.

SpamBayes 1.0.4 was tested in two modes: retraining just on errors, and training on all messages.

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
SpamBayes 71 0 0.4 4400.4 29 0 0.4 2100.4 20 1 0.4 4
SB train all 90 0 0.8 5800.8 54 0 0.8 4600.9 46 0 0.9 16

SpamBayes takes a while to truly train itself, but it does eventually get to a 98.9% filtering rate - better than some, but not truly amazing. The word database remains relatively small, but processing time is significant - especially if comprehensive training is used. Everything gets worse with comprehensive training, however - the spam detection rate drops while processing time increases. SpamBayes is able to avoid false positives in both modes, however.

SpamProbe

SpamProbe is a filter written in C++ and released under the Q Public License. Unlike most filters, which record statistics on individual words, SpamProbe is also able to track pairs of words (DSPAM can do that too). SpamProbe looks at text attachments, discarding other types of attachments with one exception: there is a simple parser for GIF images. This parser creates various words describing images in a message (based on sizes, color tables, GIF extensions, etc.) and uses them in evaluating each message.

SpamProbe is packaged as a single command with a vast number of options. There is an "auto-train" mode for getting the filter trained in the first place. There are two filtering modes which the author calls "train" and "receive." Both will filter the message; the "train" mode only updates the word database "if there was insufficient confidence in the message's score," while "receive" always updates the database. The author recommends "train" mode; your editor tested SpamProbe 1.4a in both modes:

Batch: 1 2 3 4 5
Fn Fp T FnFpT Fn Fp T FnFpT Fn Fp T Size
SpamProbe train 80 0 0.2 3910.1 37 1 0.1 3910.1 27 0 0.1 81
SP receive 90 0 0.6 5120.6 39 1 0.6 4200.7 35 1 0.9 201

SpamProbe's "receive" mode demonstrates that, with bayesian filters, more training is not always better. The added training slows down processing significantly, to the point that SpamProbe is almost as slow as SpamAssassin, but the end results are worse than those obtained without comprehensive training. SpamProbe has a significant false positive rate in either mode, but the "receive" mode makes it worse. In either mode, SpamProbe generates vast amounts of disk traffic, rather more than was observed with the other filters.

Unlike most other filters, SpamProbe does not insert a header in filtered mail. Instead, it emits a single line giving its verdict; the author then suggests using a tool like formail to create a header using that score. So integration of SpamProbe is a little harder than with some other tools.

Summary

Here is a summary table combining all of the filter runs described above:

Test False neg. False pos. Time Size
bogofilter 4065.5% 0.02 5
bogofilter -u 2683.0% 0.06 32
CRM114 140.1% 160.3% 0.06 24
CRM114 pretrain 140.2% 150.3% 0.06 24
DSPAM teft 500.6% 0.1 305
DSPAM toe 670.7% 150.3% 0.1 276
DSPAM tum 830.9% 0.1 305
SpamAssassin 210.2% 1.1 10
SpamAssassin untrained 810.9% 0.9 10
SpamBayes 1852.1% 10.02% 0.4 4
SpamBayes train all 2943.3% 0.8 16
SpamProbe train 2222.5% 30.05% 0.1 81
SpamProbe receive 2572.9% 40.07% 0.7 201

In the above table, the "false positives" columns were left blank for tests in which there were none. Since false positives will be the bane of any spam filter, it is good if they stand out.

One should, of course, take all of the above figures with a substantial grain of salt. They reflect performance on your editor's particular mail stream; things could be very different with somebody else's mail. Still, your editor's mail stream is varied enough that, perhaps, a few conclusions can be drawn.

One of those would be that SpamAssassin is still hard to beat. It is, by far, the slowest of the filters, but it is highly effective with a minimum amount of required setup and maintenance on the user's part. For the most part, it Just Works, and it works quite well. In situations where an administrator is setting things up for a large group of users, DSPAM may well be indicated. The broad flexibility of that tool make it easy to integrate into just about any mail system, and the web interface makes its operation relatively transparent to users. Just be sure you have a big disk for its databases.

CRM114 is an interesting project; its combination of technologies has the potential to make it the most accurate of all the filters. It has the look of a hardcore power tool. This tool, however, is not ready for prime time at this point. It is a major hassle to set up, and, for your editor at least, keeping the filter stable was a challenge. The other three filters all have their strong points, but none of them had the level of spam detection that your editor would like to see.

There are, of course, other filters out there as well. Some of the graphical mail clients have started to integrate their own filters. There is a great convenience in having a "junk" button handy, but the integrated filters sacrifice transparency and, in your editor's (admittedly limited) experience, they do not seem to develop the same level of accuracy. There is also ifile, which is intended to be a more general mail classifier. That tool is no longer under development, however.

In the end, none of the filters reviewed is perfect - it would be nice to see no spam at all. But some of them are surprisingly close. Think back, for a minute, to the days when were complaining about getting a dozen spams per day - or per week. Who would have thought that we would be able to cope with thousands of spams per day and still deal with our mail? The developers of these filters have, in a significant way, saved the net, and your editor thanks them.

Comments (58 posted)

Parallel universes: open access and open source

February 22, 2006

This article was contributed by Glyn Moody

The growing success of free software has led to a widening of the culture clash between "open" and "closed" to include other domains. One recent skirmish, for example, concerned a particularly important kind of digital code – the sequence of the human genome – and whether it would be proprietary, owned by companies like Celera, or freely available. Openness prevailed, but in another arena – scholarly publishing – advocates of free (as in both beer and freedom) online access to research papers are still fighting the battles that open source won years ago. At stake is nothing less than control of academia's treasure-house of knowledge.

The parallels between this movement - what has come to be known as “open access” – and open source are striking. For both, the ultimate wellspring is the Internet, and the new economics of sharing that it enabled. Just as the early code for the Internet was a kind of proto-open source, so the early documentation – the RFCs – offered an example of proto-open access. And for both their practitioners, it is recognition – not recompense – that drives them to participate.

Like all great movements, open access has its visionary – the RMS figure - who constantly evangelizes the core ideas and ideals. In 1976, the Hungarian-born cognitive scientist Stevan Harnad founded a scholarly print journal that offered what he called “open peer commentary,” using an approach remarkably close to the open source development process. The problem, of course, was that the print medium was unsuited to this kind of interactive development, so in 1989 he launched a Usenet/Bitnet magazine called “Psycoloquy”, where the feedback process of the open peer commentary could take place in hours rather than weeks. Routine today, but revolutionary for scholarly studies back then.

Harnad has long had an ambitious vision of a new kind of scholarly sharing (rather as RMS does with code): one of his early papers is entitled “Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge”, while a later one is called bluntly: “A Subversive Proposal for Electronic Publishing.” Meanwhile, the aims of the person who could be considered open access's Linus to Harnad's RMS, Paul Ginsparg, a professor of physics, computing and information science at Cornell University, were more modest.

At the beginning of the 1990s, Ginsparg wanted a quick and dirty solution to the problem of putting high-energy physics preprints (early versions of papers) online. As it turns out, he set up what became the arXiv.org preprint repository on 16 August, 1991 – nine days before Linus made his fateful “I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones” posting. But Ginsparg's links with the free software world go back much further.

Ginsparg was already familiar with the GNU manifesto in 1985, and, through his brother, an MIT undergraduate, even knew of Stallman in the 1970s. Although arXiv.org only switched to GNU/Linux in 1997, it has been using Perl since 1994, and Apache since it came into existence. One of Apache's founders, Rob Hartill, worked for Ginsparg at the Los Alamos National Laboratory, where arXiv.org was first set up (as an FTP/email server at xxx.lanl.org). Other open source programs crucial to arXiv.org include TeX, GhostScript and MySQL.

In 1994, Harnad espoused the idea of self-archiving in his “Subversive Proposal”, whereby academics put a copy of their papers online locally (originally on FTP servers) as well as publishing them in hardcopy journals. The spread of repositories soon led to interoperability issues. The 1999 Open Archives Initiative (in which Ginsparg was a leading figure) aimed to deal with this by defining a standard way of exposing an article's metadata so that it could be “harvested” efficiently by search engines.

Beyond self-archiving - later termed “green” open access by Harnad – lies publishing in fully open online journals (“gold” open access). The first open access magazine publisher, BioMed Central – a kind of Red Hat of the field – appeared in 1999. In 2001 the Public Library of Science (PLoS) was launched; PLoS is a major publishing initiative inspired by the examples of arXiv.org, the public genomics databases and open source software, and which was funded by the Gordon and Betty Moore Foundation (to the tune of $9 million over five years).

Just as free software gained the alternative name “open source” at the Freeware Summit in 1998, so free open scholarship (FOS), as it was called until then by the main newsletter that covered it - written by Peter Suber, professor of philosophy at Earlham College - was renamed “open access” as part of the Budapest Open Access Initiative in December 2001. Suber's newsletter turned into Open Access News and became one of the earliest blogs; it remains the definitive record of the open access movement, and Suber has become its semi-official chronicler (the Eric Raymond of open access - without the guns).

After the Budapest meeting (funded by speculator-turned-philanthropist George Soros, who played the role taken by Tim O'Reilly at the Freeware Summit), several other major declarations in support of open access were made, notably those at Bethesda and Berlin (both 2003). Big research institutions started actively supporting open access – rather as big companies like IBM and HP did with open source earlier. Key early backers were the Howard Hughes Medical Institute (2002) in the US and the Wellcome Trust (2003) in the UK, the largest private funders of medical research in their respective countries.

Both agreed to pay the page charges that “gold” open access titles need in order to provide the content free to readers – typically $1000 per article. This is not as onerous as it sounds: the annual subscription for a traditional scientific journal can run to $20,000 (even though the authors of the papers receive nothing for their work). For a major research institution, the cumulative cost adds up to millions of dollars a year in subscriptions. This annual tax is very like the licensing fees in the proprietary software world. What an institution saves by refusing to pay these exorbitant subscriptions – as the libraries at Cornell, Duke, Harvard and Stanford Universities have done in the US – it can use to fund page charges, just as companies can use monies saved on software licensing costs to pay for the support and customization they need.

With all this activity, governments started getting interested in open access, and so did the big publishers, worried by the potential loss of revenue (the Microsoft of the scientific publishing world, the Anglo-Dutch company Elsevier, has had operating profits of over 30%). The UK House of Commons Science and Technology committee published a lengthy report recommending obligatory open access for publicly-funded research: it was ignored by the UK government because of pressure from British publishing houses. In 2004, the US NIH issued a draft of its own plans for open access support – and was forced to water them down because of fierce lobbying from science publishers.

Given the many similarities between the respective aims of open source and open access, it is hardly surprising that there are direct links between them. In 2002, MIT released its DSpace digital repository application under a BSD license, while Eprints, the main archiving software used for creating institutional repositories, went open source under the GPL. As the latter's documentation proudly proclaims:

The EPrints software has been developed under GNU/Linux. It is intended to work on any GNU system. It may well work on other UNIX systems too. Other systems people have got EPrints up and running on include Solaris and MacOSX. There are no plans for a version to run under Microsoft Windows.

There is a commercial, supported version too. Open Journal Systems is another journal management and publishing system released under the GPL.

As the mainstream open source projects mature, the applications used by the open access movement could well prove increasingly attractive to coders who are looking for a challenge and an area where they can make a significant contribution – not just to free software, but also to widening free access to knowledge itself.

Glyn Moody writes about open source and open access at opendotdotdot.

Comments (17 posted)

Page editor: Jonathan Corbet

Security

A new Linux worm

February 21, 2006

This article was contributed by Jake Edge.

A Linux worm (called "Mare.D" by some) that exploits an old PHP XML-RPC vulnerability has been sighted in the wild and was reported on Sunday to the full-disclosure mailing list. An update later in the day makes it clear that this is a new attack, based on an earlier worm, kaiten, and attempts to connect infected systems to a botnet.

The attack starts with a crafted XML-RPC request targeted at Wordpress, Drupal, phpBB and other content management systems that were known to be vulnerable in June 2005, when this problem was first reported. The request contains code which will be executed by PHP; this code, in turn, retrieves another script from a (now defunct) server and executes it. The second script then retrieves yet another pair of executables from the server; these are the main payload of the attack.

The first of these programs is the 'spreader' which attempts to find other vulnerable hosts and infect them. The other program, instead, connects to an IRC server which functions as the 'command and control' (C&C) element for a botnet. The irc server would instruct the client to download yet another program which opens a backdoor shell when executed. It is unknown what else the attacker planned with the bots as the C&C server has been shut down.

It is interesting to note that this worm does not compromise root and does not gain complete control of the host, but it does provide enough privileges that makes it attractive for a botnet. The exploit will allow the attacker to run with the permissions of the user who owns the httpd process (typically 'apache' or 'httpd') which is sufficient to perform the two most likely bot tasks: spamming and distributed denial of service. On the flipside, because it did not gain root privileges, it cannot do very much to hide itself and it should be very easy to detect on an infected system.

Overall, the impact of this attack is relatively small thanks, in part, to fast action to shut down the servers providing the scripts and controlling the botnet. But it seems likely that the backdoor shell is running on some hosts which got an "execute" command for that script before the servers were terminated. Another possibility is that there are different versions of the attack floating around, using different server addresses; those servers may still be running.

As is the case for many malware attacks, this would only affect systems that did not have up-to-date software. Eight months seems like enough time to update affected systems, so the fact that there are still vulnerable systems out there is a sad testament to how little attention is paid to security by some, probably many, Linux system administrators.

More information about this exploit can be found in the Shadowserver article and updates on this attack are being posted to the Securiteam blog.

Comments (20 posted)

New vulnerabilities

bluez-hcidump: buffer overflow

Package(s):bluez-hcidump CVE #(s):CVE-2006-0670
Created:February 17, 2006 Updated:March 10, 2006
Description: A buffer overflow in l2cap.c in hcidump allows remote attackers to cause a denial of service (crash) through a wireless Bluetooth connection via a malformed Logical Link Control and Adaptation Protocol (L2CAP) packet.
Alerts:
Debian DSA-990-1 2006-03-10
Ubuntu USN-256-1 2006-02-21
Mandriva MDKSA-2006:041 2006-02-17

Comments (none posted)

BomberClone: remote execution of arbitrary code

Package(s):bomberclone CVE #(s):CVE-2006-0460
Created:February 17, 2006 Updated:March 14, 2006
Description: Stefan Cornelius of the Gentoo Security team discovered multiple missing buffer checks in BomberClone's code. By sending overly long error messages to the game via network, a remote attacker may exploit buffer overflows to execute arbitrary code with the rights of the user running BomberClone.
Alerts:
Debian DSA-997-1 2006-03-13
Gentoo 200602-09 2006-02-16

Comments (none posted)

CASA: buffer overflow

Package(s):CASA CVE #(s):CVE-2006-0736
Created:February 22, 2006 Updated:February 22, 2006
Description: The pam_micasa module from the CASA authentication system suffers from a remotely exploitable buffer overflow. "Since this module is added to /etc/pam.d/sshd automatically on installation of CASA it was possible for remote attackers to gain root access to any machine with CASA installed." If you are using CASA, fixing this one in a hurry would be a good idea.
Alerts:
SuSE SUSE-SA:2006:010 2006-02-22

Comments (none posted)

gnupg: false positive signature verification

Package(s):gnupg CVE #(s):CVE-2006-0455
Created:February 17, 2006 Updated:March 10, 2006
Description: Tavis Ormandy noticed that gnupg, the GNU privacy guard - a free PGP replacement, verifies external signatures of files successfully even though they don't contain a signature at all. See this update from the gnuPG team for more information.
Alerts:
SuSE SUSE-SA:2006:014 2006-03-10
SuSE SUSE-SR:2006:005 2006-03-03
SuSE SUSE-SA:2006:013 2006-03-01
Trustix TSLSA-2006-0008 2006-02-17
SuSE SUSE-SA:2006:009 2006-02-20
Gentoo 200602-10 2006-02-18
OpenPKG OpenPKG-SA-2006.001 2006-02-18
Mandriva MDKSA-2006:043 2006-02-17
Fedora FEDORA-2006-116 2006-02-17
Ubuntu USN-252-1 2006-02-17
Debian DSA-978-1 2006-02-17

Comments (2 posted)

heimdal: remote denial of service

Package(s):heimdal CVE #(s):CVE-2006-0677
Created:February 17, 2006 Updated:February 24, 2006
Description: A remote Denial of Service vulnerability was discovered in the heimdal implementation of the telnet daemon. A remote attacker could force the server to crash due to a NULL de-reference before the user logged in, resulting in inetd turning telnetd off because it forked too fast.
Alerts:
SuSE SUSE-SA:2006:011 2006-02-24
Ubuntu USN-253-1 2006-02-17

Comments (none posted)

metamail: buffer overflow

Package(s):metamail CVE #(s):CVE-2006-0709
Created:February 21, 2006 Updated:March 17, 2006
Description: A buffer overflow bug was found in the way Metamail processes certain mail messages. An attacker could create a carefully-crafted message such that when it is opened by a victim and parsed through Metamail, it runs arbitrary code as the victim.
Alerts:
Gentoo 200603-16 2006-03-17
Debian DSA-995-1 2006-03-13
Mandriva MDKSA-2006:047 2006-02-22
Red Hat RHSA-2006:0217-01 2006-02-21

Comments (none posted)

tar: buffer overflow

Package(s):tar CVE #(s):CVE-2006-0300
Created:February 22, 2006 Updated:April 9, 2006
Description: A buffer overflow (exploitable via a carefully-crafted archive file) has been discovered in GNU tar, versions 1.14 and above.
Alerts:
Fedora-Legacy FLSA:183571-2 2006-04-04
Gentoo 200603-06 2006-03-10
Debian DSA-987-1 2006-03-07
OpenPKG OpenPKG-SA-2006.006 2006-03-05
Red Hat RHSA-2006:0232-01 2006-03-01
Trustix TSLSA-2006-0010 2006-02-24
Ubuntu USN-257-1 2006-02-23
Mandriva MDKSA-2006:046 2006-02-21

Comments (none posted)

tin: buffer overflow

Package(s):tin CVE #(s):CVE-2006-0804
Created:February 19, 2006 Updated:November 24, 2006
Description: An allocation off-by-one bug exists in the TIN news reader version 1.8.0 and earlier which can lead to a buffer overflow.
Alerts:
Gentoo 200611-18 2006-11-24
OpenPKG OpenPKG-SA-2006.005 2006-02-19

Comments (none posted)

tutos: SQL injection and cross-site scripting

Package(s):tutos CVE #(s):CVE-2004-2161 CVE-2004-2162
Created:February 22, 2006 Updated:February 22, 2006
Description: The tutos groupware package has (old) SQL injection and cross-site scripting vulnerabilities.
Alerts:
Debian DSA-980-1 2006-02-22

Comments (none posted)

Updated vulnerabilities

ADOdb: PostgresSQL command injection

Package(s):adodb CVE #(s):CVE-2006-0410
Created:February 6, 2006 Updated:April 17, 2006
Description: Andy Staudacher discovered that ADOdb does not properly sanitize all parameters. By sending specifically crafted requests to an application that uses ADOdb and a PostgreSQL backend, an attacker might exploit the flaw to execute arbitrary SQL queries on the host.
Alerts:
Gentoo 200604-07 2006-04-14
Debian DSA-1031-1 2006-04-08
Debian DSA-1030-1 2006-04-08
Debian DSA-1029-1 2006-04-08
Gentoo 200602-02 2006-02-06

Comments (none posted)

adzapper: denial of service

Package(s):adzapper CVE #(s):CVE-2006-0046
Created:February 9, 2006 Updated:February 15, 2006
Description: If the adzapper proxy advertisement add-on is installed as a squid plugin, it can cause high proxy host CPU resource consumption, resulting in a denial of service.
Alerts:
Debian DSA-966-1 2006-02-09

Comments (none posted)

apache: cross-site scripting

Package(s):apache CVE #(s):CVE-2005-3352
Created:December 14, 2005 Updated:May 10, 2006
Description: Versions 1 and 2 of the apache web server suffer from a cross-site scripting vulnerability in the mod_imap module; see this bugzilla entry for details.
Alerts:
Slackware SSA:2006-129-01 2006-05-10
SuSE SUSE-SR:2006:004 2006-02-24
Fedora-Legacy FLSA:175406 2006-02-18
Gentoo 200602-03 2006-02-06
Fedora FEDORA-2006-052 2006-01-20
Red Hat RHSA-2006:0158-01 2006-01-17
Ubuntu USN-241-1 2006-01-12
Trustix TSLSA-2005-0074 2005-12-23
Mandriva MDKSA-2006:007 2006-01-05
Red Hat RHSA-2006:0159-01 2006-01-05
OpenPKG OpenPKG-SA-2005.029 2005-12-14

Comments (none posted)

auth_ldap: format string vulnerability

Package(s):auth_ldap CVE #(s):CVE-2006-0150
Created:January 10, 2006 Updated:February 28, 2006
Description: The auth_ldap package is an httpd module that allows user authentication against information stored in an LDAP database. A format string flaw was found in the way auth_ldap logs information. It may be possible for a remote attacker to execute arbitrary code as the 'apache' user if auth_ldap is used for user authentication.
Alerts:
Fedora-Legacy FLSA:177694 2006-02-27
Debian DSA-952-1 2006-01-23
Mandriva MDKSA-2006:017 2006-01-19
Red Hat RHSA-2006:0179-01 2006-01-10

Comments (none posted)

blender: integer overflow

Package(s):blender CVE #(s):CVE-2005-4470
Created:January 6, 2006 Updated:June 15, 2006
Description: Damian Put discovered that Blender did not properly validate a 'length' value in .blend files. Negative values led to an insufficiently sized memory allocation. By tricking a user into opening a specially crafted .blend file, this could be exploited to execute arbitrary code with the privileges of the Blender user.
Alerts:
Debian-Testing DTSA-29-1 2006-06-15
Debian DSA-1039-1 2006-04-24
Gentoo 200601-08 2006-01-13
Ubuntu USN-238-2 2006-01-06
Ubuntu USN-238-1 2006-01-06

Comments (none posted)

bzip2: race condition and infinite loop

Package(s):bzip2 CVE #(s):CAN-2005-0953 CAN-2005-1260
Created:May 17, 2005 Updated:January 10, 2007
Description: A race condition in bzip2 1.0.2 and earlier allows local users to modify permissions of arbitrary files via a hard link attack on a file while it is being decompressed, whose permissions are changed by bzip2 after the decompression is complete. Also specially crafted bzip2 archives may cause an infinite loop in the decompressor.
Alerts:
rPath rPSA-2007-0004-1 2007-01-09
Debian DSA-741-1 2005-07-07
Red Hat RHSA-2005:474-01 2005-06-16
OpenPKG OpenPKG-SA-2005.008 2005-06-10
SuSE SUSE-SR:2005:015 2005-06-07
Debian DSA-730-1 2005-05-27
Mandriva MDKSA-2005:091 2005-05-18
Ubuntu USN-127-1 2005-05-17

Comments (2 posted)

ktools: buffer overflow

Package(s):centericq CVE #(s):CVE-2005-3863
Created:December 7, 2005 Updated:August 29, 2006
Description: From the Debian-Testing alert: Mehdi Oudad "deepfear" and Kevin Fernandez "Siegfried" from the Zone-H Research Team discovered a buffer overflow in kkstrtext.h of the ktools library, which is included in (at least) centericq and motor.
Alerts:
Gentoo 200608-27 2006-08-29
Debian DSA-1088-1 2006-06-03
Debian DSA-1083-1 2006-05-31
Gentoo 200512-11 2005-12-20
Debian-Testing DTSA-23-1 2005-12-05

Comments (none posted)

cpio: arbitrary code execution

Package(s):cpio CVE #(s):CVE-2005-4268
Created:January 2, 2006 Updated:May 8, 2007
Description: Richard Harms discovered that cpio did not sufficiently validate file properties when creating archives. Files with e. g. a very large size caused a buffer overflow. By tricking a user or an automatic backup system into putting a specially crafted file into a cpio archive, a local attacker could probably exploit this to execute arbitrary code with the privileges of the target user (which is likely root in an automatic backup system).
Alerts:
rPath rPSA-2007-0094-1 2007-05-07
Red Hat RHSA-2007:0245-02 2007-05-01
Ubuntu USN-234-1 2006-01-02

Comments (none posted)

curl: buffer overflow

Package(s):curl CVE #(s):CVE-2005-4077
Created:December 8, 2005 Updated:March 27, 2006
Description: The curl file transfer utility has a buffer overflow vulnerability in the URL authentication code. If an overly long URL is used, a buffer overflow can result, allowing for local unauthorized access.
Alerts:
Gentoo 200603-25 2006-03-27
Debian DSA-919-2 2006-03-10
Trustix TSLSA-2005-0072 2005-12-16
Red Hat RHSA-2005:875-01 2005-12-20
Gentoo 200512-09 2005-12-16
Ubuntu USN-228-1 2005-12-12
Fedora FEDORA-2005-1137 2005-12-12
Fedora FEDORA-2005-1136 2005-12-12
Debian DSA-919-1 2005-12-12
OpenPKG OpenPKG-SA-2005.028 2005-12-10
Mandriva MDKSA-2005:224 2005-12-08
Fedora FEDORA-2005-1129 2005-12-08
Fedora FEDORA-2005-1130 2005-12-08

Comments (none posted)

cyrus-imapd: buffer overflows

Package(s):cyrus-imapd CVE #(s):CAN-2005-0546
Created:February 23, 2005 Updated:April 9, 2006
Description: Cyrus-imapd, prior to version 2.2.12, contains several buffer overflows which could be exploited by an (authenticated) attacker to run code on the server system.
Alerts:
Fedora-Legacy FLSA:156290 2006-04-04
Red Hat RHSA-2005:408-01 2005-05-17
Fedora FEDORA-2005-339 2005-04-27
OpenPKG OpenPKG-SA-2005.005 2005-04-05
Conectiva CLA-2005:937 2005-03-17
Mandrake MDKSA-2005:051 2005-03-04
Ubuntu USN-87-1 2005-02-28
SuSE SUSE-SA:2005:009 2005-02-24
Gentoo 200502-29 2005-02-23

Comments (none posted)

dia: missing input sanitizing

Package(s):dia CVE #(s):CAN-2005-2966
Created:October 4, 2005 Updated:April 6, 2006
Description: Joxean Koret discovered that the SVG import plugin did not properly sanitize data read from an SVG file. By tricking an user into opening a specially crafted SVG file, an attacker could exploit this to execute arbitrary code with the privileges of the user.
Alerts:
Debian DSA-1025-1 2006-04-06
Mandriva MDKSA-2005:187 2005-10-20
Gentoo 200510-06 2005-10-06
Debian DSA-847-1 2005-10-08
SuSE SUSE-SR:2005:022 2005-10-07
Ubuntu USN-193-1 2005-10-04

Comments (none posted)

elog: multiple vulnerabilities

Package(s):elog CVE #(s):CVE-2005-4439 CVE-2006-0347 CVE-2006-0348 CVE-2006-0597 CVE-2006-0598 CVE-2006-0599 CVE-2006-0600
Created:February 10, 2006 Updated:February 15, 2006
Description: Several security problems have been found in elog, an electronic logbook to manage notes.
Alerts:
Debian DSA-967-1 2006-02-10

Comments (none posted)

emacs21: format string vulnerability in "movemail"

Package(s):emacs21 CVE #(s):CAN-2005-0100
Created:February 7, 2005 Updated:May 15, 2006
Description: Max Vozeler discovered a format string vulnerability in the "movemail" utility of Emacs. By sending specially crafted packets, a malicious POP3 server could cause a buffer overflow, which could be exploited to execute arbitrary code with the privileges of the user and the "mail" group.
Alerts:
Fedora-Legacy FLSA:152898 2006-05-12
Debian DSA-685-1 2005-02-17
Mandrake MDKSA-2005:038 2005-02-15
Gentoo 200502-20 2005-02-15
Fedora FEDORA-2005-146 2005-02-14
Fedora FEDORA-2005-145 2005-02-14
Red Hat RHSA-2005:133-01 2005-02-15
Red Hat RHSA-2005:110-01 2005-02-15
Red Hat RHSA-2005:134-01 2005-02-10
Red Hat RHSA-2005:112-01 2005-02-10
Fedora FEDORA-2005-116 2005-02-08
Fedora FEDORA-2005-115 2005-02-08
Debian DSA-671-1 2005-02-08
Debian DSA-670-1 2005-02-08
Ubuntu USN-76-1 2005-02-07

Comments (none posted)

enscript: arbitrary code execution

Package(s):enscript CVE #(s):CAN-2004-1184 CAN-2004-1185 CAN-2004-1186
Created:January 21, 2005 Updated:May 27, 2006
Description: Erik Sjölund has discovered several security relevant problems in enscript, a program to convert ASCII text into Postscript and other formats. Unsanitized input can cause the execution of arbitrary commands via EPSF pipe support. Due to missing sanitizing of filenames it is possible that a specially crafted filename can cause arbitrary commands to be executed. Multiple buffer overflows can cause the program to crash.
Alerts:
rPath rPSA-2006-0083-1 2006-05-26
Fedora-Legacy FLSA:152892 2005-12-17
Red Hat RHSA-2005:040-01 2005-02-15
Mandrake MDKSA-2005:033 2005-02-10
Gentoo 200502-03 2005-02-02
Red Hat RHSA-2005:039-01 2005-02-01
Fedora FEDORA-2005-096 2005-01-31
Fedora FEDORA-2005-092 2005-01-28
Fedora FEDORA-2005-091 2005-01-28
Fedora FEDORA-2005-016 2005-01-26
Fedora FEDORA-2005-015 2005-01-26
Ubuntu USN-68-1 2005-01-24
Debian DSA-654-1 2005-01-21

Comments (none posted)

evolution: format string issues

Package(s):evolution CVE #(s):CAN-2005-2549 CAN-2005-2550
Created:August 15, 2005 Updated:March 23, 2006
Description: Evolution has format string issues. SITIC advisory SA05-001 contains more information.
Alerts:
Debian DSA-1016-1 2006-03-23
SuSE SUSE-SA:2005:054 2005-09-16
Red Hat RHSA-2005:267-01 2005-08-29
Gentoo 200508-12 2005-08-23
Mandriva MDKSA-2005:141 2005-08-17
Fedora FEDORA-2005-742 2005-08-11
Fedora FEDORA-2005-743 2005-08-11

Comments (2 posted)

fetchmail: multidrop bug

Package(s):fetchmail CVE #(s):CVE-2005-4348
Created:December 20, 2005 Updated:May 27, 2006
Description: Fetchmail contains a bug which allows a malicious mail server to crash the client by sending a message without headers. This occurs when running in multidrop mode.
Alerts:
rPath rPSA-2006-0084-1 2006-05-26
Fedora-Legacy FLSA:164512 2006-05-12
Slackware SSA:2006-045-01 2006-02-15
Debian DSA-939-1 2006-01-13
Ubuntu USN-233-1 2006-01-02
Mandriva MDKSA-2005:236 2005-12-23
Fedora FEDORA-2005-1187 2005-12-20
Fedora FEDORA-2005-1186 2005-12-20

Comments (none posted)

ffmpeg: buffer overflow

Package(s):ffmpeg CVE #(s):CVE-2005-4048
Created:December 15, 2005 Updated:March 17, 2006
Description: The avcodec_default_get_buffer() function of the ffmpeg library has a buffer overflow vulnerability. A user can be tricked into playing a maliciously created PNG movie, allowing the attacker to run arbitrary code with the user's privileges.
Alerts:
Debian DSA-1005-1 2006-03-16
Debian DSA-1004-1 2006-03-16
Debian DSA-992-1 2006-03-10
Gentoo 200603-03 2006-03-04
Gentoo 200602-01 2006-02-05
Gentoo 200601-06 2006-01-10
Ubuntu USN-230-2 2005-12-16
Ubuntu USN-230-1 2005-12-14
Mandriva MDKSA-2005:228 2005-12-14
Mandriva MDKSA-2005:229 2005-12-14
Mandriva MDKSA-2005:232 2005-12-14
Mandriva MDKSA-2005:230 2005-12-14
Mandriva MDKSA-2005:231 2005-12-14

Comments (none posted)

firefox: multiple vulnerabilities

Package(s):firefox CVE #(s):CAN-2005-2701 CAN-2005-2702 CAN-2005-2703 CAN-2005-2704 CAN-2005-2705 CAN-2005-2706 CAN-2005-2707 CAN-2005-2968
Created:September 22, 2005 Updated:February 15, 2006
Description: The Firefox browser has multiple vulnerabilities including problems with XBM image file processing, Unicode sequence processing, XMLHttp requests, malicious XBL binding, a JavaScript engine buffer overflow, about: pages, opening of new windows, and command line URL processing.
Alerts:
Slackware SSA:2006-045-02 2006-02-15
Fedora-Legacy FLSA:168375 2006-01-09
Ubuntu USN-200-1 2005-10-11
Ubuntu USN-155-3 2005-10-04
Debian DSA-838-1 2005-10-02
Gentoo GLSA 200509-11:02 2005-09-18
SuSE SUSE-SA:2005:058 2005-09-30
Mandriva MDKSA-2005:170 2005-09-26
Mandriva MDKSA-2005:169 2005-09-26
Slackware SSA:2005-269-01 2005-09-26
Fedora FEDORA-2005-934 2005-09-26
Fedora FEDORA-2005-933 2005-09-26
Fedora FEDORA-2005-932 2005-09-26
Fedora FEDORA-2005-931 2005-09-26
Fedora FEDORA-2005-930 2005-09-26
Fedora FEDORA-2005-929 2005-09-26
Fedora FEDORA-2005-928 2005-09-26
Fedora FEDORA-2005-927 2005-09-26
Fedora FEDORA-2005-926 2005-09-26
Ubuntu USN-186-2 2005-09-25
Ubuntu USN-186-1 2005-09-23
Red Hat RHSA-2005:789-01 2005-09-22
Red Hat RHSA-2005:785-01 2005-09-22

Comments (none posted)

Foomatic: Arbitrary command execution in foomatic-rip

Package(s):foomatic CVE #(s):CAN-2004-0801
Created:September 20, 2004 Updated:May 31, 2006
Description: There is a vulnerability in the foomatic-filters package. This vulnerability is due to insufficient checking of command-line parameters and environment variables in the foomatic-rip filter. This vulnerability may allow both local and remote attackers to execute arbitrary commands on the print server with the permissions of the spooler.
Alerts:
SuSE SUSE-SA:2006:026 2006-05-30
Fedora-Legacy FLSA:2076 2004-11-05
Conectiva CLA-2004:880 2004-10-27
Fedora FEDORA-2004-303 2004-09-21
Gentoo 200409-24 2004-09-20

Comments (none posted)

gaim: buffer overflow

Package(s):gaim CVE #(s):CAN-2005-2103
Created:August 10, 2005 Updated:February 27, 2006
Description: Gaim suffers from a heap-based buffer overflow which can be exploited via a hostile "away message" to execute arbitrary code.
Alerts:
Fedora-Legacy FLSA:158543 2006-02-25
Slackware SSA:2005-242-03 2005-08-31
Fedora FEDORA-2005-751 2005-08-17
Fedora FEDORA-2005-750 2005-08-17
Mandriva MDKSA-2005:139 2005-08-15
Gentoo 200508-06 2005-08-15
Ubuntu USN-168-1 2005-08-12
Red Hat RHSA-2005:589-01 2005-08-09

Comments (none posted)

gdb: multiple vulnerabilities

Package(s):gdb CVE #(s):CAN-2005-1704 CAN-2005-1705
Created:May 20, 2005 Updated:August 11, 2006
Description: Tavis Ormandy of the Gentoo Linux Security Audit Team discovered an integer overflow in the BFD library, resulting in a heap overflow. A review also showed that by default, gdb insecurely sources initialization files from the working directory. Successful exploitation would result in the execution of arbitrary code on loading a specially crafted object file or the execution of arbitrary commands.
Alerts:
Red Hat RHSA-2006:0354-01 2006-08-10
Red Hat RHSA-2006:0368-01 2006-07-20
Mandriva MDKSA-2005:215 2005-11-23
Fedora FEDORA-2005-1033 2005-10-27
Fedora FEDORA-2005-1032 2005-10-27