LWN.net Weekly Edition for May 13, 2010
Adventures in Linux gaming
It has been an interesting week in the world of Linux games—really in the intersection of Linux and commercial games. First was the announcement of the release of the source code that underlies the Ryzom massively multi-player online role playing game (MMORPG). In addition, though, is word that the Humble Indie Bundle, a collection of cross-platform games being sold using a novel method, generated over $1 million in a week's time, with roughly a quarter of it coming from Linux users. It has long been said that there is no market for Linux commercial games, but these two events may shine a light on different business models that just might be successful.
Humble or successful?
The basic idea behind the Humble Indie Bundle is to take five (eventually
six) games developed outside of the major game studios ("indie"),
package them together, and allow the customer to set the price. All of the
games (World of Goo, Aquaria, Gish, Lugaru HD, Penumbra Overture, and
Samorost 2—the latter was donated to the bundle a few days later) are DRM-free: "Feel free to play them without an internet
connection, back them up, and install them on all of your Macs and PCs
freely.
" They are cross-platform for Linux, MacOS X, and Windows as
well. But sponsor Wolfire Games and
the other game creators took it a step further and split the proceeds with
two charities.
By default, whatever price is chosen will be split seven ways (five games plus two charities), but the buyer can change the allocation any way they choose. The two charities are Child's Play, which provides toys, games, and books for children in hospitals and the Electronic Frontier Foundation (EFF). Assuming an even split, each organization and game developer has brought in more than $150,000 since the promotion started on May 4.
Linux buyers account for around 14% of the purchases, but, interestingly, account for 23% of revenue as reported on May 7. Wolfire Games has been a strong advocate of cross-platform games, as it believes there is money to be made from Mac and Linux games. While the success of the bundle may not be repeatable exactly, it should give hope to game developers that there is money out there for cross-platform games, and to players on non-Windows platforms that there will be more games available.
A quick look at two of the games showed them to be fairly interesting,
certainly worth looking into further when some grumpy guy isn't yammering
on about some sort of deadline. One of the two, Lugaru has been released
as free software under the GPLv2. Anyone lacking an
"anthropomorphic rebel bunny rabbit with impressive combat
skills
" in their life is encouraged to check out the source or the game itself.
Ryzom
The Ryzom MMORPG has had a history of, almost, becoming open source, starting back in 2006, when the Free Ryzom Campaign tried to buy the assets of the original developer, Nevrax, which had fallen into bankruptcy. Then in 2008 it looked like there might be another opportunity to acquire Ryzom via bankruptcy proceedings, but that didn't happen either. But on May 6, the current owner, Winch Gate Properties Ltd, announced that the server and client code, along with thousands of textures and 3D objects, were being released under the Affero GPLv3 (code) and Creative Commons Attribution-Sharealike (artwork and objects) licenses.
According to Winch Gate CTO Vianney Lecroart, after acquiring Ryzom, the
company first focused
on getting it up and running: "We just
had 30 hard drives and we had to scan all them, buy servers, configure [them],
reconnect everything, it was very hard and long process.
" At first,
Ryzom was free to play, while Winch Gate got the billing system working,
and then switched back to a "pay to play" model. After that, it spent some
time making things more stable, reworking the "starting island to
make it easier to understand
" and adding the Kitin's Lair area for
more experienced players, he said.
The reason it is being open sourced now, Lecroart said, is because "we wanted to
focus first on players
", and now that is done, so it could turn to
freeing the code. He continued:
In addition, in just a week since the release, there have been patches
submitted that Winch Gate applied "as fast as we can
". The
roadmap on the development portal shows a release
expected in July that will concentrate on build tools and packaging, and
another in November that will focus getting the current Windows-only client
working for Linux and MacOS X. The current client will run under Wine and
the roadmap mentions a Linux native version that has been compiled and
"works
".
None of the Ryzom world data is part of the release, so those who want to run their own server—already available for Linux—will need to create their own world. Existing players could be harmed by the release of the world data as it would give others a potential leg up on the locations of interesting places or, more importantly, loot. There might also be a "spoiler" effect that could take away much of the fun of playing the game. But lack of world data does make it rather difficult to get started. Another problem is that the world building tools are all Windows-only and, because they use Windows-specific libraries and APIs, will be difficult to port. Currently the roadmap shows those being available as web-based tools in June 2011.
Winch Gate has put up a small instance of the Ryzom server, OpenShard which is
free to "connect, tweak, and hack [on]
", Lecroart said. In
addition, the current state page
lists various community members who have the server up and running. "It's now up to them to add some content or do what they want on their
server
", he said.
The Free Software Foundation, who had pledged $60,000 to the original Free
Ryzom effort, applauded the
release and suggested ways that free software developers could get
involved. The 13G of textures and 3D objects was of particular interest
because they "can be adapted and used in other games
". In
addition, the FSF suggests that making Blender and other free software 3D
modeling tools work with the Ryzom engine would be a worthwhile effort.
The "Help Us" page does not mention any kind of copyright assignment being required, nor does the Developer FAQ. Given the history of Ryzom—bouncing around from company to company, typically via bankruptcy—it's good to see that there won't be any organization that can make a proprietary fork. The AGPL also ensures that anyone using the engine to provide a service—game world—is required to release their code changes back to the community.
Linux and games
It is clear that Winch Gate hopes to gain some publicity—and Ryzom players—by freeing its code. It also seems like it is genuinely interested in what the community will do with the code, artwork, and objects. One would have to guess that the Ryzom player community is fairly small, given the various upheavals along the way, so the risk to Winch Gate is quite low. In the meantime, the community gets a chance to play with a professional MMORPG engine; it's anyone's guess where that will lead. Perhaps Winch Gate is hoping someday to run contract servers for a game world created by the community.
The Humble Indie Bundle has certainly raised the profile of Wolfire and the games that were included. World of Goo has made something of a name for itself in the Linux world—perhaps partially because Ted Ts'o mentioned it during the ext4 delayed allocation mess—but the others were flying under the radar. No more. It will be interesting to see where that leads as well.
What's new in GCC 4.5?
Version 4.5 of the GNU Compiler Collection was released in mid-April with many changes under-the-hood, as well as a few important user-visible features. GCC 4.5 promises faster programs using the new link-time optimization (LTO) option, easier implementation of compiler extensions thanks to the controversial plugin infrastructure, stricter standards-conformance for floating-point computations, and better debugging information when compiling with optimizations.
The GNU Compiler Collection is one of the oldest free software projects still around. Version 1.0 of GCC was released in 1987. More than twenty years later, GCC is still under active development and each new version is adding important features. Supporting these new features in such an old codebase often requires major rewriting of substantial parts of GCC. GCC 4.0 was an important milestone in this regard, and GCC internals are still evolving at a rapid pace. However, these core improvements are sometimes not clearly visible as improvements for users. This is not the case in GCC 4.5. This article describes four new features in GCC 4.5, and also looks at an internal feature that may radically change how GCC is developed in the future.
Link-Time Optimization
Perhaps the most visible of the new features in GCC 4.5 is the Link-Time Optimization option: -flto. When source files are compiled and linked using -flto, GCC applies optimizations as if all the source code were in a single file. This allows GCC to perform more aggressive optimizations across files, such as inlining the body of a function from one file that is called from a different file, and propagating constants across files. In general, the LTO framework enables all the usual optimizations that work at a higher level than a single function to also work across files that are independently compiled.
The LTO option works almost like any other optimization flag. First, one needs to use optimization (using one of the -O{1,2,3,s} options). In cases where compilation and linking are done in a single step, adding the option -flto is sufficient
gcc -o myprog -flto -O2 foo.c bar.c
This effectively deprecates the old -combine option, which was too slow in practice and only supported for C.
With independent compilation steps, the option -flto must be specified at all steps of the process:
gcc -c -O2 -flto foo.c
gcc -c -O2 -flto bar.c
gcc -o myprog -flto -O2 foo.o bar.o
An interesting possibility is to combine the options -flto and -fwhole-program. The latter assumes that the current compilation unit represents the whole program being compiled. This means that most functions and variables are optimized more aggressively. Adding -fwhole-program in the final link step in the example above, makes LTO even more powerful.
When using multiple steps, it is strongly recommended to use exactly the same optimization and machine-dependent options in all commands, because conflicting options during compilation and link-time may lead to strange errors. In the best case, the options used during compilation will be silently overridden by those used at link-time. In the worst case, the different options may introduce subtle inconsistencies leading to unpredictable results at runtime. This, of course, is far from ideal, and, hence, in the next minor release, GCC will identify such conflicting options and provide appropriate diagnostics. Meanwhile, some extra care should be taken when using LTO.
The current implementation of LTO is only available for ELF targets, and, hence, LTO is not available in Windows or Darwin in GCC 4.5. However, the LTO framework is flexible enough to support those targets and, in fact, Dave Korn has recently proposed a patch that adds LTO support for Windows to GCC 4.5.1 and 4.6, and Steven Bosscher has done the same for Darwin.
Finally, another interesting ongoing project, called whole program optimization [PDF], aims to make LTO much more scalable for very large programs (on the order of millions of functions). Currently, when compiling and linking with LTO, the final step stores information from all files involved in the compilation in memory. This approach does not scale well if there are many large files. In practice, there may be little interaction between some files and the information required could be partitioned and optimized independently, with little performance loss, or at least gracefully degrading the effectiveness of LTO depending on existing resources. The experimental -fwhopr option is a first step in this direction, but this feature is still under development and even the name of the option is likely to change. Therefore, GCC 4.6 will probably bring further improvements in this area.
Plugins
Another long-awaited feature is the ability to load user code as plugins that modify the behaviour of GCC. A substantial amount of controversy surrounded the implementation of plugins. The possibility of proprietary plugins was probably the main factor stalling the development of this feature. However, the FSF recently reworked the Runtime Library Exception in order to prevent proprietary plugins. With the new Runtime Library Exception in place, the development of the plugins framework progressed rapidly. This, however, did not completely end the controversy surrounding plugins, and while some developers think that plugins are essential for the future of GCC and for attracting new users and contributors, others fear that plugins may divert efforts from improving GCC itself.
The plugin framework of GCC can work in principle on any system that supports dynamic libraries. In GCC 4.5, however, plugins are only supported on ELF-based platforms, that is, most Unix-like systems, but not Windows or Darwin. A plugin is loaded with the new option -fplugin=/path/to/file.so. GCC makes available a series of events for which the plugin code can register its own callback functions. The events already implemented in GCC 4.5 allow plugins to interact with the pass manager to add, reorder and remove optimization passes dynamically, modify the low level representation used by C and C++ front-ends, add new custom attributes and compiler pragmas, and other possibilities described in the internal documentation.
Despite plugins being a new feature in GCC 4.5, several projects are already making use of the plugins support. Among these projects is Dehydra, the static analysis tool for C++ developed by Mozilla; and MELT, a framework for writing optimization passes in a dialect of LISP. Also, the ICI/MILEPOST research project strongly relies on the new plugins framework in GCC 4.5.
Variable Tracking at Assignments
The Variable Tracking at Assignments (VTA) project aims to improve debug information when optimizations are enabled. When GCC compiles some code with optimizations enabled, variables are renamed, moved around, or even completely removed. When debugging such code and trying to inspect the value of some variable, the debugger would often report that the variable has been optimized out. With VTA enabled, the optimized code is internally annotated in such a way that optimization passes transparently keep track of the value of each variable, even if the variable is moved around or removed.
A small example of the differences between debug information in GCC 4.5 and previous releases is the following program:
typedef struct list {
struct list *n;
int v;
} *node;
node find_prev (node c, node w)
{
while (c) {
node opt = c;
c = c->n;
if (c == w)
return opt;
} return NULL;
}
Variable opt is removed when compiling with
optimization. Hence, in previous GCC versions, or when compiling
without VTA, one cannot inspect the value of opt even at
the highest debugging level. In GCC 4.5, however, VTA enables
inspection of the value of all variables at all points of the function.
The effect of VTA is even more noticeable for inlined functions. Before VTA, optimizations would often completely remove some arguments of an inlined function, making it impossible to inspect their values when debugging. With VTA, these optimizations still take place, however, appropriate debug information is generated for the missing arguments.
Finally, the VTA project has brought another feature, the new
-fcompare-debug option, which tests that the code
generated by GCC with and without debug information is identical. This
option is mainly used by GCC developers to test the compiler, but it
may be useful for users to check that their program is not affected by a
bug in GCC, though at a significant cost in compilation
time.
Standard conforming excess precision
Perhaps the most reported bug in GCC is bug 323. The symptoms appear when different optimization levels produce different results in floating-point computations, and when two ways of performing the same calculation do not produce the same result. Although this is an inherent limitation of floating-point numbers, users are still surprised that different optimization levels lead to highly different results. One of the main culprits of the problem is the excess precision arising from the use of the x87 floating-point unit (FPU). That is, operations performed in the FPU have more precision than double precision numbers stored in memory. Hence, the final result of a computation may significantly depend on whether intermediate operations are stored in the FPU or in memory.
This leads to some unexpected and counter-intuitive results. For example, the same piece of code may produce different results using the same compilation flags and the same machine depending on changes of seemingly unrelated code, because the unrelated code forces the compiler to save some intermediate result in memory instead of keeping it in a FPU register. One workaround to this behavior is the option -ffloat-store, which stores every floating-point variable in memory. This has, however, a significant cost in computation time. A more fine-grained workaround is to use the volatile qualifier in variables suffering from this problem.
While this problem will never be solved in computers with inexact representation of floating-point numbers, GCC 4.5 helps improve the situation by adding a new option -fexcess-precision=standard, currently only available for C, that handles floating-point excess precision in a way that conforms to ISO C99. This option is also enabled with standards conformance options such as -std=c99. However, standards-conforming precision incurs an extra cost in computation time. Therefore, users more interested in speed may wish to disable this behavior using the option -fexcess-precision=fast.
C++ compatible
GCC 4.5 is the first release of GCC that can be compiled with a C++ compiler. This may not seem very interesting or useful at the moment (but take a look at the much improved -Wc++-compat option). However, this is only the first step of an ongoing project to use C++ as the implementation language of GCC. Except for some front-end bits written in other languages, notably Ada, most of GCC is implemented in C. The internal structures of GCC are under a continuous improvement and modularization aimed at creating cleaner interfaces, and many GCC developers think that this work would be easier using C++ than C. However, this proposal is not free of controversy, and it is not clear whether the switch would occur in GCC 4.6, later, or ever.
Other improvements
The above are only some examples of the many improvements and new features in GCC 4.5. A few other features that are worth mentioning:
- GCC now makes better use of the information provided by the restrict keyword, which is also supported in C++ as an extension, to generate better optimized code.
- The libstdc++ profile mode tries to identify suboptimal uses of the standard C++ library, and suggest alternatives that improve performance.
- Previous versions of GCC incorporated the MPFR library in order to consistently evaluate math functions with constant arguments at compile time. GCC 4.5 extends this feature to complex math functions by incorporating the MPC library.
- Many improvements have been made in the specific language front-ends, in particular from the very active Fortran front-end project. Also worth mentioning is the increasing support for the upcoming ISO C++ standard (C++0x)
Conclusion
We are living interesting times on the compiler front, and GCC 4.5 is an indication that we can still expect new developments in the future. The release of GCC 4.5 brings to its users several important, and somewhat controversial, features. It also includes the typical long list of small fixes and improvements, where most will be able to find at least one thing to their liking. GCC 4.5 may well be a transition point, where the foundational work that has been done during the 4.x release series is starting to show up in user-visible features that would have been impossible in the GCC 3.x release series. It is difficult to say at this moment what GCC 4.6 will bring us in a year from now, as it will depend on what the contributors decide. Anyone can contribute to the future of GCC. This is free software after all.
Acknowledgments
I would like to thank in general the community of GCC developers, and in particular, Ian Lance Taylor, Diego Novillo, and Alexandre Oliva, for their helpful comments and suggestions when writing this article.
Of hall monitors and slippery slopes
Since its inception in July of 2009, the Fedora Hall Monitor
policy has had mixed reviews. The intent of the policy is to promote
more civil discourse on various Fedora mailing lists—to embody the
"be excellent to each other
" motto that is supposed to govern
project members' behavior. Questions were raised about the recent "hall
monitoring" of a thread on fedora-devel, because, instead of the usual
reasons for stopping a thread—personal attacks, profanity, threats of
violence, and the like—it was stopped, essentially, for going
on too long.
Kevin Kofler's open letter about why he was not going to run again for a seat on the Fedora Engineering Steering Committee (FESCo) was the starting point of the problem thread. But the focus of the discussion was mostly on the update process for Fedora, something which has been roiling the Fedora waters for several months now. Kofler strongly believes that the proposals requiring more "karma"—votes in favor, essentially—in the bodhi package management system before pushing out updates are simply bureaucratic in nature and won't prevent problems with updates. Other FESCo members, apparently the vast majority of them, disagree. As FESCo member Matthew Garrett put it:
But Kofler believes that package maintainers should be able to make these decisions, without hard and fast testing requirements imposed by FESCo, or the Fedora Packaging Committee (FPC). Kofler and others are quite happy with the status quo, whereas other community members—both FESCo and not—see that problems with upgrades are giving the project something of a black eye. Kofler is adamant in his response to Garrett:
Most of these arguments are familiar to those who follow fedora-devel. The participants in the discussion are often the same and the positions they take are fairly predictable. But the content was on-topic and the discourse wasn't descending into personal attacks or insults, so it was something of a surprise to many when hall monitor Seth Vidal stepped in and closed the thread:
No further posts to this thread will be allowed.
The last line turns out to have been somewhat premature as the thread continued, only it switched to focus on the hall monitors' decision. Toshio Kuratomi asked how the Hall Monitor policy—which is undergoing some changes as a result of this issue—could be applied to redundant threads:
Vidal quoted a blanket provision in the
policy that allows thread closure posts for "aggressive or
problematic mailing list threads
" as the reason the action was
taken. That didn't sit well with a number of folks. Kofler complained: "This vague paragraph can be abused to justify censoring pretty much
everything.
" Adam Williamson had a more detailed analysis:
At least, that's how I always assumed it was intended when the policy came in, and I'm not at all sure I'm okay with a policy which says 'hall monitors can shut down any discussion they choose for any reason they like'.
Evidently, three
users and two hall monitors had complained
about the thread, which was enough to constitute "repeated
complaints
". But, because the topic had (mostly) shifted away from
the update process and into things like hall monitoring and Fedora's
"purpose" (or goal, i.e. "what is Fedora for?"), it was allowed to
continue. In the end, the "thread closure" led to roughly doubling the
size of a thread which may—or may not—have been winding down on
its own.
In a post to fedora-advisory-board, Kuratomi requested that the board look into the issue
with an eye toward clarifying the policy. He suggested three ways to
resolve the issue: restricting the hall monitors' remit to just insults and
personal attacks, specifically calling out redundant threads as an area for
the hall monitors to police, or allowing thread closures based on the
number of complaints received. Kuratomi is in favor of the first option, "as the others are taking us
too far into the realm of giving a few people the power to decide what is
and is not useful communication.
"
At its May 6 meeting,
the board did discuss the issue. While it is clear that several board
members are not in favor of having hall monitors, and were surprised when
this particular thread was "clipped off
", as Mike McGrath put
it, there is more to the problem than just the policy. At its core, the
problem is that Fedora is still struggling with its identity.
Some community members would like to see Fedora be a well-polished desktop distribution that gets released every six months and is relatively stable from there—a la Ubuntu. Others see Fedora as a refuge for those who don't like the Ubuntu approach, want to get frequent package updates, and live closer to the "bleeding edge". It is, at the very least, difficult for one distribution to support both of those models, but in some sense that is what Fedora is currently trying to do.
Because the project hasn't made a firm commitment to a particular direction, at least one to the exclusion of the other, there are advocates on both sides who are trying hard to pull the distribution in the direction they want. Kofler is loudly, and repetitively, making his case that Fedora will lose a sizable chunk of its users and contributors if it becomes more conservative about updates. Others argue that update woes are driving users and contributors away.
McGrath is firmly in the camp that Fedora should first decide what it is
and what its goals are, and then ask those who are "chronically
unhappy
" with that direction to leave the project. That would lead
to less contentious mailing list threads among other things. It's a hard
problem, he said, and "we don't want everyone who's unhappy with
Fedora to leave
"
In a discussion that lasted for more than an hour, the board looked at
various facets of the problems, but hall monitor Josh Boyer brought it back
to the particular thread in question. He asked if there was "anyone on
the Board
that thinks the recent hall monitor action was inappropriate
". Matt
Domsch and McGrath were both surprised at the action, while John
Poelstra was not, and the rest of the board was non-committal.
No one said that they found the action inappropriate, but Domsch suggested
that the board recommend
"that hall monitors provide additional latitude to long threads that
may be redundant, but that aren't violent
".
Poelstra wanted to see some "overall objectives for having this
policy
" added to the policy document as well. Both he and Domsch
took action items to edit
the policy for board approval at its next meeting on May 13. The changes
that were made seem much in keeping with what the board members were
saying, so it seems likely that the board will approve them.
Seemingly arbitrary thread closures are clearly a concern to some in the community. Trying to determine which threads are "making progress" versus those that are just repetitive is difficult—and extremely likely to be contentious. While the goals of the hall monitor policy are generally good, it isn't clear that making decisions on specific threads to try to stop discussions getting "out of hand" is a good way forward. It is something of a "slippery slope". There are too many fine lines that need to be drawn—and then challenged by dissenters—that it may just be an exercise in futility.
For the current problem thread, at least, the real underlying issues have yet to be completely addressed. As Fedora moves toward implementing the new packaging rules, which may slow down the usual Fedora update stream, the decline in users and contributors that Kofler envisions may occur. The opposite could happen as well. Only time will tell.
Security
ClamAV 0.96 adds executable virus signatures and more
Version 0.96 of the open source virus scanner Clam AntiVirus (ClamAV) was released in April, bringing with it support for new file formats, better signatures, and several major new features — such as the first official support for Windows. It also includes an entirely new method for virus signature authors to write the detection schemes at the heart of ClamAV, using a C-like language run in a bytecode interpreter. Finally, the project issued an update to the official virus database that disabled outdated and incompatible versions of the software.
ClamAV is one of the most popular anti-virus products running on Linux, in large part due to its easy integration with Linux server software. ClamAV runs as a daemon, and accepts local and TCP connections to scan files against its virus database. As such, it is a popular choice for Linux email and file servers. Tools also exist for desktop Linux machines, and the daemon has long run on other Unix-like operating systems. Apple has even included it in OS X since version 10.4.
New features
ClamAV 0.96 adds support for scanning several important new file formats, such as InstallShield, Cpio, and 7-Zip archive files, and 64-bit ELF, UPX 3.0, and OS X Mach-O universal binary executables. The scanner can now also detect another common deception technique: packaging Windows viruses with phony Portable Executable (PE) headers and icons. The new release also includes improved wildcard-matching in virus signatures, and supports DazukoFS, which is a "stackable" filesystem designed to facilitate virus scanning. It sits on top of an existing filesystem and implements file access control in user space by allowing a process to permit or block access to particular files based on their contents.
0.96 also introduces a "Personal Stats" feature, which allows ClamAV users to remotely track their specific installation's malware detection statistics. The project already keeps anonymous global statistics of ClamAV detections, which uploads the names of recently-found malware when checking for database updates. The personal stats option requires the user to actively create a host ID on the ClamAV server, which is then copied to the ClamAV configuration file and included in subsequent upstream reports.
ClamAV's freshclam service allows installations to check for updates to the official virus database over the Internet, several times per hour, and to download incremental updates. That functionality was at the root of the need to disable very old ClamAV instances with the release of 0.96.
Version 0.94 and older contained a bug in freshclam which failed to build the updated virus database if an incremental update contained a virus signature longer than 980 bytes. It was still possible for clients to download the full database, but the project was concerned that the traffic generated would tax the ClamAV servers excessively. The bug was fixed for 0.95, and users were warned six months in advance that on April 15, 2010, the database would be updated with a special signature that disabled installations still running 0.94 or older code.
More importantly than the bandwidth hit of clients attempting
full-database retrievals — though there were no virus signatures
longer than 980 bytes prior to 0.96's release — that limit prevented the creation of the new "logical signatures
" at the core of ClamAV 0.96's other major enhancement, the bytecode interpreter.
Byte codes
0.96's bytecode engine is the new release's most fundamental change, and has sparked its share of controversy. In previous releases, the creators of the virus signatures stored in ClamAV's database were limited to pattern-matching techniques to recognize malware. With the bytecode engine, signature creators can now develop "logical
" signatures that involve heuristics, complex routines, and even unpacking file contents for examination. It also theoretically allows signature creators to examine new file formats without waiting for the main ClamAV program to support them explicitly.
ClamAV can run bytecode-engine signatures through a built-in interpreter or through a Just-In-Time (JIT) compiler built with LLVM. The syntax of the signature definition language is described as "C-like
", and although it has not been formally described in the project documentation, it is partially described in the ClamAV code itself inside the bytecode_api.h header file.
Understandably, when the feature was first announced during the 0.96 development cycle, several in the ClamAV community were uneasy about the ability to incorporate executable code in malware-detection signatures, and even attempted to deactivate the feature.
The developers responded with an explanation of the security measures taken to protect hosts from malicious or problematic routines in bytecode signatures. First, all bytecode distributed by the project will come with embedded source code that can be examined by the user with the clambc utility. Second, all bytecodes in the virus database will be cryptographically signed by the project to verify their integrity. Third, bytecodes themselves have access only to the limited ClamAV API, cannot access system calls or memory, and can only read from the currently-scanned file. Finally, bounds-checking and other security measures are inserted by the compiler and by LibClamAV itself. In addition, the entire feature can be deactivated with a simple line in the freshclam.conf configuration file.
Windows
With 0.96, ClamAV builds on Windows using Visual Studio for the first time. This means that the daemon and server-side tools should work on Windows machines just as they do on all Unix-based operating systems. By itself, the basic ClamAV package allows on-demand scanning with a command-line tool, but does not implement an on-access scanning service (i.e., automatically scanning files whenever they are read or written). On Unix systems, implementing this functionality has always been the domain of the third-party mail or file server code that connects to the ClamAV daemon.
In addition to building the server utilities on Windows, however, the project also announced the availability of an official graphical Windows client-side product. The appropriately-named ClamAV for Windows implements on-access scanning, but, intriguingly, it does not run on the Windows client computer itself. Rather, it connects to a cloud-based ClamAV service run by security company Immunet.
The client sends an SHA hash and file heuristics for each accessed file to the Immunet cloud, where it is scanned against the ClamAV database, and against other detection resources run by Immunet. A ClamAV for Windows FAQ page addresses several security concerns vital to this technique, assuring users that heuristics are only sent to Immunet for executable files, not documents, and points to Immunet's privacy policy.
ClamAV for Windows is a free service, although the source code to the Windows front-end and to Immunet's cloud backend are not open source. ClamAV assures users that in spite of this, the project has no intention of deviating from the GPL for releases of ClamAV itself.
There have been other, unofficial Windows clients for ClamAV in the past. At present, the most popular is ClamWin, which does not itself provide on-access scanning, though that feature can be added through the use of Clam Sentinel.
Moving forward
Bytecode-based virus signatures are provided in their own database, bytecode.cvd, and thus far it is quite small: only three as of May 11th. But it is clearly the way forward for the project. The old system's pattern-matching approach was very limited, and is at least in part responsible for ClamAV's lower performance than the well-funded proprietary virus scanners.
Nevertheless, judging by the response on the mailing list, the added feature may not be an immediate hit with ClamAV users, especially considering how security-conscious they are as a group. Similar wariness is probably to be expected about the cloud-based ClamAV for Windows product, though over privacy rather than security concerns alone.
ClamAV has very little active competition in the open source anti-virus marketplace. Perhaps that is due to the "scratch-your-own-itch" mentality in the Linux and open source communities, which have never seen the level of virus and malware problems still found in Windows. Consequently, it may be that the most important new bullet point of ClamAV's 0.96 release is the project's ability to build on Windows itself. That will attract more developers who will build the kinds of add-ons for client and server software that the project needs to grow and evolve further.
Brief items
Quotes of the week
New vulnerabilities
amsn: man-in-the-middle attack
| Package(s): | amsn | CVE #(s): | CVE-2010-0744 | ||||||||
| Created: | May 10, 2010 | Updated: | May 12, 2010 | ||||||||
| Description: | From the Red Hat bugzilla:
Gabriel Menezes Nunes reported: that aMSN messenger failed to properly validate SSL certificates when connecting to the MSN server. A remote attacker could use this flaw to conduct man-in-the-middle attacks and / or impersonate trusted servers. | ||||||||||
| Alerts: |
| ||||||||||
boa: missing santization
| Package(s): | boa | CVE #(s): | CVE-2009-4496 | ||||||||
| Created: | May 12, 2010 | Updated: | May 12, 2010 | ||||||||
| Description: | The boa HTTP server fails to sanitize data written to request logs, allowing an attack to embed escape sequences there. | ||||||||||
| Alerts: |
| ||||||||||
cacti: SQL injection
| Package(s): | cacti | CVE #(s): | CVE-2010-1431 | ||||||||
| Created: | May 7, 2010 | Updated: | May 12, 2010 | ||||||||
| Description: | From the Mandriva advisory:
SQL injection vulnerability in templates_export.php in Cacti 0.8.7e and earlier allows remote attackers to execute arbitrary SQL commands via the export_item_id parameter | ||||||||||
| Alerts: |
| ||||||||||
dvipng: arbitrary code execution
| Package(s): | dvipng | CVE #(s): | CVE-2010-0829 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | May 6, 2010 | Updated: | July 8, 2010 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Ubuntu advisory: Dan Rosenberg discovered that dvipng incorrectly handled certain malformed dvi files. If a user or automated system were tricked into processing a specially crafted dvi file, an attacker could cause a denial of service via application crash, or possibly execute arbitrary code with the privileges of the user invoking the program. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
kernel: denial of service
| Package(s): | kernel | CVE #(s): | CVE-2010-0730 | ||||||||
| Created: | May 7, 2010 | Updated: | May 28, 2010 | ||||||||
| Description: | From the Red Hat advisory:
A flaw was found in the Memory-mapped I/O (MMIO) instruction decoder in the Xen hypervisor implementation. An unprivileged guest user could use this flaw to trick the hypervisor into emulating a certain instruction, which could crash the guest (denial of service). | ||||||||||
| Alerts: |
| ||||||||||
moodle: multiple vulnerabilities
| Package(s): | moodle | CVE #(s): | CVE-2010-1613 CVE-2010-1614 CVE-2010-1615 CVE-2010-1616 CVE-2010-1617 CVE-2010-1618 CVE-2010-1619 | ||||||||||||
| Created: | May 10, 2010 | Updated: | October 11, 2010 | ||||||||||||
| Description: | From the SUSE advisory:
Moodle version 1.9.8 fixes several security issues including cross-site-scripting (XSS) and SQL injection bugs. | ||||||||||||||
| Alerts: |
| ||||||||||||||
mplayer, vlc: arbitrary code execution
| Package(s): | mplayer, vlc | CVE #(s): | |||||||||
| Created: | May 11, 2010 | Updated: | May 12, 2010 | ||||||||
| Description: | From the Debian advisory:
tixxDZ (DZCORE labs) discovered a vulnerability in vlc, the multimedia player and streamer. Missing data validation in vlc's real data transport (RDT) implementation enable an integer underflow and consequently an unbounded buffer operation. A maliciously crafted stream could thus enable an attacker to execute arbitrary code. | ||||||||||
| Alerts: |
| ||||||||||
mysql: privilege escalation
| Package(s): | mysql | CVE #(s): | |||||
| Created: | May 10, 2010 | Updated: | May 12, 2010 | ||||
| Description: | From the Mandriva advisory:
A vulnerability was discovered in mysql which would permit mysql users without any kind of privileges to use the UNINSTALL PLUGIN function. A problem was discovered in the mysqld init script which under certain circumstances could cause the service to exit too quickly, giving the [ OK ] status and before the mysql server was really started and bound to the mysql socket or IP address. This caused a problem for products like Pulse2. | ||||||
| Alerts: |
| ||||||
sahana: information disclosure
| Package(s): | sahana | CVE #(s): | CVE-2010-1191 | ||||
| Created: | May 7, 2010 | Updated: | May 12, 2010 | ||||
| Description: | From the Red Hat bugzilla:
Visiting a certain URL would allow an attacker to view (and potentially modify) information, which should be otherwise protected by authentication. | ||||||
| Alerts: |
| ||||||
samba: privilege escalation
| Package(s): | samba | CVE #(s): | CVE-2010-0787 | ||||||||||||||||||||||||||||||||||||||||
| Created: | May 11, 2010 | Updated: | September 23, 2011 | ||||||||||||||||||||||||||||||||||||||||
| Description: | From the Mandriva advisory:
client/mount.cifs.c in mount.cifs in smbfs in Samba allows local users to mount a CIFS share on an arbitrary mountpoint, and gain privileges, via a symlink attack on the mountpoint directory file | ||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||
texlive-bin: multiple arbitrary code execution flaws
| Package(s): | texlive-bin | CVE #(s): | CVE-2010-0739 CVE-2010-0827 CVE-2010-1440 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | May 6, 2010 | Updated: | June 26, 2012 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Ubuntu advisory: Marc Schoenefeld, Karel Srot and Ludwig Nussel discovered that TeX Live incorrectly handled certain malformed dvi files. If a user or automated system were tricked into processing a specially crafted dvi file, an attacker could cause a denial of service via application crash, or possibly execute arbitrary code with the privileges of the user invoking the program. (CVE-2010-0739, CVE-2010-1440) Dan Rosenberg discovered that TeX Live incorrectly handled certain malformed dvi files. If a user or automated system were tricked into processing a specially crafted dvi file, an attacker could cause a denial of service via application crash, or possibly execute arbitrary code with the privileges of the user invoking the program. (CVE-2010-0827) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
xar: package signature validation failure
| Package(s): | xar | CVE #(s): | CVE-2010-0055 | ||||||||
| Created: | May 12, 2010 | Updated: | May 13, 2010 | ||||||||
| Description: | The xar tool fails to properly validate package signatures, leading to an "unspecified impact." | ||||||||||
| Alerts: |
| ||||||||||
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 2.6.34-rc7, released on May 9. Linus says: "I think this is the last -rc - things have been pretty quiet on the patch front, although there's been some rather spirited discussions." The full changelog contains all the details.
According to the latest regression posting, there are 24 unresolved regressions in 2.6.34.
Stable updates: the 2.6.32.13 and 2.6.33.4 stable kernel updates were released on May 12. Both are large - on the order of 100 patches each - and fix a number of important problems.
Quotes of the week
Adaptive spinning futexes
As a general rule, a well-written program should, when it needs a resource currently owned by another program, step aside and allow other work to proceed until that resource becomes available. When it comes to low-level synchronization primitives, though, this rule does not always hold. Better overall system performance can often be achieved if a program busy-waits rather than sleeping. If the wait is short, the performance benefits that come from giving the resource to an already-running, cache-hot process outweigh the cost of the busy wait.The best-supported (by the kernel) user-space synchronization primitive is the futex. Darren Hart has been working on a patch series intended to bring adaptive spinning to futexes in an attempt to improve the performance of multi-threaded applications. These patches, while still marked as "not ready for inclusion," have evolved considerably over time.
The core idea is simple: if a process attempts to acquire a futex which is already owned by another, it will spin in an acquisition loop until the holding process either releases the futex or is scheduled out. If all goes well, the new process will be able to grab the futex quickly and get on with its work in the most efficient way. In practice, adaptive spinning generally outperforms regular futexes, but only occasionally does better than the highly tweaked, assembly-coded adaptive spinning mutex code used by the pthreads library.
Adaptive spinning requires that the kernel know which process currently owns the futex; that is a minor problem because the current futex operations do not provide that information. So a new locking operation is required in situations where adaptive spinning is to be used.
There is an alternative approach which has been recommended by some developers: do the spinning in user space rather than in the kernel. User-space spinning might just be faster, but it's trickier, because it's harder for user space to know whether the current holder of a futex is executing or not. Providing the requisite information will require the design of a special (and fast) API - work which has not yet been done.
Uprobes returns - again
The Uprobes module is becoming one of the longer-lasting stories in the kernel development community. For a few years now, developers have been trying to get this code - which allows the placement of dynamic tracepoints into user-space programs - into the mainline. We last looked at Uprobes back in January; now, as the 2.6.35 merge window approaches, Uprobes is back for another round.At this point, Uprobes has been entirely separated from the utrace layer, which is not a part of this patch series. Utrace is controversial in its own right and has not proved helpful in getting Uprobes merged. Other changes which have been made include the addition of interfaces to the the tracing and perf events subsystem. That means that dynamic probes can be inserted from the command line, then watched using the Ftrace interface or aggregated with perf.
On the other hand, Uprobes retains the "execute out of line" mechanism for the execution of instructions displaced by probes. XOL works, but it does so at the cost of injecting a new virtual memory area into the probed process; that is a larger disturbance than some developers would like to see. But the alternative - adding an emulator for those instructions to the kernel - is invasive in different ways.
Review comments so far have focused on relatively small details. That does not mean that Uprobes will be accepted when the merge window opens, but its chances do seem better than they have in the past.
Detecting idle patterns
The cpuidle subsystem is charged with putting the CPU into the optimal sleep state when there is nothing for it to do. One of the key inputs into this decision is the next scheduled timer event; that event puts an upper bound on how long the processor can expect to be able to sleep undisturbed. A more distant next timer event suggests that a deeper sleep state is appropriate.But timer events are not the only way to wake up a processor; device interrupts will also do that. There are times when hardware can be expected to interrupt well before the next timer expiration, but those times can be hard for the processor to predict. There is seemingly an exception, though: sometimes hardware interrupts are so regular that they become a sort of timer tick in their own right. A moving mouse can generate that sort of pattern; network traffic can do it too. In such situations, the current cpuidle "menu" governor may repeatedly choose the wrong sleep state.
Arjan van de Ven has come to the rescue with a simple cpuidle patch which maintains an array of the last eight actual sleep periods. Whenever it is time to put the processor to sleep, the standard deviation of those sleep periods is calculated; if it is small, then the average sleep is considered to be a better guide to the expected sleep period than the next timer event.
As machine learning goes, this code is a relatively simple example. But it should be smart enough to catch simple patterns and run the hardware in something closer to an optimal mode.
Kernel development news
The Next3 filesystem
The ext3 filesystem is tried and true, but it lacks a number of features deemed interesting by contemporary users. Snapshots - the ability to quickly capture the state of the filesystem at an arbitrary time - is at the top of many lists. It is currently possible to use the LVM snapshotting feature with ext3, but snapshots taken through LVM have some significant limitations. The Next3 filesystem offers an approach which might prove easier and more flexible: snapshots implemented directly in ext3.Next3 was developed by CTERA Networks, which has started shipping it on its C200 network-attached storage device. This code has also been posted on SourceForge and proposed for merging into the mainline kernel. The Next3 filesystem adds a simple snapshot feature to ext3 in ways which are (mostly) compatible with the existing on-disk format. It looks like a useful feature, but its path into the mainline looks to be longer than its implementers might have hoped.
The Next3 filesystem is a new filesystem type - it's not just an addition to ext3. At its core, it works by creating a special, magic file to represent a snapshot of the filesystem. The files have the same apparent size as the storage volume as a whole, but they are sparse files, so they take almost no space at the outset. When a change is made to a block on disk, the filesystem must first check to see whether that block has been saved in the most recent snapshot already. If not, the affected block is moved over to the snapshot file, and a new block is allocated to replace it. Thus, over time, disk blocks migrate to the snapshot file as they are rewritten with new contents.
Gaining read-only access to a snapshot is a simple matter of doing a loopback mount of the snapshot file as an ext2 filesystem. The snapshot file is sufficiently magic that any attempts to read blocks in the holes (which represent blocks that have not been changed since the snapshot was taken) will be satisfied from a later snapshot - which will have captured the contents of that block when it was eventually changed - or from the underlying storage device. Deleting a snapshot requires moving changed blocks into the previous snapshot, if it exists, because the deleted snapshot holds blocks which are logically part of the earlier snapshots.
The changes to the ext3 on-disk format are minimal, to the point that a Next3 filesystem can be mounted by the ordinary ext3 code. If snapshots exist, though, ext3 cannot be allowed to modify the filesystem, lest the changed blocks fail to be saved in the snapshot. So, when snapshots exist on the filesystem, it will be marked with a feature flag which forces ext3 to mount the filesystem readonly.
On the performance side, the news is said to be mostly good. Writes will take a little longer due to the need to move the old block to a snapshot file. The worst performance impact is seemingly on truncate operations; these may have to save a large number of blocks and can get a lot slower. It is also worth noting that the moving of modified blocks to the snapshot file will, over time, wreck the nice, contiguous on-disk format that ext3 tries so hard to create, with an unfortunate effect on streaming read performance. Files which must not be fragmented can be marked with a special flag which will cause blocks to be copied into the snapshot file rather than moved; that will slow writes further, but will keep the file contiguous on disk.
Next3 developer Amir Goldstein requested relatively quick review of the patches because he is trying to finalize some of the on-disk formatting. The answer he got from Ted Ts'o was probably not quite what he was looking for:
Amir's response was that, while porting the patches to ext4 is on the "we'll get around to it someday" list, that port is not an easy thing to do. The biggest problem, apparently, is making the movement of blocks into the snapshot file work properly with ext4's extent-oriented format. Beyond that, Amir says, he's not actually trying to get the changes into ext3 - he wants to merge a separate filesystem called Next3 which happens to be mostly compatible with ext3.
The "separate Next3" approach is unlikely to fly very far, though. As Ted put it, ext2, ext3, and ext4 are really just different implementations of the same basic filesystem format; this format has never really been forked. Next3, as a separate filesystem, would be a fork of the format. The fact that Next3 has taken over some data structure fields which are used to different purpose in ext4 has not helped matters:
The answer appears fairly clear: patches adding the snapshot feature might be welcome, but not as a fork of the ext3 filesystem. At a bare minimum, the filesystem format will have to be changed to avoid conflicts with ext4, but the real solution appears to be simply implementing the patches on top of ext4 instead of ext3. That is a fair amount of extra work which might have been avoided had the Next3 developers talked with the community prior to starting to code.
Moving x86 to LMB
The early days of the 2.6.34 development cycle were made more difficult for some testers by difficulties in the NO_BOOTMEM patches which came in during the merge window. The kinks in that code were eventually ironed out, but things might just get interesting again in 2.6.35 - Yinghai Lu is back with another set of patches which continues the process of completely reworking how early memory allocation is done on the x86 architecture. The potential for trouble with this kind of work is always there, but the end result does indeed seem worth aiming for.Some review: in a running kernel, memory management is handled by the buddy allocator (at the page level), with the slab allocator on top. These allocators are complex pieces of code which cannot run in the absence of a mostly functional kernel, so they cannot be used in the early stages of the bootstrap process. What is used, instead, is an architecture-specific chain of simple allocators. For x86, things start with a brk()-like mechanism which yields to the "e820" early reservation code, which, in turn, gives way to the bootmem allocator. Once the bootstrap has gotten far enough, the slab allocator can take over from the bootmem code. Yinghai's 2.6.34 changes were meant to short out the bootmem stage, allowing the system to use the early reservation code until slab can run.
During the review process for that code, some reviewers asked why x86 did not use the "logical memory block" (LMB) allocator instead of its own early reservation code. LMB is currently used by the Microblaze, PowerPC, SuperH, and SPARC architectures, so it has the look of a generic solution. There are obvious advantages to using generic code over architecture-specific variants; there are more eyes to look at the code and the overall maintenance cost is reduced. So the idea of moving to LMB made obvious sense.
LMB is, as might be expected, a truly simplistic memory manager. Low-level architecture code gives it blocks of memory to manage as it discovers them with:
long lmb_add(u64 base, u64 size);
The LMB allocator will duly store that region into a fixed-length array of known memory blocks, coalescing it with existing blocks if need be. Memory may then be allocated with:
u64 lmb_alloc(u64 size, u64 align);
Allocated blocks are tracked in a second array which looks just like the first; an allocation is satisfied by iterating through the available blocks, trying to find a sufficiently large chunk which is not already reserved by somebody else. There are other functions for reserving specific regions of memory, allocating memory on specific NUMA nodes, etc. But, at its core, LMB is a simple allocator which is meant to do a good-enough job until something more sophisticated can take over.
Yinghai's patch set makes a number of changes to the LMB code itself, starting with a move from the lib directory over to mm with the rest of the memory-management code. Some new functions are added to match the different semantics supported by the early reservation code, which works in a two-step, "find a memory block, then reserve it" mode. There is also a new function to transfer LMB reservations into the bootmem allocator for configurations where bootmem is still in use. The 22-part series culminates with a switch to LMB calls for early allocations and the removal of the now-unused early reservation code.
There has been surprisingly little discussion for a patch series which makes such fundamental changes. It seems that most kernel developers pay relatively little attention to what happens at the architecture-specific levels. One exception is Ben Herrenschmidt, who keeps an eye on LMB from the PowerPC perspective. Ben disagrees with a number of the LMB-level changes, feeling that they complicate the API and potentially introduce problems. Instead, it looks like Ben would like to fix up the LMB code himself, letting Yinghai work on the x86-specific side of things.
To that end, Ben has posted a patch series of his own, saying:
Some of the changes simply clean up the LMB code, adding, for example, a for_each_lmb() macro for iterating through the array of memory blocks. The fixed-length arrays are made variable, phys_addr_t is used to represent physical addresses, and the code is substantially reorganized. There is much that Ben still plans to do, including, happily, the addition of actual documentation to the API, but even without all that, it's a significant cleanup for the LMB code.
As with Yinghai's patches there has been little in the way of discussion. It may be that these changes will remain below the radar while the two patch sets are integrated and - maybe - merged for 2.6.35. With luck, they'll remain below the radar thereafter as well, with few people even noticing the difference.
MeeGo and Btrfs
MeeGo is arguably the dark horse in the mobile platform race: it is new, unfinished, and unavailable on any currently-shipping product, but it is going after the same market as a number of more established platforms. MeeGo is interesting: it is a combined effort by two strong industry players which are trying, in the usual slow manner, to build a truly community-oriented development process. For the time being, though, important development decisions are still being made centrally. Recently, a significant decision has come to light: MeeGo will be based on the Btrfs file system by default.Btrfs is seen as the long-term future of Linux filesystems, representing a much-needed clean break from the legacy filesystem designs we have been using for all these years. With the demise of reiser4 and the unavailability of ZFS, Btrfs would seem to be the only contender for that title. But talk about Btrfs is always framed in "it's not stable yet" terms, with few people willing to commit themselves to an actual date when the filesystem might be ready for production use. It is generally assumed that most cautious users will spend some years running on ext4 before making the jump to Btrfs. The 2.6.34 kernel will be released with this text still guarding the Btrfs configuration entry:
The MeeGo 1.0 release could happen as early as this month; given that, the above words might just seem a bit scary. In fact, they are more scary than they need to be: further on-disk format changes are not expected. The warning, it seems, will be scaled down for 2.6.35.
So why pick Btrfs for MeeGo? Arjan van de Ven described the decision this way:
He went on to describe a number of reasons why Btrfs makes sense for the MeeGo platform, starting with its data integrity features. The copy-on-write design which is at the core of Btrfs has a number of nice attributes, one of which is that users should never, ever see garbage data in files, even in a "pulled out the battery at the worst moment" situation. Device manufacturers, understandably, like that idea.
The on-disk compression feature is interesting for the MeeGo environment as well. It makes the initial system load take less space, making more available for the users of the device. But, as Arjan points out, manufacturers like it too: a smaller system image takes less time to shovel onto the storage device.
It would appear that there are a number of plans for the use of the Btrfs snapshot feature, starting with reversible package updates. With snapshots, a device can support a multi-user mode where each user appears to have the entire system to him- or herself. And the "reset to factory defaults" operation becomes a simple operation which does not require a separate recovery partition on the disk. Snapshots are not just for enterprise users anymore.
There are a number of other advantages, including small-file performance, built-in defragmentation (which is most useful for keeping boot time short), the storage management features, and more. In short, there's no doubt that Btrfs offers a useful set of features for any distribution; it's not hard to see why MeeGo wanted to use it. But that does leave an interesting open question: is Btrfs ready for inclusion into MeeGo, where it will, presumably, be installed onto systems intended for users who aren't looking to become development-stage filesystem testers?
Btrfs was initially merged for the 2.6.29 kernel; since then, patch activity looks like this:
So there is a steady rate of change to the filesystem, significant but not overwhelming. There is a wide range of contributors to this code, though the bulk of the work (by far) has been done by developers from Oracle and Red Hat. There are certainly people using Btrfs in normal use, and Fedora offers it as an experimental option. The mailing list shows a number of oops reports still, and it would appear that the famous ENOSPC issue (where the filesystem reacts poorly when the storage device overflows) is still not entirely solved. Significant feature patches - direct I/O support and RAID 4/5 support, for example - remain unmerged. In summary: Btrfs does not quite have that "it's done" look to it yet.
That said, it may well be getting close to ready for the sort of restricted and well-tested environment likely to be found in MeeGo deployments. Btrfs will also have stabilized further by the time devices actually start shipping with MeeGo - helped, no doubt, by the work of the MeeGo developers themselves. So, while this decision may appear to be ambitious now, it is not necessarily unreasonable. A dark-horse platform can only be helped by taking advantage of the best technology available to it.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Distributions
News and Editorials
NLUUG: Minimizing downtime on servers using NanoBSD, ZFS, and jails
On May 6, NLUUG held its Spring Conference with the theme System Administration. There were a lot of talks about very specific tools or case studies, but one struck your author because it married conceptual simplicity with a useful goal: Minimizing service windows on servers using NanoBSD + ZFS + jails by Paul Schenkeveld. Over the last four years, Paul has searched for methods to upgrade applications on a server with minimal downtime. The system he implemented is in production now on various servers, which require only a few seconds downtime for an application upgrade and the same amount of time for a rollback if the upgrade fails.
Jail time
System administrators have to keep their servers up-to-date, but when they upgrade the operating system or applications, the server may be unavailable for the users for some time, varying from a few seconds to much longer. If the administrator is lucky, he can schedule the upgrade after office hours when nobody uses the server, but if that's not possible, there will be some noticeable downtime for the users. In the latter case, it's extremely important that the downtime remains minimal, or the system administrator will face some angry shouting.
And that's where Murphy comes around the corner: upgrades are notorious ways to introduce unexpected problems. If an upgrade breaks things, the system administrator has to roll back to the previous upgrade. But this introduces some additional downtime, which is often longer than the upgrade downtime because rollbacks are not as easy as upgrades.
To minimize the risk of upgrades, there has been a trend to use one (virtual) server per application. Doing this, a problematic upgrade will impact only one application at a time. As a FreeBSD user, Paul obviously chose to isolate applications in jails, a lightweight form of operating-system-level virtualization. Each jail can be upgraded separately, minimizing the risk that other applications on the same server have to be brought down to fix one application's failed upgrade.
ZFS snapshots to the rescue
The root filesystem of each jail in Paul's system is in fact a read-only snapshot of the filesystem of a template jail (Paul called it a "prototype jail
"). This allows the system administrator to prepare upgrades offline: upgrade the operating system and applications (FreeBSD ports) in the template jail and create a new snapshot of its filesystem (with the zfs snapshot command). Upgrading a production jail then involves stopping the jail, changing its root filesystem to the new snapshot and then restarting it. This upgrade takes only a few seconds because it has been prepared offline. Using snapshots rather than just multiple root filesystem images has the advantage that it saves space because the filesystems of all the jails are mostly identical.
When unexpected problems crop up, rolling back is as easy as stopping the jail, changing its root filesystem to the previous snapshot and then restarting it. The system administrator can then investigate and fix the problem offline, without any unneeded downtime.
Obviously, a read-only filesystem is not enough: the applications that run inside the jail need to be able to write their data. Therefore, the directory tree for a jail is set up as a combination of the read-only snapshot of the template jail's root filesystem with some read-write filesystems for the applications, such as /var and /home.
ZFS, which is deemed production ready since FreeBSD 8, easily handles many filesystems, has a flexible quotas system, and offers fast snapshots (thanks to the copy-on-write nature of the filesystem). Therefore, it's a perfect match to create a robust way of upgrades and rollbacks, something that has also been used by Nexenta.
Inspired by embedded systems
Combining jails and ZFS is straightforward, but Paul went further and thought about the underlying operating system. He got inspired by the way embedded systems are operated, and that's why he looked at NanoBSD. This is a toolkit that comes with the FreeBSD base system and that creates a FreeBSD system image for embedded applications that is suitable for use on a Compact Flash card.
NanoBSD is an interesting server choice for a number of reasons. First, it has the same functionality as FreeBSD, unless specific features are explicitly removed from the NanoBSD image when it is created. Moreover, every application that exists as a FreeBSD port or package can be installed and used in NanoBSD. The main differences are that the complete operating system is built and upgraded offline and that the root filesystem is mounted read-only.
The drive where NanoBSD is installed is divided into three partitions by default: two image partitions and one configuration partition. All of them are mounted read-only. The /etc and /var directories are memory disks (i.e. RAM disks). After the system boot, the configuration partition is briefly mounted read-only under the /cfg directory and all files in it are copied to /etc. If the system administrator wants to make persistent changes to a file (say, /etc/resolv.conf), they have to mount the configuration partition under /cfg, copy the modified files from /etc to /cfg and unmount the configuration partition. The whole system is set up in this way to minimize the number of writes to the flash drive and prolong its life. And with this completely read-only system, there is no necessity to run fsck after a non-graceful shutdown, so the system reboots with minimal downtime.
The update process of NanoBSD is also well thought-out. While the running NanoBSD is installed on one of the image partitions, a newly built NanoBSD image is written to the other image partition. Then the system is rebooted and started from the newly installed partition. If anything goes wrong, the system can be rebooted back into the previous partition, which still contains the old, working image. The system administrator can then investigate and fix the problem offline.
Tying it all together
Combining these three technologies (NanoBSD, ZFS, and jails), Paul reached his goal of setting up a FreeBSD server that can be upgraded with minimal downtime. All user-visible applications run in jails. Underneath the jails, a minimal FreeBSD operating system runs, built using the NanoBSD script. This holds the kernel, some low-level services, and the tools for building a new system image for upgrading the operating system. The NanoBSD system image can be put on a partition of a regular disk drive, but Paul prefers to put it on a separate flash drive, because NanoBSD is specifically designed for it and using a separate drive for the operating system makes it easier for the system administrator when the hard drives with the jails fail.
System administration on this system is much the same as on a regular FreeBSD system, except for software maintenance. Of course the embedded roots of NanoBSD mean that the system administrator needs to be aware of the differences from a regular server operating system. The volatile nature of /etc is one example: it's easy to forget to copy all changed configuration files to /cfg to preserve the changes after a reboot.
The directory /var is also a memory disk, so by default NanoBSD doesn't keep log files, which is not helpful to the system administrator. One solution is to put /var on a hard disk instead of using a memory disk, but then the operating system depends on the hard disk, which Paul wanted to avoid. Therefore, he chose another solution: telling syslog to log to a syslog daemon on another host or to a syslog daemon inside a jail on the same system.
A general architecture
Each system administrator has their own way of configuring servers. In the beginning of his talk, Paul warned that he didn't mean to provide the best solution for every situation, he just wanted to describe the way that he builds and maintains FreeBSD servers. Although his talk was specifically about the combination of NanoBSD, ZFS, and jails, the architecture he described was general enough to be usable elsewhere. The same ideas can be implemented with other minimal operating systems, other filesystems with snapshot abilities, and other forms of operating-system-level virtualization.
While FreeBSD jails are probably the most well-known type of operating-system-level virtualization, other operating systems have it too. OpenSolaris for instance has Zones, which are even more flexible than FreeBSD jails. Linux has similar solutions, such as OpenVZ and LXC (Linux containers). In particular, OpenVZ (and its proprietary variant Virtuozzo Containers) is popular among providers of virtual private servers. So on the virtualization level, the same architecture that Paul uses is perfectly possible on Linux.
The second important component in Paul's scheme are the filesystem snapshots. Although ZFS is not available for Linux (at least not in kernel space), there are many other snapshot technologies. For example, LVM (Logical Volume Manager) has a snapshots facility, just as Btrfs (which we looked at a while back). So for example with OpenVZ and LVM, Linux should be perfectly capable of creating OpenVZ containers based on read-only snapshots of some template container. Proxmox already makes use of LVM snapshots to create a backup of a container without downtime.
The last step Paul took is maybe the most interesting one: in a time where operating systems, even for servers, get more and more bloated, it's refreshing to see him take some inspiration from the embedded world. While at first the limitations of an embedded operating system on read-only storage seem too cumbersome, it actually helps a lot in clearly separating the operating system from its configuration and applications, which can only be good. Moreover, the dual-image approach and fast reboots have the advantage of more robust and less intrusive upgrades to the host operating system.
Because Linux shines in the embedded world, it's not very far-fetched to try this approach in the Linux world too. When leaving Paul's talk, your author heard a couple of people in the audience thinking out loud that it would be interesting to have such a system using Linux. A quick search didn't turn up anything useful, but it's clear that it should be possible, most likely with LVM snapshots and OpenVZ containers.
New Releases
Gilmore: Fedora 12 SPARC beta
Dennis Gilmore has announced the availability of Fedora 12 SPARC beta. "I have pushed a fedora 12 beta sparc tree the master mirror, you can find one close to you Here under /releases/test/12-Beta/sparc/ the tree is pretty much complete. there are a few broken deps that need resolving. Partitioning is fragile. it mostly works however sometimes you will be best off to do manual partitioning in rescue mode or breaking out into a shell."
Linux Mint LXDE Debian 5.0 PPC is a fact
Jeroen Diederen has announced that an installer for Linux Mint LXDE based on Debian Lenny is available. "After hours and hours of hard work I can now all tell you with a lot of pride that a first version of an installer for Linux Mint LXDE based on Debian Lenny is a fact. This means that this is the first official installer of Linux Mint for PowerPC. For the moment only a 32-bits version is available as I don't have a 64-bits machine to test. The advantage of LXDE over GNOME, KDE or XFCE is that it is much snappier on old G3/G4 machines. The Mint themes make the desktop look a lot sexier than standard Debian. Linux Mint LXDE PPC comes with a huge amount of preinstalled programs, aimed at the desktop user."
Lavergne: Lubuntu 10.04
Julien Lavergne has announced the release of Lubuntu 10.04. "Lubuntu is a project to make a Ubuntu variant using the LXDE desktop. It's designed to be a lightweight and easy-to-use desktop environment. Lubuntu is currently not part of the Ubuntu family, and not build with the current Ubuntu infrastructure. This release is considered as a "stable beta", a result that could be a final and stable release if it was included in the Ubuntu family. Please note also that Lubuntu 10.04 is not a LTS version."
Mandriva Directory Server 2.4.0 available
Mandriva Directory Server 2.4.0 is available for download. Click below for a look at the new features.Slackware 13.1 BETA1
Slackware has announced Slackware 13.1 BETA1, as seen in the May 6, 2010 entry in the slackware-current changelog. "Hi folks! We have some pretty big changes today, with an update to the latest KDE SC 4.4.3, and the addition of support for ConsoleKit and PolicyKit which have been enhanced to use shadow authentication. Thanks to Andrew Psaltis for doing some great work on polkit-1, and to Robby Workman for spending months following the sometimes random developments coming from the CK/PK camp. :-) Thanks to Eric Hameleers for leading the KDE 4.4.x Slackware development and handling the out-of-tree testing through http://alien.slackbook.org/blog/. And with that, we're calling this Slackware 13.1 BETA1. A stable release should be just around the corner..."
Distribution News
Debian GNU/Linux
Release update: transitions status and freeze, RC-bugs
The Debian release team has an update on the "squeeze" release. "Due to the rate of change in unstable, it's not easy at the moment to accurately estimate when we might be able to freeze. In order to help us keep a clearer picture of which changes still need to occur before we can freeze, we will be introducing a "transition freeze" before the end of this month."
Bits from the NM process: advocacy messages
Enrico Zini presents a few bits about the New Maintainer process. "after some recent debian-newmaint and IRC conversation, it makes sense to post a quick note about advocacy messages."
New DMUP version 1.1.2
The Debian System Debian Administration Team has announced an updated version of the Debian Machine Usage Policies (DMUP). "Please note that this version only fixes the most important issues, such as cleaning up DAM vs. DSA responsibilities. While we are working on a new and improved DMUP that will hopefully fix more warts and will concentrate on the really significant issues, this is a work in progress which might take a while yet."
porter chroot updates
Peter Palfrader has announced some new volunteers to help DSA in maintaining the development chroots on the porting machines. Click below for their names and some guidelines for install requests.snapshot.debian.org implications for developers
Debian keeps a copy of all packages (sources + binaries) and accompanying metadata that get uploaded into any of our archives on snapshot.debian.org. "Short version: If you uploaded stuff to debian that is not redistributable you will have to let the snapshot people know to remove it."
Fedora
Fedora 13 release slips
The traditional slip of the Fedora release date has happened right on schedule. Due to a few remaining blocker bugs, Fedora 13 is now scheduled to be released on May 25.Fedora election update
The Fedora Project is holding elections for 3 seats on the advisory board and 5 seats on the engineering steering committee. The F14 elections questionnaire is available, with questions from the communtity and answers from the candidates.Making 3D Free for Innovation: Fedora 13 Graphics Drivers (Red Hat News)
Red Hat News continues its blog series highlighting features slated for Fedora 13 with a look at video drivers. "Following on the capabilities for the drivers for Intel and ATI based graphics cards, in Fedora 13, the Nouveau driver provides 3D hardware acceleration and is designed to support a wide array of NVidia based graphics cards. Because these drivers are fully free (as in freedom), open source software developers can build additional software against the functions they provide, taking fuller advantage of the hardware that users have purchased."
Fedora names recipient of 2010 Fedora Scholarship
The Fedora Project has announced that Ian Weller is the recipient of the 2010 Fedora Scholarship. "Now in its third year, the Fedora Scholarship recognizes college and university-bound students across the globe for their contributions to free software and the Fedora Project. Weller has contributed to Fedora for more than two years as a packager and designer, and has also played an integral role in Fedora's project-wide transition to a new wiki system in 2009."
Fedora 14 release name
The Fedora 14 release name is ... Laughlin. From the wiki page: "Robert Goddard was a professor of physics, and so is Robert Laughlin. He was awarded the 1998 Nobel Prize in physics for his explanation of the fractional quantum Hall effect. Moreover he argues for emergence which is a concept that says "The whole is more than the sum of its parts". Fedora is more than the sum of its software."
SUSE Linux and openSUSE
Wafaa: Getting openSUSE from A to Y
Andrew Wafaa has put out a call to action on accessibility for openSUSE on his blog. "So as a community do you think this is something we could get behind? I would love to see openSUSE 12.0 released as the most accessible distribution; Ubuntu currently makes the statement that they are the most accessible desktop system available and this is a statement I would love to show as false! Not through animosity but through sheer prowess of the Geeko! To do that we (yes that means you at the back, listening to your boom boom music and chatting to your friends on FaceSpace) have to roll our sleeves up and get educated and start educating!"
Ubuntu family
Shuttleworth: Unity, and Ubuntu Light
Mark Shuttleworth takes a look at an Ubuntu experience mainly designed for dual-boot netbooks. "A few months ago we took on the challenge of building a version of Ubuntu for the dual-boot, instant-on market. We wanted to be surfing the web in under 10 seconds, and give people a fantastic web experience. We also wanted it to be possible to upgrade from that limited usage model to a full desktop. The fruit of that R&D is both a new desktop experience codebase, called Unity, and a range of Light versions of Ubuntu, both netbook and desktop, that are optimised for dual-boot scenarios."
Shuttleworth: No default GNOME Shell in Ubuntu 10.10 (The H)
The H reports on Mark Shuttleworth's question and answer session that was part of Ubuntu Open Week. "Because he wants a singular user experience, Shuttleworth is also not considering the proposal to make application defaults configurable at installation. 'One of the really strong values we have is that two users of Ubuntu should, by default, either be having the same experience, or be expert enough to understand why they are not'. He sees benefits in the common user experience allowing users to help each other more easily; 'They can help each other, just talking about "the browser" ... 'for the beginning, out of the box experience, we benefit a lot from keeping it tight'."
Amber Graner: Call For Nominations for the Elected Ubuntu Women Project
Last January LWN covered the Ubuntu Women Project. At that time the project had asked the Ubuntu Community Council to appoint an interim leader and Amber Graner became the transitional project leader. Now Amber reports that the project has opened nominations for a new Ubuntu Women Project Leadership Committee. Nominations will be open until May 21, 2010 for a 3 person leadership committee.Maverick is open for development
Martin Pitt has announced that Maverick Meerkat (Ubuntu 10.10) is now open for development.
Other distributions
CentOS 3 6-Month End Of Life Notice
The CentOS project has announced that CentOS 3 will reach its end-of-life in six months. "It is recommended that any system still running CentOS 3 should be upgraded to a more recent version of CentOS before this date to ensure continued security and bug fix support."
New Distributions
Quirky
Quirky is a "puplet", a distribution based on Puppy Linux using the Woof build system. It was created by Puppy founder Barry Kauler as an outlet for some of his quirkier ideas. The recently released Quirky 1.0 "is quite straight, not very "quirky". Some of the interesting ideas that I want to try are still to come. The focus for now is to test a lot of the new stuff in Woof, such as rerwin's analog and 3G modem detection/setup scripts. Also, I have attempted to build a "multimedia special" that can play just about anything, despite the approx. 100MB size of the live-CD."
Distribution Newsletters
DistroWatch Weekly, Issue 353
The DistroWatch Weekly for May 10, 2010 is out. "Mandriva Linux, a distribution that was one of the first to understand the concept of user-friendliness on the desktop, is apparently for sale and in negotiations with two potential buyers. That's according to some unconfirmed reports that appeared on the Internet over the weekend. But the company itself remains mute on the issue, while the development of the upcoming version 2010.1 continues unabated. In other news, Red Hat explains the genealogy of its enterprise kernels, Debian and Slackware update KDE to version 4.4.3 in their respective development branches, Sabayon announces availability of daily, bleeding-edge DVD builds, and Astaro apologises for last week's updates that went terribly wrong. Also in this issue, The Economist magazine explains the reasons for setting up a Launchpad account, while The Times urges users to abandon Windows and to switch to Ubuntu. Finally, for the fans of lighter distributions we have a first-look review of CDlinux, Canonical's announcement about a new "Unity" desktop for netbooks, news about a special edition of Unity Linux with Enlightenment, and an introduction to an inaugural release of Quirky, a new mini-distribution from the developers of Puppy Linux. There is something for everyone - happy reading!"
Fedora Weekly News 224
The Fedora Weekly News for May 5, 2010 is out. "This issue begins with Project announcements, including another Fedora Community Gaming session this weekend, details on an upcoming Fedora Board IRC meeting, and opening of voting for naming Fedora 14. In news from the Fedora Planet, thoughts on a Fedora Kiosk spin, a report on Sugar-on-a-Stick testing, and what you don't know about NetworkManager. In Marketing team news, an update on the Allegheny Team and their accomplishments, report on some work for the F13 spins website, and an upcoming blog posting on hardware enablements in F13. One post from Fedora In the News this week, from Linux Gazette, covering some of the exciting aspects of Fedora 13 beta. In Design Team news, a new owner for Hackergotchis request service, and Security Advisories rounds out this issue with security-related Fedora packages released this past week for F11, F12 and F13. Read on!"
openSUSE Weekly News #122
The openSUSE Weekly News for May 8, 2010 is out. "Now the eighteenth Week goes to the End, and we are pleased to announce our new issue. In this week we are busy with the Milestone 6. After that we're interested in the last Milestone (7). We would like to invite all of our readers to test the new Milestones. And please file all found Bugs into our buzilla. So we're hoping, that you like the new Weekly News. We wish you many joy by reading it..."
Ubuntu Weekly Newsletter #192
The Ubuntu Weekly Newsletter for May 8, 2010 is out. "In this issue we cover: Maverick is open for development, Call for Ubuntu User Days Instructors, Window indicators, New Ubuntu Regional Membership Boards, Maverick UDS Translations Sessions, Patch Day Success, Ubuntu Open Week en Español closes on high note, Ubuntu Open Week - Lucid: Community, Canonical, Collaboration, Call For Nominations: Ubuntu Women Leader Leadership Committee, Ubuntu Server and Apache Tomcat - supporting MuleSoft, Full Circle Podcast #6: Mark's Space Brain from the Future, and much, much more!"
Distribution reviews
Peppermint OS: a review (Linux Critic)
The Linux Critic reviews Peppermint OS. "What do you get when you combine the flexibility, versatility and ease of maintenance of Ubuntu, the blinding speed and simplicity of LXDE, and a focus on social media and the cloud? You get Peppermint OS, that's what! Brought to you by the same developer responsible for Linux Mint 8 LXDE Community Edition, and for resurrecting Linux Mint Fluxbox CE as well, Peppermint OS is a lightweight, fast, stable implementation of what Kendall Weaver's vision of the perfect Linux distro might be for speed and the web."
RHEL 6 - your sensible but lovable friend (Channel Register)
The Register has a review of Red Hat Enterprise Linux 6. "For RHEL 6, Red Hat is using a Fedora development release based on the Linux 2.6.32 kernel - technically, it's a hybrid of several recent kernels. Red Hat engineers have hardened the Fedora base and added quite a few features - with a strong emphasis on virtualization."
Review: Ubuntu Enterprise Cloud (ComputerWorld)
ComputerWorld takes a look at creating a private cloud using Ubuntu Linux 10.04. "In keeping with its open source pedigree, Ubuntu Enterprise Cloud is integrated with the open source Eucalyptus private cloud platform, making it possible to create a private cloud with much less configuration than installing Linux first, then Eucalyptus. And for those thinking about eventually moving resources to the public cloud, or simply bursting to the public cloud when workloads spike, the Ubuntu/Eucalyptus internal cloud offering is designed to be compatible with Amazon's EC2 public cloud service."
Ubuntu 10.04 LTS Server Is Getting There (eWeek)
Joe "Zonker" Brockmeier reviews Ubuntu 10.04 LTS Server. "Canonical and the Ubuntu Project have done great things to help bring Linux to the mainstream desktop. But what about the server edition? If Ubuntu can bring the same level of polish to its server offerings, it should be a formidable competitor to Microsoft and other Linux vendors. Looking at Ubuntu Server 10.04, aka "Lucid Lynx," there's a lot to like and also some disappointments."
Lucid dream: Ars reviews Ubuntu 10.04 (ars technica)
Ryan Paul has written a lengthy review (split into nine pages) of Ubuntu 10.04. "Although Ubuntu still has a long road ahead before it will fulfill the aspirations of its creators, version 10.04 includes a number of impressive improvements that artfully advance the platform. The new theme and more cohesive branding contribute to a more compelling visual appearance, tighter integration with the cloud expands the boundaries of the desktop, and usability improvements enrich the user experience."
Page editor: Rebecca Sobol
Development
One-stop performance analysis using atop
Linux system administrators often receive complaints about the performance of their systems. It can be rather difficult to track down these problems and to find why, when, and how often they happen. Being able to zoom in on the processes that are responsible, and to see what has happened in the past, is very valuable. The atop utility was written with just these things in mind.
Performance analysis tools
Linux has a rich set of tools for performance analysis, but each has its own capabilities and limitations. In developing atop, the following list was considered to be desirable features for the tool:
-
The tool should obviously be able to show the current situation.
However many resource problems don't occur "now". Often complaints
will come in about the system performance "last night" or
"last week". Therefore the tool must be able to look in the past. Being
able to look in the future would be a "nice to have" but was deemed too
difficult to implement.
-
It should show the load of the four main resources on a system
level: CPU, memory, disk I/O, and network usage.
-
The four main resources are consumed by or on behalf of processes,
so the tool should be able to show which processes (over)load the
four resources.
- A monitoring tool takes snapshots of the system, using a certain interval. If a process used resources since the last snapshot but has exited before the current snapshot, the tool should still be able to show which processes loaded which resources. In other words: the sum of resource usage by the processes should be equal to the system wide reported resource usage.
Looking at this list of requirements, none of the existing standard analysis tools meets the bill. sar shows extensive data regarding CPU, memory, disk and network usage from the past and the present. However, it cannot "zoom in" on processes: it only shows resource usage on a system level. vmstat and iostat can only show CPU, memory and disk usage on a system level; they cannot show usage data from the past. Finally top, one of the most used performance monitors, does show CPU and memory usage on a system level and on a process level. However, it only shows the current situation, it cannot show usage data from the past. It also does not show the resource consumption for exited processes, so with top it is possible that on a system level the CPU is shown as 90% busy, while the sum of all CPU consumption on a process level is only 40% (the other 50% might have been used by processes that exited between the previous and the current snapshot).
This chart compares the characteristics of these other analysis tools with atop:
atop is free software, and can be downloaded from the web site, though many Linux distributions include atop in their repositories. After installing atop, the command atopsar is also available. It can be compared to sar but references the same log files that are generated and used by atop.
Characteristics of atop
atop was created mainly because the other tools don't report about processes that exit between snapshots. When using "process accounting", the kernel writes a record to a log file for every process that exits. atop will use these records to make a process activity list that is as complete as possible, including processes that exited since the last snapshot.
atop shows the load of the CPUs, memory, disks, and network on a system level. Apart from the network, atop also shows which processes consume these resources (for network utilization per process, a kernel patch is provided). By default, atop shows generic information about processes (like PID, name, CPU utilization, memory utilization, disk utilization, and status). However, more information about the process's memory usage, disk I/O, and scheduling characteristics is available by using single-character keystrokes (for example, s for scheduling characteristics).
Users can always override the default sorting order that atop uses. For example, for more information about a process's memory usage, the M subcommand sorts the processes in descending order of their resident memory usage. But, these processes can also be sorted on their disk I/O usage by using the D subcommand. Typing A will let atop determine what the most sensible sorting order would be given the most heavily used resource at the moment. In the system overview (the top half of the screen) a line will be highlighted if that particular resource is overloaded.
Obviously, not all data about all resources can be shown on the screen at once. Therefore, if the window is resized, atop will automatically show more (or less) data depending on the room available. Configurable priorities are used to determine what data is no longer shown if there is too little space.
Using atop on a system level
The default screen of atop looks like this:
On a system level, not only CPU and memory statistics are visible, but also disk and network usage data. In the example above, the line with label CPU shows a total of (27+61+25+214+73)=400% CPU capacity, so there are 4 CPUs in this system. The lines labeled cpu show the individual CPUs (each rated at 100%). The CPUs are listed in order of busyness. The CPU is considered busy when it is in system mode, user mode, or handling interrupts, and it is idle when idle or waiting for I/O (wait). Therefore, in this case the sort order for the CPUs is 3, 1, 2, 0 as shown in the last column, just in front of the wait percentage "w".
The header line shows that ten seconds have elapsed since the previous snapshot. During this time, four CPUs provide a total of 40 seconds of computing capacity. The line with label PRC shows the sum of the CPU time used by processes, i.e. 5.20 seconds in system mode (corresponds to 26% sys + 25% irq in the line labeled CPU) and 6.20 in user mode (corresponds to 61% user in the line labeled CPU).
A line labeled DSK gives information about a physical disk that has been active in the past interval. It shows the name of the disk, the I/O busy percentage, the number of read and write requests, and the average service time per request (avio). By making the screen wider, more data is shown: the disk bandwidth for reads and writes (in MiB/s) and the average queue length for that disk.
If the system uses LVM or MD software RAID volumes, the same information is shown for each active logical volume of MD volume. In the figure below, it is clear that several writes (111 plus 1) to logical volumes may be combined to fewer writes to the physical disk (54). The combined transfers are larger and therefore use a higher service time per transfer.
In the same way, data for memory availability, usage, paging, and page scanning is shown. The last lines in the system overview show network related data, per interface, on IP-layer level and on transport level.
Using atop on a process level
It is useful to be able to see how busy the system is, but if a system is too busy, the tool needs to be able to zoom in to find the culprit. This is where atop shines. atop tries to make sure the books balance. Other tools do not take processes that exited into account. For example, with top it is possible that on a system level the CPUs are 99% busy, even though top shows only two active processes that together have used only 5% of the CPU in the sample period. One notorious example of this is a kernel compilation: lots of short-lived processes that eat up all of the CPU, but only a few of them show up in top's output.
atop uses process accounting to take into account processes that have exited. In the first full screenshot shown at system level, we can see bzip2 having used 61% of the CPU time. atop lists it in angle brackets, to show that the process has exited. In addition, the exit code (column EXC) is shown. It is interesting to see if a process is eating up CPU power in system mode (system call and interrupt handling) or in user mode. Fortunately, atop shows you both, per process.
The Linux process scheduler determines which process gets the CPU. Scheduling information can be seen by using the s subcommand:
One can see how many threads are in the "running" state (TRUN), "sleeping interruptible" (TSLPI), or "sleeping uninterruptible" (TSLPU). The scheduling policy (normal, round-robin realtime, fifo realtime, etc.) is also shown.
CPU is not the only scarce resource in the system. Therefore, atop can also show per-process usage statistics for memory, disk bandwidth, and network bandwidth. Zooming in on disk statistics (using the d subcommand), atop shows the following:
Recent kernels are often configured with the option "Per-task storage I/O accounting", so the kernel keeps track of how much data is passed by the write and read system calls related to disk I/O (WRDSK and RDDSK respectively). However, not all write system calls will lead to physical writes on disk. For example, if a region of a file is written and then overwritten before the data was flushed from the page cache to disk, the first writes are shown by atop even though they have never been written to disk. In this case, the column WCANCL shows the amount of data whose physical write was canceled. In the example above, the actions of tar canceled 172KiB worth of writes.
Extra information with patches
The kernel does not register network bandwidth usage per process. Patches are available that make the kernel keep track of network usage per process. After receiving the n subcommand, atop will show network related data per process:
The TCPRCV/TCPSND and UDPRCV/UDPSND columns show the number of packets being received and sent per process by these transport layers. The RAWRCV and RAWSND show the number of "raw" packets received and sent. These are packets that go directly from the application to the IP layer, not passing through TCP or UDP. For example the ping program sends ICMP ECHO REQUEST packets directly through the ICMP layer to receive ICMP ECHO REPLY packets.
The TCPSASZ column shows the average send transfer size. If the screen is wide enough, the average receive transfer size is also shown, both for TCP and for UDP.
Unfortunately, these patches are not part of the mainline kernel. In 2008 an attempt was made to merge them, but the modifications conflicted with other new features (like cgroups) that were under development at the time.
Back to the future
atop is useful as a tool for the here and now. But what if the system was slow in the past? The normal installation of the atop package starts an atop daemon nightly. This daemon takes snapshots and writes them to a log file (/var/log/atop/atop_YYYYMMDD). The default snapshot interval for a logging atop is 10 minutes, but obviously this is configurable. Every logfile is preserved for a month (also configurable), so performance events a full month back can still be observed.
The log file can be viewed using atop -r log_filename. The subcommand t forwards to the next sample in the atop logfile (i.e. 10 minutes by default), subcommand T rewinds one sample. Subcommand b branches to a specific time in the current logfile. All other subcommands to zoom in on specific resources also work. The logfile that the atop daemon creates can also be viewed using a sar-like interface using the command atopsar.
One-stop analysis
Performance analysis is a cyclic process of measuring, drawing conclusions, measuring in more detail, drawing more detailed conclusions, and so on until you can really pinpoint the aching spot. Performance problems do not only occur in the here and now, but also in the past, so a performance analysis tool should be able give information in both situations. As described in this article, atop has been designed to be a complete one-stop tool to guide you at least through the first cycles (system and process level for the four most critical resources: CPU, memory, disk I/O and network).
More information about atop can be found on the atop website, as well as in the atop man page [PDF]. There is also a case study [PDF] available that shows how atop can be used to analyze a problem with processes leaking memory.
Brief items
Emacs 23.2 released
The Emacs 23.2 release is out. There's a lot of new stuff, but, arguably, the most significant change is the incorporation of the Collection of Emacs Development Environment Tools (CEDET), which brings a project management utility, function and variable name completion, a code generation mechanism, and, naturally, a mode for manipulating UML diagrams within the editor. See the NEWS file for a full list of additions and changes.Jato v0.1.0 released
The Jato v0.1.0 release is out. "Jato is an open source, JIT-only virtual machine for Java that aims to support the JVM specification version 2 including 1.5 amendments. It is not complete but you can run some Java applications with it on 32-bit x86 Linux machines. Jato uses Boehm GC as its garbage collector and GNU Classpath to provide essential Java run-time libraries."
The Koha "Harley" release
As noted by LWN reader "hackerb9," LibLime has announced the "Harley" release of the Koha library management system. "This release includes all functionality in Koha as of October 2009, plus many new features not yet available. Harley is compatible with both 3.0x and upcoming 3.2 versions. The code for the individual new features included in this version has also been made available for download from the GIT repository." This release is an important step toward the resolution of the disagreement between LibLime and the Koha development community we reported on last week.
LLVM gets its own C++ standard library
The LLVM libc++ project has announced its existence. "libc++ is an implementation of the C++ Standard Library, with a focus on standards compliance, highly efficient generated code, and with an aim to support C++'0x when the standard is ratified." The project is said to be "
approximately 85% complete at this point".
MythTV 0.23 released
The MythTV 0.23 release is out. "MythTV 0.23 brings a new event system, brand new python bindings, the beta MythNetvision internet video plugin, new audio code and surround sound upmixer, several new themes (Arclight and Childish), newly resynced ffmpeg libraries, and fixes for analog scanning, among many others." More information can be found in the release notes.
py.test-1.3.0 released
Version 1.3.0 of py.test - an automated testing tool for Python2 and Python3 programs - has been released. There's a long list of new features; click below for the changelog.Ryzom source released as free software
Back in 2008, LWN covered the story of Ryzom, a popular multiplayer role-playing game which was owned by a company in bankruptcy. An attempt to acquire and liberate the source at that time failed, but good things come (sometimes) to those who wait: Ryzom is now available under the AGPLv3 license. The artwork, too, is available under the Creative Commons Attribution-ShareAlike license. "By freeing Ryzom code, Winch Gate is transforming the MMORPG marketplace and is setting a precedent for how gaming software should evolve--in freedom. The source code released totals over two (2) million lines of source code and over 20,000 high quality textures and thousands of 3D objects."
Newsletters and articles
Development newsletters from the last week
- Caml Weekly News (May 11)
- PostgreSQL Weekly News (May 9)
Aknin: Guido's Python
Yaniv Aknin has launched into a series of articles intended to describe how Python works from the inside; thus far, the introductory and Objects 101 installments are available. "I want to see Python more like Guido does, so I'm starting out on what should develop to a series on Python's internals. On the curriculum is mainly CPython, mainly py3k, mainly bytecode evaluation (I'm not a big compilation fan) - but practically everything around executing Python and Python-like code (Unladen Swallow, Jython, Cython, etc) might turn out to be fair game in this series."
Firefox 4: fast, powerful, and empowering
Mike Beltzner, Mozilla's Director of Firefox, shares some early plans for Firefox 4. "The primary goals for Firefox 4 will be making a browser: * Fast: making Firefox super-duper fast * Powerful: enabling new open, standard Web technologies (HTML5 and beyond!), * Empowering: putting users in full control of their browser, data, and Web experience."
News In The Linux Audio World (Linux Journal)
Dave Phillips has a brief report "about a new Linux audio blog, music made by particle acceleration, how to use a laptop as a virtual music stand, synth emulation from the terminal command prompt, and watching the Linux Audio Conference on-line."
Hammond: Looking Back on Review Board
On his blog, Christian Hammond looks back at three-and-a-half years of development on the code review tool, Review Board. "Attempt #3. I decided to build our own diff parser and generator from scratch. What a project. I knew nothing about diff generation and hardly knew where to start. I spent probably a good month or so just trying to work on this new diff code, and was so close to giving up so many times. It ended up being completely worth it, though, as we ended up with a very nice, extensible diff parser. [...] Without that third attempt, we'd be in the stone age. Review Board would not be as nice to use. We wouldn't have inter-line diffs (where we highlight what changed in a replace line), syntax highlighting, move detection (coming in 1.5), or function/class headers (where we show which function/class the part of the diff is in — also coming in 1.5)."
Williams: Eat Burgers on the Short Bus
Dan Williams, developer of NetworkManager, looks at D-Bus on his blog. In particular, he has found himself periodically giving mini-lessons about D-Bus; he captures that in this explanation of D-Bus concepts along with a concrete example of the interprocess communication (IPC) mechanism. "service: a program that responds to requests from clients. Each service is identified by a 'bus name' which clients use to find the service and send requests to it. The bus name usually looks like org.foobar.Baz. A program can claim more than one bus name; NM claims org.freedesktop.NetworkManager and org.freedesktop.NetworkManagerSystemSettings, each is a unique service which provides different functionality to clients."
Page editor: Jonathan Corbet
Announcements
Non-Commercial announcements
FSF launches free software extension listing for OpenOffice.org
The Free Software Foundation (FSF) has announced a project to assemble a replacement extension library for OpenOffice.org, which will list only those extensions which are free software. ""OpenOffice.org is free software, and an important contribution to the free software community. However, the program offers the user a library of extensions, and some of them are proprietary. Distributing OpenOffice.org in the usual way has the effect of offering users the nonfree extensions too," said FSF executive director Peter Brown."
OpenOffice.org's Community Council responds to the FSF
The OpenOffice.org community council has posted a response to the Free Software Foundation's alternative extension listing. "The OpenOffice.org Community Council has been asked by the FSF to give the FSF an effective veto over which extensions should be permitted to appear in this repository. The Community Council has felt unable to do this. We believe passionately that FOSS delivers better software - including extensions, but that users must be free to make the comparison and reach their own conclusion."
GNU Project launches accessibility initiative
The GNU project has announced the launch of a new initiative aimed at improving accessibility in free software; it will be headed up by Chris Hofstader, a co-founder of the League for Programming Freedom. "GNU Accessibility is a free software pan-disability initiative to create features that can be used by people with low vision, deafness, learning and reading disabilities, and for people with mobility and other physical issues who can use an on-screen keyboard. According to the United Nations in 2005, there were 600 million people with disabilities in the world -- an exceptionally large and disenfranchised group."
Commercial announcements
Mandriva looking for a buyer
Here's an article (in French) on the mandrivalinux-online news site stating that the company is having cash-flow problems and is looking for a buyer. Possibly interested companies are said to include LightApp and Linagora. A translation of sorts is available via Google.
Articles of interest
CodePlex Users get Application Analytics Data (Port 25)
Microsoft's Port 25 blog ("Communication from the Open Source Community at Microsoft") reports on an upgrade to the CodePlex open source project hosting site that allows projects to instrument their code to get runtime analytics. For those open source projects that aren't using Visual Studio 2010, it recommends something called "Dotfuscator", which is a proprietary .NET binary obfuscator, to inject the intstrumentation. (VS2010 evidently already includes it). This is a rather different view of open source than we are used to. "The Runtime Intelligence Service lets developers inject usage instrumentation directly into application binaries. When the application is run by an end-user, the instrumentation will collect analytics data from the application, but no personally identifiable information is ever collected, and applications can include opt-out dialogues."
Legal Announcements
U.S. Lets Hollywood Disable Home TV Outputs to Prevent Piracy (Bloomberg)
Bloomberg reports on a recent decision [PDF] by the US Federal Communications Commission allowing broadcasters to disable "unprotected" output on set-top boxes for early-run movies. "The FCC order 'will allow the big firms for the first time to take control of a consumer's TV set or set-top box, blocking viewing of a TV program or motion picture,' Gigi Sohn, president of Washington-based Public Knowledge, said in a statement." Naturally, requirements for this type of antifeature also make it impossible to have truly free set-top boxes.
Contests and Awards
Android application contest for PostgreSQL
There is a contest going on now, to create open source PostgreSQL applications for the Android OS. The winner will get a brand-new developer version (unlocked) Nexus One phone from Google. Entries are due by July 6, 2010.
Education and Certification
Workshop on Essential Abstractions in GCC, 2010.
There will be a 4-day instructional workshop aimed at providing details of the internals of GCC (GNU Compiler Collection), July 5-8, 2010, in Bombay, India. "Who should attend this workshop? Anybody who has done at least a first level undergraduate course in compiler construction and has some experience of either working in compilers or teaching compilers. A sound understanding of the process of compilation is a must. Familiarity with Unix/Linux (particularly, the command line style of working) is absolutely necessary."
Education for an Open Web
The Mozilla Foundation and the Shuttleworth Foundation are jointly offering an Education for the Open Web Fellowship. The call for proposals is open until June 7, 2010. "We invite applications from individuals interested in developing innovative approaches that educate people how to promote the open web. The applicant should demonstrate practical ideas that will allow large numbers of people to learn about, improve and promote the open nature of the Internet. The fellowship should not be considered an academic fellowship aimed at research."
XtreemOS summer school
XtreemOS summer school will take place July 5-9, 2010 in Günzburg, Germany. "The XtreemOS Summer School will include lectures on modern distributed paradigms such as /*Grid computing, Cloud computing, and network-centric operating systems*/. The Summer School will combine lectures from research leaders shaping the future of distributed systems and world leaders in deploying and exploiting distributed infrastructures. Hands-on laboratory exercises and practical sessions using XtreemOS will give participants experience on using modern distributed systems."
Meeting Minutes
GNOME Foundation Meeting Minutes Published - April 29, 2010
Click below for the minutes of the April 29, 2010 meeting of the GNOME Foundation.
Calls for Presentations
Announcing the 3rd Free Culture Research Conference
The 3rd Free Culture Research Conference takes place October 8-9, 2010 in Berlin, Germany. "The Free Culture Research Conference presents a unique opportunity for scholars whose work contributes to the promotion, study or criticism of a Free Culture, to engage with a multidisciplinary group of academic peers and practitioners, identify the most important research opportunities and challenges, and attempt to chart the future of Free Culture." The call for papers is open until June 7, 2010.
Upcoming Events
Akademy 2010 Conference Program Available (KDE.News)
KDE.News has announced the conference program for Akademy 2010 is available. Akademy 2010 takes place in Tampere, Finland, July 3-11, 2010. "Akademy is the annual conference of the KDE community and open to all who share an interest in what we do and have accomplished. This conference brings together artists, designers, programmers, translators, users, writers and other contributors to celebrate the achievements of the past year. Moreover, at Akademy visions for the next year are defined in mutual discussions between the participants. Of course we will also work on our technology, enjoy good company and entertainment and check out Tampere. The conference provides a great opportunity to meet likeminded people, to discuss and share ideas, learn about the latest desktop technology and be excited about what is coming."
Libre Graphics Meeting community fundraiser
The Libre Graphics Meeting (LGM), which will be held May 27-30 in Brussels, Belgium, is trying to raise $10,000 from the community to help defray the costs of the free conference as well as for travel assistance to attendees. Donations can be made here and so far LGM has gotten over $5,000 from the community. It is part of the "10by10by10" campaign to raise $10K each from grants, corporations, and the community. "Developers from GIMP, Inkscape, Blender, Krita, Scribus, Hugin, the Open Clipart Library, the Open Font Library, and other open source projects are scheduled to appear. Technical talks will showcase new work in digital asset management, natural-media simulation, and internationalized font design. The program will also emphasize real-world usage of open source graphics software in professional publishing houses, multimedia production, and both the secondary education and art school classroom." Click below for more information about LGM and the fundraising drive.
LinuxCon Keynotes Announced
The Linux Foundation has announced the keynotes for this year's LinuxCon, taking place in Boston, MA, August 10-12, 2010.Events: May 20, 2010 to July 19, 2010
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| May 17 May 21 |
Fourth African Conference on FOSS and the Digital Commons | Accra, Ghana |
| May 18 May 21 |
PostgreSQL Conference for Users and Developers | Ottawa, Ontario, Canada |
| May 24 May 25 |
Netbook Summit | San Francisco, CA, USA |
| May 24 May 30 |
Plone Symposium East 2010 | State College, PA, USA |
| May 24 May 26 |
DjangoCon Europe | Berlin, Germany |
| May 27 May 30 |
Libre Graphics Meeting | Brussels, Belgium |
| June 1 June 4 |
Open Source Bridge | Portland, Oregon, USA |
| June 3 June 4 |
Athens IT Security Conference | Athens, Greece |
| June 7 June 10 |
RailsConf 2010 | Baltimore, MD, USA |
| June 7 June 9 |
German Perl Workshop 2010 | Schorndorf, Germany |
| June 9 June 12 |
LinuxTag | Berlin, Germany |
| June 9 June 11 |
PyCon Asia Pacific 2010 | Singapore, Singapore |
| June 10 June 11 |
Mini-DebConf at LinuxTag 2010 | Berlin, Germany |
| June 12 June 13 |
SouthEast Linux Fest | Spartanburg, SC, USA |
| June 15 June 16 |
Middle East and Africa Open Source Software Technology Forum | Cairo, Egypt |
| June 19 | FOSSCon | Rochester, New York, USA |
| June 21 June 25 |
Semantic Technology Conference 2010 | San Francisco, CA, USA |
| June 22 June 25 |
Red Hat Summit | Boston, USA |
| June 23 June 24 |
Open Source Data Center Conference 2010 | Nuremberg, Germany |
| June 26 June 27 |
PyCon Australia | Sydney, Australia |
| June 28 July 3 |
SciPy 2010 | Austin, TX, USA |
| July 1 July 4 |
Linux Vacation / Eastern Europe | Grodno, Belarus |
| July 3 July 10 |
Akademy | Tampere, Finland |
| July 6 July 9 |
Euromicro Conference on Real-Time Systems | Brussels, Belgium |
| July 6 July 11 |
11th Libre Software Meeting / Rencontres Mondiales du Logiciel Libre | Bordeaux, France |
| July 9 July 11 |
State Of The Map 2010 | Girona, Spain |
| July 12 July 16 |
Ottawa Linux Symposium | Ottawa, Canada |
| July 15 July 17 |
FUDCon | Santiago, Chile |
| July 17 July 24 |
EuroPython 2010: The European Python Conference | Birmingham, United Kingdom |
| July 17 July 18 |
Community Leadership Summit 2010 | Portland, OR, USA |
If your event does not appear here, please tell us about it.
Web sites
LinuxExchange.org
LinuxExchange.org is a new web forum for Linux and Open Source. It is built on the StackExchange platform, which allows for collaborative editing of questions and and answers.TodaysLinux launches
Rob Kennedy has returned to the Linux community with a new forum site called TodaysLinux. "Today's Linux is different than yesterday's.. the users span from those same power users to parents and kids. Most people could use Linux without any questions or issues. The auto dealership I work for has 6 Ubuntu desktop installations set up and there's all different types of people on them throughout the day not even aware that they're using Linux. That didn't happen though without people helping out - TodaysLinux.com is a place where you can help those that need it - and ask questions about things you don't know about yet."
Audio and Video programs
Video recordings from the 2010 Linux Audio Conference
One day after the completion of the 2010 Linux Audio Conference, a set of videos from the talks has been released. Included are talks from Fons Adriaensen, Lennart Poettering, John Kacur, and more - all in Theora format, naturally.
Page editor: Rebecca Sobol
