The very first LinuxConf Europe event was held in Cambridge, UK, in the first week of September. This conference is the result of a cooperation between the UK Unix User Group and the German Unix User Group; it is, in a sense, a combination of the UKUUG and Linux-Kongress events held in previous years. Talks by Dirk Hohndel and Michael Kerrisk were published last week. Here is a summary of some other LCE events.
Power management remains the focus of a great deal of attention. Arjan van de Ven started off a set of power-related talks with an overview of where the problems are. His biggest point is that software is a critical part of the power consumption picture; contemporary hardware provides a number of power-saving features, but software has a tendency to defeat them. Many of the ways in which this happens have been covered here before, so there is no need to repeat them. The core lesson here is that transitions between power states are expensive, so it is important that hardware components, once put into a power-saving state, be allowed to stay there for some time.
In the case of the CPU, idle periods of 20ms to 50ms are needed for effective power savings. Past kernels have rather defeated that goal, though, by receiving a clock interrupt every 1-10ms. The dynamic tick patches have finally fixed that problem, making it possible for longer sleeps to happen. But then user space comes along and ruins things. Since the advent of PowerTop, though, improvements have been coming quickly. Many distributions now consume at least 30% less power in typical laptop use.
Things may be getting better, but Matthew Garrett started the following session by noting that Linux still sucks - at least, it sucks power. This is a problem, he says, because getting half the battery lifetime as Windows on the same hardware is really embarrassing. Systems are still waking up far too much; the problems exist in both kernel and user space.
On the kernel side, the usual culprits - device drivers - are a big part of the problem. There are quite a few drivers which poll their hardware - sometimes up to 100 times every second. In some cases this cannot be avoided; the hardware may be broken in a way which requires this kind of polling. But in other cases the polling can be made smarter - such as turning it off when the device is not in use. There is still work to be done in this area.
User-space applications remain a problem. People tracking down wakeups often blame the X server, but the real trouble is usually the applications which are causing X to wake up. There is a tool in the works which will identify the real source of X wakeups; this is a good thing: once problems are identified they are usually fixed pretty quickly. Polling for vertical retrace periods (so that the display can be updated without artifacts) seems to be a particular problem; some API work is being done to make it easier to avoid this polling. Evidently there are also some applications which repeatedly ask the server if a particular extension is available; since the set of extensions does not change while the server is running, there is little point in doing this.
There are some interesting things which can be done to better use the power-saving features of the hardware. For example, some framebuffers can compress the video data into a dedicated memory area, then drive the video from the compressed data. This technique reduces video memory bandwidth, saving power (up to half a watt) in the process. An interesting consequence is that the amount of power saved is dependent on how well the screen's contents compress - a user's choice of background wallpaper will affect their power usage.
Finally, there is a lot to be gained if device drivers can communicate more information to user space, making polling unnecessary. Applications which poll for changes to the audio volume are an example here; if the sound system simply told them that the volume had been adjusted, they could update their displays and go back to sleep.
Jörn Engel gave a talk on the death of hard disks. His core point is that flash-based storage is faster, requires less power, makes less noise, and is more robust than rotating storage. It is also more expensive, for now, but flash is getting cheaper much more quickly. Jörn projects that flash-based drives will become more economical than hard drives between 2012 and 2019, depending on which drives one looks at.
Flash makes life easier in a number of ways; the lack of seek delays, for example, means that much of the trouble the kernel goes to in scheduling of block I/O operations can be eliminated. On the other hand, flash has challenges of its own: it is not quite the random-access array of blocks that one would like. In particular, writing to flash requires dealing with wear-leveling issues, erase operations, and more.
Manufacturers have done their best to paper over these issues through the use of translation layers which make a flash array look like a simple disk drive. These layers make it easier to use flash with existing software, but there are problems: performance is not always what one would like, and there can be hidden caches which delay the persistent storage of data. So Jörn has a request to the flash manufacturers: give us direct access to the flash array, without translation layers, and let us figure out how to best support it.
Chris Mason is not waiting for flash to take over; instead, he is working on the next-generation Linux filesystem for rotating disks. The result, Btrfs, was the subject of Chris's talk at LCE. LWN covered Btrfs last June.
Chris's motivation is the fact that disks are, for all practical purposes, getting slower - the time required to read an entire disk is growing. Most systems still store large numbers of small files, leading to a lot of wasted space. Btrfs tries to address these issues and provide a number of interesting features as well. It is extent-based, resulting in more efficient storage of larger files. Small files are packed into the filesystem tree itself, eliminating the internal fragmentation experienced by a number of other filesystems. It has indexed directories, data and metadata checksums, efficient snapshots, sequence numbers in objects (facilitating quick and easy incremental backups), an online filesystem checker in the works, and more.
The directories are actually indexed twice. One index is there for fast filename lookup; the other one, instead, lets the readdir() system call return files in inode-number order, speeding filesystem traversals. Extended attributes are stored as directory entries. Every file has a backpointer to its containing directory - and, yes, multiply-linked files have backpointers to all of the directories in which they are found.
Perhaps the most fun part of the talk was the plots Chris has generated from various benchmark runs. The limiting factor on filesystem performance is generally disk seeks; it is important to minimize disk head movement. In general, ext3 tends to move the disk head all over the platter during benchmark runs while Btrfs and XFS do better. Chris noted that better writeback clustering in the virtual memory subsystem would help ext3.
More benchmark plots (some animated) can be found in the Btrfs benchmark and Seekwatcher pages. Toward the end, Chris was asked whether performance slows down when the disk gets full. The answer was "no" because the system crashes instead. That's a good reminder that Btrfs remains an early-stage development; the on-disk format has not even been finalized yet. But the production version of Btrfs is certainly something to look forward to.
Back in 2000, the British Computer Society awarded its Lovelace Medal to Linus Torvalds. In 2007, the society finally caught up with him to deliver the medal - though, as speaker Dr. David Hartley noted, they probably were almost as quick as the post office would have been. As is typically the case, Linus seemed somewhat embarrassed by the attention.
LinuxConf Europe intends to be a conference on a truly European scale. To that end, next year's event will likely move to Germany; the details were not yet finalized to the point that the location could be announced at this year's conference, though. LCE, helped by the kernel summit, has gotten this institution off to a good start; your editor is looking forward to next year's edition.
With its first alpha just released, Python 3.0 (aka Python 3000 or Py3k) is making progress, though a final release is still a year off. Py3k overhauls the language core, removing inconsistencies and other "warts", without maintaining compatibility with the 2.x version. Various standard Python idioms go by the wayside and it will take some getting used to.
One of the driving forces for Py3k is to handle unicode strings in a uniform way. In the 2.x series, unicode handling has bugs, especially when mixing encoded and unencoded text. The Py3k solution is to separate strings, which contain decoded text, and byte-strings which are binary data into two distinct types, str and bytes. Those types cannot be combined without converting one via the encode() and decode() methods. The drawback to this change is explained in the What's New in Python 3.0 document:
This also leads to a distinction that needs to be made when handling files. Files are either binary or text files, with text files requiring an encoding to be specified when they are opened. If the wrong type or encoding is given, I/O to the file may fail.
One very visible change – perhaps the most controversial – is eliminating the print statement, moving it to a function. The change is being made mostly for consistency, as there is no other language statement like print, but it also adds additional features. One can now specify a separator, line ending, and file directly, there is no need for the print >>sys.stderr, "error" syntax, instead that becomes print("error", file=sys.stderr). As the "What's new" document points out:
Another area that has changed significantly is the dict methods. The keys(), items(), and values() methods no longer return lists, so code that treats them that way will fail. They now return something called a "view" that references the dict directly, producing values as they are needed, much like an iterator. In addition, the has_key() boolean method has been removed, the in operator should be used instead.
There are lots of smaller changes that will catch the unwary. Many of the features removed have been deprecated for some time, but, for programmers who don't follow Python language development closely, they may surprise. The raise statement has different syntax, integer division no longer truncates, instead it returns a float (with // used to get the old behavior), xrange() has been removed, and so on. It adds up to a substantial pile of things to deal with when moving existing code to Python 3.
The migration from 2.x is being assisted by the development of Python 2.6, which is slated for release in April 2008. It will provide a Py3k warnings mode that complains at runtime when a feature is being used in a way that is incompatible. It will also have many of the new features enabled, either as __future__ imports or just added into the language if it doesn't conflict with 2.x syntax. The 2to3 tool is also being developed to translate 2.6 constructs into their 3.0 equivalents. The Python Enhancement Proposal (PEP) governing the Py3k plan (PEP 3000) gives an overview of how code can be maintained to run on both 2.6 and 3.0. It sounds somewhat painful, but incompatible language changes are never easy.
There is still plenty of work to be done, the final release of 3.0 is currently scheduled for August 2008. One of the bigger remaining chunks is a reorganization of the standard library namespace. PEP 3108 lays out the changes to be made, including removing older, unsupported, or rarely used modules, renaming modules to conform to the naming standard, merging the C and Python implementations of modules (i.e. cPickle goes away and is replaced with pickle). It cleans up what had become a bit of a mess over time.
All of these changes have not come about without some objections, both from those who think another incompatible "upgrade" is not warranted to those who think Py3k doesn't go far enough. One area that is not being changed, but is a source of frustration for some, is the "global interpreter lock" (GIL), which only allows one thread at a time to operate on any Python objects or call out to C language extensions. Especially with the advent of multi-core and multi-CPU systems, the lock is very restrictive, serializing most of the core language processing.
Guido van Rossum, Benevolent Dictator for Life (BDFL) of the Python language has been very open about addressing these concerns on his All Things Pythonic weblog. That doesn't mean he plans to change things, especially with regards to the GIL, but he puts together a well reasoned defense, mostly concerning the performance of the language with finer-grained locks. He is clearly not much of a fan of multi-threaded programming with its attendant race conditions, deadlocks, and other issues, but he is not opposed to efforts to remove the GIL either. As he points out, it is not inherent in the Python language, but is an attribute of the current language implementation, other implementations (Jython, IronPython) do not have the GIL.
There are fundamental changes in Python 3, it will be interesting to see how quickly it is adopted after being released. People learning Python won't need to learn Py3k for another two years or so, according to van Rossum, and should, instead, concentrate on 2.x (which means 2.5 until April). The unicode handling rework will probably be enough to get the increasing number of localized programs updated, but the rest of the changes are not terribly compelling. It is likely that there will be Python 2.x programs around for a long time to come.
Purpose-built Fedora distributions, called "spins", are a recent addition to that community in an attempt to reach additional users. The idea is to use tools like Revisor to create a custom collection of software that work well together for a particular set of tasks. This collection can then be installed or run from a live CD, providing an easy means to have the right collection of tools immediately, rather than after a lengthy yum install pass.
The concept itself is not new, there are many distributions targeted at a particular subset of users. Typically, other popular distributions (Debian and Ubuntu in particular) have been used as the basis for them. The Fedora project is embracing the idea, pulling together a list of the spins and elevating at least two to the status of "official spins". The idea is to appeal to those who don't want to be bothered with tracking down, installing, and configuring the tools needed for their task; instead it is all packaged for them.
Starting with Fedora 7, two official releases of the distribution are available, one for each of the dominant desktops. For Fedora 8, there will also be a developer spin, which has the explicit goal of attracting more Fedora developers. It will include Eclipse, perhaps other integrated development environments (IDEs), gcc and friends, emacs, SystemTap, and other developer tools. Other ideas, such as a working Xen virtual machine and targeting web developers, have been discussed as well.
The other official spin for Fedora 8 is the Fedora Electronic Lab (FEL). This project pulls together the tools for electronic design and configures them to work well together. A wide variety of software for circuit simulation, hardware development in VHDL and Verilog, Very Large Scale Integration (VLSI) design, and embedded systems development are included. Universities are high on the list of target audiences, with the FEL website claiming 250 universities already using Fedora; attracting more is one of the goals.
Several other spins are being worked on as well, not "officially", but there does seem to be some serious work going into them. The Security LiveCD is a Fedora 7 based spin for security auditing and testing. It contains all of the tools that an administrator or security researcher might need to do forensic analysis of a rooted machine, check a network for vulnerable hosts, or do penetration testing. Since it can be booted directly from a read-only device, risks of infection from any malware are eliminated. Any machine can be quickly turned into a security workstation by using a distribution like this.
Another ambitious project is the Fedora Art Studio. This spin not only collects the tools into one package, it also pulls in content likely to be useful to artists, desktop publishers, animators, and other creative folks. There are collections of clip art, fonts, textures, brushes, and so on, all with free licenses. There are also tutorials included to get people up to speed on the various packages. Plans are to include default Firefox bookmarks for useful sites as well.
Other spins are listed on the site, ranging from the Creative Commons LiveContent spin (covered by LWN here) to a SystemTap live CD. The Fedora wiki has various Howtos on remixing and rebranding Fedora, as well as using the Live CD tools. Most people who want to build a custom spin will start by using the Revisor GUI tool, which provides options for installation, live or virtualization (for Xen or KVM virtual machines) media for CDs, DVDs, USB thumb drives and more. The project has clearly put a lot of time and effort into making it as easy as possible to create new spins from the large repository of Fedora software.
It remains to be seen if any of these spins become popular, but it may be a good way to introduce new users to Fedora. It is unlikely that power users will find a spin that covers all of what they use, but they just might find one that serves as a good starting point. They can either customize their own spin from there or use the usual repository tools to grab whatever extras they need. For a distribution that, until recently, had a reputation for not working with the community, this effort may go a long way towards erasing that history.
LWN recently tried a new (for us) form of advertising, known as "in-text" advertising – ads that pop up from highlighted keywords in an article. When we announced the change, it was obvious from the comments that it was a tad unpopular. Truth to tell, they started getting on our nerves more as time went on; they didn't seem quite so annoying when running it on our development systems. We have discontinued the ads; they will not be coming back.
A lot of good points were made in the comments, we appreciate the time you took to make them. Our readers are (obviously) very important to us; your opinions on what works and what doesn't are always carefully considered. There were also several interesting suggestions made, we will be pondering those as we make plans.
We do want to dispel one concern that we heard. We are not under an imminent threat of going under. We are proceeding with the plan we laid out in May: working on the revenue side of the business while producing the same quality of content you have come to expect. There will be other experiments along the way; some will fail, hopefully some will succeed as well.
Page editor: Jonathan Corbet
Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds