LWN.net Logo

LWN.net Weekly Edition for August 7, 2008

Kernel Hacker's Bookshelf: The Practice of Programming

August 6, 2008

This article was contributed by Valerie Henson

In The Mythical Man-Month, Fred Brooks observes that the productivity of experienced programmers frequently varies by a factor of 10 or more. What makes the 10x programmers so much better? Undoubtedly some of the difference is due to native facility with language or logic. But even with these advantages, no one is born writing beautiful, elegant, maintainable code; everyone goes through a learning process.

How do we learn to be good programmers? In many ways, the art of computer programming is still stuck in the era of the master-apprentice system. Some of us are lucky enough to learn to program in something like "the UNIX room" at Bell Labs, where you could shoulder-surf the likes of Ken Thompson and Dennis Ritchie. Occasionally someone practices pair-programming instead of just arguing passionately about it, and once in a very long while, a 10x programmer will actually teach another person how to program. Unfortunately, formal university education rarely teaches students about the practical aspects of programming, as any holder of a computer science degree will readily attest, and few programmers have the time, interest, or ability to write accessible books about programming. As a result, most programmers are doomed to a decade of re-inventing wheels by trial and error.

Brian Kernighan and Rob Pike are two 10x programmers who do have the time, interest, and ability to write a book about software engineering best practices. The Practice of Programming aims to fill the gaps in the training of most computer programmers. From the book:

Topics like testing, debugging, portability, performance, design alternatives, and style - the practice of programming - are not usually the focus of computer science or programming courses. Most programmers learn them haphazardly as their experience grows, and a few never learn them at all.

This book probably won't make you ten times more productive, but it can easily make you twice as productive (and half as frustrated). If I could send one book to a programmer trapped on a desert island, this would be the book - and I'd send the same book to the new programmer who just joined my development team.

Overview

The Practice of Programming differs from most programming books in several enjoyable ways. Rather than promoting a particular new programming philosophy, Kernighan and Pike focus on three principles: simplicity, clarity, and generality. As you might guess from the title, the book is short on theory and long on practice. About one third of the ~250 page book is taken up by actual real-world example code, starting with the original dodgy code and showing the step-by-step evolution to better code. Most examples are in C, but the principles illustrated readily translate to other languages.

The writing style of this book is refreshingly practical and down-to-earth, without losing generality. The authors avoid stark black-and-white pronouncements, preferring to discuss why different techniques are useful under different conditions. Clarity is another hallmark of their style; they use as few words as possible to clearly state each point, and dismiss trivialities and side issues quickly and cleanly. A typical example of this approach is their advice on brace and indentation style: "The specific style is less important than its consistent application. Pick one style, preferably ours, use it consistently, and don't waste time arguing."

The book is organized into nine chapters, each covering a topic such as testing or debugging that usually requires an entire book on its own. The table of contents includes headings like "Test as You Write the Code," "Consistency and Idioms," "Strategies for Speed," "Other People's Bugs," and "Programs that Write Programs." I can't cover the whole book in this review, but I'll go into detail on two of my favorite chapters, "Performance" and "Notation."

Performance

The introduction of this chapter gives some very direct advice: "The first principle of optimization is don't." Computers are fast - go run lmbench on your desktop to update your sense of just how fast. For example, some system calls are now in the sub-microsecond range under Linux on modern hardware. Armchair optimization - the practice of making small theoretical optimizations as you code, at the expense of readability, portability, or correctness - is especially foolish in light of Donald Knuth's observation that 4% of the code typically accounts for more than half of the run-time of the program. Kernighan and Pike's first piece of advice is to write simple, clear, concise code, and optimize only when you have some tangible reason to do so.

The chapter begins with a real-world optimization problem: a spam-filter that worked well enough in testing but bogged down in production. The tangible reason for optimizing this program is that the mail queues were filling up with undelivered mail - a clear justification for optimization if there ever was one. The authors show the process they went through to optimize the spam-filter, step-by-step: profiling, analysis, a first attempt at optimization, re-factoring the problem, addition of pre-computation, and measurement of the results. This overview is welcome not only as a good programming war story but also because the overall flow of code optimization is non-obvious (otherwise, "How would you go about optimizing a program?" would not be such a common interview question).

The rest of the chapter talks about best practices for each step of optimization. The first topic is timing and profiling, as it should be. All too often, even good programmers measure performance by "feel" - if you don't believe me, search LKML. Sometimes no easy tool exists to measure what is being optimized, but it's still better to write some kind of measurement tool, no matter how clunky or approximate. Human perception and judgment are heavily influenced by preconceptions and the vast majority of theoretical optimizations have negligible effects on performance. A more subtle piece of advice is to turn performance results into pictures or graphs. Chris Mason's seekwatcher is an excellent example; it turns block traces into graphs - and even movies!

The authors cram a surprisingly complete demonstration of profiling into less than two pages, using prof on their spam-filter as the example. They show how to identify hot spots and do basic sanity checking on the results - e.g., match up the number of times a function call shows up in the profile with the number of iterations of the main loop. While they include some caveats on trusting profiling results, I wish they had spent some time on the design of profiling tools to show the kinds of biases and errors that so often make profiling results misleading. Perhaps it's because I work on systems software, but I've found that I really have to know the details of whether the profiler is using a periodic timer, hardware counters, includes time spent sleeping for IO in the kernel, how many events are dropped or missed, etc. A useful technique to demonstrate, and one in keeping with their minimalist, do-it-yourself philosophy, would be manually bisecting the code with timers to find hot spots when normal profiling tools fail.

The discussion on rewriting code goes beyond "find the top function and optimize it" - it also addresses eliminating calls to hot functions entirely and doing modest amounts of pre-computation. A fair portion of the section on code tuning has been superseded by improved compilers which can do, e.g., loop-unrolling automatically, but it still teaches valuable lessons about how to read code and understand its true cost and complexity.

Notation

The chapter on notation unfolds elegant, beautiful solutions one by one, turning normally painful problems into fun coding exercises. Each technique - little languages, special-purpose notation, programs that write programs, virtual machines - is accompanied by a concrete demonstration of how to implement the bare minimum of the technique to get the job done. The suggestion to "write a new language" seems absurd in the face of most day-to-day programming problems, but writing a very small, very specialized language can save the programmer much time and many bugs, even when replacing only a few hundred lines of conventional code. Their first example, after printf() format specifiers, is a notation for packing and unpacking network packets. I recently implemented this technique and can report that it worked beautifully, repaying the time I invested in it within days of completion.

Another exercise in minimalism is their demonstration of how to write a basic grep in around 100 lines of C, without relying on external libraries. Most of us will never need to re-implement regular expressions from scratch, but we may encounter a problem best solved by writing a small general purpose pattern matcher.

Another example demonstrates the power (and danger) of keeping a variety of scripting languages and data processing tools at your fingertips. The authors implement a crude text-only web browser with about 50 lines of Awk, Tcl, and Perl, again using only built-in language support and no external libraries or modules. Here as elsewhere, Kernighan and Pike refuse to make hard and fast assertions about the One True Scripting Language; they'd rather you used the right language for the right job. From the book:

These languages together are more powerful than any one of them in isolation. It's worth breaking the job into pieces if it enables you to profit from the right notation.

It can be argued that this approach is less justified now, given the modern plethora of scripting languages written specifically to address the limitations of earlier scripting languages. However, their argument still rings true for me, as someone who has never settled down into one scripting language. I have a decade of experience using a hodge-podge of random scripting languages, and when I do write in one scripting language, I end up spending a lot of time contorting language features to fit situations they were not designed for.

The section on virtual machines shows how to implement a minimal special purpose virtual machine (the Z-machine for Zork comes to mind immediately). The remaining sections cover programs that write programs, using macros to generate code (a common technique in Linux header files), and just a little taste of run-time code generation.

Summary

The Practice of Programming embodies its own principles: simplicity, clarity, generality. First published in 1999, it has aged well due to its focus on general principles of good programming rather than language-specific tricks and tips. The book has something to offer to programmers at all levels of experience; beginners will benefit most but experienced developers will appreciate the more advanced and subtle techniques in the later chapters. Of all the books on the Kernel Hacker's Bookshelf, this one should never be missing.

Comments (29 posted)

Firefox to support Theora video

By Jake Edge
August 6, 2008

Video in the browser, at least for Linux, has always resorted to somewhat clunky solutions—Flash plug-ins or external programs—but that is likely to change in Firefox 3.1. Recent commits to the Firefox development tree have added support for the HTML 5 <video> and <audio> tags as well as native Ogg Vorbis and Theora support. Providing multimedia support directly in a free browser, with no plug-in required, is a huge step forward both for Linux and for the royalty-free codecs.

The battle over video and audio formats is an ugly one, largely because they are patent minefields. The "mainstream" formats, MPEG-4 for video and MP3 for audio, are licensed on a royalty basis to companies that want to implement playback. Obviously, Mozilla is not in a position to pay a per-installation royalty, so that leaves various ad hoc methods using Javascript and plug-ins—that users have to track down—to make audio-video playback work in its browser.

[Firefox showing video]

Trying the new feature (seen at left) on one of the recent nightly Firefox builds seemed to work pretty well given that it is still under development. The video played smoothly, but the audio was not functional, only producing a rumbling, clicking soundtrack. The Wikimedia Commons video collection was used to test as it is a nice collection of Theora videos.

Some have seen the lack of Theora content currently on the web as a reason to downplay Firefox's support for the format, which is unfortunate, as Mozilla hacker Robert O'Callahan was quick to point out. Unlike the current situation, once a Firefox with video support is released, there will be one format that all content producers can be sure will be available for Firefox. Depending on whose numbers you believe that means that somewhere between 10 and 25% of web surfers (or more than 100 million people) will be using it.

Even with the dominance of Internet Explorer, the plethora of codec plug-ins has made it somewhat difficult for content providers to decide upon which video formats to support. With a substantial fraction of browsers supporting a particular free format, that situation may change. Wikimedia will certainly help by providing reasons for those not using Firefox to demand Theora plug-ins—if not integrated Theora support—for their browsers. As more content is available in that format, the pressure will build on Microsoft and Apple. As we mentioned in an article on web video formats last December, more content is the key to Theora support.

Some have argued that Vorbis and Theora are just as likely to be patent-encumbered as the more mainstream codecs, but so far that is unproven. There is no licensing authority that claims to have patents covering those codecs. Though Mozilla has some depth to its pockets—largely due to its deal with Google—patent holders might be loathe to attack a free software browser. In many ways, patent holders risk upsetting their entire apple cart if their attacks rise too high into the public consciousness. Though, clearly, Mozilla will be taking on some amount of risk with this move.

There have also been arguments that the Theora codec produces inferior video compared to those used by MPEG-4 and others. There is certainly truth to that assertion, but there is ongoing work to bring Theora more in line with the quality of its competitors. Due to the fact that it isn't controlled by a licensing authority with little or no interest in improving it, there is hope that Theora, or some descendant of it, could produce superior results some day.

Dirac—also known by the name of its C language implementation Schrödinger—is another royalty-free codec that is being looked at for inclusion into Firefox. There are currently some performance issues with decoding, but if those get resolved, there might be two free choices for video codecs in Firefox.

There are lots of entrenched interests that would like to see Theora, Vorbis, Dirac, and others like them disappear. They are quite happy with the current state of affairs. For the most part, though, users are not. Even on "well supported" platforms, video—and to a lesser extent audio—is a confusing jumble of plug-ins and formats that make it somewhat painful to use. Flash and Silverlight are supposed to "solve" these problems, but they do it in a not-quite-free way that still requires plug-ins. If web users start to find it easier to use the video formats embedded in their browser, and content producers take notice, it could completely change video on the web.

Comments (24 posted)

Building custom appliance distributions with rBuilder

By Jonathan Corbet
August 5, 2008
Linux distributions can be a pain. Users have to go through the whole process of installation, configuration, and updates, and, often, all they really want to do is to run a single application. The vendors of that application, meanwhile, feel the need to support as many distributions as possible, even though the actual system running underneath their code is nearly irrelevant. Wouldn't it be nice if users could simply get their desired application as an "appliance" which comes with all the necessary component parts nicely hidden inside?

As it happens, rPath has been in the appliance business for a little while now. Recently, the company has made its appliance-building infrastructure available to free-software products in the form of rBuilder Online. In essence, rBuilder can be used to create and maintain a custom distribution oriented around the delivery of a specific application. The result is a "software appliance" which, in theory, makes the given application available in a self-contained, standalone distribution.

There are a number of example appliances available on the site. They include:

  • Bongo, an attempt to revitalize work on the Hula mail client

  • Gallery, a standalone photo album

  • LochDNS, a DNS server

  • Openfiler, a storage management system

There are several others oriented around content management systems, telephony applications, database servers, and more. All told, quite a few projects have shown interest in creating software appliances for their applications.

Your editor grabbed a copy of the Openfiler appliance and installed it onto a spare box which had been cluttering up the office. Appliances from rBuilder start out looking like a Fedora system; they use the same Anaconda installer. The installed system also shows a lot of Red Hat heritage, such as /etc/sysconfig, various system-config-* commands, an /etc/inittab file which credits Mark Ewing and Donnie Barnes, etc. But there is a crucial difference: there is no rpm command. Instead, these appliances are based on rPath's Conary package management system, which takes a very different approach to the software management problem. But there are still similarities with Fedora: your editor attempted a conary updateall operation on the LochDNS appliance, only to see it fail with a set of file conflict errors; it was almost like running Rawhide again.

[Openfiler admin screen] Appliance users are not supposed to have to dirty their fingertips with command-line administrative operations, though. To help them avoid this fate, rBuilder-based appliances come with the rPath Appliance Platform Agent, otherwise known as a web-based administration interface. Once the user gets past the usual set of obnoxious Firefox dialogs ("this site has an SSL certificate which is not only unknown, but is almost certainly hostile and is ugly besides"), this interface provides a set of administrative screens for standard tasks (networking, updating the system, etc.) along with some specific to the Openfiler application.

In theory, it should be possible to manage one of the appliances without ever going to the command line - or even knowing that the command line exists. In practice, how well that works depends a lot on how the administration screens are designed. In the Openfiler case, quite a bit of clicking around in circles was required, but your editor did finally succeed in setting up a volume based on a USB key, perform a software update, and shut down the system at the end.

The creation of appliances would appear to be relatively straightforward; details can be found in this document. One creates an account in the rBuilder system, then puts together a file describing which components (packages) are necessary in the final system. Those components will presumably include at least one application provided by the appliance builder - that application being the reason for the creation of the software appliance in the first place. The "rMake" system will then pull all of the pieces together, bring in any needed dependencies, and wrap it all up inside a minimal distribution; the resulting system image seems to run at about 300MB.

There are several possible output formats, including the Anaconda-based installation CD image; the rPath folks would appear to have put a lot of effort into making appliances work on a number of virtualization platforms as well. Appliances can be built for VMWare, various forms of Xen, VirtualIron, and Microsoft VHD. Notably absent is anything based on Lguest or KVM. Even more notably absent is any kind of live CD appliance; anything not running in a virtual machine must be installed onto the host system's disks.

rPath's Conary servers seem to be set up to handle software updates. It is also possible to obtain source for the packages found in an appliance through the rBuilder site, though one must do a little digging first. Both of these features are important: anybody creating a distribution-based appliance has to arrange for updates and source availability somehow. One assumes that most appliance creators have no real desire to get into the broader distribution business, so it's nice for them to be able to offload these tasks. Anybody distributing these appliance images should note that rPath does not appear to have undertaken any obligation to continue to provide these services in the future. Should rPath decide to stop, some interesting questions on who is ultimately responsible for satisfying the source-availability provisions of the GPL could come up.

Naturally enough, rPath offers commercial services for those who would like stronger guarantees about long-term support, or who want to include proprietary software in their appliances.

For the time being, this approach to software distribution would seem to be most useful for companies which are in the business of building real, hardware-based appliances. Distributing software in virtual machines has the look of a new and truly impressive form of bloat; even "just enough operating system" is a lot of baggage for an application to drag around. For situations where one wants to try out a complex system, appliance distribution may be worth its cost, but one would probably not want to get every application this way.

There may be value, though, in software distributions which can run almost anywhere, and which can be nicely isolated from the outside world. Locking network-exposed applications - server processes or web browsers - into their own little world could help to avoid a lot of security problems in a way which seems more straightforward than SELinux or containers.

But, perhaps most interestingly, the appliance approach could eliminate a number of distribution-compatibility issues by putting many more people into the distribution business. Now anybody can throw together a special-purpose distribution without having to deal with all of the plumbing that makes the whole thing actually work. Something interesting will certainly come of this idea, even if it's hard to say just what that might be at the moment.

Comments (21 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: OLS: Smack for embedded devices; New vulnerabilities in java, python, RealPlayer, trac,...
  • Kernel: Can user-space bugs be kernel regressions?; A kernel message catalog; The TALPA molehill.
  • Distributions: Looking forward to Fedora 10.
  • Development: The GNOME 2.24 module proposals, Django 1.0 schedule, liboggplay debut, new versions of Dirmngr, Web Submission and Review, ZK, matplotlib, GnuCash, SQL-Ledger, Player Project, pygame, a2jmidid, guitarix, pyspread, Pydev, Marathon.
  • Press: LinuxWorld's OSCON report, LTSP Hackfest coverage, OLS coverage, Intuit's Linux moves, Bruce Perens on Microsoft/Apache, open source in pre-schools, Jim Zemlin on LSB 4.0, Miguel de Icaza on Mono and Moonlight, Guido van Rossum interview, improving free software usability.
  • Announcements: EFF's Coders' Rights Project, EFF releases ISP testing tool, FSF demonstrates iPhone's GPLv3 incompatibility, IBM, Canonical, Novell, and Red Hat announced desktop collaboration, Git Magic, OSCON Proceedings, O'Reilly Tools of Change cfp, LF End User Summit, OO.o conf registration.
Next page: Security>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds