August 6, 2008
This article was contributed by Valerie Henson
In
The
Mythical Man-Month, Fred Brooks observes that the productivity of
experienced programmers frequently varies by a factor of 10 or more.
What makes the 10x programmers so much better? Undoubtedly some of
the difference is due to native facility with language or logic. But
even with these advantages, no one is born writing beautiful, elegant,
maintainable code; everyone goes through a learning process.
How do we learn to be good programmers? In many ways, the art of
computer programming is still stuck in the era of the
master-apprentice system. Some of us are lucky enough to learn to
program in something like
"the
UNIX room" at Bell Labs, where you could shoulder-surf the likes
of Ken Thompson and Dennis Ritchie. Occasionally someone practices
pair-programming instead of just arguing passionately about it, and
once in a very long while, a 10x programmer will actually teach
another person how to program. Unfortunately, formal university
education rarely teaches students about the practical aspects of
programming, as any holder of a computer science degree will readily
attest, and few programmers have the time, interest, or ability to
write accessible books about programming. As a result, most
programmers are doomed to a decade of re-inventing wheels by trial and
error.
Brian Kernighan and Rob Pike are two 10x programmers who do have the
time, interest, and ability to write a book about software engineering
best practices.
The
Practice of Programming aims to fill the gaps in the training of
most computer programmers. From the book:
Topics like
testing, debugging, portability, performance, design alternatives, and
style - the practice of programming - are not usually the focus of
computer science or programming courses. Most programmers learn them
haphazardly as their experience grows, and a few never learn them at
all.
This book probably won't make you ten times more productive,
but it can easily make you twice as productive (and half as
frustrated). If I could send one book to a programmer trapped on a
desert island, this would be the book - and I'd send the same book to
the new programmer who just joined my development team.
Overview
The Practice of Programming differs from most programming books in
several enjoyable ways. Rather than promoting a particular new
programming philosophy, Kernighan and Pike focus on three principles:
simplicity, clarity, and generality. As you might guess from the
title, the book is short on theory and long on practice. About one
third of the ~250 page book is taken up by actual real-world example
code, starting with the original dodgy code and showing the
step-by-step evolution to better code. Most examples are in C, but
the principles illustrated readily translate to other languages.
The writing style of this book is refreshingly practical and
down-to-earth, without losing generality. The authors avoid stark
black-and-white pronouncements, preferring to discuss why different
techniques are useful under different conditions. Clarity is another
hallmark of their style; they use as few words as possible to clearly
state each point, and dismiss trivialities and side issues quickly and
cleanly. A typical example of this approach is their advice on brace
and indentation style: "The specific style is less important than its
consistent application. Pick one style, preferably ours, use it
consistently, and don't waste time arguing."
The book is organized into nine chapters, each covering a topic such
as testing or debugging that usually requires an entire book on its
own. The table of contents includes headings like "Test as You Write
the Code," "Consistency and Idioms," "Strategies for Speed," "Other
People's Bugs," and "Programs that Write Programs." I can't cover the
whole book in this review, but I'll go into detail on two of my
favorite chapters, "Performance" and "Notation."
Performance
The introduction of this chapter gives some very direct advice: "The
first principle of optimization is don't." Computers
are fast - go run
lmbench on your desktop
to update your sense of just how fast. For example, some system calls
are now in the sub-microsecond range under Linux on modern hardware.
Armchair optimization - the practice of making small theoretical
optimizations as you code, at the expense of readability, portability,
or correctness - is especially foolish in light of Donald Knuth's
observation that 4% of the code typically accounts for more than half
of the run-time of the program. Kernighan and Pike's first piece of
advice is to write simple, clear, concise code, and optimize only when
you have some tangible reason to do so.
The chapter begins with a real-world optimization problem: a
spam-filter that worked well enough in testing but bogged down in
production. The tangible reason for optimizing this program is that
the mail queues were filling up with undelivered mail - a clear
justification for optimization if there ever was one. The authors
show the process they went through to optimize the spam-filter,
step-by-step: profiling, analysis, a first attempt at optimization,
re-factoring the problem, addition of pre-computation, and measurement
of the results. This overview is welcome not only as a good
programming war story but also because the overall flow of code
optimization is non-obvious (otherwise, "How would you go about
optimizing a program?" would not be such a common interview question).
The rest of the chapter talks about best practices for each step of
optimization. The first topic is timing and profiling, as it should
be. All too often, even good programmers measure performance by
"feel" - if you don't believe me, search LKML. Sometimes no easy tool
exists to measure what is being optimized, but it's still better to
write some kind of measurement tool, no matter how clunky or
approximate. Human perception and judgment are heavily influenced by
preconceptions and the vast majority of theoretical optimizations have
negligible effects on performance. A more subtle piece of advice is
to turn performance results into pictures or graphs. Chris
Mason's seekwatcher
is an excellent example; it turns block traces into graphs - and even
movies!
The authors cram a surprisingly complete demonstration of profiling
into less than two pages, using prof on their spam-filter as
the example. They show how to identify hot spots and do basic sanity
checking on the results - e.g., match up the number of times a
function call shows up in the profile with the number of iterations of
the main loop. While they include some caveats on trusting profiling
results, I wish they had spent some time on the design of profiling
tools to show the kinds of biases and errors that so often make
profiling results misleading. Perhaps it's because I work on systems
software, but I've found that I really have to know the details of
whether the profiler is using a periodic timer, hardware counters,
includes time spent sleeping for IO in the kernel, how many events are
dropped or missed, etc. A useful technique to demonstrate, and one in
keeping with their minimalist, do-it-yourself philosophy, would be
manually bisecting the code with timers to find hot spots when normal
profiling tools fail.
The discussion on rewriting code goes beyond "find the top function
and optimize it" - it also addresses eliminating calls to hot
functions entirely and doing modest amounts of pre-computation. A
fair portion of the section on code tuning has been superseded by
improved compilers which can do, e.g., loop-unrolling automatically,
but it still teaches valuable lessons about how to read code and
understand its true cost and complexity.
Notation
The chapter on notation unfolds elegant, beautiful solutions one by
one, turning normally painful problems into fun coding exercises.
Each technique - little languages, special-purpose notation,
programs that write programs, virtual machines - is accompanied by a
concrete demonstration of how to implement the bare minimum of the
technique to get the job done. The suggestion to "write a new
language" seems absurd in the face of most day-to-day programming
problems, but writing a very small, very specialized language can save
the programmer much time and many bugs, even when replacing only a few
hundred lines of conventional code. Their first example,
after printf() format specifiers, is a notation for packing
and unpacking network packets. I recently implemented this technique
and can report that it worked beautifully, repaying the time I
invested in it within days of completion.
Another exercise in minimalism is their demonstration of how to write
a basic grep in around 100 lines of C, without relying on
external libraries. Most of us will never need to re-implement
regular expressions from scratch, but we may encounter a problem best
solved by writing a small general purpose pattern matcher.
Another example demonstrates the power (and danger) of keeping a
variety of scripting languages and data processing tools at your
fingertips. The authors implement a crude text-only web browser with
about 50 lines of Awk, Tcl, and Perl, again using only built-in
language support and no external libraries or modules. Here as
elsewhere, Kernighan and Pike refuse to make hard and fast assertions
about the One True Scripting Language; they'd rather you used the
right language for the right job. From the book:
These languages
together are more powerful than any one of them in isolation. It's
worth breaking the job into pieces if it enables you to profit from
the right notation.
It can be argued that this approach is less
justified now, given the modern plethora of scripting languages
written specifically to address the limitations of earlier scripting
languages. However, their argument still rings true for me, as
someone who has never settled down into one scripting language. I
have a decade of experience using a hodge-podge of random scripting
languages, and when I do write in one scripting language, I end up
spending a lot of time contorting language features to fit situations
they were not designed for.
The section on virtual machines shows how to implement a minimal
special purpose virtual machine
(the Z-machine
for Zork comes to mind
immediately). The remaining sections cover programs that write
programs, using macros to generate code (a common technique in Linux
header files), and just a little taste of run-time code generation.
Summary
The
Practice of Programming embodies its own principles: simplicity,
clarity, generality. First published in 1999, it has aged well due to
its focus on general principles of good programming rather than
language-specific tricks and tips. The book has something to offer to
programmers at all levels of experience; beginners will benefit most
but experienced developers will appreciate the more advanced and
subtle techniques in the later chapters. Of all the books on the
Kernel Hacker's Bookshelf, this one should never be missing.
Comments (29 posted)
By Jake Edge
August 6, 2008
Video in the browser, at least for Linux, has always resorted to somewhat
clunky solutions—Flash plug-ins or external programs—but that is
likely to change in Firefox 3.1. Recent commits to the
Firefox development
tree
have added support for the HTML 5 <video> and <audio> tags as
well as native Ogg Vorbis and Theora support. Providing multimedia
support directly in a free browser, with no plug-in required, is a huge step
forward both for Linux and for the royalty-free codecs.
The battle over video and audio formats is an ugly one, largely because
they are patent minefields. The "mainstream" formats, MPEG-4 for video and
MP3 for audio, are licensed on a royalty basis to companies that want to
implement playback. Obviously, Mozilla is not in a position to pay a
per-installation royalty, so that leaves various ad hoc methods using
Javascript and plug-ins—that users have to track down—to make audio-video
playback work in its browser.
Trying the new feature (seen at left) on one of the recent nightly Firefox
builds seemed
to work pretty well given that it is still under development. The video played
smoothly, but the audio was not functional, only producing a rumbling,
clicking soundtrack. The Wikimedia
Commons video collection was used to test as it is a nice collection of
Theora videos.
Some have seen the lack of Theora content currently on the web as a reason
to downplay
Firefox's support for the format, which is unfortunate, as Mozilla
hacker Robert O'Callahan was quick to point
out. Unlike the
current situation, once a Firefox with video support is released, there
will be one format that all content producers can be sure will be available
for Firefox. Depending on whose numbers you believe that means that somewhere
between 10 and 25% of web surfers (or more than 100 million people) will be
using it.
Even with the dominance of Internet Explorer, the plethora of codec
plug-ins has made it somewhat difficult for content providers to decide
upon which video formats to support. With a substantial fraction of browsers
supporting a particular free format, that situation may change. Wikimedia
will certainly help by providing reasons for those not using
Firefox to demand Theora plug-ins—if not integrated Theora
support—for their browsers. As more content is available in that
format, the pressure will build on Microsoft and Apple. As we mentioned in an
article on web video formats
last December,
more content is the key to Theora support.
Some have argued that Vorbis and Theora are just as likely to be
patent-encumbered as the more mainstream codecs, but so far that is
unproven. There is no licensing authority that claims to have patents
covering those codecs. Though Mozilla has some depth to its
pockets—largely due to its deal with Google—patent holders
might be loathe to attack a free software browser. In many ways, patent
holders risk upsetting their entire apple cart if their attacks rise too high
into the public consciousness. Though, clearly, Mozilla will be taking on
some amount of risk with this move.
There have also been arguments that the Theora codec produces
inferior video compared to those used by MPEG-4 and others. There is
certainly truth to
that assertion, but there is ongoing work to bring Theora more in line with
the quality of its competitors. Due to the fact that it isn't controlled
by a licensing authority with little or no interest in improving it, there is
hope that Theora, or some descendant of it, could produce superior results
some day.
Dirac—also known by the name of
its C language implementation Schrödinger—is another royalty-free codec
that is being looked at for inclusion into Firefox. There are currently
some performance issues with decoding, but if those get resolved, there
might be two free choices for video codecs in Firefox.
There are lots of entrenched interests that would like to see Theora,
Vorbis, Dirac, and others like them disappear. They are quite happy with
the current state of affairs. For the most part, though, users are not.
Even on "well supported" platforms, video—and to a lesser extent
audio—is a confusing jumble of plug-ins and formats that make it
somewhat painful to use. Flash and Silverlight are supposed to "solve"
these problems, but they do it in a not-quite-free way that still requires
plug-ins. If web users start
to find it easier to use the video formats embedded in their browser, and
content producers take notice, it
could completely change video on the web.
Comments (24 posted)
By Jonathan Corbet
August 5, 2008
Linux distributions can be a pain. Users have to go through the whole
process of installation, configuration, and updates, and, often, all they
really want to do is to run a single application. The vendors of that
application, meanwhile, feel the need to support as many distributions as
possible, even though the actual system running underneath their code is
nearly irrelevant. Wouldn't it be nice if users could simply get their
desired application as an "appliance" which comes with all the necessary
component parts nicely hidden inside?
As it happens, rPath has been in the appliance business for a little while
now. Recently, the company has made its appliance-building infrastructure
available to free-software products in the form of rBuilder Online. In essence,
rBuilder can be used to create and maintain a custom distribution oriented
around the delivery of a specific application. The result is a
"software appliance" which, in theory, makes the given application
available in a self-contained, standalone distribution.
There are a number of example appliances available on the site. They
include:
- Bongo, an attempt to
revitalize work on the Hula mail client
- Gallery, a standalone photo
album
- LochDNS, a DNS
server
- Openfiler, a storage
management system
There are several others oriented around content
management systems, telephony applications, database servers, and more.
All told, quite a few projects have shown interest in creating software
appliances for their applications.
Your editor grabbed a copy of the Openfiler appliance and installed it onto
a spare box which had been cluttering up the office. Appliances from
rBuilder start out looking like a Fedora system; they use the same Anaconda
installer. The installed system also shows a lot of Red Hat heritage, such
as /etc/sysconfig, various system-config-* commands, an
/etc/inittab file which credits Mark Ewing and Donnie Barnes,
etc. But there is a crucial difference: there is no rpm command. Instead,
these appliances are based on rPath's Conary package management
system, which takes a very different approach to the software management
problem. But there are still similarities with Fedora: your editor
attempted a conary updateall operation on
the LochDNS appliance, only to see it fail with a set of file conflict
errors; it was almost like running Rawhide again.
Appliance users are not supposed to have to dirty their fingertips with
command-line administrative operations, though. To help them avoid this
fate, rBuilder-based appliances come with the rPath
Appliance Platform Agent, otherwise known as a web-based administration
interface. Once the user gets past the usual set of obnoxious Firefox
dialogs ("this site has an SSL certificate which is not only unknown, but
is almost
certainly hostile and is ugly besides"), this interface provides a
set of administrative screens for standard tasks (networking, updating the
system, etc.) along with some specific to the Openfiler application.
In theory, it should be possible to manage one of the appliances without
ever going to the command line - or even knowing that the command line
exists. In practice, how well that works depends a lot on how the
administration screens are designed. In the Openfiler case, quite a bit of
clicking around in circles was required, but your editor did finally
succeed in setting up a volume based on a USB key, perform a software
update, and shut down the system at the end.
The creation of appliances would appear to be relatively straightforward;
details can be found in this
document. One creates an account in the rBuilder system, then puts
together a file describing which components (packages) are necessary in the
final system. Those components will presumably include at least one
application provided by the appliance builder - that application being the
reason for the creation of the software appliance in the first place. The
"rMake" system will then pull all of the pieces together, bring in any
needed dependencies, and wrap it all up inside a
minimal distribution; the resulting system image seems to run at about 300MB.
There are several possible output formats, including the Anaconda-based
installation CD image; the rPath folks would appear to have put a lot of
effort into making appliances work on a number of virtualization platforms
as well. Appliances can be built for VMWare, various forms of Xen,
VirtualIron, and Microsoft VHD. Notably absent is anything based on Lguest
or KVM. Even more notably absent is any kind of live CD appliance;
anything not running in a virtual machine must be installed onto the host
system's disks.
rPath's Conary servers seem to be set up to handle software updates. It is
also possible to obtain source for the packages found in an appliance
through the rBuilder site, though one must do a little digging first.
Both of these features are important: anybody creating a distribution-based
appliance has to arrange for updates and source availability somehow. One
assumes that most appliance creators have no real desire to get into the
broader distribution business, so it's nice for them to be able to offload
these tasks. Anybody distributing these appliance images should note that
rPath does not appear to have undertaken any obligation to continue to
provide these services in the future. Should rPath decide to stop, some
interesting questions on who is ultimately responsible for satisfying the
source-availability provisions of the GPL could come up.
Naturally enough, rPath offers commercial services for those who would like
stronger guarantees about long-term support, or who want to include
proprietary software in their appliances.
For the time being, this approach to software distribution would seem to be
most useful for companies which are in the business of building real,
hardware-based appliances. Distributing software in virtual machines has
the look of a new and truly impressive form of bloat; even "just enough
operating system" is a lot of baggage for an application to drag around.
For situations where one wants to try out a complex system, appliance
distribution may be worth its cost, but one would probably not want to get
every application this way.
There may be value, though, in software distributions which can run almost
anywhere, and which can be nicely isolated from the outside world. Locking
network-exposed applications - server processes or web browsers - into
their own little world could help to avoid a lot of security problems in a
way which seems more straightforward than SELinux or containers.
But, perhaps most interestingly, the appliance approach could eliminate a
number of distribution-compatibility issues by putting many more people
into the distribution business. Now anybody can throw together a
special-purpose distribution without having to deal with all of the
plumbing that makes the whole thing actually work. Something interesting
will certainly come of this idea, even if it's hard to say just what that
might be at the moment.
Comments (21 posted)
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: OLS: Smack for embedded devices; New vulnerabilities in java, python, RealPlayer, trac,...
- Kernel: Can user-space bugs be kernel regressions?; A kernel message catalog; The TALPA molehill.
- Distributions: Looking forward to Fedora 10.
- Development: The GNOME 2.24 module proposals, Django 1.0 schedule, liboggplay debut, new versions of Dirmngr, Web Submission and Review, ZK, matplotlib, GnuCash, SQL-Ledger, Player Project, pygame, a2jmidid, guitarix, pyspread, Pydev, Marathon.
- Press: LinuxWorld's OSCON report, LTSP Hackfest coverage, OLS coverage, Intuit's Linux moves, Bruce Perens on Microsoft/Apache, open source in pre-schools, Jim Zemlin on LSB 4.0, Miguel de Icaza on Mono and Moonlight, Guido van Rossum interview, improving free software usability.
- Announcements: EFF's Coders' Rights Project, EFF releases ISP testing tool, FSF demonstrates iPhone's GPLv3 incompatibility, IBM, Canonical, Novell, and Red Hat announced desktop collaboration, Git Magic, OSCON Proceedings, O'Reilly Tools of Change cfp, LF End User Summit, OO.o conf registration.
Next page:
Security>>