August 6, 2008
This article was contributed by Valerie Henson
In
The
Mythical Man-Month, Fred Brooks observes that the productivity of
experienced programmers frequently varies by a factor of 10 or more.
What makes the 10x programmers so much better? Undoubtedly some of
the difference is due to native facility with language or logic. But
even with these advantages, no one is born writing beautiful, elegant,
maintainable code; everyone goes through a learning process.
How do we learn to be good programmers? In many ways, the art of
computer programming is still stuck in the era of the
master-apprentice system. Some of us are lucky enough to learn to
program in something like
"the
UNIX room" at Bell Labs, where you could shoulder-surf the likes
of Ken Thompson and Dennis Ritchie. Occasionally someone practices
pair-programming instead of just arguing passionately about it, and
once in a very long while, a 10x programmer will actually teach
another person how to program. Unfortunately, formal university
education rarely teaches students about the practical aspects of
programming, as any holder of a computer science degree will readily
attest, and few programmers have the time, interest, or ability to
write accessible books about programming. As a result, most
programmers are doomed to a decade of re-inventing wheels by trial and
error.
Brian Kernighan and Rob Pike are two 10x programmers who do have the
time, interest, and ability to write a book about software engineering
best practices.
The
Practice of Programming aims to fill the gaps in the training of
most computer programmers. From the book:
Topics like
testing, debugging, portability, performance, design alternatives, and
style - the practice of programming - are not usually the focus of
computer science or programming courses. Most programmers learn them
haphazardly as their experience grows, and a few never learn them at
all.
This book probably won't make you ten times more productive,
but it can easily make you twice as productive (and half as
frustrated). If I could send one book to a programmer trapped on a
desert island, this would be the book - and I'd send the same book to
the new programmer who just joined my development team.
Overview
The Practice of Programming differs from most programming books in
several enjoyable ways. Rather than promoting a particular new
programming philosophy, Kernighan and Pike focus on three principles:
simplicity, clarity, and generality. As you might guess from the
title, the book is short on theory and long on practice. About one
third of the ~250 page book is taken up by actual real-world example
code, starting with the original dodgy code and showing the
step-by-step evolution to better code. Most examples are in C, but
the principles illustrated readily translate to other languages.
The writing style of this book is refreshingly practical and
down-to-earth, without losing generality. The authors avoid stark
black-and-white pronouncements, preferring to discuss why different
techniques are useful under different conditions. Clarity is another
hallmark of their style; they use as few words as possible to clearly
state each point, and dismiss trivialities and side issues quickly and
cleanly. A typical example of this approach is their advice on brace
and indentation style: "The specific style is less important than its
consistent application. Pick one style, preferably ours, use it
consistently, and don't waste time arguing."
The book is organized into nine chapters, each covering a topic such
as testing or debugging that usually requires an entire book on its
own. The table of contents includes headings like "Test as You Write
the Code," "Consistency and Idioms," "Strategies for Speed," "Other
People's Bugs," and "Programs that Write Programs." I can't cover the
whole book in this review, but I'll go into detail on two of my
favorite chapters, "Performance" and "Notation."
Performance
The introduction of this chapter gives some very direct advice: "The
first principle of optimization is don't." Computers
are fast - go run
lmbench on your desktop
to update your sense of just how fast. For example, some system calls
are now in the sub-microsecond range under Linux on modern hardware.
Armchair optimization - the practice of making small theoretical
optimizations as you code, at the expense of readability, portability,
or correctness - is especially foolish in light of Donald Knuth's
observation that 4% of the code typically accounts for more than half
of the run-time of the program. Kernighan and Pike's first piece of
advice is to write simple, clear, concise code, and optimize only when
you have some tangible reason to do so.
The chapter begins with a real-world optimization problem: a
spam-filter that worked well enough in testing but bogged down in
production. The tangible reason for optimizing this program is that
the mail queues were filling up with undelivered mail - a clear
justification for optimization if there ever was one. The authors
show the process they went through to optimize the spam-filter,
step-by-step: profiling, analysis, a first attempt at optimization,
re-factoring the problem, addition of pre-computation, and measurement
of the results. This overview is welcome not only as a good
programming war story but also because the overall flow of code
optimization is non-obvious (otherwise, "How would you go about
optimizing a program?" would not be such a common interview question).
The rest of the chapter talks about best practices for each step of
optimization. The first topic is timing and profiling, as it should
be. All too often, even good programmers measure performance by
"feel" - if you don't believe me, search LKML. Sometimes no easy tool
exists to measure what is being optimized, but it's still better to
write some kind of measurement tool, no matter how clunky or
approximate. Human perception and judgment are heavily influenced by
preconceptions and the vast majority of theoretical optimizations have
negligible effects on performance. A more subtle piece of advice is
to turn performance results into pictures or graphs. Chris
Mason's seekwatcher
is an excellent example; it turns block traces into graphs - and even
movies!
The authors cram a surprisingly complete demonstration of profiling
into less than two pages, using prof on their spam-filter as
the example. They show how to identify hot spots and do basic sanity
checking on the results - e.g., match up the number of times a
function call shows up in the profile with the number of iterations of
the main loop. While they include some caveats on trusting profiling
results, I wish they had spent some time on the design of profiling
tools to show the kinds of biases and errors that so often make
profiling results misleading. Perhaps it's because I work on systems
software, but I've found that I really have to know the details of
whether the profiler is using a periodic timer, hardware counters,
includes time spent sleeping for IO in the kernel, how many events are
dropped or missed, etc. A useful technique to demonstrate, and one in
keeping with their minimalist, do-it-yourself philosophy, would be
manually bisecting the code with timers to find hot spots when normal
profiling tools fail.
The discussion on rewriting code goes beyond "find the top function
and optimize it" - it also addresses eliminating calls to hot
functions entirely and doing modest amounts of pre-computation. A
fair portion of the section on code tuning has been superseded by
improved compilers which can do, e.g., loop-unrolling automatically,
but it still teaches valuable lessons about how to read code and
understand its true cost and complexity.
Notation
The chapter on notation unfolds elegant, beautiful solutions one by
one, turning normally painful problems into fun coding exercises.
Each technique - little languages, special-purpose notation,
programs that write programs, virtual machines - is accompanied by a
concrete demonstration of how to implement the bare minimum of the
technique to get the job done. The suggestion to "write a new
language" seems absurd in the face of most day-to-day programming
problems, but writing a very small, very specialized language can save
the programmer much time and many bugs, even when replacing only a few
hundred lines of conventional code. Their first example,
after printf() format specifiers, is a notation for packing
and unpacking network packets. I recently implemented this technique
and can report that it worked beautifully, repaying the time I
invested in it within days of completion.
Another exercise in minimalism is their demonstration of how to write
a basic grep in around 100 lines of C, without relying on
external libraries. Most of us will never need to re-implement
regular expressions from scratch, but we may encounter a problem best
solved by writing a small general purpose pattern matcher.
Another example demonstrates the power (and danger) of keeping a
variety of scripting languages and data processing tools at your
fingertips. The authors implement a crude text-only web browser with
about 50 lines of Awk, Tcl, and Perl, again using only built-in
language support and no external libraries or modules. Here as
elsewhere, Kernighan and Pike refuse to make hard and fast assertions
about the One True Scripting Language; they'd rather you used the
right language for the right job. From the book:
These languages
together are more powerful than any one of them in isolation. It's
worth breaking the job into pieces if it enables you to profit from
the right notation.
It can be argued that this approach is less
justified now, given the modern plethora of scripting languages
written specifically to address the limitations of earlier scripting
languages. However, their argument still rings true for me, as
someone who has never settled down into one scripting language. I
have a decade of experience using a hodge-podge of random scripting
languages, and when I do write in one scripting language, I end up
spending a lot of time contorting language features to fit situations
they were not designed for.
The section on virtual machines shows how to implement a minimal
special purpose virtual machine
(the Z-machine
for Zork comes to mind
immediately). The remaining sections cover programs that write
programs, using macros to generate code (a common technique in Linux
header files), and just a little taste of run-time code generation.
Summary
The
Practice of Programming embodies its own principles: simplicity,
clarity, generality. First published in 1999, it has aged well due to
its focus on general principles of good programming rather than
language-specific tricks and tips. The book has something to offer to
programmers at all levels of experience; beginners will benefit most
but experienced developers will appreciate the more advanced and
subtle techniques in the later chapters. Of all the books on the
Kernel Hacker's Bookshelf, this one should never be missing.
(
Log in to post comments)