Eric Raymond first announced
project back in 1999: The Art of Unix Programming
was to be a new
book, written with help from the community, that would "attempt to
explain the Zenlike 'special transmission, outside the scriptures' that
distinguishes Unix gurus from ordinary mortals.
" More than three
years later, a
draft of the book
is available for review.
The Art of Unix Programming is certainly not a beginner's
programming manual. It assumes, instead, that the reader is already a
competent hacker and is looking to learn more about the Unix way of doing
things. So there is a lot of talk about philosophy and history, and a
wealth of case studies. There is a lot of language like:
As with Zen art, the simplicity of good Unix code depends on
exacting self-discipline and a high level of craft, neither of
which are necessarily apparent on casual inspection. Transparency
is hard work, but worth the effort for more than merely artistic
reasons. Unlike Zen art, software requires debugging - and usually
needs continuing maintenance, forward-porting, and adaptation
throughout its lifetime. Transparency is therefore more than an
esthetic triumph, but a victory that will be reflected in lower
costs throughout the software's lifecycle.
Eric would, seemingly, like his book to be seen as a successor to the
Kernighan and Plauger classics The Elements of Programming Style and
Software Tools. This book shows some of the classic Raymond traits:
no less than six case studies feature fetchmail (which he wrote), and the
examples demonstrating the fortune file format are all about the evils of
But there is some good stuff in there which has not necessarily been
written down before. Eric is a good writer, and he has experience in the
realm he is writing about. The Art of Unix Programming is worth a
We asked Eric a few questions about the draft release; here are his
LWN: If you could characterize the art of programming in/for Unix as
described in your book, in a single paragraph, how would you do it?
I'll do better, I'll boil it down to a single phrase. Keep it simple, stupid!
The true art of programming -- and this is something Unix guys were
arguably the first to figure out and the most consistent at applying --
is minimizing global complexity. Most of the rest of the Unix philosophy
pretty much falls out of that.
The draft as posted does not include any sort of licensing; will the final
version be available under a free license?
Yes, but I haven't decided which one. There will be some restrictions
on print reproduction, but none on electronic.
When you first announced the book project, it seemed you were planning to
put the chapters out gradually and make use of a lot of community input.
After chapter four, however (released almost exactly two years ago), things
went quiet, and the rest of the book, seemingly, was done in a "cathedral"
mode. Why is that? Did the more open approach not work out?
No, it's just that I stalled out for a long time and then gave it six
weeks of intense work. This happened after an acquisitions editor at
Addison-Wesley called me and said "Uh. Apparently you had an
agreement to do a book with my predecessor, but I can't find a
contract." There wasn't one; I have a twitch that way, I don't sign a
contract until the book is essentially complete. He successfully
nudged me into working on it again.
The book talks little about the programming of complex graphical
applications, and avoids the GNOME/KDE issue altogether. Yet one could
argue that complex applications are a big part of the future of Unix-like
systems. There is often, however, a sort of impedance mismatch between
fancy applications (think StarOffice 5) and the Unix way of doing things.
What suggestions do you have for authors of graphical applications to help
them carry forward the Unix tradition in the graphical world?
Separate policy from mechanism, because policy ages much faster than
mechanism. Separate engines from interfaces, because tangling the two
together tends to lead to unmaintainable messes. Don't give it a GUI
if it doesn't need one.
Policy-mechanism separation is a major theme in the book. It's
usually thought of in connection with X, but it can be applied a lot
more widely -- and, in fact, Unix programmers *do* apply it a lot more
widely without being really aware of the principle consciously.
(Yes, that's right, I'm doing another yet another book that's
basically about conscious expression of unconscious folk practices.
This would be #3. Is there anybody left who still finds this
surprising? No? I thought not... :-))
One of the insights I got, one that's especially applicable to big
gnarly GUI applications, is that Unix programmers divide all Gaul into
three parts -- policy, mechanism, and glue. Mechanism is code that
tells how to do things, policy is code that tells what to do -- and glue
is the stuff that binds policy and mechanism together.
The punch line: glue is evil and must be destroyed, or at least minimized.
Your typical huge honkin' C++ application with classes stacked twelve
deep is an unmaintainable mess because the top two layers are policy,
the bottom two are mechanism, and the middle eight are glue. And the
trouble with glue is that it's opaque -- it impedes your ability to see
clear down through the system from the top, or clear up from the bottom.
You can't debug what you can't see through, because you can't form an
adequate mental model of its behavior.
So my advice to GUI programmers is this: Decide what's policy and
what's mechanism. Separate them cleanly -- ideally, have the GUI and
engine running in separate processes, like gv and ghostscript or
xcdroast and cdrecord. Then *ruthlessly eliminate all glue*. Or
as much of it as you can, anyway.
There is very little treatment of security in the book. Why is that? Is,
in your mind, security peripheral to the main art of Unix programming, or
is something else going on?
It's peripheral. This is not a book about system administration, it's
about how to design well. There's an aspect of that that has to do with
secirity of course, but most of the things that make for good security
(like minimizing code that has to be trusted) are just good engineering
practice. That I *do* talk about a lot.
Unix has had a long run in the computing world, and, by all indications, it
has a while to go yet. All good things come to an end eventually,
however. What do you think might bring about the end of the Unix era, and
what might replace Unix in the future?
My money is on capability-based persistent-object systems like EROS.
But prophecy is difficult, especially about the future.
Comments (24 posted)
[This article was contributed by Joe 'Zonker'
Tuesday a company called Reasoning,
released a study that seems to prove what Open Source
developers have been saying for years: Open code, and the inspection
that it allows, produces a better product. Specifically, the company
compared the Linux TCP/IP stack against a number of commercial TCP/IP
stacks and found that the Linux implementation had fewer defects than
other proprietary implementations.
The paper, "How Open-Source and Commercial Software Compare" is
available from Reasoning by request, so we decided to take a look at it
to see how they had reached their conclusions.
Specifically, Reasoning lined up the Linux TCP/IP implementation from
the 2.4.19 Linux kernel against five commercial implementations. In
total, out of 81,852 lines of code, Reasoning found only 8 defects in
the Linux TCP/IP code. All but one of the other five implementations
compared with Linux were at least ten years old, the other is about
three years old. The company did not name the specific operating
systems, but Reasoning's CEO Scott Trappe confirmed that two were
commercial Unix systems, one was "not Unix but in very broad use," and
the embedded implementations were by "major vendors of networking
equipment." Trappe said that Reasoning couldn't name companies
specifically, but the companies had agreed to let Reasoning use the
As always, it helps to understand the company doing the research, and
the context of the research, before taking the results too seriously. We
spoke with Trappe, to clarify some information not in the white paper
and to get a feel for Reasoning's background. Reasoning is a company
that specializes in automated testing of software written in C/C++,
which it has been doing since 2001. Prior to that, the company had
specialized in Y2K testing. The company plans to add testing of Java
software to its services later this year.
The study was not commissioned by any of the Linux vendors or companies
who might be competing with Linux. Instead, Trappe said that the company
had performed the study primarily to highlight its services. Unlike
the other projects that Reasoning works on, they were free to release
their results along with specific code examples from the Linux TCP/IP
stack. Trappe also said that the company was looking to prove that
inspection itself was important in providing quality software and that
"testing alone can never uncover all the defects in software."
The company chose the TCP/IP stack because it provided a good point of
comparison. Trappe admitted that it might be stretching it to draw too
many conclusions from the study of one piece of software, but that their
study "does support some claims that it can rival commercial quality."
Trappe also mentioned that the company may do further studies in the
future comparing Open Source software to commercial software.
The company looks for five kinds of defects in code: Memory leaks, null
pointer dereferences, bad deallocations, out of bounds array access and
uninitialized variables. According to Trappe, none of the errors found
in the Linux TCP/IP stack were security issues. At least one of the
issues, a memory leak, was fixed in the 2.4.20 kernel before Reasoning
notifed the kernel team of the defects. Four of the problems found (an
uninitialized variable and some out-of-bounds errors) are not
truly defects, since they do not cause the code to behave incorrectly.
So, of eight defects reported, four are not real, three are
debatable and one has been fixed.
When taking into account the revised information, the Linux TCP/IP stack
has a defect density of 0.013 per 1,000 lines of code. The
implementation with the fewest defects after Linux is one of the
embedded stacks, with .08 defects per 1,000 lines of code. One
implementation, one of the commercial OSes, had 183 defects out of about
269,100 lines of code - 0.7 per thousand.
To be sure, the Reasoning study raises some interesting points, though
there's not enough data to say conclusively that Open Source software is
always of higher quality than its proprietary counterparts. The study
looked only at one small piece of the Linux kernel, and only considered
a small set of information. The Linux kernel has also been extensively
checked for this sort of error by the Stanford checker and the new "smatch"
program, so it should be relatively clean.
Reasoning's study says nothing about
performance or features, and it does not address the
functionality of the code. However, it does supply some data in favor of
the argument that open code leads to higher quality -- at least in terms
of specific defects.
We'll be interested to see what kinds of studies Reasoning does in the
future, and how other Open Source projects compare to commercial code.
Comments (8 posted)
The Free Software Foundation has announced
that this year's winner of its Award for the Advancement of Free Software
is Lawrence Lessig - a fine choice. "FSF President and founder,
Richard Stallman, presented the award
to Professor Lawrence Lessig for promoting understanding of the
political dimension of free software, including the idea that 'code is
Comments (1 posted)
Page editor: Jonathan Corbet
Next page: Security>>