LWN.net Weekly Edition for February 13, 2003
The Art of Unix Programming
Eric Raymond first announced this project back in 1999: The Art of Unix Programming was to be a new book, written with help from the community, that would "attempt to explain the Zenlike 'special transmission, outside the scriptures' that distinguishes Unix gurus from ordinary mortals." More than three years later, a draft of the book is available for review.
The Art of Unix Programming is certainly not a beginner's programming manual. It assumes, instead, that the reader is already a competent hacker and is looking to learn more about the Unix way of doing things. So there is a lot of talk about philosophy and history, and a wealth of case studies. There is a lot of language like:
Eric would, seemingly, like his book to be seen as a successor to the Kernighan and Plauger classics The Elements of Programming Style and Software Tools. This book shows some of the classic Raymond traits: no less than six case studies feature fetchmail (which he wrote), and the examples demonstrating the fortune file format are all about the evils of gun control. But there is some good stuff in there which has not necessarily been written down before. Eric is a good writer, and he has experience in the realm he is writing about. The Art of Unix Programming is worth a look.
We asked Eric a few questions about the draft release; here are his answers.
LWN: If you could characterize the art of programming in/for Unix as described in your book, in a single paragraph, how would you do it?
ESR: I'll do better, I'll boil it down to a single phrase. Keep it simple, stupid!
The true art of programming -- and this is something Unix guys were arguably the first to figure out and the most consistent at applying -- is minimizing global complexity. Most of the rest of the Unix philosophy pretty much falls out of that.
LWN: The draft as posted does not include any sort of licensing; will the final version be available under a free license?
ESR: Yes, but I haven't decided which one. There will be some restrictions on print reproduction, but none on electronic.
LWN: When you first announced the book project, it seemed you were planning to put the chapters out gradually and make use of a lot of community input. After chapter four, however (released almost exactly two years ago), things went quiet, and the rest of the book, seemingly, was done in a "cathedral" mode. Why is that? Did the more open approach not work out?
ESR: No, it's just that I stalled out for a long time and then gave it six weeks of intense work. This happened after an acquisitions editor at Addison-Wesley called me and said "Uh. Apparently you had an agreement to do a book with my predecessor, but I can't find a contract." There wasn't one; I have a twitch that way, I don't sign a contract until the book is essentially complete. He successfully nudged me into working on it again.
LWN: The book talks little about the programming of complex graphical applications, and avoids the GNOME/KDE issue altogether. Yet one could argue that complex applications are a big part of the future of Unix-like systems. There is often, however, a sort of impedance mismatch between fancy applications (think StarOffice 5) and the Unix way of doing things. What suggestions do you have for authors of graphical applications to help them carry forward the Unix tradition in the graphical world?
ESR: Separate policy from mechanism, because policy ages much faster than mechanism. Separate engines from interfaces, because tangling the two together tends to lead to unmaintainable messes. Don't give it a GUI if it doesn't need one.
Policy-mechanism separation is a major theme in the book. It's usually thought of in connection with X, but it can be applied a lot more widely -- and, in fact, Unix programmers *do* apply it a lot more widely without being really aware of the principle consciously.
(Yes, that's right, I'm doing another yet another book that's basically about conscious expression of unconscious folk practices. This would be #3. Is there anybody left who still finds this surprising? No? I thought not... :-))
One of the insights I got, one that's especially applicable to big gnarly GUI applications, is that Unix programmers divide all Gaul into three parts -- policy, mechanism, and glue. Mechanism is code that tells how to do things, policy is code that tells what to do -- and glue is the stuff that binds policy and mechanism together.
The punch line: glue is evil and must be destroyed, or at least minimized. Your typical huge honkin' C++ application with classes stacked twelve deep is an unmaintainable mess because the top two layers are policy, the bottom two are mechanism, and the middle eight are glue. And the trouble with glue is that it's opaque -- it impedes your ability to see clear down through the system from the top, or clear up from the bottom. You can't debug what you can't see through, because you can't form an adequate mental model of its behavior.
So my advice to GUI programmers is this: Decide what's policy and what's mechanism. Separate them cleanly -- ideally, have the GUI and engine running in separate processes, like gv and ghostscript or xcdroast and cdrecord. Then *ruthlessly eliminate all glue*. Or as much of it as you can, anyway.
LWN: There is very little treatment of security in the book. Why is that? Is, in your mind, security peripheral to the main art of Unix programming, or is something else going on?
ESR: It's peripheral. This is not a book about system administration, it's about how to design well. There's an aspect of that that has to do with secirity of course, but most of the things that make for good security (like minimizing code that has to be trusted) are just good engineering practice. That I *do* talk about a lot.
LWN: Unix has had a long run in the computing world, and, by all indications, it has a while to go yet. All good things come to an end eventually, however. What do you think might bring about the end of the Unix era, and what might replace Unix in the future?
ESR: My money is on capability-based persistent-object systems like EROS. But prophecy is difficult, especially about the future.
Comparing free and proprietary software defect rates
[This article was contributed by Joe 'Zonker' Brockmeier]
Tuesday a company called Reasoning, Inc. released a study that seems to prove what Open Source developers have been saying for years: Open code, and the inspection that it allows, produces a better product. Specifically, the company compared the Linux TCP/IP stack against a number of commercial TCP/IP stacks and found that the Linux implementation had fewer defects than other proprietary implementations.The paper, "How Open-Source and Commercial Software Compare" is available from Reasoning by request, so we decided to take a look at it to see how they had reached their conclusions.
Specifically, Reasoning lined up the Linux TCP/IP implementation from the 2.4.19 Linux kernel against five commercial implementations. In total, out of 81,852 lines of code, Reasoning found only 8 defects in the Linux TCP/IP code. All but one of the other five implementations compared with Linux were at least ten years old, the other is about three years old. The company did not name the specific operating systems, but Reasoning's CEO Scott Trappe confirmed that two were commercial Unix systems, one was "not Unix but in very broad use," and the embedded implementations were by "major vendors of networking equipment." Trappe said that Reasoning couldn't name companies specifically, but the companies had agreed to let Reasoning use the aggregate data.
As always, it helps to understand the company doing the research, and the context of the research, before taking the results too seriously. We spoke with Trappe, to clarify some information not in the white paper and to get a feel for Reasoning's background. Reasoning is a company that specializes in automated testing of software written in C/C++, which it has been doing since 2001. Prior to that, the company had specialized in Y2K testing. The company plans to add testing of Java software to its services later this year.
The study was not commissioned by any of the Linux vendors or companies who might be competing with Linux. Instead, Trappe said that the company had performed the study primarily to highlight its services. Unlike the other projects that Reasoning works on, they were free to release their results along with specific code examples from the Linux TCP/IP stack. Trappe also said that the company was looking to prove that inspection itself was important in providing quality software and that "testing alone can never uncover all the defects in software."
The company chose the TCP/IP stack because it provided a good point of comparison. Trappe admitted that it might be stretching it to draw too many conclusions from the study of one piece of software, but that their study "does support some claims that it can rival commercial quality." Trappe also mentioned that the company may do further studies in the future comparing Open Source software to commercial software.
The company looks for five kinds of defects in code: Memory leaks, null pointer dereferences, bad deallocations, out of bounds array access and uninitialized variables. According to Trappe, none of the errors found in the Linux TCP/IP stack were security issues. At least one of the issues, a memory leak, was fixed in the 2.4.20 kernel before Reasoning notifed the kernel team of the defects. Four of the problems found (an uninitialized variable and some out-of-bounds errors) are not truly defects, since they do not cause the code to behave incorrectly. So, of eight defects reported, four are not real, three are debatable and one has been fixed.
When taking into account the revised information, the Linux TCP/IP stack has a defect density of 0.013 per 1,000 lines of code. The implementation with the fewest defects after Linux is one of the embedded stacks, with .08 defects per 1,000 lines of code. One implementation, one of the commercial OSes, had 183 defects out of about 269,100 lines of code - 0.7 per thousand.
To be sure, the Reasoning study raises some interesting points, though there's not enough data to say conclusively that Open Source software is always of higher quality than its proprietary counterparts. The study looked only at one small piece of the Linux kernel, and only considered a small set of information. The Linux kernel has also been extensively checked for this sort of error by the Stanford checker and the new "smatch" program, so it should be relatively clean. Reasoning's study says nothing about performance or features, and it does not address the functionality of the code. However, it does supply some data in favor of the argument that open code leads to higher quality -- at least in terms of specific defects.
We'll be interested to see what kinds of studies Reasoning does in the future, and how other Open Source projects compare to commercial code.
Lawrence Lessig wins FSF Award
The Free Software Foundation has announced that this year's winner of its Award for the Advancement of Free Software is Lawrence Lessig - a fine choice. "FSF President and founder, Richard Stallman, presented the award to Professor Lawrence Lessig for promoting understanding of the political dimension of free software, including the idea that 'code is law'."
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Disk encryption; new vulnerabilities in hypermail, PostgreSQL, w3m...
- Kernel: Two new I/O schedulers; driver porting; accessing BitKeeper repositories without bk
- Distributions: News from Debian, Gentoo, Mandrake, Red Hat and Slackware - Plus Shabdix
- Development: TownPortal 0.1, JACK 0.50.0, PostgreSQL 7.3.2, Mailman 2.1.1, ESP Ghostscript 7.05.6, mod_security 1.4.2, xKit 1.6.1, POE 0.25, Sweep 0.8.1, Mozilla 1.3 beta, Kopete 0.6, GSview 4.31 beta, Samba 2.2.8pre1, GnuCash 1.8.1, LyX 1.3.0OpenOffice.org 1.0.2 Beta SDK, GNU Midnight Commander 4.6.0.
- Press: MS .Net patent threat, Banks using Linux, Sam's Club $300 Linux PC, Pixar goes to Linux, Dennis Ritchie interview, SPI elections.
- Announcements: Debian joins Desktop Linux Consortium, OpenOffice.org Conference 2003, GUADIC 2003, FSF meeting, UCITA withdrawn from the ABA.
- Letters: TRACE flaw