By Jonathan Corbet
February 2, 2011
The Sendmail mail transfer agent tends to be one of those programs that one
either loves or hates. Both its supporters and its detractors will agree,
though, that Sendmail played a crucial role in the development of
electronic mail before, during, and after the explosion of the Internet.
Sendmail creator Eric Allman took a trip to Brisbane to talk to the LCA
2011 about the history of this project. Sendmail is, he said, 30 years old
now; in those three decades it has thrived without corporate support,
changed the world, and thrived in a world which was changing rapidly around
it.
The history
Sendmail had its start at the University of California, Berkeley, in 1980;
it was initially something Eric did while he was supposed to be working on
the Ingres relational database management system. In those days, the
Computer Science department had a dozen machines, but the main system was
"Ernie CoVAX," which was accessed via ASCII terminals. There was a limited
number of ports, so users had to connect via a patch panel in the mail
room; contention for available ports was often intense.
Things got more interesting when the Ingres project got an ARPAnet
connection; a single PDP11 machine, with two ports, was the only way to
access the net at that time. There was no way the entire department was
going to share those two ports without somebody getting hurt, so another
solution was required. Eric looked at the problem, concluded that what
everybody really wanted was the ability to send mail through the gateway
machine, and decided that he would make a way to access email from other
machines on campus. From this beginning delivermail was born.
There was a set of design principles that Eric adopted at that time. There
was only one of him, so programming time was a truly finite resource.
Redesigning user agents and mail stores was out of the question.
Delivermail had to adapt to the world around it, not the other way around.
The resulting program worked, but was not without its problems. The
compiled-in configuration lacked flexibility, there was no address
translation as messages moved between networks, and the parsing was simple
and opaque. But it succeeded in moving mail around and giving the entire
department access to the net.
Then the department got the BSD contract. Bill Joy needed a mail transfer
agent to connect to the network, so he talked Eric into taking on the job.
After all, how hard could it be? Among other things, the new MTA needed to
support the SMTP mail protocol - which wasn't specified yet. Supporting
SMTP also forced the addition of a mail queue, a job which turned out to be
much harder than it looked. Eric hacked away, and Sendmail was shipped
with 4.1BSD in 1982 with support for SMTP, header rewriting, queueing, and
runtime configuration.
After that, Eric left Berkeley for a "lucrative" (heavy on the quotes)
career in industry. Sendmail, meanwhile, was picked up by the Unix
vendors. The Unix wars were in full force at that time; the
inevitable result was a proliferation of different versions of Sendmail.
The program became balkanized and incompatible across systems.
Eric returned to Berkeley in 1989 and started hacking on Sendmail again;
the immediate need was support for the ".cs" subdomain at the university.
That work snowballed into a major rewrite culminating in Sendmail 8;
this version integrated a great deal of code from both the industry and the
community. It added support for ESMTP, a number of new protocols, delivery
status notifications, LDAP integration, eight-bit mail, and a new
configuration package. Uptake increased after the Sendmail 8 release
as a result of these features, but also as the result of the publication of
the O'Reilly "bat" book. Documentation, it turns out, really matters.
Sendmail Inc. was created in 1998 with the fantasy that it would let Eric
get back to coding. In reality, starting a company is more about
marketing, sales, and money than about technology - a lesson many of us
have learned. It was one of the first companies trying to mix open source
and proprietary offerings; in those days, the prevailing wisdom is that a
company needed proprietary lock-in to have any chance of success. Over
time, though, functionality migrated to the free version; thus Sendmail
gained support for encryption, authentication, milters (mail filters),
virtual hosting, spam filtering, and more. And that's where things stand
today.
Lessons learned
As one might expect, 30 years of experience have led to a number of lessons
worth passing on. Eric shared a few of them.
One is that requirements change all the time. The original delivermail
program had reliability as its primary focus - few things are more
hazardous to one's academic career than losing a professor's grant
proposal. Over time, the requirements shifted toward functionality and
performance; Sendmail had to scale up in speed and features as the Internet
took off. Then users were demanding protection from spam and malware; that
shifted Sendmail development toward keeping mail out. We have, Eric noted,
gone full circle toward unreliable mail service. After that came
requirements around legal and regulatory compliance - that is where a great
deal of Sendmail Inc.'s business lies. There is currently an increasing
focus on controlling costs, mobility, and social network integration.
Without the ability to adapt to meet these shifting requirements, Sendmail
would not have thrived through all these years.
With regard to Sendmail's design decisions, Eric said that some turned out
to be right, some were wrong, and some were right at the time but are wrong
now. One criticism that has been made is that Sendmail is an overly
general solution; it can route and rewrite messages in ways which are
generally unneeded in these days of Internet monoculture. Eric defended
that generality by saying that the world was in great flux when Sendmail
was designed; there was no way to really know how things were going to turn
out. And, he said, he would do it again: "the world is still ugly."
Rewriting rules for addresses are a part of that generality; even at the
time, it seemed like overkill, but he couldn't come up with anything
better. It was, he said, probably the right thing to do. That said, the
decision to use tabs as active characters was the stupidest thing he has
ever done. That's how makefiles did it, and it seemed cool at the time.
As a whole, he said, the concept was right, but the syntax and flow control
could have been a lot better. Even so, he's glad he did matching based on
tokens; basing Sendmail configuration around regular expressions would have
been far worse.
If he were doing the configuration system now, it would look a lot more
like the Apache scheme.
The message munging feature was needed for the rewriting of headers; it
facilitated interoperability between different networks. It is still used
a lot, he said, though it's arguably not necessary. Sendmail could benefit
from a pass-through mode which shorts out the message munging, but that
leaves open the question of what should be done with non-compliant
messages. Should they be fixed, rejected, or just dropped? There is, he
said, no obvious answer.
The embedding of SMTP and queueing in the mail daemon was the right thing
to do; he does not agree with the Postfix approach of proliferating lots of
small daemons. The queue structure itself involves two files for every
message: one with the envelope, and one with the body. That forces the
system to scan large numbers of small files on a busy system, which is not
always optimal. At the time it was the right way to go; now he would
probably use some sort of database for the envelopes. The decision to use
plain text for all internal files was right, though; it makes debugging
much easier.
With regard to the use of the m4 macro preprocessor for configuration, Eric
admitted that the syntax is painful. But he needed a macro facility and
didn't want to reinvent the wheel. The "damned dnl lines" for comments
were a mistake, though, and completely unnecessary. In summary, some sort
of tool was needed; m4 might not have been the best choice, but it's not
clear what would have been.
With regard to extending or changing features: Sendmail has tended toward
extending features and maintaining compatibility, and that has not always
been the right thing to do. The hostname masquerading facility was one
example; that feature was simply done wrong the first time around. Rather
than fixing it, though, Eric papered over the problems with new
features. It would have been better to inflict some short-term pain on
users, perhaps aided by a migration tool, and be done with it. The
unwillingness to replace
mistaken features has a lot to do with why Sendmail is difficult to
configure.
Sendmail goes out of its way to accept and fix bogus input; that was in
compliance with the robustness principle ("be conservative in what you send
but liberal in what you accept") that was widely accepted at the time. It
increases interoperability, but at the cost of allowing broken software to
persist indefinitely, leading to large costs down the road. Nonetheless,
it was the right idea at the time for the simple reason that
everything was broken then. But he should have tightened things up
later on.
What would he have done differently? At the top of the list is trying to
fix problems as soon as possible. These include tabs in the configuration
file and the V7 mailbox format. He's really tired of seeing
">From" in messages; he said he could have fixed it and
expressed his apologies for not having taken the opportunity. He would
make more use of modern tools; Sendmail has its own build script, which is
not something he would do today. He would use more privilege separation,
though he would not go as far as Postfix. He would have made a proper
string abstraction; strings are by far the weakest part of the C language.
There are also a number of things he would do the same, starting with the
use of C as the implementation language. It is, he said, a dangerous
language, but the programmer always knows what is going on.
Object-oriented programming, he said, is a mistake; it hides too much.
Beyond that, he would continue to do things in small chunks. The
creation of syslog (initially as a way of getting debugging
information out) was obviously the right thing to do; he was surprised that
there was no centralized way of dealing with logging data on Unix systems.
He would still
implement rewriting rules, albeit with a different syntax. And he would
continue not to rely too heavily on outside tools. There is a cost to
adding dependencies on tools; sometimes it's better to just build what you
need. There are, he said, projects using lex when all they really
need is strtok().
There were a number of "takeaways" to summarize the talk:
- The KISS (keep it simple, stupid) principle works.
- If you don't know what you are doing, advance designs will
not help.
- The world is messy, just plan on it.
- Flexibility trumps performance when the world changes every day.
- Fix things early; your installed base will only get larger if
you succeed, and the pain of not fixing things will only get worse.
- Use plain text for internal files and protocols.
- Good documentation is the key to broad acceptance; most projects, he
said, have not yet figured this out.
The talk was evidently based on a chapter from an upcoming book on the
architecture of open-source applications.
One member of the audience asked Eric which MTA he would recommend for new
installations today. His possibly surprising answer was Postfix. He talked
a lot with Postfix author Wietse Venema during its creation, and was
impressed. Postfix is, he said, nice work, even if he doesn't agree with
all of the design decisions that were made.
(
Log in to post comments)