LWN.net Logo

Sendmail X

Sendmail has a difficult reputation. It is the canonical example of how large, complex programs are subject to security problems. It has a configuration file format which makes obfuscated Perl code seem highly readable by comparison. Its performance when dealing with large amounts of mail is held to be inferior. One could, of course, point out that sendmail's security problems appear to be mostly behind it, that few people ever have to look at the raw configuration file, and that sendmail was a cherished gift, once upon a time, to anybody who had ever tried to convince delivermail to route a message along a uucp bang path, by way of the Arpanet, from a CSNet node. For all of its blemishes, sendmail has been a crucial and valuable part of the network's infrastructure for many years.

After all those years, however, sendmail may just be due for a major upgrade. As it turns out, work on the next generation of sendmail, called sendmail X, has been under way for some time. Some early code has been made available; sendmail X 0.0.16 is available from this page. Do note that it is billed as "pre-alpha" code; using it on a server which handles real mail is probably not a good idea.

A lengthy design document for sendmail X is available; it gives some insight into what the next version of sendmail will look like. The first impression that comes out is that sendmail X will be so different that one wonders why the "sendmail" name is being used at all. Sendmail X is a completely new mail transfer agent, redesigned and rewritten from the beginning.

As is the norm for contemporary MTA design, sendmail X is implemented as a set of (relatively) small, cooperating processes. The system is divided in this way:

  • The queue manager is the core of sendmail X; its job is to manage messages as they move through the system, make delivery decisions, etc.

  • The SMTP server accepts incoming mail from the net and passes them to the queue manager. Actually, the queue manager is involved throughout the SMTP conversation; it is consulted on whether to accept the connection in the first place, and it may have actually delivered the mail before the text is acknowledged.

  • The SMTP client passes mail on to other systems for delivery.

  • The address resolver is charged with understanding - and rewriting - recipient addresses. This process also handles DNS blacklisting and other types of address-based filtering.

  • The master control program gets all of the other processes going and handles termination, restarts, and crash recovery. This program is actually derived from the BSD inetd source.

In addition, there will be a collection of local delivery agents, mail filter processes, etc.

Much thought has been given to performance, to the point that may cause some to wonder if there might be some premature optimization going on. For example, the SMTP server has been designed to use an Apache-style mode, where multiple processes exist, each of which runs several server threads. This design will certainly add complexity to the server, but few sites are likely to benefit from the associated performance increase.

System administrators will be glad to know that the sendmail.cf configuration file is gone. Sendmail X will use a C-like configuration syntax, similar to that used by BIND. Configuration of real-world mail systems will, perhaps, never be an entirely simple task, but sendmail X should be easier to set up than its predecessor.

Unsurprisingly, security is said to be a core design goal. The multi-process design is clearly motivated by security concerns, though the relatively high level of interaction between these processes may complicate things. The qmail design, for example, has a far lower level of interaction and trust between its components - though that approach leads to problems of its own. There are no setuid programs in sendmail X. It is necessary to run the master control program as root; it then handles any privileged tasks that it can before starting the subsidiary processes under a different user ID. Thus, for example, it binds to the SMTP port before starting the SMTP server. Since the master control program does not actually handle mail or communicate with the outside, it should be relatively hard to compromise.

The code consists of almost 600 C files. In some ways it resembles the qmail code; it has many short files with reimplementations of many functions normally found in the C library. A special string type is used to avoid buffer overruns. A casual look suggests that the code really is being written with security in mind. That much new code is sure to have a surprise or two in it somewhere, however.

The author of sendmail X (Claus Aßmann) claims to have been running it since the beginning of the year without losing any mail. Even so, it will probably be some time before it is put forward as a viable option for production sites. What happens then will be interesting. Sendmail X will be jumping into an environment where several other options exist and are in wide use. The MTA ecosystem has, over the years, gone from being a single-program monoculture to a diverse field with several alternatives. Sendmail X will have to be significantly better than those alternatives, and much better than sendmail 8, to be widely successful in that environment.

(Thanks to Xose Vazquez Perez for drawing our attention to this project).


(Log in to post comments)

Sendmail X

Posted Nov 11, 2004 6:11 UTC (Thu) by aspa (guest, #4299) [Link]

the high-level architecture has a resemblance to Postfix. well, if it wasn't invented here and you
like re-inventing the wheel, then i guess it makes a lot of sense.

Sendmail X

Posted Nov 11, 2004 18:03 UTC (Thu) by alq666 (guest, #11220) [Link]

Indeed it's very odd that someone would consider rewriting sendmail a good 5 years after postfix became available.

The postmaster bit in haskell looks much more fun to hack.

Sendmail X

Posted Nov 11, 2004 11:34 UTC (Thu) by minichaz (subscriber, #630) [Link]

Why write something like this in C?

Don't get me wrong: C is great for low-level stuff but I don't see why a mail server would need to be written in such a potentially dangerous and difficult language. Perhaps a few of the performance critical parts but not the whole thing.

(This isn't a flame, really.) :)

Charlie

Try Postmaster, an MTA written in Haskell?

Posted Nov 11, 2004 14:48 UTC (Thu) by shapr (subscriber, #9077) [Link]

Postmaster is an MTA written in Haskell - http://postmaster.cryp.to/
If you're familiar with Haskell, Postmaster is quite a powerful tool.

It's fun to compare a Haskell implementation with a C implementation.
Postmaster's included libraries are RFCs 2234, 2821, and 2822, DNS, syslog, and openssl. Other than that, Postmaster is a single 28k file, 867 lines with whitespace and comments. Minus lines that are only comment or whitespace, Postmaster.hs is only 532 lines!
I wonder how other MTAs compare terms of source code size.

Try Postmaster, an MTA written in Haskell?

Posted Nov 11, 2004 17:41 UTC (Thu) by bronson (subscriber, #4806) [Link]

I wonder how Postmaster compares to other MTAs in terms of functionality... Not well it would appear. And that config file looks just painful to administer in the real world.

Try Postmaster, an MTA written in Haskell?

Posted Nov 11, 2004 18:26 UTC (Thu) by dcoutts (subscriber, #5387) [Link]

> And that config file looks just painful to administer in the real world.

That's not a configuration file format, that's a programming lanuage! It's a full featured mature functional programming language.

I guess the pain factor depends on how complicated your setup is. For a simple setup a programming lanugage as config file is probably overkill, but for a highly customised setup it's probably much simpler than some ordinary inexpressive config description format. I head that sendmail is pretty configurable and it's config file format can get pretty hairy. :-)

Try Postmaster, an MTA written in Haskell?

Posted Nov 11, 2004 22:51 UTC (Thu) by cdmiller (subscriber, #2813) [Link]

Our Qmail config files total only 11 lines...

Why C?

Posted Nov 11, 2004 19:29 UTC (Thu) by ncm (subscriber, #165) [Link]

Good question. It's easy to write code that is proof against buffer overruns in C++, with no cost to performance. Furthermore, the standard library probably has the data structures you need, implemented with much greater care than you could afford to do yourself.

It's hard to imagine why anything other than a kernel module is written in raw C any more. It's easy, though, to see why someone wouldn't want a long-running daemon with a garbage collector.

Why not Garbage Collection?

Posted Nov 11, 2004 19:50 UTC (Thu) by shapr (subscriber, #9077) [Link]

There are many incremental garbage collection algorithms, why would they be unsuited for a long-running daemon?

I've used quite a few long-running processes (like webservers) that use generational garbage collection (Haskell, GHC). They work fine.
Java uses GC, and it has some success with long-running daemons also.
Python uses reference-counting, which I think counts as GC, and it also has some success with long-running daemons.

Does GC tend to leak memory or something? What disadvantages do you know about that I might not?

Why C?

Posted Nov 12, 2004 15:44 UTC (Fri) by dps (subscriber, #5725) [Link]

It is claimed that

>It's easy to write code that is proof against buffer overruns in C++, with
>no cost to performance. Furthermore, the standard library probably has the
>data structures you need, implemented with much greater care than you >could afford to do yourself.

This is *completely* wrong in my experience. I can not write C++ code with the same efficiency as C code. In particular C++ std::string wastes cycles on copies or reference counts. In my expiernece C++ requires things like reference counts if you want to avoid nasty suprises (something goes out of scope somewhere and disappears prematurely).

You can get smaller and faster code by making sure every persistent object has an owner, who is responsible for freeing it or passing it on to someone else. This is not much more complex either, unless your design is broken.

I avoid the C++ algorithms and data structures because how they are implemented is too vague for my taste---I choose algorithms and data structures based on the data and how my program uses it. It is also The implementation really does matter, despite what the OO people claim.

Sendmail X

Posted Nov 11, 2004 20:34 UTC (Thu) by einstein (subscriber, #2052) [Link]

I switched to postfix a year or so ago, and haven't looked back. However
this does look interesting. Perhaps the end result will be something like
postfix, but with a functional "sendmail -bv" command?

One can hope.

Sendmail X

Posted Nov 12, 2004 22:11 UTC (Fri) by hildeb (subscriber, #6532) [Link]

Postfix has this. Try sendmail -bt.

Sendmail X

Posted Nov 12, 2004 16:45 UTC (Fri) by salex (subscriber, #4814) [Link]

> sendmail was a cherished gift, once upon a time, to anybody who had
> ever tried to convince delivermail to route a message along a uucp
> bang path, by way of the Arpanet, from a CSNet node.

Cherished gift is a bit strong. I hacked delivermail to add new addressing formsts and it certainly could have used a few more comments. Nonetheless, sendmail was like a general fighting the previous war. Within a year of its introduction in a BSD release, techniques like UUCP routing gateways had gotten rid of most of the ugliness in addressing. The % address hack tended to take care of the rest of them. Yet we were left (for the next 20 years so far) with a configuration file format that maintained such generality that many administrators were left with the option of either trying to find someone else's configuration file that was close enough (reducing the benefits of generality) or risking losing mail.

Even when it was released, my recollection of its reception was much more questioning why such a bloated, difficult system rather than receiving it as a cherished gift.

Best,
Scott

Sendmail X

Posted Nov 15, 2004 21:57 UTC (Mon) by Ross (subscriber, #4065) [Link]

Not to mention the configuration syntax: intentionally terse because it was
also the internal representation which was reparsed constantly. That's a bad
design decision. You could argue that RAM was expensive in those days but
even so a real configuration syntax could have been pre-processed into
another on-disk format with similar density. They kind of retrofitted this
later with the m4 templates but they aren't that great.

Sendmail X

Posted Nov 26, 2004 10:26 UTC (Fri) by BlueBird (guest, #26263) [Link]

And it does not look like it is going to improve. How could "something similar to C" be seen as a good configuration langage ?

The history shows us that csh for example is far less successful than bash and sh. Or emacs, with its configuration file in lisp is very difficult to customise and it is a common practice to "steal" .emacs around in order to get a feature working.

In my opinion, there are only two choices possible for a configuration file:
- configuration is quite simple: then you should go for key/value pairs, like in .desktop files, like in .ini files (it was not invented by Microsoft, don't worry), like in lilo.conf, etc. This is simple to parse, simple to modify, simple to generate, simple to analyse. If you need a more advanced strucutre, you can add sections to separate entries. It scales very well (but should a configuration file really scale ?)

- configuration is really complex: web servers and mail servers come to my mind. In that case, you should go for xml. It is easy to modify, quite easy to generate, verify, analyse, parse and it can support complexity.

Anything else is about re-inventing the wheel and creating a situation where less program and less people will be able to deal with your configuration files.

Sendmail X

Posted Nov 29, 2004 14:07 UTC (Mon) by mwilck (guest, #1966) [Link]

Isn't XML just an obfuscated LISP dialect? Basic LISP constructs should be understandable to anyone who understands XML.

.emacs isn't hard because the format is difficult, but because it is so powerful, and because the available options are poorly documented. If you rewrote .emacs in XML with the same functionality and documentation, I bet people would still rather "steal" it than rewrite it from scratch.

Sendmail X or Postfix ?

Posted Nov 24, 2004 22:04 UTC (Wed) by adulau (guest, #1131) [Link]

The design detailed is clearly very similar to Postfix.

http://ftp.easynet.be/postfix/OVERVIEW.html

If you have time, take a look at the source code of Postfix and you will
see how we should write software. Comments are good and well-balanced, structure is clean and simple, every exception is detailed...

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds