LWN.net Logo

LCA: Lessons from 30 years of Sendmail

By Jonathan Corbet
February 2, 2011
The Sendmail mail transfer agent tends to be one of those programs that one either loves or hates. Both its supporters and its detractors will agree, though, that Sendmail played a crucial role in the development of electronic mail before, during, and after the explosion of the Internet. Sendmail creator Eric Allman took a trip to Brisbane to talk to the LCA 2011 about the history of this project. Sendmail is, he said, 30 years old now; in those three decades it has thrived without corporate support, changed the world, and thrived in a world which was changing rapidly around it.

The history

Sendmail had its start at the University of California, Berkeley, in 1980; it was initially something Eric did while he was supposed to be working on the Ingres relational database management system. In those days, the Computer Science department had a dozen machines, but the main system was "Ernie CoVAX," which was accessed via ASCII terminals. There was a limited number of ports, so users had to connect via a patch panel in the mail room; contention for available ports was often intense.

Things got more interesting when the Ingres project got an ARPAnet connection; a single PDP11 machine, with two ports, was the only way to access the net at that time. There was no way the entire department was going to share those two ports without somebody getting hurt, so another solution was required. Eric looked at the problem, concluded that what everybody really wanted was the ability to send mail through the gateway machine, and decided that he would make a way to access email from other machines on campus. From this beginning delivermail was born.

There was a set of design principles that Eric adopted at that time. There was only one of him, so programming time was a truly finite resource. Redesigning user agents and mail stores was out of the question. Delivermail had to adapt to the world around it, not the other way around. [Eric Allman] The resulting program worked, but was not without its problems. The compiled-in configuration lacked flexibility, there was no address translation as messages moved between networks, and the parsing was simple and opaque. But it succeeded in moving mail around and giving the entire department access to the net.

Then the department got the BSD contract. Bill Joy needed a mail transfer agent to connect to the network, so he talked Eric into taking on the job. After all, how hard could it be? Among other things, the new MTA needed to support the SMTP mail protocol - which wasn't specified yet. Supporting SMTP also forced the addition of a mail queue, a job which turned out to be much harder than it looked. Eric hacked away, and Sendmail was shipped with 4.1BSD in 1982 with support for SMTP, header rewriting, queueing, and runtime configuration.

After that, Eric left Berkeley for a "lucrative" (heavy on the quotes) career in industry. Sendmail, meanwhile, was picked up by the Unix vendors. The Unix wars were in full force at that time; the inevitable result was a proliferation of different versions of Sendmail. The program became balkanized and incompatible across systems.

Eric returned to Berkeley in 1989 and started hacking on Sendmail again; the immediate need was support for the ".cs" subdomain at the university. That work snowballed into a major rewrite culminating in Sendmail 8; this version integrated a great deal of code from both the industry and the community. It added support for ESMTP, a number of new protocols, delivery status notifications, LDAP integration, eight-bit mail, and a new configuration package. Uptake increased after the Sendmail 8 release as a result of these features, but also as the result of the publication of the O'Reilly "bat" book. Documentation, it turns out, really matters.

Sendmail Inc. was created in 1998 with the fantasy that it would let Eric get back to coding. In reality, starting a company is more about marketing, sales, and money than about technology - a lesson many of us have learned. It was one of the first companies trying to mix open source and proprietary offerings; in those days, the prevailing wisdom is that a company needed proprietary lock-in to have any chance of success. Over time, though, functionality migrated to the free version; thus Sendmail gained support for encryption, authentication, milters (mail filters), virtual hosting, spam filtering, and more. And that's where things stand today.

Lessons learned

As one might expect, 30 years of experience have led to a number of lessons worth passing on. Eric shared a few of them.

One is that requirements change all the time. The original delivermail program had reliability as its primary focus - few things are more hazardous to one's academic career than losing a professor's grant proposal. Over time, the requirements shifted toward functionality and performance; Sendmail had to scale up in speed and features as the Internet took off. Then users were demanding protection from spam and malware; that shifted Sendmail development toward keeping mail out. We have, Eric noted, gone full circle toward unreliable mail service. After that came requirements around legal and regulatory compliance - that is where a great deal of Sendmail Inc.'s business lies. There is currently an increasing focus on controlling costs, mobility, and social network integration. Without the ability to adapt to meet these shifting requirements, Sendmail would not have thrived through all these years.

With regard to Sendmail's design decisions, Eric said that some turned out to be right, some were wrong, and some were right at the time but are wrong now. One criticism that has been made is that Sendmail is an overly general solution; it can route and rewrite messages in ways which are generally unneeded in these days of Internet monoculture. Eric defended that generality by saying that the world was in great flux when Sendmail was designed; there was no way to really know how things were going to turn out. And, he said, he would do it again: "the world is still ugly."

Rewriting rules for addresses are a part of that generality; even at the time, it seemed like overkill, but he couldn't come up with anything better. It was, he said, probably the right thing to do. That said, the decision to use tabs as active characters was the stupidest thing he has ever done. That's how makefiles did it, and it seemed cool at the time. As a whole, he said, the concept was right, but the syntax and flow control could have been a lot better. Even so, he's glad he did matching based on tokens; basing Sendmail configuration around regular expressions would have been far worse.

If he were doing the configuration system now, it would look a lot more like the Apache scheme.

The message munging feature was needed for the rewriting of headers; it facilitated interoperability between different networks. It is still used a lot, he said, though it's arguably not necessary. Sendmail could benefit from a pass-through mode which shorts out the message munging, but that leaves open the question of what should be done with non-compliant messages. Should they be fixed, rejected, or just dropped? There is, he said, no obvious answer.

[Eric Allman] The embedding of SMTP and queueing in the mail daemon was the right thing to do; he does not agree with the Postfix approach of proliferating lots of small daemons. The queue structure itself involves two files for every message: one with the envelope, and one with the body. That forces the system to scan large numbers of small files on a busy system, which is not always optimal. At the time it was the right way to go; now he would probably use some sort of database for the envelopes. The decision to use plain text for all internal files was right, though; it makes debugging much easier.

With regard to the use of the m4 macro preprocessor for configuration, Eric admitted that the syntax is painful. But he needed a macro facility and didn't want to reinvent the wheel. The "damned dnl lines" for comments were a mistake, though, and completely unnecessary. In summary, some sort of tool was needed; m4 might not have been the best choice, but it's not clear what would have been.

With regard to extending or changing features: Sendmail has tended toward extending features and maintaining compatibility, and that has not always been the right thing to do. The hostname masquerading facility was one example; that feature was simply done wrong the first time around. Rather than fixing it, though, Eric papered over the problems with new features. It would have been better to inflict some short-term pain on users, perhaps aided by a migration tool, and be done with it. The unwillingness to replace mistaken features has a lot to do with why Sendmail is difficult to configure.

Sendmail goes out of its way to accept and fix bogus input; that was in compliance with the robustness principle ("be conservative in what you send but liberal in what you accept") that was widely accepted at the time. It increases interoperability, but at the cost of allowing broken software to persist indefinitely, leading to large costs down the road. Nonetheless, it was the right idea at the time for the simple reason that everything was broken then. But he should have tightened things up later on.

What would he have done differently? At the top of the list is trying to fix problems as soon as possible. These include tabs in the configuration file and the V7 mailbox format. He's really tired of seeing ">From" in messages; he said he could have fixed it and expressed his apologies for not having taken the opportunity. He would make more use of modern tools; Sendmail has its own build script, which is not something he would do today. He would use more privilege separation, though he would not go as far as Postfix. He would have made a proper string abstraction; strings are by far the weakest part of the C language.

There are also a number of things he would do the same, starting with the use of C as the implementation language. It is, he said, a dangerous language, but the programmer always knows what is going on. Object-oriented programming, he said, is a mistake; it hides too much. Beyond that, he would continue to do things in small chunks. The creation of syslog (initially as a way of getting debugging information out) was obviously the right thing to do; he was surprised that there was no centralized way of dealing with logging data on Unix systems. He would still implement rewriting rules, albeit with a different syntax. And he would continue not to rely too heavily on outside tools. There is a cost to adding dependencies on tools; sometimes it's better to just build what you need. There are, he said, projects using lex when all they really need is strtok().

There were a number of "takeaways" to summarize the talk:

  • The KISS (keep it simple, stupid) principle works.
  • If you don't know what you are doing, advance designs will not help.
  • The world is messy, just plan on it.
  • Flexibility trumps performance when the world changes every day.
  • Fix things early; your installed base will only get larger if you succeed, and the pain of not fixing things will only get worse.
  • Use plain text for internal files and protocols.
  • Good documentation is the key to broad acceptance; most projects, he said, have not yet figured this out.

The talk was evidently based on a chapter from an upcoming book on the architecture of open-source applications.

One member of the audience asked Eric which MTA he would recommend for new installations today. His possibly surprising answer was Postfix. He talked a lot with Postfix author Wietse Venema during its creation, and was impressed. Postfix is, he said, nice work, even if he doesn't agree with all of the design decisions that were made.


(Log in to post comments)

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 16:48 UTC (Wed) by fuhchee (subscriber, #40059) [Link]

Did Eric mention the security problems in sendmail history?

Security

Posted Feb 2, 2011 17:17 UTC (Wed) by corbet (editor, #1) [Link]

Nope, no mention of that. It occurred to me to ask a security-related question, but, by then, it was too late.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 21:20 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

it is worth noting that the vast majority of those problems were prior to the rewrite for version 8.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 21:34 UTC (Wed) by eisenbud (subscriber, #13153) [Link]

Seriously? Anyone who ran sendmail in the mid to late '90s can't help but remember the "Sendmail bug of the month club." Remote root exploits all the time. This was well into the sendmail 8 days. This was a big incentive for things like qmail and postfix to start from scratch with a much better security architecture.

Any look at "lessons from 30 years of sendmail" that doesn't include the terrible security story isn't serious at all.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 21:46 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

I started using sendmail in the mid 90's and by that time it wasn't significantly worse than all the other software on the systems (linux and AIX) that I was running.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 22:13 UTC (Wed) by nix (subscriber, #2304) [Link]

To be blunt its security history is not terrible for a C program, particularly not for an old one. There've been only two or three actually exploitable sendmail bugs in the thirteen years I've been running it. In comparison there have been hundreds and hundreds of wireshark bugs, kernel bugs, bind bugs, you-name-it bugs. Even exim seems to have had more holes recently than sendmail (and it hasn't had many). BIND's security history is much, much worse, despite a similar rewrite-for-security, and BIND is much more critical to Internet function and much more exposed to the whole wide world than sendmail is.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 22:54 UTC (Wed) by HelloWorld (subscriber, #56129) [Link]

> To be blunt its security history is not terrible for a C program
Yet another reason not to use C. For anything.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 0:32 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Yet another reason not to use C. For anything.

OK, sure. Avoiding C magically fixes security problems. Not.

Avoiding C greatly reduces the risk of certain security problems (buffer-overflow, stack smashing) assuming the non-C language is implemented securely. It does nothing about other security problems like race conditions, unsafe /tmp files, incorrect input sanitization (eg, SQL injection problems), etc, etc....

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 1:09 UTC (Thu) by HelloWorld (subscriber, #56129) [Link]

> OK, sure. Avoiding C magically fixes security problems.
Did I claim this was the case? No, I didn't. But it does help, and not just because more modern languages are memory-safe.

> Avoiding C greatly reduces the risk of certain security problems (buffer-overflow, stack smashing) assuming the non-C language is implemented securely. It does nothing about other security problems like race conditions, unsafe /tmp files, incorrect input sanitization (eg, SQL injection problems), etc, etc....
Yeah, except that it does. Modern languages actually do help with these problems. A sanitized string is basically a subtype of a unsanitized string: you can use it everywhere where an unsanitized string can be used, but you can't use an unsanitized string where a sanitized string is required. Too bad C's type system doesn't support subtyping. Similarly, race conditions are much less likely in a language that supports threading in a sensible way. The Rust language for example forces all inter-thread communication to be explicit, they use a concept named "channel" for this.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 4:44 UTC (Thu) by wahern (subscriber, #37304) [Link]

In C there's a concept called `FILE' for this. In Unix (C, shell) it's called `pipe'. `Threads' are processes where all `inter-thread' communication must be done through one of those devices. Shared-memory parallel programming with manual synchronization was popularized by Windows and C++ coders. Why? Because `processes' were too heavy weight, too slow, and too cumbersome... kinda like the excuses people use when they don't want to use a scripting language.

C doesn't have a sophisticated typing system, but that has little to do with the issue of not being able to write a "sanitized string" type. Just the idea of sanitized string sounds wrong to me. If it's not an ASCII, NUL-terminated array of characters, then it's not properly called a string in C terminology.

Type abstraction is often the root cause of security bugs. For example, you could treat a password as a sub-type of string. But strings as commonly understood almost universally support the concept of truncation. But if you truncate a password horrible security repercussions result. So why would you want to treat it like a string at all?

Strings also usually support the notion of concatenation. So take HTML. You could keep an HTML document as a string, but HTML has a hierarchical structure, and prepending or appending data to a complete document results in garbage.

Just exclude those methods, one says. Well (1) the fact that you must exclude already exposes you to mishaps and bugs the same as forgetting to bounds check an operation in C, and (2) what are you left with when you exclude all the unnecessary and possibly dangerous operations? Not much.

Buffer overflows in C usually are the result of people trying to treat everything like a string; they want to slurp data into a string or array, and then manipulate it in that form. That's a horrible way to write software, whether in C or any other language. It just so happens that if you do it in C you're susceptible to more attacks than if you do it in, say, Java; but the solution isn't to write the bad code in Java; it's to stop writing that kind of code at all.

Using the best language for the particular job also helps. Writing parsers has always been easier for me to do in C because of pointers, and the ability to write very concise state machines. A parser is really a way to consume a string, character by character, and transform it into some other structure. So when people tell me that handling strings in C is more difficult, I don't know how to respond. It's certainly more difficult to juggle and manipulate strings in C. But if that's how you're processing string input in any language, you're probably doing it wrong. If I'm parsing an e-mail message, I'll construct a tree of objects by consuming a stream of characters. I may store the message, or parts of it, as a character "string", but only as an immutable object that I never need to manipulate; outputting it later, if necessary. So I rarely care about the difficulty of manipulating strings in C, because I rarely need to do that.

I tend to use general purpose scripting languages for things _other_ than string processing, like executing complex rules or transformations of structures of objects. For still other things domain specific language are preferable.

Of course, if all you want to do is hack out a script to process some data (as Perl is popular for), then have at it. But don't fool yourself that your script is any more secure than if written in C. There are probably several times more remote execution bugs in scripting language built applications than C applications, just because of improper use of strings.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 10:00 UTC (Thu) by Thomas (subscriber, #39963) [Link]

"For example, you could treat a password as a sub-type of string. But strings as commonly understood almost universally support the concept of truncation. But if you truncate a password horrible security repercussions result. So why would you want to treat it like a string at all?"

You got it the wrong way round. Strings [a concatenation of bytes] can support truncation but don't have to.

Cheers,
T.

String manipulation bugs

Posted Feb 6, 2011 19:08 UTC (Sun) by man_ls (guest, #15091) [Link]

(2) what are you left with when you exclude all the unnecessary and possibly dangerous operations? Not much.
Immutable strings. As seen in Java, Python or Lua. Safe, flexible, and only occasionally slow enough to use other options. If you remove the main cause for the most common security bug, and nobody complains, then in my book that is a good decision.
There are probably several times more remote execution bugs in scripting language built applications than C applications, just because of improper use of strings.
String manipulation bugs I can live with. Security holes are unacceptable. A language where every bug must be considered a security bug is too hard for me.

LCA: Lessons from 30 years of Sendmail

Posted Feb 12, 2011 23:07 UTC (Sat) by ofranja (subscriber, #11084) [Link]

"Type abstraction is often the root cause of security bugs. For example, you could treat a password as a sub-type of string. But strings as commonly understood almost universally support the concept of truncation. But if you truncate a password [...]"

I think you wanted to say LACK of abstraction.

If password is not exactly a string, you should have created a "password" type with proper operations and associated semantics.

Do not ever consider "C" as an example of "complete type system", unless you also consider a Ford T an modern vehicle.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 21:05 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Did I claim this was the case? No, I didn't. But it does help, and not just because more modern languages are memory-safe.

They also tend to be slower (often a lot slower) and more memory-hungry. Unfortunately, if you want the ultimate in performance on a UNIX-like system, you're stuck with assembly, C or C++. Assembly is clearly the wrong tool for most applications, and IMO C++ is a monstrosity that adds a ton of garbage to C without really making it any safer.

I can see writing most things in safer languages. But your MTA has to be fast and efficient, which is why we don't see many (any?) widely-used MTAs written in anything but C or C++.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 21:34 UTC (Thu) by foom (subscriber, #14868) [Link]

> But your MTA has to be fast and efficient

I'd dispute that: I bet you could write an MTA in *Ruby* and it would be perfectly good on today's computers for all but the largest sites. There's not that much mail volume, and computers have gotten a lot faster...

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 21:44 UTC (Thu) by dskoll (subscriber, #1630) [Link]

I bet you could write an MTA in *Ruby* and it would be perfectly good on today's computers for all but the largest sites.

Go for it. Report back when done.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 22:06 UTC (Thu) by foom (subscriber, #14868) [Link]

No thanks. I have no interest in writing a new MTA, nor in writing any Ruby code. :)

But I will point out that many large mailing list installations run on mailman, which is written completely in Python. That seems to be performant enough.

And spam filtering of email content is frequently done with spamassassin -- written in Perl.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 22:40 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

Mailman is not a MTA.

it manages the list of users that the MTA will send mail to (a low overhead activity)

it validates messages sent to the list (it handles each message once, before it gets multiplied by potentially several orders of magnitude as it's sent out)

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 11:08 UTC (Fri) by paravoid (subscriber, #32869) [Link]

Have a look at Lamson, then.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 19:42 UTC (Fri) by dskoll (subscriber, #1630) [Link]

From the "About Lamson" page:

However, as great as Lamson is for processing email intelligently, it isn’t the best solution for delivering mail. There is 30+ years of SMTP lore and myth stored in the code of mail servers such as Postfix and Exim that would take years to replicate and make efficient. Being a practical project, Lamson defers to much more capable SMTP servers for the grunt work of getting the mail to the final recipient.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 0:17 UTC (Sat) by dskoll (subscriber, #1630) [Link]

And spam filtering of email content is frequently done with spamassassin -- written in Perl.

Indeed so. We use Perl (including SpamAssassin) to filter our mail.

In terms of memory size, SpamAssassin on our system is about 70MB per instance vs. about 9MB for Sendmail. (To be fair, we use a lot of other Perl modules apart from SpamAssasssin in our filter.) And when it comes to performance, SpamAssassin is so slow relative to Sendmail that Sendmail becomes completely negligible. Bolting anything in Perl onto Sendmail is like towing a five-ton truck with a motorcycle. :)

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 0:43 UTC (Sat) by foom (subscriber, #14868) [Link]

Yes. Perl is an order of magnitude slower than C. Furthermore, spam filtering is inherently a harder job than mail routing. Yet, Spamassassin is *still* fast enough! Thus my claim: MTAs don't need to be written in C.

But writing a new MTA from scratch now is pretty pointless, no matter what language it's in.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 17:13 UTC (Sat) by dskoll (subscriber, #1630) [Link]

Thus my claim: MTAs don't need to be written in C.

That's probably true, 99% of the time. But again, MTA authors tend to worry a lot about performance and tend to write their software to cope with huge amounts of mail. I believe that's the correct approach because even a small site can suddenly get a huge spike in traffic for various reasons (eg, a spammer does a massive joe-job spam run.) You don't really want your email to fall over.

LCA: Lessons from 30 years of Sendmail

Posted Feb 26, 2011 13:07 UTC (Sat) by job (guest, #670) [Link]

Spam filtering is very different from general mail routing. You might find a better example in qpsmtpd.

It is (almost) pure Perl and in use at quite a few large installations including the Apache project.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 21:58 UTC (Thu) by HelloWorld (subscriber, #56129) [Link]

Assembly is clearly the wrong tool for most applications [...]But your MTA has to be fast and efficient, which is why we don't see many (any?) widely-used MTAs written in anything but C or C++.
When C was invented, compilers were primitive, and it was trivial to produce assembly code that would run faster than the equivalent C code. Yet, people decided in the 1970s to port the single most performance-critical piece of code - the UNIX kernel - to C, because that made it portable and generally easier and more pleasant to program. Nowadays, machines are faster by orders of magnitude, and there are actually quite a few safe programming languages that allow one to produce efficient code: Go, D, ATS and many others. More are being developed, such as Rust. Yet, people nowadays refuse to accept a negligible overhead over C (say, 20%, which is the goal set by the Go language), and this seems just bizarre to me. I believe that the main reason for sticking with C is simply inertia. "We've always done it that way!"

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 23:38 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Yet, people nowadays refuse to accept a negligible overhead over C (say, 20%, which is the goal set by the Go language), and this seems just bizarre to me. I believe that the main reason for sticking with C is simply inertia. "We've always done it that way!"

I don't think that's the reason for sticking with C (it's not my reason, anyway.) Here are my reasons:

  • I have a lot of experience with C and I like it. There's certainly something to be said for familiarity.
  • C is old and well-tested. The C standardization committee has done a superb job of keeping C true to the original C spirit while adding useful improvements.
  • There are many C libraries and tools available, far more than for newer languages.
  • In some cases, a 20% performance hit isn't worth it. An MTA on a busy mail system is one of those cases. While it's true that most mail systems are not that busy, MTA authors rightly attempt to make their MTA as fast and reliable as possible.
  • MTAs in particular are an old and mostly solved problem. There's really no incentive for Yet Another MTA. Witness the extremely slow progress on Meta1. (Yeah, I knew you'd never heard of it. :))

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 2:51 UTC (Fri) by HelloWorld (subscriber, #56129) [Link]

I have a lot of experience with C and I like it. There's certainly something to be said for familiarity.
I don't mean to insult you, but this is exactly the mindset I mean. It's just another way of saying "because I always did it that way", really.
C is old and well-tested.
Being well-tested helps against bugs. It doesn't help one bit against design mistakes (except if you change the design, but that never happened to a meaningful extent for the C language (I guess that's what you meant with "keeping C true to the original C spirit"))
There are many C libraries and tools available, far more than for newer languages.
Any decent language has a foreign function interface for calling C functions. And at least some of them, like D, make it absolutely trivial to create bindings.
In some cases, a 20% performance hit isn't worth it. An MTA on a busy mail system is one of those cases. While it's true that most mail systems are not that busy, MTA authors rightly attempt to make their MTA as fast and reliable as possible.
No, they don't. C won't give you the fastest possible result, assembly will (portability is not an excuse, use a portable assembly language like LLVM assmbly). Yet, people seem to think that C is somehow the be-all and end-all of programming languages when it comes to performance - it's preposterous!

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 3:19 UTC (Fri) by dskoll (subscriber, #1630) [Link]

I don't mean to insult you, but this is exactly the mindset I mean.

Meh... everyone tends to like tools he/she is familiar with. That's just human nature. Maybe some Perl or Ruby fanatics will eventually write an MTA in their language...

Any decent language has a foreign function interface for calling C functions.

But... but... then you're back in dangerous territory, no?

No, they don't. C won't give you the fastest possible result, assembly will (portability is not an excuse, use a portable assembly language like LLVM assmbly)

OK, now I know you're just arguing for the sake of argument. :) That's totally ridiculous and you know it.

Anyway... go ahead. Write a UNIX MTA in something other than C or C++ and see how it fares. The proof is in the pudding.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 12:59 UTC (Fri) by james (subscriber, #1325) [Link]

Using qpsmtpd for traps.spamassassin.org describes how SpamAssassin had been using Postfix on a donated server for their spamtraps.

Unsurprisingly, they were getting a lot of spam on that server, and postfix was having trouble keeping up. So they switched to qpsmtpd, "a flexible smtpd daemon written in Perl" which uses "Danga Interactive / Six Apart’s insanely scalable event-driven asynchronous socket class".

Justin reports: "results have been great… we now have a pure-perl system handling heavy volumes without breaking a sweat, certainly compared to the previous system."

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 16:07 UTC (Fri) by dskoll (subscriber, #1630) [Link]

I read that article. The problem with the Postfix solution was not Postfix, but the fact that they were filtering from a procmail-invoked Perl script. That's a huge fail.

We run our Perl spam-scanner in conjunction with Sendmail, but we use the Milter interface to keep persistent Perl scanners running. The bottleneck in this system is by far the spam-scanning; Sendmail doesn't even show up on the radar.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 1:23 UTC (Sat) by HelloWorld (subscriber, #56129) [Link]

Meh... everyone tends to like tools he/she is familiar with. That's just human nature.
This is another way of saying "I do it that way because I always did, but that's OK because everybody else does too.".
But... but... then you're back in dangerous territory, no?
Using a C library and writing the rest of the program in another language is still better than writing it all in C.
OK, now I know you're just arguing for the sake of argument. :) That's totally ridiculous and you know it
It's not, you just didn't understand my point. You said that MTA authors attempt to make their software as efficient as possible, which, together with the fact that most MTAs are written in C, implies that C is the language you can write the most efficient software in. But this simply isn't the case: C is just one spot among many, many, many others on the efficiency-vs.-comfort curve. More specifically, it is not on the performance maximum of that curve, since that's where assembly language is. Yet, many (most?) people seem to believe that C somehow hit the perfect spot for "systems" programming. Even if that was true in 1970, people should start getting comfortable with the idea that it's not any longer.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 1:51 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

Yet, despite C++ having been available for so long, there's still a distinct lack of widely-used operating systems written in it. Good system programmers are often tend to be the ones familiar with the kernel underlying their code, and that means having to have a good grasp of C regardless of what you'd prefer to code in. There's a natural selection pressure in favour of C even if there are arguably better choices.

(The first significant codebase I worked on was C++, and I've probably still written more Perl than I have C)

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 17:10 UTC (Sat) by dskoll (subscriber, #1630) [Link]

But this simply isn't the case: C is just one spot among many, many, many others on the efficiency-vs.-comfort curve. More specifically, it is not on the performance maximum of that curve, since that's where assembly language is.

It's not a smooth curve. The transition from C to assembly involves a *huge* increase in the difficulty curve with a relatively modest increase in the efficiency curve. C is a just-high-enough/just-low-enough level language to hit a sweet spot in efficiency-vs-difficulty.

But you already know this and are just arguing for the sake of it.

LCA: Lessons from 30 years of Sendmail

Posted Feb 6, 2011 1:37 UTC (Sun) by HelloWorld (subscriber, #56129) [Link]

> But you already know this and are just arguing for the sake of it.
No, I merely disagree with you. C doesn't sit in a sweet spot, as there are many, many useful features that could be added without compromising efficiency, like templates, modules, proper macros and many more. Anyway, discussing this with you is obviously pointless by now.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 7:34 UTC (Fri) by yoe (subscriber, #25743) [Link]

I find it interesting that you detract from C as an unsafe language by advocating for assembler.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 23:45 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I find it interesting that you detract from C as an unsafe language by advocating for assembler.

He didn't advocate for assembler. It was reductio ad absurdum -- if fast is the only consideration, you would use assembler. Since you don't use assembler, fast is not the only consideration.

Since fast is not the only consideration, maybe you should consider safety. Or lower development cost (with which you can probably pay for the extra hardware you need compared to the C or assembler implementation at the same speed).

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 2:07 UTC (Sat) by nybble41 (subscriber, #55106) [Link]

Actually, even if speed is your only consideration you are still likely to be better off using C rather than assembler where possible. A decent compiler can generate better-optimized code from C source than all but the best assembler programmers could write by hand.

That is not to say that there are no other relevant considerations, but other languages are not necessarily any safer or easier to develop in than C; it varies depending on the project. Sometimes there really is no other viable alternative. For example, the only other realistic systems programming language I know of is D, which (while otherwise a great language) is not as portable as C (yet), nor nearly as well established so far as libraries and tool support are concerned.

LCA: Lessons from 30 years of Sendmail

Posted Feb 26, 2011 13:16 UTC (Sat) by job (guest, #670) [Link]

You are being inflammatory. The choice of programming language is clearly not the most important part of secure coding. Otherwise I'd expect you to run a Python based sshd on your system, no?

C is good because it is trivial and easy to read when done properly. Token expansion or resource exhaustion won't creep up on you from the underlying libraries.

There are of course other aspects to it. Given the choice between a C daemon and a Python one, without knowing the program I too would probably pick the latter one, but it all boils down to prejudice.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 13:05 UTC (Thu) by nix (subscriber, #2304) [Link]

Of course buffer overflows are most of the security holes we've had in sendmail. BIND seems to be about evenly divided between that and DoS attacks (and I defy you to find a language that can avoid *those*.)

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 2:15 UTC (Thu) by bronson (subscriber, #4806) [Link]

Not this again. Do you never tire of pointless language arguments?

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 3:00 UTC (Thu) by djm (subscriber, #11651) [Link]

> Yet another reason not to use C. For anything.

Enjoy your world without C, due to arrive right after hydrogen-powered flying cars and free robot concubines.

C for anything

Posted Feb 3, 2011 4:10 UTC (Thu) by ncm (subscriber, #165) [Link]

No, he's right. Wright Flyers are grounded, trucks have entirely supplanted horse-drawn wagons, LCDs drove out CRTs, and does anybody use the x86 instruction set any... um.

C++ is clearly the better choice. You have keep the Java monkeys away, though, or you'll spend your life writing Get and Set functions, and calls to them. Fortunately Java monkeys are more attracted to Ruby and other four-letter languages.

(This is not meant to suggest that all Java programmers are monkeys. You, in particular, are the exception.)

C for anything

Posted Feb 6, 2011 15:22 UTC (Sun) by malor (subscriber, #2973) [Link]

Trust me on this: no, I'm not. :-)

Getters and setters

Posted Feb 6, 2011 23:30 UTC (Sun) by man_ls (guest, #15091) [Link]

You have keep the Java monkeys away, though, or you'll spend your life writing Get and Set functions, and calls to them.
That comment is a bit cruel, but it's right on target. I have been wondering for the last 5 years what all those getters and setters were buying us. On one hand you were supposed to use them instead of public attributes because you could change the way to access the particular attribute later on. On the other, you were not supposed to do weird things in getters or setters since it might be considered as a bad practice. Even simple encapsulation of another object can be frowned upon in certain circles, and it certainly does not help understand the code.

I turned to public attributes a few years ago and never regretted it. You know, I'm not against the occasional access method, and separating methods with side effects from methods that return values is an excellent practice. But mandating a whole level of indirection just in case, just for the sake of it? Why?

Then I found out that not everything is an object, and I stopped wondering. My Python skills have improved since I understood that simple truth.

Getters and setters

Posted Feb 11, 2011 12:26 UTC (Fri) by mattthecat (guest, #72858) [Link]

The monkeys are those who use Java that write get and set functions.
The few of us that recognize Java without the added c**p are not monkeys.

"Don't got to use no stinkin' getters or setters"

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 6:30 UTC (Thu) by cmccabe (guest, #60281) [Link]

There's nothing wrong with C. On the other hand, there's a lot of things wrong with writing a daemon that doesn't have proper privilege separation.

P.S. Higher level languages are vulnerable to a variety of attacks that C isn't. For example, eval-based attacks or SQL injection attacks. The solution to these problems is the same: validate user inputs carefully, and structure your application into different components that communicate by message passing, rather than a single giant blob.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 20:59 UTC (Sat) by leoc (subscriber, #39773) [Link]

IMHO the only decent solution to poor programs is to build better programmers who are not ignorant and don't take shortcuts. But of course good programmers and good code cost more time (and money) which is why the market does not select for those traits.

LCA: Lessons from 30 years of Sendmail

Posted Feb 10, 2011 8:22 UTC (Thu) by eduperez (guest, #11232) [Link]

> Yet another reason not to use C. For anything.

Interesting comment... on a site devoted to a kernel / operating system written mostly in C; other than trolling, I cannot find another reason for your presence here.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 16:58 UTC (Wed) by xav (guest, #18536) [Link]

I once had to tweak a sendmail.cf. I remember thinking about the man inventing its "syntax" and wishing him a death with great suffering.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 19:21 UTC (Wed) by copsewood (subscriber, #199) [Link]

Been there and done that. And that's why the m4 macros took over, you have not needed to hand edit .cf for more than a decade. But there are still far too many ways to masquerade an address and much if not most of the Bat book still seems to make little sense. Someday I will migrate away to something more inherently sane, but it will probably take a week or 2 out of my life when I do.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 19:22 UTC (Wed) by drag (subscriber, #31333) [Link]

> m4 macros took over

This didn't help much, unfortunately.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 20:28 UTC (Wed) by rfunk (subscriber, #4054) [Link]

In fact it made things worse; it means you need to understand the m4 syntax, plus know what standard configurations it can pull in, plus know how those standard configurations work, plus you still need to know the original config language so you can fix the standard configurations to work properly.

I actually understand the .cf language better than the m4 layer on top of it. A big part of the problem is that the standard configurations available to m4 have lagged too far behind common situations, like "send from our domain, with no local mailboxes". Here's part of the file I saved from my last sendmail excursion... note that it still needs a .cf-syntax line.

dnl We want a nullclient configuration,
dnl except that we want /etc/aliases respected.
dnl So we selectively pull from nullclient.m4 and modify.
dnl http://brandonhutchinson.com/wiki/Nullclient_with_alias_p...
define(`confFALLBACK_MX', `mail.mydomain.com')dnl
define(`SMART_HOST', `mail.mydomain.com')dnl
define(`confFORWARD_PATH', `')dnl
ifdef(`confFROM_HEADER',, `define(`confFROM_HEADER', `<$g>')')dnl
define(`_DEF_LOCAL_MAILER_FLAGS', `lsADFM5q')dnl
MASQUERADE_AS(`mydomain.com')dnl
FEATURE(`allmasquerade')dnl
FEATURE(`masquerade_envelope')dnl
MAILER(`local')dnl
MAILER(`smtp')dnl
dnl Apparently the only way to force Sendmail to send outside this machine
dnl if the recipient address has any way to match the local machine.
dnl http://www.technoids.org/sendmail/removew.html
LOCAL_RULESETS
LOCAL_RULE_0
R$* < @mydomain.com. > $* $#esmtp $@ mail.mydomain.com $: $1<@mydomain.com.>$2

I like that Allman suggested Postfix for new installs, though of course I think he's wrong about his preference for combining so many functions into one binary.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 20:34 UTC (Wed) by rfunk (subscriber, #4054) [Link]

Oops, that second link in the code should be:
http://weldon.whipple.org/sendmail/removew.html

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 22:20 UTC (Wed) by jeleinweber (subscriber, #8326) [Link]

If you watch the video, the actual question wasn't directed at "which mailer should we install", it was more along the lines of "ignoring sendmail, what would be your second choice?".

Other than that minor nit, a great summary.

Ignoring sendmail

Posted Feb 3, 2011 9:12 UTC (Thu) by ncm (subscriber, #165) [Link]

A distinction without a difference?

Sendmail configuration

Posted Feb 2, 2011 23:03 UTC (Wed) by rfunk (subscriber, #4054) [Link]

Thinking about this more...

The original problem was that sendmail.cf was considered to be too low-level and overly-flexible. It's like an assembly language for address rewriting.

The "solution" was to put a simple macro language on top of it, combined with a set of boilerplate configurations.

The real problem with that solution, other than m4's syntax rivaling sendmail.cf's syntax in ugliness, is that the person trying to avoid the assembly language of sendmail.cf is limited to the few boilerplate configurations that someone else has provided (and if we're lucky, documented). It's like we got something like a macro assembler for address rewriting, when what was needed was something much higher-level.

And that higher level is what we have with pretty much every other major MTA around today. They don't have the total flexibility of sendmail.cf, but they're both more flexible and easier to deal with than sendmail.mc (which is intended to be easier to deal with but less flexible than sendmail.cf).

Sendmail configuration

Posted Feb 3, 2011 1:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

>And that higher level is what we have with pretty much every other major MTA around today. They don't have the total flexibility of sendmail.cf

They do. They just use external modules for that, which is a GOOD thing. I have a set of nice Python scripts to process mail from automatic monitoring systems, and they work just fine with Postfix.

Sendmail configuration

Posted Feb 3, 2011 13:12 UTC (Thu) by nix (subscriber, #2304) [Link]

They don't. Virtually all currently-live MTAs other than sendmail are constrained to RFC822-format email addresses, for instance. sendmail is not.
(These days, of course, that is a completely useless 'feature', but there's no denying that supporting it is a kind of flexibility that sendmail possesses that other MTAs do not.)

Sendmail configuration

Posted Feb 3, 2011 23:35 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I think that can be done in qmail (well, everything can be done in qmail) and I think it can be done in Postfix.

Sendmail configuration

Posted Feb 3, 2011 23:11 UTC (Thu) by brianomahoney (subscriber, #6206) [Link]

There is nothing wrong with the 'sendmail.cf' syntax except that you need a programmer, not an admin, to make major changes. The syntax and semantics of sendmail rules are one of the best documented of all configs.

Sendmail configuration

Posted Feb 3, 2011 23:41 UTC (Thu) by rfunk (subscriber, #4054) [Link]

Yes, sendmail.cf is well-documented and logical, but (at least for modern times) it's too terse and low-level.

It's more procedural than declarative, which is why you need a programmer to change it.

There's really no excuse for its alphabet soup of mailer flags.

And yet I'll happily take sendmail.cf over sendmail.mc.

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 19:09 UTC (Wed) by smoogen (subscriber, #97) [Link]

Sendmail was my first real project as I needed to get Sendmail 4 working on a LynxOS system (a real time Unix running on 386 hardware). Due to the lack of needed software libraries, I ended up having to turn on all these features listed as deprecated and work through how they worked. All the while getting email and encouragement ("I think it still works but man I really don't want to go there again") from an Eric Allman. I think it was after I had finished reading the Bat Book of 1990 that I realized they were one and the same.

Thanks Eric, for writing software that would allow me to do all kinds of crazy things (and for giving my professor a project of what compilers would compile to (.cf)

LCA: Lessons from 30 years of Sendmail

Posted Feb 2, 2011 21:43 UTC (Wed) by paracyde (guest, #72492) [Link]

Jonathan: Do you think you can convince Linus to give a "Lessons from 20 years of Linux" talk, when Linux turns 20 later this year?

Plain Text

Posted Feb 2, 2011 22:29 UTC (Wed) by jeremiah (subscriber, #1221) [Link]

I find his comment on using Plain Text interesting. I've got a project that would benefit greatly from storing it's files in a db as opposed to the file system. But I continue to resist the urge because damnit! Sometimes all you have is vi. I've toyed with the idea of a simple way of importing and exporting the data to and from a db, but again, who knows how things can break. I made the same choice with decision to the protocols, tokenized human readable text. So it's nice to see someone think that that decision was still worth it 30 years later.

Now I just need to get that O'Reilly book written. :)

Plain Text

Posted Feb 2, 2011 22:49 UTC (Wed) by HelloWorld (subscriber, #56129) [Link]

But I continue to resist the urge because damnit! Sometimes all you have is vi.
Yes, sometimes. And in the remaining 95% of all cases, you have all the necessary tools at your disposal and cripple yourself by not using them. Sounds like a wonderful tradeoff to me.

Plain Text

Posted Feb 2, 2011 23:13 UTC (Wed) by jeremiah (subscriber, #1221) [Link]

In this case, were talking remote Linux boxes on non-broad-band connections. So seriously vi is all we have sometimes. Doesn't mean I don't also have a nice GUI front end to things, when it's an option. Things are also written so that the data can be stored in a DB if the admin chooses. But the default is to assume we are in a worst case scenario, until proven otherwise. Also makes it lighter weight. Of course there is Berkley DB which is pretty light. Awe who knows. There's never a single correct solution for everyone, all you can do is make a lot of things optional, and spend a lot of time thinking about what the best defaults are for the majority of people.

Plain Text

Posted Feb 3, 2011 17:34 UTC (Thu) by iabervon (subscriber, #722) [Link]

psql is probably even less latency-sensitive than vi (being based on readline and stdout), and is at least as good at making controlled modifications to structured data. It's not so good at changing one character in a long string in a single field, but it's great when the granularity of the database matches the granularity of the data.

Plain Text

Posted Feb 3, 2011 5:06 UTC (Thu) by wahern (subscriber, #37304) [Link]

What tools? Are you going to go the Mailman route, creating a hundred useless command line tools alongside a custom web-based configuration engine, sitting atop a complex database schema, possibly using a relational database backend? Almost every other option effectively boils down to that, and it's horrible.

The limitations of an ASCII configuration file is its virtue! Often times they turn into domain specific languages. In which case, use sendmail.cf as your north star and head south.

Plain Text

Posted Feb 3, 2011 13:14 UTC (Thu) by nix (subscriber, #2304) [Link]

If you have to put things in a DB, at least make it sqlite. The sqlite command-line tool can be used to manipulate that in all sorts of ways if your tool is broken.

Plain Text

Posted Feb 5, 2011 15:07 UTC (Sat) by tzafrir (subscriber, #11501) [Link]

Though the sqlite command-line interface is no way close in friedlyness to that of pgsql (or even the mysql one).

Plain Text

Posted Feb 7, 2011 16:49 UTC (Mon) by nix (subscriber, #2304) [Link]

Yeah, agreed. I was assuming 'zero dependencies', i.e. the sort of thing which might reasonably break in a situation in which 'all you have is vi'.

Plain Text

Posted Feb 3, 2011 16:02 UTC (Thu) by dgm (subscriber, #49227) [Link]

It's not only about changing data with vi. Sometimes is about being able to recover critical data from a corrupt file with vi, rather than giving it up because the binary db file is f___ up, and someone forgot the change backup tapes for the last two months, and all you have is last night's backup of an already corrupt file.

Are you really sure a db will buy you something? Do you need ACID? Do you _really_ need an index to find your data? Be sure a database will buy you something before giving up reliability. For instance, have you thought that reading a 80 MB text file takes just a couple of seconds with any modern disk? That's a good million 80-byte records.

Plain Text

Posted Feb 3, 2011 21:38 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

also keep in mind the problems that firefox has where it can stall for significant amounts of time while it is doing an ACID update to it's bookmark file.

does that file _really_ need ACID protection? I don't think so, I think that just having a fairly recent backup is good enough.

Plain Text

Posted Feb 4, 2011 1:53 UTC (Fri) by zlynx (subscriber, #2285) [Link]

The long lag is really a filesystem bug. Newer filesystems have fixed the problem. Firefox shouldn't lag on ext4, xfs or btrfs.

Plain Text

Posted Feb 4, 2011 1:59 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

the really long lag was a combination of an Ext3 bug and the fact that the browser was trying to get ACID protection of the file.

I still don't think that's appropriate.

Plain Text

Posted Feb 26, 2011 13:24 UTC (Sat) by job (guest, #670) [Link]

True. For small projects (less than a couple of hundred thousand records perhaps), I just keep data in RAM and serialize to plain text periodically. RAM is cheap, even 100 MB of data structures is practically free, and crazy fast. I like to think lesser external dependencies make deployment easier and crashes won't hurt you all that bad.

Tried to learn Sendmail in 1995

Posted Feb 3, 2011 2:16 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

I tried to learn Sendmail when I registered my own domain and set up a server in 1995. My requirements weren't that complicated, but I never good get it to work the way I wanted, even with help from the guy that maintained the Sendmail configuration at my day job. I went looking for a "Sendmail for Dummies" book, but of course there isn't one, and I later realized that expecting one is about like expecting to find "Quantum Chromodynamics for Dummies".

I switched to Qmail, and had no trouble getting it to do what I needed. More recently I switched from Qmail to Postfix.

Tried to learn Sendmail in 1995

Posted Feb 6, 2011 18:51 UTC (Sun) by njs (guest, #40338) [Link]

Quantum Chromodynamics for Dummies
That's actually more easily available.

Tried to learn Sendmail in 1995

Posted Feb 6, 2011 18:51 UTC (Sun) by njs (guest, #40338) [Link]

Doh, I mixed up chromodynamics and electodynamics. How embarrassing!

Tried to learn Sendmail in 1995

Posted Feb 6, 2011 20:11 UTC (Sun) by Trelane (guest, #56877) [Link]

Understandable. Exposure to QCD, I'd expect, would tend to color your thinking.

*cough*

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 10:09 UTC (Thu) by Thomas (subscriber, #39963) [Link]

"There is currently an increasing focus on controlling costs, mobility, and social network integration."

"mobility, and social network integration"??? He is kidding, isn't he?

Cheers,
T.

mobility and social network integration

Posted Feb 3, 2011 18:25 UTC (Thu) by rfunk (subscriber, #4054) [Link]

Mobility: Everyone has a cell phone these days, and those are increasingly smartphones that are used to deal with email. Thanks to the iPad and its imitators, the tablet market is growing quickly. Not to mention how much SMS messaging is being, especially by people who rarely use email.

Social Networks: Something like a third of the global population is on Facebook, and a huge number are on Twitter. Increasingly people are using those for communication where they would have used email before. (Due to spam filtering they're also more reliable than email.) And much of this is happening with mobile devices that work less well with email.

I can very much see why Sendmail Inc would want to be involved in these things. As a company, their survival depends on it.

LCA: Lessons from 30 years of Sendmail

Posted Feb 3, 2011 19:21 UTC (Thu) by lieb (subscriber, #42749) [Link]

Ahh, sendmail. The article was enlightening and as I can see by some of the comments, a lot of why it did things the way it did may not make sense in a post rfc 733 world. I worked on its predecessor MMDF while at SRI and the comments around separate processes had a different context in those days. MMDF was a set of separate processes "similar" to postfix but for a different reason. We only had 40+ K and something like sendmail would be too large for the address space. Sendmail had the luxury of a 32 bit vm to play in so... Production rules were nice in comparison to the dance we had to do before that. Imagine mapping uucp<-> rfc733 <-> decnet <-> all-in-one <-> (can't remember what the other ones were anymore) and sending from one end of that chain to the other and getting a reply back that actually worked.

As for the comments about m4, C, .cf files, and buffer overflows... As in Vint Cerf's comments about IPv4 etc. at LCA, it seemed like a good idea at the time and, hey, much to our amazement, the world jumped aboard. And they all actually (still) work. What may appear to be sins 30 years on, were the best ideas we could come up with at the time. When I added dbm to manage alias lists in mmdf it was a great idea given what was before (flat files), but in light of today and things like sqlite, not so hot. Then again, without dbm going beyond it's sell-by date, how would we know that sqlite would be such a good idea? Imagine a network that is a "failure" because a 32 bit address space is too small! ;) In both cases, such "failure" is because they worked more than well enough to push us to the next step.

Thank you Eric.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 16:08 UTC (Fri) by daglwn (subscriber, #65432) [Link]

> There are also a number of things he would do the same, starting with the
> use of C as the implementation language. It is, he said, a dangerous
> language, but the programmer always knows what is going on. Object-
> oriented programming, he said, is a mistake; it hides too much.

Ah, that old canard. When will these "gurus" stop making themselves look like fools? No language can save bad design and if Eric things the object-oriented paradigm hides too much, he's doing it wrong.

There are many, many, many reasons to consider a language other than C and a programming paradigm other than procedural. I continue to be amazed by people who refuse to even consider C++ when it can do everything C can do, as fast and more safely.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 18:22 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

> people who refuse to even consider C++ when it can do everything C can do, as fast and more safely.

Really... Can it compile as fast and more bug-free than C?

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 20:03 UTC (Fri) by daglwn (subscriber, #65432) [Link]

Yes, it can.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 23:27 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

"can"...? So, it holds true for a least one case? Or do you mean that its likely to be the norm?

Are you willing to say "C programmers who switch to C++ won't experience longer compilation times (and certainly not *significantly* longer compilation times), and won't discover bugs introduced by the compiler at a higher rate than they did in the C world."?

Which subset of C++ do *you* recommend to make the above statement true? Or is it true if I use (nearly) all the features of C++98?

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 23:49 UTC (Fri) by daglwn (subscriber, #65432) [Link]

I answered the question you asked which, frankly, is a poor question. "Can" is easy to answer and "will" is a ridiculous question.

The more useful question is, "Does the language provide tools to make the proper tradeoffs in various situations?" The answer to that question, for C++, is "yes," and "certainly better than C does."

I am willing to say a C-style code will compile and run just as fast with a C++ compiler as with a C compiler and the C++ compiler will catch more semantics errors. I will also contend that the code generation is of just as high a quality as for the C compiler, because the optimizer and code generator is likely exactly the same. Anything beyond that involves analyzing tradeoffs.

For example, C++ templates introduce better type safety at the cost of potential longer compile times. Just how much longer is a function of how type safe and performant you want the code to be.

Take a generic data structure. In C one might code this with void * which results in compact and unsafe code that compiles in time X. With C++ one can put a template wrapper around that void * code and get type safety with negligable compile time impact (I would say zero).

A fully-typed template implementation (getting rid of void *) will provide even better type safety and often better performance than even the C version at the cost of higher compile time. For me, programmer time and user time is much more expensive than computer time so safety+performance is always a winner over compile time.

I cannot answer which subset of C++ is best for anyone because it depends on the nature of the problem. I have used all of C++'s paradigms in my projects and the developer time savings has been terrific. I don't much care about compile time but that reflects my needs. Someone else likely has different needs.

But what I do know is that if one has "C-style" needs than one can write "C-style" in C++ and take advantage of better safety and performance. Note that by "C-style" I don't mean simply porting C code to compile with a C++ compiler. I mean making use of things like boost::array and rvalue references where appropriate to eliminate unsafe and underperforming C constructs.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 0:33 UTC (Sat) by Trelane (guest, #56877) [Link]

> The answer to that question, for C++, is "yes," and "certainly better than C does."

This isn't an unconditional "yes;" there are some very nice features in C (I'm looking at *you*, variable-length automatic arrays) that aren't in C++.

But excepting those, you're right.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 2:16 UTC (Sat) by cmccabe (guest, #60281) [Link]

> I answered the question you asked which, frankly, is a poor question.
> "Can" is easy to answer and "will" is a ridiculous question.

Every large C++ project I've ever worked on has had long compilation times. It's a consequence of the design of the language. Every file in a C++ project must do the same work over and over for #include directives. A single #define could change the meaning of everything. This means that compilation times for C++ projects tend to be O(n^n), where n = number of files.

It's not a ridiculous question to ask "will my compile times be longer if I port project X to C++." The answer is almost certainly yes. Does it matter? Depends on what project X is.

C's limitations are its virtue. It enforces a consistent low-level, performance-sensitive style. Above all, C tends to encourage simplicity, the programmer's best friend.

My biggest problem with C++ is a philosophical one. It blurs the lines between high-level components and low-level ones. You often see this confusion in the minds of C++ advocates. They ask why C isn't good for implementing user interfaces, or parsing text files. The answer is that it isn't supposed to be good for those things. Those are the things that you write another component for-- a cleanly separated component that uses a nice API to talk to the rest of the system.

This is the future. You can see it in all the newest software-- the web sites that have a Javascript front end, talking to a Java or Ruby backend, talking to a webserver and kernel that are written in C. Ask anyone writing an Android app or a website backend what a vtable or a buffer overflow is. Blank stare. It's like asking them what a PNP transistor is.

There is no room for C++ in this world because C++ itself is a layering violation, a hack. When performance was super-duper critical it was ok to have this layering violation. But in the coming years, performance is going to be less and less of an issue and people will move to more cleanly structured systems.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 5:43 UTC (Sat) by daglwn (subscriber, #65432) [Link]

> It's not a ridiculous question to ask "will my compile times be longer if
> I port project X to C++." The answer is almost certainly yes.

That's simply not credible. If it was a port from C, there is nothing a C++ compiler would do differently than a C compiler that would greatly increase compile time.

> Every large C++ project I've ever worked on has had long compilation
> times.

That's a valid observation but it doesn't indicate any general statements can be made.

> It's a consequence of the design of the language.

That does not follow.

> Every file in a C++ project must do the same work over and over for
> #include directives. A single #define could change the meaning of
> everything.

No, that would be a violation of the ODR. Structures and definitions cannot change after they've been used.

> There is no room for C++ in this world because C++ itself is a layering
> violation.

Your entire argument makes no sense. There is nothing in C or C++ that prevents or encourages any particular design. One can write well defined modules and interfaces in both languages. One can write poorly structured code in both languages.

But C++ provides safety mechanisms that are simply not available in C. RAII is an example.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 21:27 UTC (Sat) by cmccabe (guest, #60281) [Link]

> No, that would be a violation of the ODR. Structures and definitions
> cannot change after they've been used.

It's only a violation of the ODR if the objects that are defined differently have global linkage and all of them are not weak symbols.

There are actually a lot of macros that change the behavior of standard headers. _GNU_SOURCE, _BSD_SOURCE, and _SVID_SOURCE are three popular ones.

You seem to be confused about how include files work in C and C++. The way they work is that each translation unit (that's .cpp file to you) has to scan through all the files included by that unit, recursively. There are no shortcuts and the compiler cannot cache this work.

The reason why I said it was O(n^n) is because n^n is the upper bound on the time complexity. Remember that you can include .c or .cpp files. In reality, most projects compilation times will grow slower than this. However, it's still exponential in the number of files and the compile times seen by real-world projects like WebKit reflect this.

> One can write well defined
> modules and interfaces in both languages. One can write poorly structured
> code in both languages.

I agree. A good programmer can write good code in any language. A bad one can write Vogon poetry in any language.

There's a lot of projects I like and respect that use C++. LLVM, OpenCV, Ceph, WebKit, and a lot of others. C++ will be around for a long time. For new projects, however, I would encourage people to look at newer languages like Google Go. Progress hasn't stood still and we have learned some things since the early nineties. I swear!

LCA: Lessons from 30 years of Sendmail

Posted Feb 7, 2011 19:13 UTC (Mon) by nix (subscriber, #2304) [Link]

You seem to be confused about how include files work in C and C++. The way they work is that each translation unit (that's .cpp file to you) has to scan through all the files included by that unit, recursively. There are no shortcuts and the compiler cannot cache this work.
Except that there are shortcuts and GCC does cache this work, and has for more than fifteen years. (e.g. you can skip even opening files more than once if they are entirely contained in include guards and the guards are not #undefed.)

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 0:26 UTC (Tue) by cmccabe (guest, #60281) [Link]

Sigh. I knew I was going to get some grief when I said "there are no shortcuts." :)

It depends on what you call a shortcut I guess. The header guard optimization is good, but the process as a whole is still O(n^n). Doing slightly more efficient things with file descriptors can't change that.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 18:27 UTC (Tue) by nix (subscriber, #2304) [Link]

Um, for 'slightly more efficient things with file descriptors' substitute 'almost always avoid parsing the vast majority of headers more than once'.

The exponential explosion you refer to simply does not happen with real code. And if header parsing is slow, GCC supports precompiled headers on common platforms to speed things up. (Yes, you may have to restructure your headers a bit to use them, but if you're compiling slow enough that you need this feature, that's a small cost.)

LCA: Lessons from 30 years of Sendmail

Posted Feb 11, 2011 9:33 UTC (Fri) by cmccabe (guest, #60281) [Link]

See my comment below. Basically, private data members of a class also need to be #included in that class' header file. So you *cannot* "avoid parsing the vast majority of headers more than once."

Under ideal conditions, C++ compilation is slow. If you add even a few non-ideal conditions, like programmers who love to define functions in header files "for performance", extensive use of templates, auto-generated anything, or unecessary cross-module dependencies, it becomes positively glacial.

Unfortunately real-world projects tend to have some or all of these conditions. I'm too lazy to find the reference now, but Google's C++ compile times are said to be measured in hours. And those guys read Effective C++ and know their stuff.

Precompiled headers sound helpful, but only for headers you are including from external libraries. Maybe they would be useful for something like QT? I haven't used precompiled headers.

LCA: Lessons from 30 years of Sendmail

Posted Feb 19, 2011 0:24 UTC (Sat) by nix (subscriber, #2304) [Link]

Er, when I said 'more than once' I said 'more than once per translation unit', and this is almost universal. This reduces your claimed O(n^2) to, uh, O(nm) where n is the number of translation units and m is the number of headers.

Precompiled headers are useful in any project where you have one great big header that #includes a lot of stuff. This is extremely common.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 13:55 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

This means that compilation times for C++ projects tend to be O(n^n), where n = number of files.

Nice piece of hilariously ridiculous hyperbole there. A 1000-file C++ project - even one composed by the Stupidest Imaginable Programmer - does not take 10001000 times as long to compile as a one-file project.

O(m * n), where m = number of source files and n = number of ubiquitously included header files, is rather closer to the mark.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 0:37 UTC (Tue) by cmccabe (guest, #60281) [Link]

First of all, big-O is the worst-case upper bound, not the average case.

> O(m * n), where m = number of source files and n = number of ubiquitously
> included header files, is rather closer to the mark.

Header files tend to include other header files. This is a consequence of one of the other design features of C++, the fact that the definition of a class cannot be spread across multiple files.

So you often see stuff like this:
> #include "private_helper.h"
>
> class MyClass {
> ...
> private:
> PrivateHelper myPrivateHelper;
> };

PrivateHelper is not be part of the public API of the class (hopefully), but even so, it will require you to include private_helper.h. That class itself might have its own private helpers... and so it goes, on and on.

What's that, you say? I can use the pImpl idiom? Sure, if I can live with reduced performance and more boilerplate code to initialize and destroy the pImpl. Sigh.

It would be interesting to graph, say, WebKit compilation times as a function of the number of files. I really doubt it's anywhere close to linear.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 9:13 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

A worst-case upper bound that will never be hit even by insane usage is of purely academic interest - particularly in this discussion, since the feature that permits the O(nn) worst case in C++ is also present in C.

Now, I'll happily admit that real-world incremental recompilation times are worse for C++ than C - but the worst-case upper bound is a red herring there.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 3:17 UTC (Sat) by chad.netzer (subscriber, #4257) [Link]

> I answered the question you asked which, frankly, is a poor question.

Because it was a tongue-in-cheek response to a statement that was, frankly, pretentious.

C++ can be amazingly elegant. I actually like it. And yet, we still live in an age where major C++ projects refuse to use the C++ standard library. That's an embarrassment. And it's just one of several very telling reason as to why C++ has not replaced C after nearly 30 years. For the C++ professionals out there who like the language (like me!), how many C++ books are on your shelf? I only ever needed two C books (K&R, Harbison/Steele).

Btw, it's amusing you call out void * pointers as an unsafe C construct, when it's quite straightforward to employ void * safely. And yet, can tell me why new[] and delete[] haven't been deprecated from the C++ language as the ghastly abominations that they are? You *need* to use boost::array precisely because C++ got this so monumentally wrong, and once you start using boost, you no longer are writing "C-style" C++. The complications of function overloading alone are enough to accidentally trip up the minimalist C-style programmer, if they are not careful.

Anyway, I'll let you have the last say, if you wish. Language war threads are probably better waged on other sites.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 5:53 UTC (Sat) by daglwn (subscriber, #65432) [Link]

> Because it was a tongue-in-cheek response to a statement that was,
> frankly, pretentious.

No, not pretentious. Factual.

> And yet, we still live in an age where major C++ projects refuse to use
> the C++ standard library. That's an embarrassment.

An embarrassment for those projects, yes. I'm not claiming C++ is perfect. It's far from it. I'm claiming it's more useful than C, which is true. There are parts of the C standard library people don't touch.

> For the C++ professionals out there who like the language (like me!),
> how many C++ books are on your shelf?

This isn't a valid way to measure the productivity aspects of a language, but to answer your question, I have one that I have referenced over the last two years: Josuttis' Standard Library book.

> when it's quite straightforward to employ void * safely.

Sure, one can employ void * safely. Just not as easily as one can in C++. One can't get the performance out of void * that one can get out of templates.

> And yet, can tell me why new[] and delete[] haven't been deprecated from
> the C++ language as the ghastly abominations that they are?

That's a bit strong, wouldn't you say? What's wrong with them, exactly?

> You *need* to use boost::array

No, actually I've never used it.

> once you start using boost, you no longer are writing "C-style" C++

You're mixing design styles. No one who uses Boost extensively *wants* to write "C-style." But I certainly would consider boost::array to be "C-style."

> The complications of function overloading alone are enough to
> accidentally trip up the minimalist C-style programmer, if they are not
> careful.

What's difficult about function overloading? The member hiding rules are surprising for the newcomer but so are pointers and recursion.

> Language war threads are probably better waged on other sites.

I don't consider this a war. I consider it education.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 6:44 UTC (Sat) by chad.netzer (subscriber, #4257) [Link]

I don't want to get into it with you. I'm reading your other responses, and I just can't see how a discussion with you could provide any value to this site, or me. I don't disagree with the gist of your opinion, just the strident tenacity with which you state it. Furthermore, I'm annoyed that you* mentioned and promoted boost::array in the thread, and then when I responded to about it, you claim not to have used it and discount it's relevance. I consider that a bait-and-switch tactic, and an annoyance.

In any case, everything that has ever needed to be said or not said about "C vs. C++" has been said elsewhere, and since LWN doesn't allow thread promotion/demotion by voting, I think I'd like to stop. I certainly think good C++ code is often a positive improvement over good C code, but I disagree with you only in the (implied?) claim that C++ should completely supplant C. It hasn't, because C++ still has problematic issues. The fact that there has been an explosion of languages since 1991 (when I learned C++), and that *many* of them interface easily with C, but almost *none* of them interface easily with C++, perfectly illustrates the issue. History will show that C++ tried hard, but failed, to be a "better C" in *all* cases (as you seem to claim).

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 21:49 UTC (Fri) by Tet (subscriber, #5433) [Link]

When will these "gurus" stop making themselves look like fools?

Hint: they're called gurus because they know what they're talking about. In this case, he's right, and you're the one looking like a fool. Object orientation is useful in some cases. But it's not a panacea, and isn't right for everything.

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 23:53 UTC (Fri) by daglwn (subscriber, #65432) [Link]

Where did I say it is always the right tool? On the contrary, Eric clearly says it is never the right tool. "Always" and "never" are the sorts of words that make one look foolish.

"Object-oriented" is also a particularly overloaded term. I tend to think of it as implying class hierarchies while others tend to think of it as data encapsulation. Both are valid views so we need to be precise about what we're describing.

I *think* Eric is talking about class hierarchies because no sane person could possibly object to data encapsulation. :)

LCA: Lessons from 30 years of Sendmail

Posted Feb 4, 2011 18:12 UTC (Fri) by chmouel (guest, #6335) [Link]

I like about the honesty of this guy about which MTA he would advise to people.

Sendmail versus Postfix

Posted Feb 12, 2011 6:18 UTC (Sat) by eric_allman (guest, #72865) [Link]

I've had a number of people react with surprise and even shock that I seem to have recommended Postfix over sendmail in my LCA keynote.

The actual question, at least as I heard it, wasn't "which of the existing MTAs would I would recommend", but rather "if sendmail did not exist, what would you run?" Postfix would be my recommended second choice for the some of the reasons I discussed in my talk. I still run sendmail, and when I set up new machines I still install and use sendmail.

The real fact is that the world has moved beyond simple MTAs, which in today's world only address a small portion of the problem. If I were a large enterprise responsible for messaging architecture today, I would not choose any MTA, sendmail or Postfix; I would be looking at fully integrated Message Processing Platforms which are designed to handle the complex messaging needs of today's large companies.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds