Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
LCA: Lessons from 30 years of Sendmail
Posted Feb 2, 2011 21:34 UTC (Wed) by eisenbud (subscriber, #13153)
Any look at "lessons from 30 years of sendmail" that doesn't include the terrible security story isn't serious at all.
Posted Feb 2, 2011 21:46 UTC (Wed) by dlang (✭ supporter ✭, #313)
Posted Feb 2, 2011 22:13 UTC (Wed) by nix (subscriber, #2304)
Posted Feb 2, 2011 22:54 UTC (Wed) by HelloWorld (guest, #56129)
Posted Feb 3, 2011 0:32 UTC (Thu) by dskoll (subscriber, #1630)
Yet another reason not to use C. For anything.
OK, sure. Avoiding C magically fixes security problems. Not.
Avoiding C greatly reduces the risk of certain security problems (buffer-overflow, stack smashing) assuming the non-C language is implemented securely. It does nothing about other security problems like race conditions, unsafe /tmp files, incorrect input sanitization (eg, SQL injection problems), etc, etc....
Posted Feb 3, 2011 1:09 UTC (Thu) by HelloWorld (guest, #56129)
> Avoiding C greatly reduces the risk of certain security problems (buffer-overflow, stack smashing) assuming the non-C language is implemented securely. It does nothing about other security problems like race conditions, unsafe /tmp files, incorrect input sanitization (eg, SQL injection problems), etc, etc....
Yeah, except that it does. Modern languages actually do help with these problems. A sanitized string is basically a subtype of a unsanitized string: you can use it everywhere where an unsanitized string can be used, but you can't use an unsanitized string where a sanitized string is required. Too bad C's type system doesn't support subtyping. Similarly, race conditions are much less likely in a language that supports threading in a sensible way. The Rust language for example forces all inter-thread communication to be explicit, they use a concept named "channel" for this.
Posted Feb 3, 2011 4:44 UTC (Thu) by wahern (subscriber, #37304)
C doesn't have a sophisticated typing system, but that has little to do with the issue of not being able to write a "sanitized string" type. Just the idea of sanitized string sounds wrong to me. If it's not an ASCII, NUL-terminated array of characters, then it's not properly called a string in C terminology.
Type abstraction is often the root cause of security bugs. For example, you could treat a password as a sub-type of string. But strings as commonly understood almost universally support the concept of truncation. But if you truncate a password horrible security repercussions result. So why would you want to treat it like a string at all?
Strings also usually support the notion of concatenation. So take HTML. You could keep an HTML document as a string, but HTML has a hierarchical structure, and prepending or appending data to a complete document results in garbage.
Just exclude those methods, one says. Well (1) the fact that you must exclude already exposes you to mishaps and bugs the same as forgetting to bounds check an operation in C, and (2) what are you left with when you exclude all the unnecessary and possibly dangerous operations? Not much.
Buffer overflows in C usually are the result of people trying to treat everything like a string; they want to slurp data into a string or array, and then manipulate it in that form. That's a horrible way to write software, whether in C or any other language. It just so happens that if you do it in C you're susceptible to more attacks than if you do it in, say, Java; but the solution isn't to write the bad code in Java; it's to stop writing that kind of code at all.
Using the best language for the particular job also helps. Writing parsers has always been easier for me to do in C because of pointers, and the ability to write very concise state machines. A parser is really a way to consume a string, character by character, and transform it into some other structure. So when people tell me that handling strings in C is more difficult, I don't know how to respond. It's certainly more difficult to juggle and manipulate strings in C. But if that's how you're processing string input in any language, you're probably doing it wrong. If I'm parsing an e-mail message, I'll construct a tree of objects by consuming a stream of characters. I may store the message, or parts of it, as a character "string", but only as an immutable object that I never need to manipulate; outputting it later, if necessary. So I rarely care about the difficulty of manipulating strings in C, because I rarely need to do that.
I tend to use general purpose scripting languages for things _other_ than string processing, like executing complex rules or transformations of structures of objects. For still other things domain specific language are preferable.
Of course, if all you want to do is hack out a script to process some data (as Perl is popular for), then have at it. But don't fool yourself that your script is any more secure than if written in C. There are probably several times more remote execution bugs in scripting language built applications than C applications, just because of improper use of strings.
Posted Feb 3, 2011 10:00 UTC (Thu) by Thomas (subscriber, #39963)
You got it the wrong way round. Strings [a concatenation of bytes] can support truncation but don't have to.
String manipulation bugs
Posted Feb 6, 2011 19:08 UTC (Sun) by man_ls (subscriber, #15091)
(2) what are you left with when you exclude all the unnecessary and possibly dangerous operations? Not much.
There are probably several times more remote execution bugs in scripting language built applications than C applications, just because of improper use of strings.
Posted Feb 12, 2011 23:07 UTC (Sat) by ofranja (subscriber, #11084)
I think you wanted to say LACK of abstraction.
If password is not exactly a string, you should have created a "password" type with proper operations and associated semantics.
Do not ever consider "C" as an example of "complete type system", unless you also consider a Ford T an modern vehicle.
Posted Feb 3, 2011 21:05 UTC (Thu) by dskoll (subscriber, #1630)
Did I claim this was the case? No, I didn't. But it does help, and not just because more modern languages are memory-safe.
They also tend to be slower (often a lot slower) and more memory-hungry. Unfortunately, if you want the ultimate in performance on a UNIX-like system, you're stuck with assembly, C or C++. Assembly is clearly the wrong tool for most applications, and IMO C++ is a monstrosity that adds a ton of garbage to C without really making it any safer.
I can see writing most things in safer languages. But your MTA has to be fast and efficient, which is why we don't see many (any?) widely-used MTAs written in anything but C or C++.
Posted Feb 3, 2011 21:34 UTC (Thu) by foom (subscriber, #14868)
I'd dispute that: I bet you could write an MTA in *Ruby* and it would be perfectly good on today's computers for all but the largest sites. There's not that much mail volume, and computers have gotten a lot faster...
Posted Feb 3, 2011 21:44 UTC (Thu) by dskoll (subscriber, #1630)
I bet you could write an MTA in *Ruby* and it would be perfectly good on today's computers for all but the largest sites.
Go for it. Report back when done.
Posted Feb 3, 2011 22:06 UTC (Thu) by foom (subscriber, #14868)
But I will point out that many large mailing list installations run on mailman, which is written completely in Python. That seems to be performant enough.
And spam filtering of email content is frequently done with spamassassin -- written in Perl.
Posted Feb 3, 2011 22:40 UTC (Thu) by dlang (✭ supporter ✭, #313)
it manages the list of users that the MTA will send mail to (a low overhead activity)
it validates messages sent to the list (it handles each message once, before it gets multiplied by potentially several orders of magnitude as it's sent out)
Posted Feb 4, 2011 11:08 UTC (Fri) by paravoid (subscriber, #32869)
Posted Feb 4, 2011 19:42 UTC (Fri) by dskoll (subscriber, #1630)
From the "About Lamson" page:
However, as great as Lamson is for processing email intelligently, it isnt the best solution for delivering mail. There is 30+ years of SMTP lore and myth stored in the code of mail servers such as Postfix and Exim that would take years to replicate and make efficient. Being a practical project, Lamson defers to much more capable SMTP servers for the grunt work of getting the mail to the final recipient.
Posted Feb 5, 2011 0:17 UTC (Sat) by dskoll (subscriber, #1630)
And spam filtering of email content is frequently done with spamassassin -- written in Perl.
Indeed so. We use Perl (including SpamAssassin) to filter our mail.
In terms of memory size, SpamAssassin on our system is about 70MB per instance vs. about 9MB for Sendmail. (To be fair, we use a lot of other Perl modules apart from SpamAssasssin in our filter.) And when it comes to performance, SpamAssassin is so slow relative to Sendmail that Sendmail becomes completely negligible. Bolting anything in Perl onto Sendmail is like towing a five-ton truck with a motorcycle. :)
Posted Feb 5, 2011 0:43 UTC (Sat) by foom (subscriber, #14868)
But writing a new MTA from scratch now is pretty pointless, no matter what language it's in.
Posted Feb 5, 2011 17:13 UTC (Sat) by dskoll (subscriber, #1630)
Thus my claim: MTAs don't need to be written in C.
That's probably true, 99% of the time. But again, MTA authors tend to worry a lot about performance and tend to write their software to cope with huge amounts of mail. I believe that's the correct approach because even a small site can suddenly get a huge spike in traffic for various reasons (eg, a spammer does a massive joe-job spam run.) You don't really want your email to fall over.
Posted Feb 26, 2011 13:07 UTC (Sat) by job (guest, #670)
It is (almost) pure Perl and in use at quite a few large installations including the Apache project.
Posted Feb 3, 2011 21:58 UTC (Thu) by HelloWorld (guest, #56129)
Assembly is clearly the wrong tool for most applications [...]But your MTA has to be fast and efficient, which is why we don't see many (any?) widely-used MTAs written in anything but C or C++.
Posted Feb 3, 2011 23:38 UTC (Thu) by dskoll (subscriber, #1630)
Yet, people nowadays refuse to accept a negligible overhead over C (say, 20%, which is the goal set by the Go language), and this seems just bizarre to me. I believe that the main reason for sticking with C is simply inertia. "We've always done it that way!"
I don't think that's the reason for sticking with C (it's not my reason, anyway.) Here are my reasons:
Posted Feb 4, 2011 2:51 UTC (Fri) by HelloWorld (guest, #56129)
I have a lot of experience with C and I like it. There's certainly something to be said for familiarity.
C is old and well-tested.
There are many C libraries and tools available, far more than for newer languages.
In some cases, a 20% performance hit isn't worth it. An MTA on a busy mail system is one of those cases. While it's true that most mail systems are not that busy, MTA authors rightly attempt to make their MTA as fast and reliable as possible.
Posted Feb 4, 2011 3:19 UTC (Fri) by dskoll (subscriber, #1630)
I don't mean to insult you, but this is exactly the mindset I mean.
Meh... everyone tends to like tools he/she is familiar with. That's just human nature. Maybe some Perl or Ruby fanatics will eventually write an MTA in their language...
Any decent language has a foreign function interface for calling C functions.
But... but... then you're back in dangerous territory, no?
No, they don't. C won't give you the fastest possible result, assembly will (portability is not an excuse, use a portable assembly language like LLVM assmbly)
OK, now I know you're just arguing for the sake of argument. :) That's totally ridiculous and you know it.
Anyway... go ahead. Write a UNIX MTA in something other than C or C++ and see how it fares. The proof is in the pudding.
Posted Feb 4, 2011 12:59 UTC (Fri) by james (subscriber, #1325)
Unsurprisingly, they were getting a lot of spam on that server, and postfix was having trouble keeping up. So they switched to qpsmtpd, "a flexible smtpd daemon written in Perl" which uses "Danga Interactive / Six Aparts insanely scalable event-driven asynchronous socket class".
Justin reports: "results have been great
we now have a pure-perl system handling heavy volumes without breaking a sweat, certainly compared to the previous system."
Posted Feb 4, 2011 16:07 UTC (Fri) by dskoll (subscriber, #1630)
I read that article. The problem with the Postfix solution was not Postfix, but the fact that they were filtering from a procmail-invoked Perl script. That's a huge fail.
We run our Perl spam-scanner in conjunction with Sendmail, but we use the Milter interface to keep persistent Perl scanners running. The bottleneck in this system is by far the spam-scanning; Sendmail doesn't even show up on the radar.
Posted Feb 5, 2011 1:23 UTC (Sat) by HelloWorld (guest, #56129)
Meh... everyone tends to like tools he/she is familiar with. That's just human nature.
But... but... then you're back in dangerous territory, no?
OK, now I know you're just arguing for the sake of argument. :) That's totally ridiculous and you know it
Posted Feb 5, 2011 1:51 UTC (Sat) by mjg59 (subscriber, #23239)
(The first significant codebase I worked on was C++, and I've probably still written more Perl than I have C)
Posted Feb 5, 2011 17:10 UTC (Sat) by dskoll (subscriber, #1630)
But this simply isn't the case: C is just one spot among many, many, many others on the efficiency-vs.-comfort curve. More specifically, it is not on the performance maximum of that curve, since that's where assembly language is.
It's not a smooth curve. The transition from C to assembly involves a *huge* increase in the difficulty curve with a relatively modest increase in the efficiency curve. C is a just-high-enough/just-low-enough level language to hit a sweet spot in efficiency-vs-difficulty.
But you already know this and are just arguing for the sake of it.
Posted Feb 6, 2011 1:37 UTC (Sun) by HelloWorld (guest, #56129)
Posted Feb 4, 2011 7:34 UTC (Fri) by yoe (subscriber, #25743)
Posted Feb 4, 2011 23:45 UTC (Fri) by giraffedata (subscriber, #1954)
I find it interesting that you detract from C as an unsafe language by advocating for assembler.
He didn't advocate for assembler. It was reductio ad absurdum -- if fast is the only consideration, you would use assembler. Since you don't use assembler, fast is not the only consideration.
Since fast is not the only consideration, maybe you should consider safety. Or lower development cost (with which you can probably pay for the extra hardware you need compared to the C or assembler implementation at the same speed).
Posted Feb 5, 2011 2:07 UTC (Sat) by nybble41 (subscriber, #55106)
That is not to say that there are no other relevant considerations, but other languages are not necessarily any safer or easier to develop in than C; it varies depending on the project. Sometimes there really is no other viable alternative. For example, the only other realistic systems programming language I know of is D, which (while otherwise a great language) is not as portable as C (yet), nor nearly as well established so far as libraries and tool support are concerned.
Posted Feb 26, 2011 13:16 UTC (Sat) by job (guest, #670)
C is good because it is trivial and easy to read when done properly. Token expansion or resource exhaustion won't creep up on you from the underlying libraries.
There are of course other aspects to it. Given the choice between a C daemon and a Python one, without knowing the program I too would probably pick the latter one, but it all boils down to prejudice.
Posted Feb 3, 2011 13:05 UTC (Thu) by nix (subscriber, #2304)
Posted Feb 3, 2011 2:15 UTC (Thu) by bronson (subscriber, #4806)
Posted Feb 3, 2011 3:00 UTC (Thu) by djm (subscriber, #11651)
Enjoy your world without C, due to arrive right after hydrogen-powered flying cars and free robot concubines.
C for anything
Posted Feb 3, 2011 4:10 UTC (Thu) by ncm (subscriber, #165)
C++ is clearly the better choice. You have keep the Java monkeys away, though, or you'll spend your life writing Get and Set functions, and calls to them. Fortunately Java monkeys are more attracted to Ruby and other four-letter languages.
(This is not meant to suggest that all Java programmers are monkeys. You, in particular, are the exception.)
Posted Feb 6, 2011 15:22 UTC (Sun) by malor (subscriber, #2973)
Getters and setters
Posted Feb 6, 2011 23:30 UTC (Sun) by man_ls (subscriber, #15091)
You have keep the Java monkeys away, though, or you'll spend your life writing Get and Set functions, and calls to them.
I turned to public attributes a few years ago and never regretted it. You know, I'm not against the occasional access method, and separating methods with side effects from methods that return values is an excellent practice. But mandating a whole level of indirection just in case, just for the sake of it? Why?
Then I found out that not everything is an object, and I stopped wondering. My Python skills have improved since I understood that simple truth.
Posted Feb 11, 2011 12:26 UTC (Fri) by mattthecat (guest, #72858)
"Don't got to use no stinkin' getters or setters"
Posted Feb 3, 2011 6:30 UTC (Thu) by cmccabe (guest, #60281)
P.S. Higher level languages are vulnerable to a variety of attacks that C isn't. For example, eval-based attacks or SQL injection attacks. The solution to these problems is the same: validate user inputs carefully, and structure your application into different components that communicate by message passing, rather than a single giant blob.
Posted Feb 5, 2011 20:59 UTC (Sat) by leoc (subscriber, #39773)
Posted Feb 10, 2011 8:22 UTC (Thu) by eduperez (guest, #11232)
Interesting comment... on a site devoted to a kernel / operating system written mostly in C; other than trolling, I cannot find another reason for your presence here.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds