[LWN Logo]

Date:	Tue, 30 Jun 1998 10:27:51 -0700 (PDT)
From:	Linus Torvalds <torvalds@transmeta.com>
To:	"Jeffrey B. Siegal" <jbs@quiotix.com>
Subject: Re: Thread implementations...



On Tue, 30 Jun 1998, Jeffrey B. Siegal wrote:
> 
> One thing that bothers me about this is that it sending out full-sized packets
> seems like it would involve copying the data in order to build these full-sized
> packets.  If memory is slow and the network card is smart and efficient, it might
> be cheaper to just send smaller packets.  Thoughts?

Note that regardless of how fast and smart the network card is, if your
actual _network_ is faster than your memory copy speed it is time to
either throw away your computer and start over, or just admit to yourself
that whatever you do your computer is never going to be very good at doing
web-serving.

What I'm trying to say is that "cheaper" is not immediately obvious. Yes,
you may spend less CPU cycles on it. But if the network performance
suffers, then you just lost something, and the "cheaper in CPU" approach
actually became "more expensive in real life". 

I personally don't actually think that web-serving should ever be
CPU-bound when it comes to the actual networking part. CGI, yes. But if
your web-server is so CPU-bound by just trying to keep up with the network
and disk that one system call per transfer makes a difference, then
something is seriously wrong with your setup. 

Note: this is not denouncing using scatter-gather on the network card etc,
and trying to use less copies. sendfile() is actually the much nicer
interface for that, simply because suddenly all the problems that delayed
writes with zero-copy had with the UNIX semantics of "write()" no longer
exist - because sendfile() doesn't have to have the unix semantics of
writes. 

[ For those of you who haven't been in on that particular discussion: in
  many cases you'd like to just give the network card a series of physical
  addresses, and tell it "take these, send them out as a TCP packet, and
  tell me when you're done". And then go on to serving the next packet,
  knowing that the network card will do the actual work in the background.

  This is really hard to do with "write()", because if you return from the
  write() system call before the network card has finished everything
  (which is what you'd like to do in order to overlap calculations with
  communication) then you suddenly have lots of problems with coherency
  and making sure the user doesn't modify the buffer until everything has
  been sent from the old buffer. That's whay the UNIX semantics require
  for "write()".

  However, for "sendfile()" is just makes perfect sense to allow this. The
  "sendfile()" thing in effect asks the system to send out the file - it
  doesn't ask the system to send out some specific buffer that people can
  scribble on at will. Suddenly the OS has much more control, and at the
  same time we have fewer rules too (we might for example say that "hey,
  if somebody is in the middle of modifying this file, then who knows
  whether we'll send out old or new data?" - we don't have to keep the
  thing coherent to the same degree we have to keep a user buffer
  coherent.

  As such, it suddenly becomes much easier to do clever tricks like
  background sending and letting a network card really shine. I'm not
  claiming it is easy, but I'm claiming it is easiER. ]

I understand that people are nervous about adding new system calls, and
especially something that is most well-known in the NT community. But
we've shamelessly stolen from others - clone() was very much influenced by
plan-9, as was the /proc filesystem. Let's not be picky about where the
stolen ideas come from.. 

		Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu