User: Password:
|
|
Subscribe / Log in / New account

Linux kernel design patterns - part 3

Linux kernel design patterns - part 3

Posted Jun 23, 2009 8:03 UTC (Tue) by butlerm (guest, #13312)
In reply to: Linux kernel design patterns - part 3 by marcH
Parent article: Linux kernel design patterns - part 3

IPv4 is the perfect example of a unnecessarily constraining mid-layer. The
designers arbitrarily decided that 32 bit fixed length addresses would be
good enough indefinitely, when in practice both the fixed length constraint
and the 32 bit constraint were causing serious problems within less than
fifteen years.

So to fix this enormous mess, the IETF goes out and designs another
protocol, IPv6, a protocol which is incompatible in almost every way
imaginable with the protocol it is trying to replace, to the point that
many think widespread migration is never going to happen. And IPv6 is
showing signs of premature obsolescence already, in considerable part due
to the effective waste of 64 bits of its 128 bit addresses.

To say nothing of the standard BSD socket interface, which makes it more or
less impossible to write a layer 3 transparent application program, i.e.
one that would work with a layer 3 protocol that hasn't been invented yet.
And so on...


(Log in to post comments)

Linux kernel design patterns - part 3

Posted Jun 23, 2009 8:54 UTC (Tue) by marcH (subscriber, #57642) [Link]

> The designers arbitrarily decided that 32 bit fixed length addresses would be good enough indefinitely, when in practice both the fixed length constraint and the 32 bit constraint were causing serious problems within less than fifteen years.

Please do not throw out the baby with the bath water: the fact that addresses are 32bits wide is not a core design principle of IP! It is just a minor implementation detail gone really wrong.

It is much more difficult to have no mid-layer at all when designing communication protocols than when designing filesystems or other kernel subsystems. Simply because you are not alone. Upgrading your kernel is easy. Having everyone upgrading its kernel is obviously much harder. Already today you CAN avoid TCP/IP entirely and send raw packets on the wire! But they will obviously not go any further than your network neighbours. To go any further you MUST agree on a minimum set of conventions (that is: a protocol), including a fixed format for addresses. Else please explain how to route hundreds of gigabit per seconds with free form addresses.

I am definitely not pretending that IP is the perfect layer 3, far from it. But it is a *minimal* one, really. This is actually both its strength and weakness.

Please name a layer 3 lighter and with less constraints than IP (I did not say ¨better¨).

> So to fix this enormous mess, the IETF goes out and designs another
protocol, IPv6, a protocol which is incompatible in almost every way
imaginable with the protocol it is trying to replace,

This is only because of IPv4´s original sin which was never designed to be ¨upgradable¨. Easy to blame 40 years later. Since you have to give up on compatibility anyway, then better start from scratch and not copy/paste IPv4 past mistakes. And by the way, IPv6 is still a ¨minimal¨ layer 3.

> to the point that many think widespread migration is never going to happen.

Every operating system already supports IPv6, and a number of consumer ISPs are already offering IPv6. Many people are already using it (I do), it already works. Except in the US maybe. The reason withholding IPv6 deployment is not incompatibility but laziness and wealth of IPv4 addresses (and only for some countries).

> To say nothing of the standard BSD socket interface, which makes it more or less impossible to write a layer 3 transparent application program,

It is not pretty but possible: please look at getaddrinfo() examples. Anyway I fully agree that the BSD socket API sucks, but this is a different topic.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 9:08 UTC (Tue) by johill (subscriber, #25196) [Link]

> It is much more difficult to have no mid-layer at all when designing communication protocols than when designing filesystems or other kernel subsystems. Simply because you are not alone. Upgrading your kernel is easy. Having everyone upgrading its kernel is obviously much harder.

I think you're confusing implementation and specification. Nobody, not even the original article, argued that there shouldn't be a "specification midlayer". However, the original article argued that the implementation should not make the "midlayer mistake", which I contested. So far your rebuttal has been on a "specification midlayer" basis. TBH, I haven't even figured out whether you were trying a rebuttal or not, nor what the TCP/IP specification has to do with the original thesis.

Also, you can ignore the fact that I mentioned TCP, and my point still stands with just plain IPv4, it's implemented as a midlayer.

In any case, it would be near impossible to implement networking as a library approach since afaict that would mean you'd have sockets tied to NICs and would have to provide migration for that, or something like that.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 10:17 UTC (Tue) by hppnq (guest, #14462) [Link]

Also, you can ignore the fact that I mentioned TCP, and my point still stands with just plain IPv4, it's implemented as a midlayer.

It's not. There is no abstraction that typifies the layer. You are confusing the typical network protocol's layered design with an OS kernel design pattern.

In any case, it would be near impossible to implement networking as a library approach since afaict that would mean you'd have sockets tied to NICs and would have to provide migration for that, or something like that.

Sockets are of course bound to an interface, where appropriate. "Sockets" are a library. I must admit I am not sure what you are trying to to say here.

What could be considered a networking midlayer is the integration of network and Unix domain sockets. And that, indeed, is perhaps not such a great idea (see X11), but YMMV (see X11).

Linux kernel design patterns - part 3

Posted Jun 23, 2009 10:26 UTC (Tue) by johill (subscriber, #25196) [Link]

> It's not. There is no abstraction that typifies the layer. You are confusing the typical network protocol's layered design with an OS kernel design pattern.

?
No, the design is layered, but the implementation is layered as well, in Linux. It may not be layered as much, though.

> Sockets are of course bound to an interface, where appropriate. "Sockets" are a library. I must admit I am not sure what you are trying to to say here.

Well, taking a page out of the "library approach" book, you'd have to implement IP sockets in the NIC driver, by calling some functions out of the "socket" library. The NIC driver would get a socket, and then whenever something happens to the socket, call library functions to get 802.3 framed packets. Instead, however, all socket ioctls are handled directly in a layer above the NIC driver, and the NIC driver never sees the socket, but only the 802.3 frames.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 12:33 UTC (Tue) by hppnq (guest, #14462) [Link]

No, the design is layered, but the implementation is layered as well, in Linux. It may not be layered as much, though.

Sorry, I missed you were mentioning the implementation specifically. The confusion was mine. ;-)

I mentioned "sockets" are actually a library, because well, they actually were, and were perceived as such especially before they became the industry standard for inter-NIC communication. In the library book, you would have to find a way to communicate through your NIC to another NIC, and sockets provide just one way to do that.

Correct me if I'm wrong: the Linux network drivers deal with socket buffers (and frames on the wire of course, in the case of ethernet), and the buffers are associated with sockets. One obvious reason for this particular aspect of the implementation is the asynchronous nature of network IO; the driver implementation cannot really in general afford to call library functions whenever "something happens to the socket".

Linux kernel design patterns - part 3

Posted Jun 23, 2009 20:29 UTC (Tue) by johill (subscriber, #25196) [Link]

Yes, data packets are called socket buffers (sk_buff) but the fact that there may or may not be a socket attached to them (sk_buff->sk) is mostly irrelevant to NIC drivers.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 10:39 UTC (Tue) by marcH (subscriber, #57642) [Link]

> I haven't even figured out whether you were trying a rebuttal or not, nor what the TCP/IP specification has to do with the original thesis.

Well, since I am not sure either what you are trying to say, I guess we are even ;-)

So let me rephrase and summarize my point: TCP/IP is incredibly successful. Does this prove or invalidate the midlayer anti-pattern?

I think TCP/IP´s success proves that the midlayer is an anti-pattern, because:
- TCP is not a midlayer but an (optional) library;
- IP has been shrunk to the smallest possible network midlayer 3. Unlike for other subsystems, it is unfortunately practically impossible to shrink a network midlayer 3 down to zero. You need a mimimum set of conventions, and IP is good at reaching this minimum.
- the BSD socket API sucks but it is not really relevant to this question.

What I am NOT saying: IP is the best network layer 3. There are other aspects than this midlayer question.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 10:59 UTC (Tue) by johill (subscriber, #25196) [Link]

Right, so I guess we're just talking about different things. I think IP or TCP as implemented in Linux disprove the "midlayer mistake" antipattern, while you're saying that to the network, TCP or IP are more libraries than layers. I don't think there's any agreement or disagreement, unless I'm misunderstanding you (again) you're talking about the network, while I'm talking about the implementation.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 12:09 UTC (Tue) by marcH (subscriber, #57642) [Link]

I was not thinking about any Linux-specifics at all. That probably explains our misunderstanding to a large extend.

Linux kernel design patterns - part 3

Posted Jun 28, 2009 19:30 UTC (Sun) by marcH (subscriber, #57642) [Link]

> In any case, it would be near impossible to implement networking as a library approach

I think Van Jacobson's did that, check his "network channels": http://lwn.net/Articles/169961/ . The motivation was performance.

(thanks to hppnq for pointing this out)

Linux kernel design patterns - part 3

Posted Jul 5, 2009 7:42 UTC (Sun) by neilbrown (subscriber, #359) [Link]

> In any case, it would be near impossible to implement networking as a library approach since afaict that would mean you'd have sockets tied to NICs and would have to provide migration for that, or something like that.

Yes, tying a socket to a NIC would not work. You still need some degree of layering. Routing clearly needs to be done in a layer well above the individual NICs. However that doesn't mean that a NIC should be treated simply as a device that can send and receive packets. I think it is possible to find a richer abstraction that it is still reasonable to tie to the NIC.

I risk exposing my ignorance here, but I believe the netoworking code has a concept called a 'flow'. It is a lower level concept than a socket or a TCP connection, but it is higher level than individual packets. A flow essentially embodies a routing decision - rather than apply the routing rules to each packet, you apply them once to the source/destination of a socket to create a flow, then keep using that flow until it stops working or needs to be revised.

I imagine that the networking layer could create a flow and connect it to a NIC. Then the NIC driver sees the stream of data heading out, and provides a stream of data coming in. It might use library routines to convert between stream and packets, or it might off load an arbitrary amount of this work to the NIC itself. Either the NIC or the networking layer can abort the flow (due e.g. to timeouts or routing changes) and the network layer responds by re-running the routing algorithm and creating a new flow.

So yes, there must still be a networking layer. The 'mistake' is to put too much control in the layer and not to give enough freedom to the underlying drivers. Choosing the right level of abstraction is always hard, and often we only see the mistakes in hindsight.

An interesting related issue comes up when you consider "object based storage" devices. These things are disk drives which are not linearly addressable, but rather present the storage as a number of variable-sized objects. One could reasonably think of each object as a file.

To put a Linux filesystem on one of these you wouldn't need to worry about allocation policy or free space bitmaps. You would only need to manage metadata like ownership and timestamps, and the directory structure.

So we could imagine a file system interface which passed the requests all the way down to the device. That device might provide regular block-based semantics, so the driver would call in to filesystem libraries to manage space allocation and different libraries to manage the directory tree and metadata. A different device might be an "object-based storage" device so the driver uses the native object abstraction for space allocation and uses the library for directory management. A third device might be a network connection to an NFS server, so neither the space allocation or the directory allocation libraries would be needed. A local cache for some files would still be used though.

Now I'm not really advocating that design, and definitely wouldn't expect anyone to come up with it before being presented with the reality of object based storage. I'm presenting it mainly as an example of how a midlayer can be limiting, and how factoring code into a library style can provide more flexibility. It would certainly make it easier to experiment if we had different libraries for different directory management strategies, and different block allocation strategies etc, and could mix-and-match...

Linux kernel design patterns - part 3

Posted Jul 5, 2009 9:43 UTC (Sun) by johill (subscriber, #25196) [Link]

Re: object storage, I think it already exists -- exofs?

Will have to do some more digging to respond to your other points.

Linux kernel design patterns - part 3

Posted Jul 6, 2009 3:56 UTC (Mon) by dlang (subscriber, #313) [Link]

sometimes you don't route all traffic from one source to one destination through the same NIC (although that is the most common case)

you may have bonded NICs to give you more bandwidth, in which case your traffic may need to go through different NICs

you may have bridged NICs, and connectivity to the remote host changes, causing you to need to go out a different NIC to get to it.

you may be prioritizing traffic, sending 'important' traffic out a high-bandwidth WAN link, which sending 'secondary' traffic out a lower-bandwidth WAN link.

Linux kernel design patterns - part 3

Posted Jul 9, 2009 2:36 UTC (Thu) by pabs (subscriber, #43278) [Link]

Now that exofs exists, it would be awesome for KVM/qemu to be able to pass a directory tree on a host to a Linux guest that would then mount it using exofs.

Linux kernel design patterns - part 3

Posted Jul 9, 2009 11:04 UTC (Thu) by johill (subscriber, #25196) [Link]

I thought about that a couple of days back, and it's probably not very hard, but it would also be somewhat stupid.

Remember that exofs actually keeps a "filesystem" in the object storage. So for example for a directory, it kinda stores this file:

/-----
|dir: foo
|files:
| * bar: 12
| * baz: 13
\-----

and 12/13 are handles to other objects. So to write a host filesystem, you'd have to write an OSD responder that creates those "directory files" on the fly based on the filesystem. Then you get races if the guest and host both modify a directory at the same time, you'd have to cache what you last told the guest, and then see what modifications it made, to apply those modifications to the filesystem.

All in all, I think a different protocol would be much easier.

Linux kernel design patterns - part 3

Posted Jun 23, 2009 9:13 UTC (Tue) by hppnq (guest, #14462) [Link]

Have you any idea what the impact would have been on internet traffic (all those years ago!) if, say, IP packets would have used 64 bit (or worse, arbitrary length) addressing instead of the 32 bits you so easily dismiss as "unnecessarily constraining"?


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds