User: Password:
Subscribe / Log in / New account

Linux kernel design patterns - part 3

Linux kernel design patterns - part 3

Posted Jul 5, 2009 7:42 UTC (Sun) by neilbrown (subscriber, #359)
In reply to: Linux kernel design patterns - part 3 by johill
Parent article: Linux kernel design patterns - part 3

> In any case, it would be near impossible to implement networking as a library approach since afaict that would mean you'd have sockets tied to NICs and would have to provide migration for that, or something like that.

Yes, tying a socket to a NIC would not work. You still need some degree of layering. Routing clearly needs to be done in a layer well above the individual NICs. However that doesn't mean that a NIC should be treated simply as a device that can send and receive packets. I think it is possible to find a richer abstraction that it is still reasonable to tie to the NIC.

I risk exposing my ignorance here, but I believe the netoworking code has a concept called a 'flow'. It is a lower level concept than a socket or a TCP connection, but it is higher level than individual packets. A flow essentially embodies a routing decision - rather than apply the routing rules to each packet, you apply them once to the source/destination of a socket to create a flow, then keep using that flow until it stops working or needs to be revised.

I imagine that the networking layer could create a flow and connect it to a NIC. Then the NIC driver sees the stream of data heading out, and provides a stream of data coming in. It might use library routines to convert between stream and packets, or it might off load an arbitrary amount of this work to the NIC itself. Either the NIC or the networking layer can abort the flow (due e.g. to timeouts or routing changes) and the network layer responds by re-running the routing algorithm and creating a new flow.

So yes, there must still be a networking layer. The 'mistake' is to put too much control in the layer and not to give enough freedom to the underlying drivers. Choosing the right level of abstraction is always hard, and often we only see the mistakes in hindsight.

An interesting related issue comes up when you consider "object based storage" devices. These things are disk drives which are not linearly addressable, but rather present the storage as a number of variable-sized objects. One could reasonably think of each object as a file.

To put a Linux filesystem on one of these you wouldn't need to worry about allocation policy or free space bitmaps. You would only need to manage metadata like ownership and timestamps, and the directory structure.

So we could imagine a file system interface which passed the requests all the way down to the device. That device might provide regular block-based semantics, so the driver would call in to filesystem libraries to manage space allocation and different libraries to manage the directory tree and metadata. A different device might be an "object-based storage" device so the driver uses the native object abstraction for space allocation and uses the library for directory management. A third device might be a network connection to an NFS server, so neither the space allocation or the directory allocation libraries would be needed. A local cache for some files would still be used though.

Now I'm not really advocating that design, and definitely wouldn't expect anyone to come up with it before being presented with the reality of object based storage. I'm presenting it mainly as an example of how a midlayer can be limiting, and how factoring code into a library style can provide more flexibility. It would certainly make it easier to experiment if we had different libraries for different directory management strategies, and different block allocation strategies etc, and could mix-and-match...

(Log in to post comments)

Linux kernel design patterns - part 3

Posted Jul 5, 2009 9:43 UTC (Sun) by johill (subscriber, #25196) [Link]

Re: object storage, I think it already exists -- exofs?

Will have to do some more digging to respond to your other points.

Linux kernel design patterns - part 3

Posted Jul 6, 2009 3:56 UTC (Mon) by dlang (subscriber, #313) [Link]

sometimes you don't route all traffic from one source to one destination through the same NIC (although that is the most common case)

you may have bonded NICs to give you more bandwidth, in which case your traffic may need to go through different NICs

you may have bridged NICs, and connectivity to the remote host changes, causing you to need to go out a different NIC to get to it.

you may be prioritizing traffic, sending 'important' traffic out a high-bandwidth WAN link, which sending 'secondary' traffic out a lower-bandwidth WAN link.

Linux kernel design patterns - part 3

Posted Jul 9, 2009 2:36 UTC (Thu) by pabs (subscriber, #43278) [Link]

Now that exofs exists, it would be awesome for KVM/qemu to be able to pass a directory tree on a host to a Linux guest that would then mount it using exofs.

Linux kernel design patterns - part 3

Posted Jul 9, 2009 11:04 UTC (Thu) by johill (subscriber, #25196) [Link]

I thought about that a couple of days back, and it's probably not very hard, but it would also be somewhat stupid.

Remember that exofs actually keeps a "filesystem" in the object storage. So for example for a directory, it kinda stores this file:

|dir: foo
| * bar: 12
| * baz: 13

and 12/13 are handles to other objects. So to write a host filesystem, you'd have to write an OSD responder that creates those "directory files" on the fly based on the filesystem. Then you get races if the guest and host both modify a directory at the same time, you'd have to cache what you last told the guest, and then see what modifications it made, to apply those modifications to the filesystem.

All in all, I think a different protocol would be much easier.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds