Brief itemsreleased on February 7. "There's nothing much that stands out here. Some arch updates (arm and powerpc), the usual driver updates: dri (radeon/i915), network cards, sound, media, scisi, some filesystem updates (cifs, btrfs), and some random stuff to round it all out (networking, watchpoints, tracepoints, etc)." The short-form changelog is in the announcement, or see the full changelog for all the details.
Stable updates: the 22.214.171.124 long-term update was released on February 7 with a long list of important fixes.
The 126.96.36.199 update was released on February 9. This one contains a couple dozen important fixes.
That window has been capped by RFC 3390 at four segments (just over 4KB) for the better part of a decade. In the meantime, connection speeds have increased and the amount of data sent over a given connection has grown despite the fact that connections live for shorter periods of time. As a result, many connections never ramp up to their full speed before they are closed, so the four-segment limit is now seen as a bottleneck which increases the latency of a typical connection considerably. That is one reason why contemporary browsers use many connections in parallel, despite the fact that the HTTP specification says that a maximum of two connections should be used.
Some developers at Google have been agitating for an increase in the initial congestion window for a while; in July 2010 they posted an IETF draft pushing for this change and describing the motivation behind it. Evidently Google has run some large-scale tests and found that, by increasing the initial congestion window, user-visible latencies can be reduced by 10% without creating congestion problems on the net. They thus recommend that the window be increased to 10 segments; the draft suggests that 16 might actually be a better value, but more testing is required.
David Miller has posted a patch increasing the window to 10; that patch has not been merged into the mainline, so one assumes it's meant for 2.6.39.
Interestingly, Google's tests worked with a number of operating systems, but not with Linux, which uses a relatively small initial receive window of 6KB. Most other systems, it seems, use 64KB instead. Without a receive window at least as large as the congestion window, a larger initial congestion window will have little effect. That problem will be fixed in 2.6.38, thanks to a patch from Nandita Dukkipati raising the initial receive window to 10 segments.annoyed by HTC's policy of delaying source code releases for up to 120 days after a given handset ships. In response, Matthew Garrett has suggested an addition to the top-level COPYING file in the kernel source:
About the only response so far has been from Alan Cox, who has suggested that getting a lawyer's opinion on the matter might be useful. Linus, over whose name the new text would appear, has not commented on it. So it's not clear if the change will go in or whether it will inspire any changes in vendor behavior if it is merged. But it does, at least, make the developers' feelings on the matter known.
Kernel development news
The question, as it turns out, came sooner - February 3, to be exact - when Jan Kara suggested that removing ext2 and ext3 could be discussed at the upcoming storage, filesystems, and memory management summit. Jan asked:
One might protest that there will be existing filesystems in the ext3 (and even ext2) formats for the indefinite future. Removing support for those formats is clearly not something that can be done. But removing the ext2 and/or ext3 code is not the same as removing support: ext4 has been very carefully written to be able to work with the older formats without breaking compatibility. One can mount an ext3 filesystem using the ext4 code and make changes; it will still be possible to mount that filesystem with the ext3 code in the future.
So it is possible to remove ext2 and ext3 without breaking existing users or preventing them from going back to older implementations. Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time. Given that, one might wonder why removing the older code even requires discussion.
There appear to be a couple of reasons not to hurry into this change, both of which have to do with testing. As Eric Sandeen noted, some of the more ext3-like options are not tested as heavily as the native modes of operation:
There is also concern that ext4, which is still seeing much more change than its predecessors, is more likely to introduce instabilities. That's a bit of a disturbing idea; there are enough production users of ext4 now that the introduction of serious bugs would not be pleasant. But, again, the backward-compatible operating modes of ext4 may not be as heavily tested as the native mode, so one might argue that operation with older filesystems is more likely to break regardless of how careful the developers are.
So, clearly, any move to get rid of ext2 and ext3 would have to be preceded by the introduction of better testing for the less-exercised corners of ext4. The developers involved understand that clearly, so there is no need to be worried that the older code could be removed too quickly.
Meanwhile, there are also concerns that the older code, which is not seeing much developer attention, could give birth to bugs of its own. As Jan put it:
Developers have also expressed concern that new filesystem authors might copy code from ext2, which, at this point, does not serve as a good example for how Linux filesystems should be written.
The end result is that, once the testing concerns have been addressed, everybody involved might be made better off by the removal of ext2 and ext3. Users with older filesystems would get better performance and a code base which is seeing more active development and maintenance. Developers would be able to shed an older maintenance burden and focus their efforts on a single filesystem going forward. Thanks to the careful compatibility work which has been done over the years, it may be possible to safely make this move in the relatively near future.
With some regularity, the topic of allowing multiple Linux Security Modules (LSMs) to all be active comes up in the Linux kernel community. There have been some attempts at "stacking" or "chaining" LSMs in the past, but nothing has ever made it into the mainline. On the other hand, though, every time a developer comes up with some kind of security hardening patch for the kernel, they are generally directed toward the LSM interface. Because the "monolithic" security solutions (like SELinux, AppArmor, and others) tend to have already taken the single existing LSM slot in many distributions, these simpler, more targeted LSMs are generally unable to be used. But a discussion on the linux-security-module mailing list suggests that work is being done that just might solve this problem.
The existing implementation of LSMs uses a single set of function pointers in a struct security_operations for the "hooks" that get called when access decisions need to be made. Once a security module gets registered (typically at boot time using the security= flag), its implementation is stored in the structure and any other LSM is out of luck. The idea behind LSM stacking would be to keep multiple versions of the security_operations structure around and to call each registered LSM's hooks for an access decision. While that sounds fairly straightforward, there are some subtleties that need to be addressed, especially if different LSMs give different answers for a particular access.
This problem with the semantics of "composing" two (or more) LSMs has been discussed at various points, without any real global solution for composing arbitrary LSMs. As Serge E. Hallyn warned over a year ago:
There is one example of stacking LSMs as Hallyn describes in the kernel already; the capabilities LSM is called directly from other LSMs where necessary. That particular approach is not very general, of course, as LSM maintainers are likely to lose patience with adding calls for every other possible LSM. A more easily expandable solution is required.
David Howells posted a set of patches that would add that expansion mechanism. It does that by allowing multiple calls to the register_security() initialization function, each with its own set of security_operations. Instead of the current situation, where each LSM manages its own data for each kind of object (credentials, keys, files, inodes, superblocks, IPC, and sockets), Howell's security framework will allocate and manage that data for the LSMs.
The security_operations structure gets new *_data_size and *_data_offset fields for each kind of object, with the former filled in by the LSM before calling register_security() and the latter being managed by the framework. The data size field tells the framework how much space is needed for the LSM-specific data for that type of object, and the offset is used by the framework to find each LSM's private data. For struct cred, struct key, struct file, and struct super_block, the extra data for each registered LSM is tacked onto the end of the structure rather than going through an intermediate pointer (as is required for the others). Wrappers are defined that will allow an LSM to extract its data from an object based on the new fields in the operations table.
The framework then maintains a list of registered LSMs and puts the capabilities LSM in the first slot of the list. When one of the security hooks is called, the framework iterates over the list and calls the corresponding hook for each registered LSM. Depending on the specific hook, different kinds of iterators are used, but the usual iterator looks for a non-zero response from an LSM's hook, which would indicate a denial of some kind, and returns that to the framework. The other iterators are used for specialized calls, for example when there is no return value or when only the first hook found should be called. The upshot is that the hooks for registered LSMs get called in order (with capabilities coming first), and the first to deny the access "wins". Because the capabilities calls are pulled out separately, that also means that the other LSMs no longer have to make those calls themselves; instead the framework will handle it for them.
But there are a handful of hooks that do not work very well in a multi-LSM environment, in particular the secid (an LSM-specific security label ID) handling routines (e.g. secid_to_secctx(), task_getsecid(), etc.). Howells's current implementation just calls the hook of the first LSM it finds that implements it, which is not going to make it possible to use multiple LSMs that all implement those hooks (currently just SELinux and Smack). Howells's solution is to explicitly ban that particular combination:
But Smack developer Casey Schaufler isn't convinced that is the right course: "That kind of takes the wind out of the sails, doesn't it?" He would rather see a more general solution that allows multiple secids, and the related secctxs (security contexts), to be handled by the framework:
Another interesting part of Schaufler's message is that he has been working on an "alternative approach" to the multi-LSM problem that he calls "Glass". The code is, as yet, unreleased, but Schaufler describes Glass as an LSM that composes other LSMs:
Unlike Howells's proposal, Glass would leave the calls to the capabilities LSM (aka commoncap) in the existing LSMs, and only call commoncap if no LSM implemented a given hook. The idea is that the LSMs already handle the capabilities calls in their hooks as needed, so it is only when none of those get called that requires a call into commoncap. In addition, Glass leaves the allocation and management of the security "blobs" (LSM-specific data for objects) to the LSMs rather than centralizing them in the framework as Howells's patches do.
In addition to various other differences, there is a more fundamental difference in the way that the two solutions handle multiple LSMs that all have hooks for a particular security operation. Glass purposely calls each hook in each registered LSM, whereas Howells's proposal typically short-circuits the chain of hooks once one of them has denied the access. Schaufler's idea is that an LSM should be able to maintain state, which means that skipping its hooks could potentially skew the access decision:
There are plenty of other issues to resolve, including things like handling /proc/self/attr/current (which contains the security ID for the current process) because various user-space programs already parse the output of that file, though it is different depending on which LSM is active. A standardized format for that file, which takes multiple LSMs into account, might be better, but it would break the kernel ABI and is thus not likely to pass muster. Overall, though, Howells and Schaufler were making some good progress on defining the requirements for supporting multiple LSMs. Schaufler is optimistic that the collaboration will bear fruit: "I think that we may be able to get past the problems that have held multiple LSMs back this time around."
So far, there is only the code from Howells to look at, but Schaufler has promised to make Glass available soon. With luck, that will lead to a multi-LSM solution that the LSM developers can coalesce behind, whether it comes from Howells, Schaufler, or a collaboration between them. There may still be a fair amount of resistance from Linus Torvalds and other kernel hackers, but the lack of any way to combine LSMs comes up too often for it to be ignored forever.
Mesh networking, as its name implies, is meant to work via a large number of short-haul connections without any sort of centralized control. A proper mesh network should configure itself dynamically, responding to the addition and removal of nodes and changes in connectivity. In a well-functioning mesh, networking "just happens" without high-level coordination; such a net should be quite hard to disrupt. What the kernel offers now falls somewhat short of that ideal, but it is a good demonstration of how hard mesh networking can be.
The "Better Approach To Mobile Ad-hoc Networking" (BATMAN) protocol is described in this draft RFC. A BATMAN mesh is made up of a set of "originators" which communicate via network interfaces - normal wireless interfaces, for example. Every so often, each originator sends out an "originator message" (OGM) as a broadcast to all of its neighbors to tell the world that it exists. Each neighbor is supposed to note the presence of the originator and forward the message onward via a broadcast of its own. Thus, over time, all nodes in the mesh should see the OGM, possibly via multiple paths, and thus each node will know (1) that it can reach the originator, and (2) which of its neighbors has the best path to that originator. Each node maintains a routing table listing every other node it has ever heard of and the best neighbor by which to reach each one.
This protocol has the advantage of building and maintaining the routing tables on the fly; no central coordination is needed. It should also find near-optimal routes to each. If a node goes away, the routing tables will reconfigure themselves to function in its absence. There is also no node in the network which has a complete view of how the mesh is built; nodes only know who is out there and the best next hop. This lack of knowledge should add to the security and robustness of the mesh.
Nodes with a connection to the regular Internet can set a bit in their OGMs to advertise that fact; that allows others without such a connection to route packets to and from the rest of the world.
The original BATMAN protocol uses UDP for the OGM messages. That design allows routing to be handled with the normal kernel routing tables, but it also imposes a couple of unfortunate constraints: nodes must obtain an IP address from somewhere before joining the mesh, and the protocol is tied to IPv4. The BATMAN-adv protocol found in the Linux kernel has changed a few things to get around these problems, making it a rather more flexible solution. BATMAN-adv works entirely at the link layer, exchanging non-UDP OGMs directly with neighboring nodes. The routing table is maintained within a special virtual network device, which makes all nodes on the mesh appear to be directly connected via that virtual interface. Thus the system can join the mesh before it has a network address, and any protocol can be run over the mesh.
BATMAN-adv removes some of the limitations found in BATMAN, but readers who have gotten this far will likely be thinking of the limitations that remain. The flooding of broadcast OGMs through the net can only scale so far before a significant amount of bandwidth is consumed by network overhead. The protocol trims OGMs which are obviously not of interest - those which describe a route which is known to be worse than others, for example - but the OGM traffic will still be significant if the mesh gets large. The routing tables will also grow, since every node must keep track of every other node in existence. The overhead for these tables is probably manageable for a mesh of 1,000 nodes; it is probably hopeless for 1,000,000 nodes. Mobile devices - which are targeted by this protocol - are especially likely to suffer as the table gets larger.
Security is also a concern in this kind of network. Simple bandwidth-consuming denial of service attacks would seem relatively straightforward. Sending bogus OGMs could cause the size of routing tables to explode or disrupt the routing within the mesh. A more clever attack could force traffic to route through a hostile node, enabling man-in-the-middle exploits. And so on. The draft RFC quickly mentions some of these issues, but it seems clear that security has not been a major design goal.
So it would seem clear that BATMAN-adv, while interesting, is not the solution to the problem of an overly-centralized network. It could be a useful way to extend connectivity through a building or small neighborhood, but it is not meant to operate on a large scale or in an overtly hostile environment. The bigger problem is a hard one to solve, to say the least. The experience gained with protocols like BATMAN-adv may will prove valuable in the search for that solution, but there is clearly some work to be done still.
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds