In Brief
What Tux3 needs, it seems, is some new development energy. It could be an interesting project for developers who are wanting to get started in filesystem development.
Resource counters. The resource counter mechanism is built into control groups; it is intended for use by tools like the memory use controller. These counters contain, at their core, a (believe it or not) counter value which tracks the current usage of a resource by a given control group. This counter has run into the same problem which afflicts any frequently-changed global variable: it scales poorly due to cache line bouncing. The usage of some resources (pages of memory, for example) can change frequently, causing the associated counter to be a drag on the system as a whole.
Balbir Singh's scalable resource counters patch aims to fix that situation. With this patch, the single "usage" counter becomes an array of per-CPU counters. Since each processor works with its own copy of the counter, there is no more cache line bouncing and things run faster. The down side is that the count becomes approximate. The per-CPU counters are summed occasionally to keep everything roughly in sync, but keeping exact counts would take away much of the scalability that this patch was meant to provide. The good news is that exact counts are not really needed anyway; as long as the counter reflects something close enough to reality, the system will work essentially as it did before - only a little more quickly.
Inline spinlocks. Once upon a time, spinlocks were implemented with a series of inline functions, on the notion that such a performance-critical primitive would need to be as fast as possible. That changed in 2004, when spinlocks were turned into normal functions. The function call overhead hurt a bit, but moving spinlocks out-of-line made the kernel considerably smaller, which has performance benefits of its own. And that's how spinlocks have been ever since.
The pendulum may be about to swing the other way again, though, at least for the S390 architecture. Heiko Carstens noted that function calls on this architecture are quite expensive. He put together an inline spinlocks patch and measured performance improvements of 1-5%. So he would like to put this patch into the mainline, along with a configuration option allowing each architecture to choose the best way to implement spinlocks. So far, there has been little commentary for or against this idea.
Const seq_operations. James Morris has posted a patch making seq_operations structures
constant throughout the kernel. These structures are almost always
populated at compile time and never need to change; allowing the function
pointers therein to be overwritten can only be useful to those who would
like to subvert the kernel. A number of core VFS operations structures
have been made const over the years, but seq_operations
has not been addressed until now. James says: "This is derived from
the grsecurity patch, although generated
from scratch because it's simpler than extracting the changes
from there.
"
data=guarded. Back in the middle of the discussion of crash robustness and latency in the ext3 filesystem, Chris Mason came forward with a proposal for a data=guarded mode, which would delay metadata updates when files change size to prevent the disclosure of unrelated information. Since then, the data=guarded patch has disappeared from view. In response to a query from Frans Pop, Chris confirmed that he is still working on that code, and that he plans to get it merged for 2.6.32.
Among those welcoming the news was Andi Kleen, who remarked: "data=writeback already cost
me a few files after crashes here.
" The data=guarded mode may not
help with that particular problem, though: it is really meant to combine
the security benefits of data=ordered (not disclosing random data, in
particular) with the performance benefits of data=writeback. The worst
data-loss problems should have already been addressed by the robustness
fixes that went into ext3 for 2.6.30.
