|
|
Log in / Subscribe / Register

In Brief

By Jonathan Corbet
August 12, 2009
Tux3. The once-noisy Tux3 development community has gone rather quiet in recent months. An inquiry into the status of the project led to one of last week's quotes of the week, wherein developer Daniel Phillips pled a lack of time and expressed regrets at not having merged the code into the mainline months ago. When asked (by Ted Ts'o) for a description of what makes Tux3 interesting, Daniel responded this way:

I think Tux3 fills an empty niche in our filesystem ecology where a simple, clean and modern general purpose filesystem should exist and there is none. In concrete terms, Tux3 implements a single-pointer-per-extent model that Btrfs and ZFS do not. This allows a very simple *physical* design, with much complexity pushed to the *logical* level where things generally behave better. A simple physical design offers many benefits, including making it easier to take a run at that holiest of holy grails, online check and repair.

What Tux3 needs, it seems, is some new development energy. It could be an interesting project for developers who are wanting to get started in filesystem development.

Resource counters. The resource counter mechanism is built into control groups; it is intended for use by tools like the memory use controller. These counters contain, at their core, a (believe it or not) counter value which tracks the current usage of a resource by a given control group. This counter has run into the same problem which afflicts any frequently-changed global variable: it scales poorly due to cache line bouncing. The usage of some resources (pages of memory, for example) can change frequently, causing the associated counter to be a drag on the system as a whole.

Balbir Singh's scalable resource counters patch aims to fix that situation. With this patch, the single "usage" counter becomes an array of per-CPU counters. Since each processor works with its own copy of the counter, there is no more cache line bouncing and things run faster. The down side is that the count becomes approximate. The per-CPU counters are summed occasionally to keep everything roughly in sync, but keeping exact counts would take away much of the scalability that this patch was meant to provide. The good news is that exact counts are not really needed anyway; as long as the counter reflects something close enough to reality, the system will work essentially as it did before - only a little more quickly.

Inline spinlocks. Once upon a time, spinlocks were implemented with a series of inline functions, on the notion that such a performance-critical primitive would need to be as fast as possible. That changed in 2004, when spinlocks were turned into normal functions. The function call overhead hurt a bit, but moving spinlocks out-of-line made the kernel considerably smaller, which has performance benefits of its own. And that's how spinlocks have been ever since.

The pendulum may be about to swing the other way again, though, at least for the S390 architecture. Heiko Carstens noted that function calls on this architecture are quite expensive. He put together an inline spinlocks patch and measured performance improvements of 1-5%. So he would like to put this patch into the mainline, along with a configuration option allowing each architecture to choose the best way to implement spinlocks. So far, there has been little commentary for or against this idea.

Const seq_operations. James Morris has posted a patch making seq_operations structures constant throughout the kernel. These structures are almost always populated at compile time and never need to change; allowing the function pointers therein to be overwritten can only be useful to those who would like to subvert the kernel. A number of core VFS operations structures have been made const over the years, but seq_operations has not been addressed until now. James says: "This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there."

data=guarded. Back in the middle of the discussion of crash robustness and latency in the ext3 filesystem, Chris Mason came forward with a proposal for a data=guarded mode, which would delay metadata updates when files change size to prevent the disclosure of unrelated information. Since then, the data=guarded patch has disappeared from view. In response to a query from Frans Pop, Chris confirmed that he is still working on that code, and that he plans to get it merged for 2.6.32.

Among those welcoming the news was Andi Kleen, who remarked: "data=writeback already cost me a few files after crashes here." The data=guarded mode may not help with that particular problem, though: it is really meant to combine the security benefits of data=ordered (not disclosing random data, in particular) with the performance benefits of data=writeback. The worst data-loss problems should have already been addressed by the robustness fixes that went into ext3 for 2.6.30.


to post comments

const function pointers

Posted Aug 13, 2009 2:06 UTC (Thu) by jamesmrh (guest, #31622) [Link] (3 responses)

FWIW, checkpatch.pl does check some operations for const (file_operations and seq_operations), but this will only help new patches which are actually passed through the script.

There are some other operations to cover, too.

If anyone wants to help, have a wade through the grsecurity patch for hints.

797 files changed, 22449 insertions(+), 3032 deletions(-)

Perhaps make a cup of tea first.

const function pointers

Posted Aug 13, 2009 18:52 UTC (Thu) by spender (guest, #23067) [Link] (2 responses)

Or you can look at:
http://grsecurity.net/~spender/oooo_fancy.diff
http://grsecurity.net/~spender/more_const_fixes.diff
http://grsecurity.net/~spender/grsec_constfixes-2.4.37.4....

I'm pretty sure I made some corrections since then, so don't depend on the above to be 100% correct.

I can upload the simple/ugly script I used to automate fixing them up a bit.

-Brad

const function pointers

Posted Aug 13, 2009 19:02 UTC (Thu) by spender (guest, #23067) [Link]

There were only two things in 2.6.30.4 that needed the writable *_ops (I fixed up file_operations, seq_operations, dentry_operations, inode_operations, address_space_operations, vm_operations, super_operations): drivers/scsi/sg.c and virt/kvm/kvm_main.c

sg.c I fixed trivially, kvm_main.c can't be fixed easily without making the code ugly.

-Brad

const function pointers

Posted Aug 13, 2009 22:50 UTC (Thu) by jamesmrh (guest, #31622) [Link]

Thanks.

.


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds