Sponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Stable kernel updates (security fix)
The 2.6.25.2, 2.6.24.7, and 2.4.36.4 stable kernel updates have been
released. They contain a single fix for a "pretty nasty" security hole in
the filesystem locks code. The curious can see the
full commit with description in Viroese: "fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered."
(Log in to post comments)
Stable kernel updates (security fix) Posted May 7, 2008 14:13 UTC (Wed) by mattdm (subscriber, #18) [Link] So, can someone explain the nastiness in English?
Nastiness explained (sort of) Posted May 7, 2008 15:22 UTC (Wed) by pr1268 (subscriber, #24648) [Link] So, can someone explain the nastiness in English? I'll try: Update your kernel with this patch, -or- There's a 1-in-100 chance that, for around 20 CPU cycles, a race could occur, thereby allowing a root privilege escalation (pulling numbers from a hat, so to speak). Seriously, from what I was able to decipher from Al Viro's e-mail, it appears that code designed to prevent a race condition in the file control (fcntl) section has a "leak", and this patch aims to hold a spinlock for a while longer to cover up the leak.
Nastiness explained (sort of) Posted May 7, 2008 16:04 UTC (Wed) by xorbe (guest, #3165) [Link] "Solution is to hold ->file_lock around fcheck() in there;" -- which sounds much less ambiguous...
Nastiness explained (sort of) Posted May 7, 2008 16:26 UTC (Wed) by MisterIO (subscriber, #36192) [Link] Actually the bug is just for SMP boxes. So if you have a single-processor computer, you won't have any problem.
Nastiness explained (sort of) Posted May 7, 2008 16:37 UTC (Wed) by MisterIO (subscriber, #36192) [Link] Ok, with kernel preemption, that's not true.
Nastiness explained (sort of) Posted May 7, 2008 19:32 UTC (Wed) by viro (subscriber, #7872) [Link] Actually, on UP we see everything in order, regardless of preemption. What happens is that in setlk we update the per-inode list of file_lock (making it non-empty), then check that descriptor still points to the same file. In close we remove file from descriptor table, then check per-inode list. The trouble is, we have no warranty that these will be seen in order on another CPU. IOW, check of per-inode list might see the value prior to update done by setlk and setlk might see descriptor table state prior to update done by close. With both sides deciding that there's no need to evict file_lock - close() doesn't see it, setlk thinks that when close() comes it'll evict the sucker. And file_lock holds a reference to struct file, but doesn't contribute to refcount, with all joy that comes from _that_. It's not supposed to outlive removal of struct file from descriptor table, let alone struct file itself... lock/unlock around accesses to descriptor table forces SMP ordering and solves the problem; modification of file_lock list is guaranteed to be seen before unlock done after the check in setlk and check in close is guranteed to come after lock done before removal from descriptor table. So that spinlock acts as a barrier, preventing the ordering problems. See Documentation/memory-barriers.txt for background in that area. And yes, that's part of the price to be paid for lockless access schemes (such as RCU for descriptor tables and lazy taking of BKL for POSIX locks eviction in filp_close()) - one needs to think of things that normally would be provided automatically by exclusion primitives. Real SMP-only races are not too frequent thing around VFS, but yes, sometimes they do happen...
Much clearer explanation Posted May 7, 2008 19:51 UTC (Wed) by pr1268 (subscriber, #24648) [Link] Thank you, sir! Your explanation makes so much more sense than my silly melodramatic version above. This issue highlights one of the critical issues of multi-processor kernel programming--it's easier to write multi-threaded userspace code, but the buck stops at the kernel with respect to maintaining absolute thread safety.
Nastiness explained (sort of) Posted May 7, 2008 23:29 UTC (Wed) by csamuel (subscriber, #2624) [Link] Al, thanks for that description! Could this be related to a problem we're seeing on Barcelona (B3) based systems where open(2) occasionally fails with ENOENT when creating a file in a directory that most certainly exists ? We're seeing it under 2.6.24 and 2.6.25, on both NFS and local (ext3) filesystems. We did think it was the queueing system occasionally corrupting the mode variable and dropping O_CREAT until a user forwarded us a message showing /bin/cp having an identical problem. :-(
Nastiness explained (sort of) Posted May 8, 2008 7:51 UTC (Thu) by MisterIO (subscriber, #36192) [Link] Thanks for the explanation.
Nastiness explained (sort of) Posted May 7, 2008 16:37 UTC (Wed) by NAR (subscriber, #1313) [Link] However, nowadays even desktops tend to have multicore CPUs, so I think there are a lot more SMP machines out there than 2-3 years ago...
Nastiness explained (sort of) Posted May 7, 2008 18:40 UTC (Wed) by proski (subscriber, #104) [Link] Laptops are going multicore too. It's hard to buy a single-core laptop now, unless it's very low end.
Stable kernel updates (security fix) Posted May 7, 2008 18:27 UTC (Wed) by iabervon (subscriber, #722) [Link] I think that, if you do a lock fcntl() on a file descriptor in one thread and a close() on it in another thread, it's possible that you'll end up with the fcntl starting and figuring out what you're locking, and then the close happens, and then the lock completes. At this point, you've locked the file, but it's closed. These locks are only supposed to last until you close the file (or exit, which automatically closes it). So I think this lets people leak locks. I don't know whether this matters significantly for security, but, since POSIX implies that, if you really want a lock to go away and you're root, you can kill the process that holds it, something might be depending on this actually working.
Stable kernel updates (security fix) Posted May 7, 2008 19:56 UTC (Wed) by viro (subscriber, #7872) [Link] It does matter; file_lock is not supposed to outlive the struct file it refers to. If it does, and if struct file in question is later freed, you can get the page that used to contain it mapped in userland. Then it's not hard to fill it with the right pattern, so that e.g. reading from /proc/locks would read data from arbitrary kernel address and print it for you (look at the place where it prints ->i_ino, for starters). That's already a roothole; looking for more direct ways to escalate is left as an exercise for readers (and yes, it is possible).
Stable kernel updates (security fix) Posted May 7, 2008 21:49 UTC (Wed) by iabervon (subscriber, #722) [Link] Ah, yes, if the kernel itself is assuming that a locked file must be open in how it handles memory, that's a pretty easy escalation.
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.