LWN.net Logo

Advertisement

Advanced thin client solution for Linux, based on Open Source. Mix Windows and Linux, with hardware accelerated OpenGL!

Advertise here

Stable kernel updates (security fix)

The 2.6.25.2, 2.6.24.7, and 2.4.36.4 stable kernel updates have been released. They contain a single fix for a "pretty nasty" security hole in the filesystem locks code. The curious can see the full commit with description in Viroese: "fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered."
(Log in to post comments)

Stable kernel updates (security fix)

Posted May 7, 2008 14:13 UTC (Wed) by mattdm (subscriber, #18) [Link]

So, can someone explain the nastiness in English?

Nastiness explained (sort of)

Posted May 7, 2008 15:22 UTC (Wed) by pr1268 (subscriber, #24648) [Link]

So, can someone explain the nastiness in English?

I'll try:

Update your kernel with this patch,

-or-

There's a 1-in-100 chance that, for around 20 CPU cycles, a race could occur, thereby allowing a root privilege escalation (pulling numbers from a hat, so to speak).

Seriously, from what I was able to decipher from Al Viro's e-mail, it appears that code designed to prevent a race condition in the file control (fcntl) section has a "leak", and this patch aims to hold a spinlock for a while longer to cover up the leak.

Nastiness explained (sort of)

Posted May 7, 2008 16:04 UTC (Wed) by xorbe (guest, #3165) [Link]

"Solution is to hold ->file_lock around fcheck() in there;" -- which sounds much less
ambiguous...

Nastiness explained (sort of)

Posted May 7, 2008 16:26 UTC (Wed) by MisterIO (subscriber, #36192) [Link]

Actually the bug is just for SMP boxes. So if you have a single-processor computer, you won't
have any problem.

Nastiness explained (sort of)

Posted May 7, 2008 16:37 UTC (Wed) by MisterIO (subscriber, #36192) [Link]

Ok, with kernel preemption, that's not true.

Nastiness explained (sort of)

Posted May 7, 2008 19:32 UTC (Wed) by viro (subscriber, #7872) [Link]

Actually, on UP we see everything in order, regardless of preemption.
What happens is that in setlk we update the per-inode list of file_lock 
(making it non-empty), then check that descriptor still points to the
same file.  In close we remove file from descriptor table, then check
per-inode list.  The trouble is, we have no warranty that these will be
seen in order on another CPU.  IOW, check of per-inode list might see
the value prior to update done by setlk and setlk might see descriptor
table state prior to update done by close.  With both sides deciding
that there's no need to evict file_lock - close() doesn't see it, setlk
thinks that when close() comes it'll evict the sucker.

And file_lock holds a reference to struct file, but doesn't contribute
to refcount, with all joy that comes from _that_.  It's not supposed
to outlive removal of struct file from descriptor table, let alone
struct file itself...

lock/unlock around accesses to descriptor table forces SMP ordering
and solves the problem; modification of file_lock list is guaranteed
to be seen before unlock done after the check in setlk and check in
close is guranteed to come after lock done before removal from descriptor
table.  So that spinlock acts as a barrier, preventing the ordering
problems.

See Documentation/memory-barriers.txt for background in that area.
And yes, that's part of the price to be paid for lockless access
schemes (such as RCU for descriptor tables and lazy taking of BKL
for POSIX locks eviction in filp_close()) - one needs to think of
things that normally would be provided automatically by exclusion
primitives.

Real SMP-only races are not too frequent thing around VFS, but yes,
sometimes they do happen...

Much clearer explanation

Posted May 7, 2008 19:51 UTC (Wed) by pr1268 (subscriber, #24648) [Link]

Thank you, sir! Your explanation makes so much more sense than my silly melodramatic version above.

This issue highlights one of the critical issues of multi-processor kernel programming--it's easier to write multi-threaded userspace code, but the buck stops at the kernel with respect to maintaining absolute thread safety.

Nastiness explained (sort of)

Posted May 7, 2008 23:29 UTC (Wed) by csamuel (subscriber, #2624) [Link]

Al, thanks for that description!

Could this be related to a problem we're seeing on Barcelona (B3) based 
systems where open(2) occasionally fails with ENOENT when creating a file 
in a directory that most certainly exists ?

We're seeing it under 2.6.24 and 2.6.25, on both NFS and local (ext3) 
filesystems.

We did think it was the queueing system occasionally corrupting the mode 
variable and dropping O_CREAT until a user forwarded us a message 
showing /bin/cp having an identical problem. :-(

Nastiness explained (sort of)

Posted May 8, 2008 7:51 UTC (Thu) by MisterIO (subscriber, #36192) [Link]

Thanks for the explanation.

Nastiness explained (sort of)

Posted May 7, 2008 16:37 UTC (Wed) by NAR (subscriber, #1313) [Link]

However, nowadays even desktops tend to have multicore CPUs, so I think there are a lot more
SMP machines out there than 2-3 years ago...

Nastiness explained (sort of)

Posted May 7, 2008 18:40 UTC (Wed) by proski (subscriber, #104) [Link]

Laptops are going multicore too. It's hard to buy a single-core laptop now, unless it's very low end.

Stable kernel updates (security fix)

Posted May 7, 2008 18:27 UTC (Wed) by iabervon (subscriber, #722) [Link]

I think that, if you do a lock fcntl() on a file descriptor in one thread and a close() on it
in another thread, it's possible that you'll end up with the fcntl starting and figuring out
what you're locking, and then the close happens, and then the lock completes. At this point,
you've locked the file, but it's closed. These locks are only supposed to last until you close
the file (or exit, which automatically closes it). So I think this lets people leak locks. I
don't know whether this matters significantly for security, but, since POSIX implies that, if
you really want a lock to go away and you're root, you can kill the process that holds it,
something might be depending on this actually working.

Stable kernel updates (security fix)

Posted May 7, 2008 19:56 UTC (Wed) by viro (subscriber, #7872) [Link]

It does matter; file_lock is not supposed to outlive the struct file
it refers to.  If it does, and if struct file in question is later
freed, you can get the page that used to contain it mapped in userland.
Then it's not hard to fill it with the right pattern, so that e.g.
reading from /proc/locks would read data from arbitrary kernel address
and print it for you (look at the place where it prints ->i_ino,
for starters).  That's already a roothole; looking for more direct
ways to escalate is left as an exercise for readers (and yes, it is
possible).

Stable kernel updates (security fix)

Posted May 7, 2008 21:49 UTC (Wed) by iabervon (subscriber, #722) [Link]

Ah, yes, if the kernel itself is assuming that a locked file must be open in how it handles
memory, that's a pretty easy escalation.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds