|
Nastiness explained (sort of)Nastiness explained (sort of)Posted May 7, 2008 19:32 UTC (Wed) by viro (subscriber, #7872)In reply to: Nastiness explained (sort of) by MisterIO Parent article: Stable kernel updates (security fix)
Actually, on UP we see everything in order, regardless of preemption. What happens is that in setlk we update the per-inode list of file_lock (making it non-empty), then check that descriptor still points to the same file. In close we remove file from descriptor table, then check per-inode list. The trouble is, we have no warranty that these will be seen in order on another CPU. IOW, check of per-inode list might see the value prior to update done by setlk and setlk might see descriptor table state prior to update done by close. With both sides deciding that there's no need to evict file_lock - close() doesn't see it, setlk thinks that when close() comes it'll evict the sucker. And file_lock holds a reference to struct file, but doesn't contribute to refcount, with all joy that comes from _that_. It's not supposed to outlive removal of struct file from descriptor table, let alone struct file itself... lock/unlock around accesses to descriptor table forces SMP ordering and solves the problem; modification of file_lock list is guaranteed to be seen before unlock done after the check in setlk and check in close is guranteed to come after lock done before removal from descriptor table. So that spinlock acts as a barrier, preventing the ordering problems. See Documentation/memory-barriers.txt for background in that area. And yes, that's part of the price to be paid for lockless access schemes (such as RCU for descriptor tables and lazy taking of BKL for POSIX locks eviction in filp_close()) - one needs to think of things that normally would be provided automatically by exclusion primitives. Real SMP-only races are not too frequent thing around VFS, but yes, sometimes they do happen...
(Log in to post comments)
Much clearer explanation Posted May 7, 2008 19:51 UTC (Wed) by pr1268 (subscriber, #24648) [Link] Thank you, sir! Your explanation makes so much more sense than my silly melodramatic version above. This issue highlights one of the critical issues of multi-processor kernel programming--it's easier to write multi-threaded userspace code, but the buck stops at the kernel with respect to maintaining absolute thread safety.
Nastiness explained (sort of) Posted May 7, 2008 23:29 UTC (Wed) by csamuel (subscriber, #2624) [Link] Al, thanks for that description! Could this be related to a problem we're seeing on Barcelona (B3) based systems where open(2) occasionally fails with ENOENT when creating a file in a directory that most certainly exists ? We're seeing it under 2.6.24 and 2.6.25, on both NFS and local (ext3) filesystems. We did think it was the queueing system occasionally corrupting the mode variable and dropping O_CREAT until a user forwarded us a message showing /bin/cp having an identical problem. :-(
Nastiness explained (sort of) Posted May 8, 2008 7:51 UTC (Thu) by MisterIO (subscriber, #36192) [Link] Thanks for the explanation.
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.