The problem with converting to killable waits is that if you interrupt a filesystem transaction that is waiting on another resource (e.g. a buffer lock) before the transaction can complete, the filesystem has to be able to undo the modifications already made in the transaction to be able to successfully back out and return an error. This is far from simple.
I'd estimate implementing such functionality in XFS will touch around 30% of the code base and introduce several hundred new error paths that have to be tested (somehow). It's a fundamental design change - the assumption of being allowed to wait forever when in transaction context makes error handling and test matrices so much simpler.
The only reason I would consider making such a drastic change is if there is some new functionality that requires it. e.g. as the first step for triggering on-line repair when corruption is detected during a transaction....
Anyway, if you have a hung filesystem, continuing operations after the hang is not going to improve the situation - it'll just get stuck again as the problematic resource is encountered by the next transaction. Being able to run "kill -9" doesn't avoid the issue of needing to correct the problem - that is still likely to require a reboot because you won't be able to unmount the filesystem for repair.
So while the idea of a fully interruptible fileystem is nice, it's far from being a reality....
Posted Aug 11, 2010 6:23 UTC (Wed) by tialaramex (subscriber, #21167)
[Link]
The killable waits are a matter of pragmatism, so you should approach the situation with that in mind. Perhaps there are, as you suggest, hundreds of places in XFS where a device failure could theoretically hang a process indefinitely, but how many actually trigger?
I'd suggest building an XFS filesystem on an iSCSI disk, and trying two basic scenarios:
1. Run a heavy file layer benchmark to simulate active use of the disk, pull the Ethernet from the iSCSI device
2. Idle the filesystem, pull the Ethernet, then immediately do 'ls' or 'cat /dev/urandom > hugeTestFile'
I suspect that in fact these scenarios will repeatedly get processes stuck in just a handful of waits within XFS. Making just these killable will, we may reasonably guess, help out a lot of administrators for much less work than your proposed "fundamental design change".
How about it?
The Linux Storage and Filesystem Summit, day 2
Posted Aug 12, 2010 15:03 UTC (Thu) by ebirdie (subscriber, #512)
[Link]
Excuse me, if I miss something here, but aren't USB sticks practically used today the way that people tend to pull them off and forget that there was a process copying onto the device/file system or there was some indexer or hidden helper in background having operations on the file system. Not to mention people forget ejecting the device before physical disconnect. Today there exists all the more pluggable devices used as storage so ejecting becomes more and more unpractical all the time. So a process still writing onto an unpluged device should just be killed away automatically (the system administrator example is bad in this usage, I think) or die itself from eating resources like power and from complicating situations like suspending.
Secondly, can't a writing process conclude, there will no time in future to complete the request after a decent time has passed, when IO requests become time shared as was told on the Writeback section? I'm just an sysadmin, so I may get many many things wrong here.