User: Password:
Subscribe / Log in / New account

Why allow arbitrary writeback?

Why allow arbitrary writeback?

Posted Apr 22, 2010 12:06 UTC (Thu) by epa (subscriber, #39769)
Parent article: When writeback goes wrong

It sounds as though the problem happens when writing back dirty pages to a complicated storage device such as network storage, RAID and so on. The simple case of an ext4 filesystem on local disk does not appear to cause a problem. Why not define a safe set of storage devices which are guaranteed not to need lots of stack to do writeback, and require all writes to other devices to be done synchronously? If that causes a performance problem, the device could have its own writeback layer which has a known maximum memory usage, and will not accept new pages for writeback unless it can guarantee it will have the memory to flush them later.

I know that writeback disk I/O makes a massive performance difference for your desktop PC with a spinning platter hard disk. But is that still the case for other types of device?

(Log in to post comments)

Why allow arbitrary writeback?

Posted May 15, 2010 18:57 UTC (Sat) by Duncan (guest, #6647) [Link]

Because the reason such "complicated storage devices" are complicated, is because they can be arbitrarily layered. Your "simple" example was the case of ext4 directly on local disk. Fine, but there's nothing preventing that ext4 from being on LVM, on md/RAID-0, on md/RAID-1, on network iSCSI, the sort of case mentioned in the article. Each layer eats up additional stack space, and it's the very flexibility to be able to stack block devices (and the kernel stack space they make use of) like that which makes them so useful in the first place. The ext4 doesn't particularly care whether it's directly on disk, or on LVM, or on md/RAID, or what, and that flexibility is taken to be a GOOD thing, one which a LOT of folks would object to disappearing.

OTOH, some of the new generation of filesystem solutions, such as btrfs, have "layering violations" in that they know about and account for, at least to some degree, what they're actually on, and optionally include layers such as RAID directly in the filesystem itself. But these are still very new, btrfs is still labeled experimental, and not ready for production use as yet.

Really, getting the reclaim into its own stack space would seem to be the only reasonably permanent solution, because that's the only way out of the current situation with an arbitrary amount of stack space both already used before the reclaim is called, and needed to guarantee that the call will succeed. Anything else is only a temporary solution, not addressing the real problem, that there's an arbitrarily variable amount of stack both allocated before the call and needed after it, to complete it. Anything else is only addressing individual special-cases, while ignoring the root of the problem.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds