LWN.net Logo

The Linux Storage and Filesystem Summit, day 1

The Linux Storage and Filesystem Summit, day 1

Posted Aug 11, 2010 2:28 UTC (Wed) by koverstreet (subscriber, #4296)
In reply to: The Linux Storage and Filesystem Summit, day 1 by butlerm
Parent article: The 2010 Linux Storage and Filesystem Summit, day 1

Yeah, the idea I sketched out is roughly equivalent (in behavior) to doing it with threads. I should make it explicit that my idea doesn't do anything filesystems can't do themselves.

Threads are unwieldy when you want to express something more complicated than linear dependencies. Like you suggested, if you're just segregating metadata and data that's fine, but the actual dependencies are in practice usually more complex, so - provided you have an easy way of expressing them - it could in theory be a performance gain.

Anyway, if you want to pipeline ios all the way down to the SCSI layer you need a way of expressing dependencies to the block layer, which needs more than threads.

I might have to write an actual patch and see what people think...


(Log in to post comments)

The Linux Storage and Filesystem Summit, day 1

Posted Aug 12, 2010 18:21 UTC (Thu) by butlerm (subscriber, #13312) [Link]

The advantage of threads is that they are simple, are already implemented by many existing block devices (SCSI ones at any rate), and allow the optimization of many common cases - journal write before metadata write without a round trip (or worse a cache flush) in between, for example.

They also make it very convenient for a filesystem to gain notification when a series of block writes have been committed to disk without being too involved with the low level details of how that is known to be the case.

On some devices any write barrier is most efficiently translated into a full cache flush, on others completion of a series of writes with force unit access specified. If the block interface does not provide I/O threads with write barriers or the equivalent, presumably a filesystem would be forced to choose one or the other, which would be highly inefficient in a number of cases.

With the proper threaded interface, the lower level device driver can choose how to implement the write barrier most efficiently. SATA devices (which seem to be unusually backward in this regard) probably need a full cache flush. Other devices you can either issue an explicit barrier, or you can efficiently wait for a series of force unit access writes to individually complete. The filesystem shouldn't have to care about what is most efficient for any given device.

The Linux Storage and Filesystem Summit, day 1

Posted Aug 19, 2010 13:13 UTC (Thu) by cypherpunks (guest, #1288) [Link]

I've had a similar idea, which is specifically designed for easy hardware implementation: allow an operation to have a (small integer) tag, then provide a command to "wait for all operations with tag #k to complete".

More generally, you could let every operation have a prerequisite tag that must be completed (you need one reserved tag number which is never used to specify commands with no prerequisites), and have the wait operation be a NOP with a prerequisite.

To merge threads, the wait operation can have a tag #n which differs from the #k it is waiting for. After it is issued, waiting for #n effectively waits for both.

Now, you can merge independent operation streams by doing address translation on tags. And you can compress tag space (down to simple barriers, in the limiting case) by allowing false sharing.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds