|From:||Zach Brown <zach.brown-AT-oracle.com>|
|To:||Linus Torvalds <torvalds-AT-linux-foundation.org>|
|Subject:||Re: Syslets, Threadlets, generic AIO support, v6|
|Date:||Tue, 29 May 2007 15:49:16 -0700|
|Cc:||linux-kernel-AT-vger.kernel.org, Ingo Molnar <mingo-AT-elte.hu>, Arjan van de Ven <arjan-AT-infradead.org>, Christoph Hellwig <hch-AT-infradead.org>, Andrew Morton <akpm-AT-zip.com.au>, Alan Cox <alan-AT-lxorguk.ukuu.org.uk>, Ulrich Drepper <drepper-AT-redhat.com>, Evgeniy Polyakov <johnpol-AT-2ka.mipt.ru>, "David S. Miller" <davem-AT-davemloft.net>, Suparna Bhattacharya <suparna-AT-in.ibm.com>, Davide Libenzi <davidel-AT-xmailserver.org>, Jens Axboe <jens.axboe-AT-oracle.com>, Thomas Gleixner <tglx-AT-linutronix.de>|
> .. so don't keep us in suspense. Do you have any numbers for anything > (like Oracle, to pick a random thing out of thin air ;) that might > actually indicate whether this actually works or not? I haven't gotten to running Oracle's database against it. It is going to be Very Cranky if O_DIRECT writes aren't concurrent, and that's going to take a bit of work in fs/direct-io.c. I've done initial micro-benchmarking runs for basic sanity testing with fio. They haven't wildly regressed, that's about as much as can be said with confidence so far :). Take a streaming O_DIRECT read. 1meg requests, 64 in flight. str: (g=0): rw=read, bs=1M-1M/1M-1M, ioengine=libaio, iodepth=64 mainline: read : io=3,405MiB, bw=97,996KiB/s, iops=93, runt= 36434msec aio+syslets: read : io=3,452MiB, bw=99,115KiB/s, iops=94, runt= 36520msec That's on an old gigabit copper FC array with 10 drives behind a, no seriously, qla2100. The real test is the change in memory and cpu consumption, and I haven't modified fio to take reasonably precise measurements of those yet. Once I get O_DIRECT writes concurrent that'll be the next step. I was pleased to see my motivation for the patches, to avoid having to add specific support for operations to be called from fs/aio.c, work out. Take the case of 4k random buffered reads from a block device with a cold cache: read: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 mainine: read : io=16,116KiB, bw=457KiB/s, iops=111, runt= 36047msec slat (msec): min= 4, max= 629, avg=563.17, stdev=71.92 clat (msec): min= 0, max= 0, avg= 0.00, stdev= 0.00 aio+syslets: read : io=125MiB, bw=3,634KiB/s, iops=887, runt= 36147msec slat (msec): min= 0, max= 3, avg= 0.00, stdev= 0.08 clat (msec): min= 2, max= 643, avg=71.59, stdev=74.25 aio+syslets w/o cfq read : io=208MiB, bw=6,057KiB/s, iops=1,478, runt= 36071msec slat (msec): min= 0, max= 15, avg= 0.00, stdev= 0.09 clat (msec): min= 2, max= 758, avg=42.75, stdev=37.33 Everyone step back and thank Jens for writing a tool that gives us interesting data without us always having to craft some stupid specific test each and every time. Thanks, Jens! In the mainline number fio clearly shows the buffered read submissions being handled synchronously. The mainline buffered IO paths doesn't know to identify and work with iocbs so requests are handled in series. In the +syslet number we see the __async_schedule() catching the blocking buffered read, letting the submission proceed asynchronously. We get async behaviour without having to touch any of the buffered IO paths. Then we turn off cfq and we actually start to saturate the (relatively ancient) drives :). I need to mail Jens about that cfq behaviour, but I'm guessing it's expected behaviour of a sort -- each syslet thread gets its own io_context instead of inheriting it from its parent. - z
Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds