Weekly Edition Return to the Kernel pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
I/O space write barriers
Some platforms, it seems, have an interesting property: writes to I/O
memory space from multiple processors may be reordered before reaching the
device. Even if the device registers are protected by a lock (pretty much
necessary to keep multiple processors from writing simultaneously and
confusing the device), writes issued by one CPU can arrive before those
from another, even if the second CPU had held the lock and issued its
writes first. The Itanium architecture in particular behaves this way,
though others may as well.
The answer, according to Jesse Barnes is the addition of a new type of memory barrier to force the ordering of writes to the device. Jesse's patch adds a new function, mmiowb(), which implements this barrier. He has also updated the qla1280 driver to make use of it. Authors of PCI drivers are accustomed to coding a different sort of barrier: reading from a device register to ensure that all writes have actually been posted to the device. mmiowb() is a different, lighter-weight mechanism. After a call to mmiowb(), writes might still have not reached the device. Writes are not forced out; they just have their ordering with respect to subsequent writes guaranteed. In many situations, that sort of guarantee is all that is needed. (Log in to post comments)
I/O space write barriers Posted Sep 23, 2004 17:35 UTC (Thu) by jsbarnes (subscriber, #4096) [Link] > The Itanium architecture in particular behaves this way, though others> may as well. According to Grant Grundler, ia64 architectures other than sn2 don't have this behavior. However, SGI Challenge, Origin, and Altix machines do exhibit posted write reordering from different CPUs, and I suspect some other large NUMA machines do as well. For those, it's faster to simply issue a write barrier than it is to do a full I/O read from the target bus. See http://www.finux.org/Reprints/Reprint-Bryant-OLS2004.pdf, under "I/O changes for Altix" for details and performance numbers (this is a paper I co-authored with some other SGI engineers to describe changes we made to Linux for the Altix platform), in particular the section on 'Ordering posted writes efficiently'. Quick highlights: regular PIO read 5940 ns relaxed PIO read 2619 ns (also described in the paper) sn_mmiob() 1610 ns (Note that sn_mmiob() was the sn2 specific implementation of mmiowb() and is renamed in the patch I posted.) And the above measurements were just taken from a small system. On a large system, with a PIO read from a distant device, I'd expect the new barrier to shave off far more than 1000 ns from the time. Jesse
|
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.