> The Itanium architecture in particular behaves this way, though others
> may as well.
According to Grant Grundler, ia64 architectures other than sn2 don't have
this behavior. However, SGI Challenge, Origin, and Altix machines do
exhibit posted write reordering from different CPUs, and I suspect some
other large NUMA machines do as well. For those, it's faster to simply
issue a write barrier than it is to do a full I/O read from the target
bus. See http://www.finux.org/Reprints/Reprint-Bryant-OLS2004.pdf, under
"I/O changes for Altix" for details and performance numbers (this is a
paper I co-authored with some other SGI engineers to describe changes we
made to Linux for the Altix platform), in particular the section on
'Ordering posted writes efficiently'. Quick highlights:
regular PIO read 5940 ns
relaxed PIO read 2619 ns (also described in the paper)
sn_mmiob() 1610 ns
(Note that sn_mmiob() was the sn2 specific implementation of mmiowb() and
is renamed in the patch I posted.)
And the above measurements were just taken from a small system. On a
large system, with a PIO read from a distant device, I'd expect the new
barrier to shave off far more than 1000 ns from the time.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds