It turns out SCSI only supports ordering within the context of an initiator target (I_T) nexus, which means any given initiator usually only gets one I/O thread per device. The way around that limitation is to use to establish separate connections (I_T nexuses) for each I/O thread, but that probably isn't practical in most cases, even on something like iSCSI.
That is not to say that the SCSI folks shouldn't add real I/O thread support, because a write barrier at the device level (for all practical purposes) is not much more useful than a full cache flush.