The advantage of threads is that they are simple, are already implemented by many existing block devices (SCSI ones at any rate), and allow the optimization of many common cases - journal write before metadata write without a round trip (or worse a cache flush) in between, for example.
They also make it very convenient for a filesystem to gain notification when a series of block writes have been committed to disk without being too involved with the low level details of how that is known to be the case.
On some devices any write barrier is most efficiently translated into a full cache flush, on others completion of a series of writes with force unit access specified. If the block interface does not provide I/O threads with write barriers or the equivalent, presumably a filesystem would be forced to choose one or the other, which would be highly inefficient in a number of cases.
With the proper threaded interface, the lower level device driver can choose how to implement the write barrier most efficiently. SATA devices (which seem to be unusually backward in this regard) probably need a full cache flush. Other devices you can either issue an explicit barrier, or you can efficiently wait for a series of force unit access writes to individually complete. The filesystem shouldn't have to care about what is most efficient for any given device.