the problem with your approach is that various pieces (including the hard drive itself) will re-order anything in it's buffer to shorten the total time it takes to get everything in the buffer to disk.
that is why barriers are needed to tell the device not to reorder across the buffer.