Journalling commit is more like "asynchronous delayed commit" from a database point of view,
when fsync() isn't used. They protect the integrity of filesystem structure itself, and are
not used for application-level transactional changes. Sometimes that weaker kind of commit is
fine, and the performance gain is large.
fsync() makes it more like a standard database commit, where the data is supposed to be secure
before the call returns.
This is one area where traditional databases can learn from filesystems. There are some
things where you don't actually need a database to commit quickly - that can take as long as
it needs to batch and optimise I/O. All you need then is consistent rollback. For example,
databases which hold cached calculations are like this.
Your point about partial writes on power failure and not using overlapping blocks (will
sectors do?) is valid, and I would like to know more about what the database professionals
have discovered about what exactly is and isn't safe. For example, can failure to write
sector N+1 corrupt sector N written a long time ago? Is the "failure block size" larger than
a single sector when doing O_DIRECT (when that really works)? Is it larger than a
filesystem/blockdev block size when not using O_DIRECT? What's the reason Oracle uses
different journal block sizes on different OSes?
I think the filesystem implementors do know about that effect. Journal entries are finished
with a commit block, to isolate the commit into its own block, which is not touched by the
next transaction. I think your two/three ping-pong blocks correspond to the journal's finite
wraparound length on a filesystem - do say more if that's not so.