The only way with the current POSIX apis to get this guarantee is to fsync() the fd before renaming. But this imposes an unnecessary overhead on both the app (generally) and the whole system (with ext3 data=orderer).
Now, what ext4 does is clearly correct according to what is "allowed" by POSIX (actually, this is kinda vague as POSIX allows fsync() to be empty, and doesn't actually specify anything about system crashes.)
However, even if its "posixly correct", it is imho broken. In the sense that I wouldn't let any of my data near such a filesystem, and I would recommend everyone who asks me to not use it.
Take for example this command:
sed -i s/user1/user2/ foo.conf
This does in-place update using write-to-temp and rename over, without fsync. The result of running this command, is that if your machine locks up after up to a minute you loose both versions of foo.conf.
Now, is foo.conf important? How the heck is sed to know? Is sed broken? Should it fsync? Thats more or less arguing that every app should fsync on close, which on ext4 is the same as the filesystem doing it, but on ext3 is unnecessary and a massive system slowdown.
Or should we try to avoid the performance implications of fsync (due to its guarantees being far more than what we need to solve our requirements)? We could do this by punting this to the users of sed, by having a -important-data argument, and then pushing this further out to any script that uses sed, etc, etc.
Or we could just rely on filesystems to guarantee this common behaviour to work. Even if its not specified by POSIX. (And choose not to use filesystems that doesn't give us that guarantee, like so many people have switched from XFS after data losses).
Ideally of course there would be another syscall, flag or whatever that says "don't write metadata before data is written". That way we could get both efficient and correct apps, but that doesn't exist today.