The goal is to make sure the data for the new file is on disk before (or in the same transaction as) the metadata for the rename.
We have two basic choices to accomplish this:
1) Put the new file into a list of things that must be written before the commit is done. This is pretty much what the proposed ext4 changes do.
2) Write the data before the rename is complete.
The problem with #1 is that it reintroduces the famous ext3 fsync behavior that caused so many problems with firefox. It does this in a more limited scope, just for files that have been renamed, but it makes for very large side effects when you want to rename big files.
The problem with #2 is that it is basically fsync-on-rename.
The btrfs fsync log would allow me to get away with #1 without too much pain, because fsyncs don't force full btrfs commits and so they won't actually wait for the renamed file data to hit disk.
But, the important discussion isn't if I can sneak in a good implementation for popular but incorrect API usage. The important discussion is, what is the API today and what should it really be?
Applications have known how to get consistent data on disk for a looong time. Mail servers do it, databases do it. Changing rename to include significant performance penalties when it isn't documented or expected to work this way seems like a very bad idea to me.
I'd much rather make a new system call or flag for open that explicitly documents the extra syncing, and give application developers the choice.