Postgres actually uses a similar split - the individual backends insert the log entry into memory and the "wal writer" writes them to disk in the background. Only when an individual backend requires some wal records to be written/fsynced (e.g. because it wants the commit record to be safely on disk) and that portion of the wal has not yet been written/fsynced the individual backends will do so.
There are two separate areas of contention here:
- synchronous disk writes/syncs. That can be eased by stuff like batching the required syncs for several backends/transactions in one sync and by lowering consistency requirements a bit (like setting synchronous_commit=off).
- contention around the WAL datastructures. For relatively obvious reasons only one backend can insert into the WAL (in memory!) at the same time, so there can be rather heavy contention around the locks protecting it. There's some work going on to make locking more fine grained, but its complicated stuff and the price of screwing up would be way too high, so it might take some time.
I don't think there is some fundamental difference here. PG will always have a higher overhead than a filesystem because the guarantees it gives (by default at least) are far stricter and more essentially because its layered *above* a filesystem so it shares that overhead in the first place.
.oO(Don't mention postgres if you don't want to be talked to death :P)
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds