The Tux3 filesystem returns
The Tux3 filesystem returns
Posted Jan 3, 2013 7:33 UTC (Thu) by daniel (guest, #3181)In reply to: The Tux3 filesystem returns by jeltz
Parent article: The Tux3 filesystem returns
      Posted Jan 6, 2013 3:19 UTC (Sun)
                               by jeltz (guest, #88600)
                              [Link] (2 responses)
       
     
    
      Posted Jan 6, 2013 13:47 UTC (Sun)
                               by andresfreund (subscriber, #69562)
                              [Link] (1 responses)
       
Postgres actually uses a similar split - the individual backends insert the log entry into memory and the "wal writer" writes them to disk in the background. Only when an individual backend requires some wal records to be written/fsynced (e.g. because it wants the commit record to be safely on disk) and that portion of the wal has not yet been written/fsynced the individual backends will do so. 
I don't think there is some fundamental difference here. PG will always have a higher overhead than a filesystem because the guarantees it gives (by default at least) are far stricter and more essentially because its layered *above* a filesystem so it shares that overhead in the first place. 
.oO(Don't mention postgres if you don't want to be talked to death :P) 
 
     
    
      Posted Jan 6, 2013 21:29 UTC (Sun)
                               by raven667 (subscriber, #5198)
                              [Link] 
       
Only in a few places, when new files are created or the metadata for existing files is changed, by changing the file size for example.  Files that are preallocated have very little overhead to (re)write, except for maybe mtime updates.  
 
     
    The Tux3 filesystem returns
      
The Tux3 filesystem returns
      
There are two separate areas of contention here:
- synchronous disk writes/syncs. That can be eased by stuff like batching  the required syncs for several backends/transactions in one sync and by lowering consistency requirements a  bit (like setting synchronous_commit=off).
- contention around the WAL datastructures. For relatively obvious reasons only one backend can insert into the WAL (in memory!) at the same time, so there can be rather heavy contention around the locks protecting it. There's some work going on to make locking more fine grained, but its complicated stuff and the price of screwing up would be way too high, so it might take some time.
The Tux3 filesystem returns
      
           