User: Password:
Subscribe / Log in / New account

sync() starves other reader/writers

sync() starves other reader/writers

Posted Mar 26, 2014 23:24 UTC (Wed) by MrWim (subscriber, #47432)
In reply to: sync() starves other reader/writers by seanyoung
Parent article: PostgreSQL pain points

I've similarly used sync_file_range and posix_fadvise (LWN article) to avoid trashing caches while timeshifting live TV on the YouView set-top box. It seemed to work well and avoided all the gotchas associated with O_DIRECT. It was based on the advice from Linus on how to copy files avoid cache trashing and without O_DIRECT. You can find the documentation and code for this on github.

(Log in to post comments)

sync() starves other reader/writers

Posted Mar 27, 2014 10:52 UTC (Thu) by seanyoung (subscriber, #28711) [Link]

Thank you for that.

The problem postgresql has is that you want the wal/journal to written asap (so with fsync). You want the database files to be written WITHOUT fsync, but you do want to know when when they complete/are persisted, so you can safely discard old wal/journal files for examples.

So what I was trying to avoid was calling fsync(). sync_file_range() just ends up calling fsync in the file system driver, it is no different, as you have done in your code.

I'm not sure there is an method for this, although the aio functions seem to provide an API for this.

sync() starves other reader/writers

Posted Mar 27, 2014 18:06 UTC (Thu) by dlang (subscriber, #313) [Link]

Plus you want to make sure none of the database files get written ahead of the corresponding WAL data

sync() starves other reader/writers

Posted Mar 27, 2014 19:44 UTC (Thu) by seanyoung (subscriber, #28711) [Link]

Indeed, there are two solutions to this:

1) write the journal before you do anything (requires repeating the operations twice)
2) Before modifying a database page, ensure that any scheduled i/o has completed. If not either copy the page or move on to other pending work.

So ideally you want completion information on page level for non-fsync writes.

sync() starves other reader/writers

Posted Mar 27, 2014 20:05 UTC (Thu) by dlang (subscriber, #313) [Link]

given that your transaction is likely to affect multiple database pages, #2 isn't viable, you can never guarantee that all or none of the transaction will be visible after a crash. That's why the databases do #1

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds