> You are right that people should respect the spec, but I think that POSIX compliance is not the problem here.
It strikes me as odd that an open source OS uses a non-free spec to define its operations. Doesn't it strike anybody else as odd that we have a whole pile of people here arguing about compliance to a spec they most likely haven't seen? I see statements like "ensure your app only relies on stuff in POSIX". Perfectly good advice, except how is your typical open source developer meant to do that when he can't get access to the bloody thing?
That aside, I gather (since I have not been able to get a copy of POSIX myself), POSIX's doesn't offer much to programmers who want to ensure some combination of consistency consistency and durability. This sort of stuff is a basic requirement if your want to produce a reliable application. The furor here is an indication of just how basic it is. Yet even if you did have access to the spec, I gather it doesn't spell out how to do this. So programmers have learnt a bunch of ad hoc heuristics, like "to get consistency without the slowdowns caused by durability, use open();write();close();rename()". Then we get accused of "not adhering to the spec" when the next version of the FS doesn't implement the heuristic. Give us a break!
Ted's suggestion that you should be using sqlite if you want to write out a few hundred bytes of text reliably is on one level almost a joke. I presume he suggested it because the sqlite authors have taken the time to learn all the heuristics to get data on the disc reliably. Given it _is_ so hard figure all those heuristics for the various file systems your application could find itself running on I guess it is a reasonable suggestion. Unfortunately, as the firefox programmers found out, it doesn't always work. Yeah, sqlite got the data onto the disc reliably, but only by using fsync() which killed performance on some platforms. Given you probably don't care if your latest browsing history hit the disc in 5 minutes time, it is a great illustration of why programmers are so fond of "open();write();close();rename()".
From talking to a MySql developer, I gather the situation is even worse than most posting here realise. Not only does the rename() trick not work, it turns out just about anything beyond fdatasync() doesn't work. For example, you might expect that appending to a file would be fairly safe. Well, not so apparently according to POSIX. He said that if you append to a file, there is a chance on POSIX system the entire file could be truncated if you crash at the wrong moment. The only way to guarantee a file can't be corrupted by a write is to ensure you don't effect the metadata (think block allocations) - ie always write to pre-allocated blocks. Need to extend your 100Gb database? Well then you have to copy it, write zero's to the extra space at the end to ensure it isn't sparsely allocated, then use the fsync(); rename() trick.
And that should be a joke. Pity it isn't. Given that filesystems aren't going to implement ACID, we need a set of primitives we can use build up our own implementations ACID. Fast, simple things, along the lines of the lines of the CPU instruction "Test Bit and Set" which is there so assembly programmers to implement all sorts of complex locking schemes on top of it. And we need them defined in a spec that we can actually access - unlike POSIX.
Given that ain't going to happen, Ted's only way of of this is to publish such a document for his filesystems - the ext* series. Just a series of HOWTO's would be a good start - HOWTO extend a large file reliably, HOWTO get consistent data written to disc (ie impose ordering on writes) without the slowdown's of unwanted sync()'s, HOWTO ensure a rename() for a file you don't have open has hit the disc. Nothing fancy. Just the basic operations we applications are expected to implement reliably every day on his file systems.