|
|
Subscribe / Log in / New account

PostgreSQL's fsync() surprise

PostgreSQL's fsync() surprise

Posted Apr 20, 2018 10:19 UTC (Fri) by anton (subscriber, #25547)
In reply to: PostgreSQL's fsync() surprise by flussence
Parent article: PostgreSQL's fsync() surprise

POSIXly correct filesystems have surprised users in unpleasant ways in the past; recall early ext4 eating people's DE config files, all because the standard had some undefined behaviour around file writes and renames.
If a standard does not define something, it's up to the implementation to do it; i.e., it's their responsibility. Sufficiently bloody-minded implementors produce unpleasant surprises, and then point to standards or benchmarks as an excuse; but as long as the standard does not require the unpleasant behaviour (in which case it would be defined, not undefined), the implementator has the choice, and therefore the responsibility. Of course, implementors who blame the standard don't want you to recognize this, and often argue as if lack of definition in the standard required them to behave unpleasantly. It doesn't.

I wonder if the "what POSIX mandates" in the article really refers to a mandate by POSIX, or another case of lack of definition that an implementator sees as a welcome opportunity for an unpleasant surprise.


to post comments

PostgreSQL's fsync() surprise

Posted Apr 20, 2018 18:12 UTC (Fri) by zlynx (guest, #2285) [Link] (9 responses)

If the user-friendly, pleasant behavior is expected then it should be in the standard. If it isn't, there's a reason for that and implementors should be able to be as bloody-minded as they please.

If everyone is expected to be nice instead of following the standards, then there's no point in the current standard and it should be replaced with the "be nice" version.

For example, there are people who expect TCP/IP to deliver their packets in the same sized chunks they were sent. These people are simple wrong. But by the "be nice" standard we'd have to write stupid networking stacks because some people expect behavior that isn't required.

Maybe it's time for a POSIX 2020 standard. But if it isn't in there, don't expect it to work like anything else.

PostgreSQL's fsync() surprise

Posted Apr 21, 2018 14:54 UTC (Sat) by anton (subscriber, #25547) [Link] (8 responses)

Yes, ideally standards would be complete. In practice, they tend to specify just the intersection of the behaviour of the existing implementations (in line with the requirement that a standard should standardize common practice), as well as considering various constraints on outlier systems; e.g., "We want this standard to be implementable on a system with 64KB RAM, and mandating the pleasant behaviour would cost several KB for this subfeature alone, so we leave the behaviour unspecified." And then a bloody-minded implementor for systems that use multiple GBs of RAM uses the lack of specification as justification to implement unpleasant behaviour.

And don't forget that standards are decided through consensus in the committee, so it takes just a few bloody-minded implementors on the standards committee to block any progress towards pleasantness.

If everyone is expected to be nice instead of following the standards
That's an excellent example of what I mean with "hiding behind the standard", and why I suspect that "what POSIX mandates" is in reality different from what was claimed in the discussion described in the arcticle. If the standards do not specify what the implementation should do ("undefined behaviour" or somesuch), there is nothing in the standard that the implementation could follow, and it's the sole responsibility of the implementor to choose a particular behaviour. If, in such a situation, the implementor chooses to implement unpleasant behaviour, it's his fault, and his fault alone; the standard did not make him do it.

PostgreSQL's fsync() surprise

Posted Apr 24, 2018 16:32 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (7 responses)

> If the standards do not specify what the implementation should do ("undefined behaviour" or somesuch), there is nothing in the standard that the implementation could follow, and it's the sole responsibility of the implementor to choose a particular behaviour. If, in such a situation, the implementor chooses to implement unpleasant behaviour, it's his fault, and his fault alone; the standard did not make him do it.

All true, of course, but "unpleasant behavior" can still be a reasonable choice. Any application which *relied* on system-specific "pleasant" behavior would necessarily be non-portable. If "pleasant" behavior is desirable then, IMHO, the right solution is to standardize the behavior so that applications can be written against the standard and not one particular implementation. In the meantime, the most productive choice when undefined behavior is detected is to complain as loudly as possible, or even terminate the process, rather than allow the application to silently continue in an undefined state. This ensures that the application developer is made aware of the issue and has both the opportunity and incentive to fix it. (However, this outcome should remain *undefined* behavior so that this can be changed in the future if and when more pleasant behavior is standardized.) Going out of one's way to make undefined behavior "pleasant" is a form of attractive nuisance, in that it tends to encourage non-portable code.

In the end, an application which relies on a specific implementation of undefined behavior, pleasant or unpleasant, is broken. A particular installation may do the right thing for certain known inputs; one may even be able to prove that it does the right thing for all possible inputs given perfect knowledge of the implementation in use on a particular system. However, the third layer of software[1]—design/logic—is missing: since the application is not in compliance with the standard, one cannot prove that it will work on any standard-compliant system, including future versions of the same system.

[1] http://www.pathsensitive.com/2018/01/the-three-levels-of-...

PostgreSQL's fsync() surprise

Posted Apr 26, 2018 16:22 UTC (Thu) by anton (subscriber, #25547) [Link] (6 responses)

All true, of course, but "unpleasant behavior" can still be a reasonable choice.
Yes, as mentioned, when implementing on a system with 64KB, you may not be able to afford the pleasantness. But we would not be discussing this topic if all cases of unpleasant behaviour were reasonable.
Any application which *relied* on system-specific "pleasant" behavior would necessarily be non-portable.
It would be *potentially* non-portable, not necessarily. It would become actually non-portable if an unpleasant implementation appears. But so what? I am pretty keen on portability, but life's too short for unreasonably unpleasant implementations. If your program does not run in 64KB anyway, there is no need to cater to that reasonable unpleasantness; and if you want to cater to unreasonable unpleasantness, it's your time and money to waste (after all, some people write programs in Brainfuck), but I would not recommend it to anyone else.
If "pleasant" behavior is desirable then, IMHO, the right solution is to standardize the behavior so that applications can be written against the standard and not one particular implementation.
If you think so, go ahead and work on standardizing pleasant behaviours. But as mentioned, there is the issue of constrained systems where you cannot afford the pleasantness. One solution is to specify several levels of the standard. The minimal level allows unpleasantness that is reasonable on constrained systems; a higher level specifies more pleasantness. However, if you have unreasonable implementors in the standards committee, you will be out of luck in your standardization effort.

Concerning reporting when undefined behaviour is performed, that's a relatively pleasant way to deal with the situation. It's not appropriate when the application developer actually wants to rely on a specific behaviour and does not want to "fix" it, but it certainly makes it clear that your implementation is not pleasant enough to run this application.

In the end, an application which relies on a specific implementation of undefined behavior, pleasant or unpleasant, is broken.
No, it isn't. If it behaves as intended in a specific setting, it's working, not broken. It may be unportable, but that does not make it broken.
since the application is not in compliance with the standard, one cannot prove that it will work on any standard-compliant system
Most programmers do not formally verify their programs, but instead test them. There is no way to prove that a program is in compliance with a standard by testing, even if the programmer intends to avoid undefined behavior. But even the few programmers that actually use formal verification for their programs cannot prove that their programs comply with most standard (e.g., POSIX), because most standard are not formally specified. So this whole proof issue is a red herring.
including future versions of the same system.
Any system worth using (e.g., Linux) maintains in future versions the pleasantness it has supported in earlier versions.

PostgreSQL's fsync() surprise

Posted Apr 26, 2018 17:06 UTC (Thu) by zlynx (guest, #2285) [Link]

> Any system worth using (e.g., Linux) maintains in future versions the pleasantness it has supported in earlier versions.

No, because that is an unreasonable limit.

Simply because of implementation limits, ext3 serialized file and directory updates in a certain way and for many years. So people got used to it. But it never applied to ext2, XFS or FAT or literally ANY other filesystem. Not to mention BSD's UFS or Hammer2, or Apple's HFS. Heck, it didn't even apply to ext3 in certain configurations.

And then people tried to require that ext4 work the same way. And btrfs. And even wanted to go back to force XFS to work that way too.

The correct answer is to fsync() everything, which would show how bad ext3 was at that particular operation. All those fsyncs make things slower for people using ext3, but that does not mean fsync is the wrong answer. It just means ext3 was a filesystem with a terrible fsync() implementation that people got used to using.

"Pleasant behavior" is often simply what programmers have become used to. It doesn't make it correct or actually pleasant.

PostgreSQL's fsync() surprise

Posted Apr 26, 2018 22:42 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

> It would be *potentially* non-portable, not necessarily. It would become actually non-portable if an unpleasant implementation appears.

See, you're talking about level 2 (particular implementations). Portable program *design* happens at level 3 (design/logic). If your program relies on behavior which is undefined according to the standard then it is non-portable, regardless of whether other implementations behave the same way. You can't say "this program works on any POSIX-compatible system", for example. You know that it works on Linux version X and maybe BSD version Y, but if someone puts together a new OS which follows all the relevant standards neither you nor they can be confident that your program will work on it unmodified.

> Most programmers do not formally verify their programs, but instead test them.

Formal verification in this context is a red herring. Tests are also a form of proof, albeit in the weaker courtroom-style, balance-of-evidence sense rather than the strict mathematical sense. The point is that without a standard you don't have a sound basis for reasoning "I called the function with these arguments, therefore the implementer and I both know that it should do this." Standards are how users and implementers of an API communicate. Relying on undefined behavior in your program is like speaking gibberish and expecting the listener to guess what you meant; there is a breakdown in communication, and the problem isn't on the implementer's end.

> Any system worth using (e.g., Linux) maintains in future versions the pleasantness it has supported in earlier versions.

As zlynx already explained, that is an unreasonable expectation and even Linux doesn't always operate that way.

PostgreSQL's fsync() surprise

Posted Feb 14, 2019 21:21 UTC (Thu) by dvdeug (guest, #10998) [Link] (3 responses)

A POSIX-compliant system could have malloc just return an error for all calls. A POSIX system that could be reasonable in some circumstances could have malloc return an error for any malloc over a megabyte; the first port of Unix was the Interdata 8/32, with 256kb of memory. There is no non-trivial Unix program that doesn't make assumptions about the POSIX system it's running on.

> if someone puts together a new OS which follows all the relevant standards neither you nor they can be confident that your program will work on it unmodified.

Even POSIX-compatible systems aren't perfectly interchangable. In the case of a program like PostgreSQL, it's usually important not just that it runs, but it runs well, and POSIX can not and does not guarantee speed constraints; even Linux alone can store its filesystems in many different ways on many different media, and some of those combinations may not work in practice for PostgreSQL.

> Standards are how users and implementers of an API communicate.

In theory, but not in reality. Most of the APIs a major program depends on are implemented by one library and have but vague descriptions of how it works outside the source code and behavior of that library. There were many Unixes before POSIX, many C and C++ compilers before the first standard was written down. Many people still depend on specialized features of GNU C, enough that several compilers have to copy those unstandardized features. Standards are wonderful if they're followed, but many are underspecified or just usually ignored. New versions of the C, C++ and Scheme standard have removed features that older standards have mandated because they were not well supported.

A huge example is the fact that most of these standards are written in English, an unstandardized language, not Lojban or even French. How can we know what a standard means if the language it is written in is unstandardized? But, for the most part, we manage.

PostgreSQL's fsync() surprise

Posted Feb 18, 2019 20:09 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (2 responses)

> A POSIX-compliant system could have malloc just return an error for all calls. ... Even POSIX-compatible systems aren't perfectly interchangable.

True, but irrelevant. I only mentioned POSIX as an example. No one is expecting a complex project like PostgreSQL to work equally well under all POSIX-compliant operating systems; there will be other dependencies.

Regarding the first point, a POSIX-compliant program would check for malloc() errors and either recover or terminate in a well-defined way. The program is portable as long as the behavior is well-defined for all conforming implementations; this is a separate consideration from being *useful*.

>> Standards are how users and implementers of an API communicate.
> In theory, but not in reality. Most of the APIs a major program depends on are implemented by one library and have but vague descriptions of how it works outside the source code and behavior of that library.

What you are describing is a failure to communicate. Programs written this way are inherently non-portable because they are written to fit the specifics of particular implementations. Any change to an implementation can cause any program to break in unspecified ways. This is the problem which standards exist to solve. They allow implementers and users of an interface to agree on roles and responsibilities; implementers can improve their code without worrying about breaking standards-compliant users, and users know which parts of the interface they can rely on and which parts may vary from one implementation (or version) to the next.

> How can we know what a standard means if the language it is written in is unstandardized?

"How can digital logic exist when all electronic components have analog characteristics?" This is bordering on abstract philosophy in the "can two people ever truly communicate" sense, but I'll try to answer it seriously anyway: We distinguish between parts of the language we can rely on for clear communication and parts which, while perhaps useful in other contexts, fail to clearly convey our intent, and build up more complex constructs from elements of the first set. The subset of natural language used for formal standards is actually pretty tightly constrained compared to literature in general. Even so, the dependency on natural language for formal specifications is a weak point and communication does occasionally break down as a result. We have feedback mechanisms in place to detect such breakdowns and correct them by issuing clarifications or revising the standards.

PostgreSQL's fsync() surprise

Posted Feb 19, 2019 0:42 UTC (Tue) by dvdeug (guest, #10998) [Link] (1 responses)

Of what interest is a portable program that is not useful? It's trivial to write a portable program; just check uname at the start and exit out on any but the system you're written for. Nobody does that, because it's not useful.

As for which parts may vary from version to version, version 3 may adhere to an entirely different standard than version 2. The fact that there is a standard may do you no good if it's evolving rapidly along with the software.

From the other side, even if you are standards conforming, that may not be enough. A user can expect that qsort sorts, but can they expect that it does so reasonably quickly? How often can you call fsync to maintain a reasonable balance between speed and safety? That's never going to be defined by the standard, but an understanding needs to be reached by the authors of a program like PostgreSQL.

I don't believe it's a question of abstract philosophy. It's one thing if standards were a tool used some places and not others in the computer world, that at their best were understood not to be sufficient to be binding on either implementer or user, then it would be reasonable to use unstandardized language in writing standards. But if _all_ APIs should depend on standards, then using an unstandardized language, when, again, formal languages like Lojban or simply standardized ones like French exist.

PostgreSQL's fsync() surprise

Posted Feb 19, 2019 22:39 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> Of what interest is a portable program that is not useful?

It's not a matter of either/or. Programs should be both portable *and* useful.

> A user can expect that qsort sorts, but can they expect that it does so reasonably quickly? How often can you call fsync to maintain a reasonable balance between speed and safety? That's never going to be defined by the standard...

Why not? Standards do sometimes specify things like algorithmic complexity. C doesn't specify that for qsort(), unfortunately, but C++ does require std::sort() to be O(n log n) in the number of comparisons. What constitutes a "reasonable balance" is up to the user, but there is no reason in principle why there couldn't be a standard for "filesystems useable with PostgreSQL" which defines similar timing requirements for fsync().

PostgreSQL's fsync() surprise

Posted Apr 26, 2018 16:29 UTC (Thu) by Wol (subscriber, #4433) [Link]

> I wonder if the "what POSIX mandates" in the article really refers to a mandate by POSIX, or another case of lack of definition that an implementator sees as a welcome opportunity for an unpleasant surprise.

As I understand it, POSIX explicitly *avoids* what happens when things go wrong, precisely because POSIX has no idea what's happened.

So a linux standard that says "this is the way we handle errors" will be completely orthogonal to POSIX. And would be a good thing ...

The trouble with POSIX is it's an old standard, that is out-of-date, and while I believe there is some effort at updating it, there is far too much undefined behaviour out there.

Cheers,
Wol


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds