|
|
Subscribe / Log in / New account

Errors on close

Errors on close

Posted Feb 12, 2025 23:06 UTC (Wed) by NYKevin (subscriber, #129325)
In reply to: Errors on close by farnz
Parent article: Maintainer opinions on Rust-for-Linux

> In practice, though, I'm struggling to think of a case where sync_all( also known as fsync(2)) is the wrong thing, and checking returns from close(2) is the right thing.

High-performance file I/O is an exercise in optimism. By not calling fsync, you accept some probability of silent data loss in exchange for higher performance. But there's an even more performant way of doing that: You can just skip all the calls to File::write(), and for that matter, skip opening the file altogether, and just throw away the data now.

Presumably, then, it is not enough to just maximize performance. We also want to lower the probability of data loss as much as possible, without compromising performance. Given that this is an optimization problem, we can imagine various different points along the tradeoff curve:

* Always call fsync. Zero probability of silent data loss (ignoring hardware failures and other things beyond our reasonable control), but slower.
* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster.
* Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.

Really, it's the middle one that makes no sense, since one main memory cache miss is hardly worth writing home about in terms of performance. Maybe if you're writing a ton of small files really quickly, but then errno will be in cache and the performance cost becomes entirely unremarkable.


to post comments

Errors on close

Posted Feb 13, 2025 10:13 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster. * Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.

This is the core of our disagreement - as far as I can find, the probability of silent data loss on Linux is about the same whether or not you check the close return code, with the exception of NFS. Because you can't do anything with the FD after close, all FSes but NFS seem to only return EINTR (if a signal interrupted the call) or EBADF (you supplied a bad file descriptor), and in either case, the FD is closed. NFS is slightly different, because it can return the server error associated with a previous write call, but it still closes the FD, so there is no way to recover.

Errors on close

Posted Feb 13, 2025 17:54 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

> but it still closes the FD, so there is no way to recover.

Of course error-on-close is recoverable, you just delete the file and start over. Or if that doesn't work, report the error to the user so that they know their data has not been saved (and can take whatever action they deem appropriate, such as saving the data to a different filesystem, copying the data into the system clipboard and pasting it somewhere to be preserved by other means, etc.).

Errors on close

Posted Feb 13, 2025 18:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Of course error-on-close is recoverable, you just delete the file and start over.

Until you can't start over ... which is probably par for the course in most data entry applications ...

Cheers,
Wol

Errors on close

Posted Feb 14, 2025 10:49 UTC (Fri) by farnz (subscriber, #17727) [Link]

But, by the nature of Linux's close syscall, error on close means one of three things:
  1. You supplied a bad file descriptor to close. No data loss, nothing to do.
  2. A signal came in mid-close. No data loss, nothing to do.
  3. You're on NFS, and a previous operation failed - but you don't know which one, or whether the data is safe.

Deleting the file is the worst possible thing to do with an error on close - two of the three are cases where the data has been saved, and it's an oddity of your code that resulted in the error being reported on close. The third is one where the file is on an NFS mount, the NFS server is set up to write data immediately upon receiving a command (rather than on fsync, since you won't get a delayed error for a write if the NFS server is itself caching) and you didn't fsync before close (required on NFS to guarantee that you get errors).

But even in the latter case, close is not enough to guarantee that you get a meaningful error that tells you that the data has not been saved - you need fsync, since the NFS server is permitted to return success to all writes and closes, and only error on fsync.

And just to be completely clear, I think this makes error on close useless, because all it means in most cases is either "your program has a bug" or "a signal happened at a funny moment". There's a rare edge case if you have a weird NFS setup where an error on close can mean "data lost", but if you're not in that edge case (which cannot be detected programmatically, since it depends on the NFS server's configuration), the two worst possible things you can do if there's an error on close are "delete the file (containing safe data) and start over" and "report to the user that you've saved their data, probably, so that they can take action just in case this is an edge case system.

On the other hand, fsync deterministically tells you either that the data is as safe as can reasonably be promised, or thatit's lost, and you should take action.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds