Ada does this too

Posted Feb 12, 2025 20:28 UTC (Wed) by plugwash (subscriber, #29694)
In reply to: Ada does this too by NYKevin
Parent article: Maintainer opinions on Rust-for-Linux

IIRC You can hack together an undroppable type by putting a call to a nonexistent external function in the drop glue.

There are two problems though.

1. A type can be forgotten without being dropped, known as "leaking" the value. There was a discussion in the run up to rust 1.0 about whether this should be considered unsafe, which ultimately came down on the side of no.
2. Values are dropped on panic, if the compiler can't prove your code won't panic then it will have to generate the drop glue.

Ada does this too

Posted Feb 12, 2025 21:55 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (15 responses)

There are different levels of this:

A weakly-undroppable type may not be dropped. It may be leaked, forgotten, put into an Arc/Rc cycle, or smuggled inside of ManuallyDrop. It may also be dropped on an unwinding panic (with the understanding that panics are bad, unwinding is worse, and some features just won't play nicely with them). It does not provide a safety guarantee, so unsafe code must assume that undroppable types may still get lost by other means.

A strongly-undroppable type may not be dropped, and additionally provides a safety guarantee that its drop glue is never called. It cannot be dropped on unwinding panic, so such panics are converted into aborts if they would drop the object (you can still use catch_unwind to manually handle the situation, if you really want to). Unsafe code may assume that the object is never dropped, but still may not make any other assumptions about the object's ultimate fate.

An unleakable type may not be leaked or forgotten. It must always be dropped or destructured. You may not put it into Rc, Arc, std::mem::forget, ManuallyDrop, MaybeUninit (which is just ManuallyDrop in a funny hat), Box::leak, or anything else that could cause it to become leaked (unsafe code may do some or all of these things, but must not actually leak the object). You also may not define a static of that type, because statics don't get dropped at exit and are functionally equivalent to leaking an instance on startup.

A strongly-linear type is both strongly-undroppable and unleakable. It cannot be dropped or leaked by any means, and is subject to all of the constraints listed above. It may only be destructured by code that has visibility into all of its fields.

Now my commentary:

* Weakly-undroppable types are already very useful as a form of linting. For example, the std::fs::File type currently does not have a close() method, because the type already closes itself on drop. But that means that it swallows errors when it is closed. The documentation recommends calling sync_all() (which is equivalent to fsync(2)) if you care about errors, but I imagine that some filesystem developers would have choice words about doing that in lieu of checking the error code from close(2). If File were weakly-undroppable, then it could provide a close() method that returns errors (e.g. as Result<()>) and fails the compilation if you forget to call it. This isn't a safety issue since the program won't perform UB if you forget to close a file, so we don't need a strong guarantee that it is impossible to do so. We just want a really strong lint to stop the user from making a silly mistake. It would also help with certain problems involving async destructors, but I don't pretend to understand async nearly well enough to explain that. On the downside, it would interact poorly with most generic code, and you'd probably end up copying the semantics of ?Sized to avoid massive backcompat headaches (i.e. every generic type would be droppable by default, and would need to be qualified as ?Drop to allow undroppable types).
* I'm not sure that strongly-undroppable types provide much of a useful improvement over weakly-undroppable types, and it would be absurd to provide both features at once. But it could be argued that having one exceptional case where drop glue is invoked is a recipe for bugs, so it might be a cleaner implementation. OTOH, if you have chosen to have unwinding panics, you probably don't want them to magically transform into aborts just because some library in your project decided to use an undroppable somewhere. I currently think that weakly-undroppable is the more pragmatic choice, but I think there are valid arguments to the contrary.
* Unleakable types would allow the original API for std::thread::scope to be sound, and would probably enable some other specialized typestates. They're also pretty invasive, although probably not quite as much as undroppable types. They would not solve the File::close() problem.
* Strongly-linear types are just the combination of two of the above features. If either feature is too invasive to be practical, then so are strongly-linear types. But they would provide a strong guarantee that every instance is cleaned up in one and only one way.

Of course, there is another problem: We cannot guarantee that an arbitrary Turing-complete program makes forward progress. If the program drops into an infinite loop, deadlock, etc., then no existing object will ever get cleaned up, meaning that everything is de facto leaked whether our types allow for it or not. To some extent, this is fine, because a program stuck in an infinite loop will never execute unsafe code that makes assumptions about how objects are cleaned up. To some extent, it is not fine, because we can write this function (assuming that ?Leak means "a type that can be unleakable"):

fn really_forget<T: Send + 'static + ?Leak>(t: T){
std::thread::spawn(move || _t = t; loop{std::thread::park();});
}

...or some variation thereof, and there is probably no general way to forbid such functions from existing. So any type that is Send + 'static (i.e. has no lifetime parameters and can be moved between threads) should be implicitly Leak.

The "obvious" approach is to make 'static imply Leak, and require all unleakable (and maybe also undroppable) types to have an associated lifetime parameter, which describes the lifetime in which they are required to be cleaned up. More pragmatically, you might instead say that 'static + !Leak is allowed as a matter of type coherence, but provides no useful guarantees beyond 'static alone, and unsafe code must have a lifetime bound if it wants to depend on something not leaking. I'm not entirely sure how feasible that is in practice, but it is probably more theoretically sound than just having !Leak imply no leaks by itself, and unsafe code probably does want to have a lifetime bound anyway (it provides a more concrete and specific guarantee than "no leaks," since it allows you to assert that object A is cleaned up no later than object B).

Errors on close

Posted Feb 12, 2025 22:33 UTC (Wed) by farnz (subscriber, #17727) [Link] (5 responses)

For example, the std::fs::File type currently does not have a close() method, because the type already closes itself on drop. But that means that it swallows errors when it is closed. The documentation recommends calling sync_all() (which is equivalent to fsync(2)) if you care about errors, but I imagine that some filesystem developers would have choice words about doing that in lieu of checking the error code from close(2). If File were weakly-undroppable, then it could provide a close() method that returns errors (e.g. as Result<()>) and fails the compilation if you forget to call it

Note that Rust could, with some effort, have a method fn close (self) -> io::Result<()>, without the weakly-undroppable property, so that developers who really care can get at the errors from closing a file. It'd be stronger if it'd fail the compilation if you forgot to call it, but it'd resolve the issue with those filesystem developers.

In practice, though, I'm struggling to think of a case where sync_all( also known as fsync(2)) is the wrong thing, and checking returns from close(2) is the right thing. The problem is that close returning no error is a rather nebulous state - there's not really any guarantees about what this means, beyond Linux telling you that the FD is definitely closed (albeit this is non-standard - the FD state is "unspecified" on error by POSIX) - whereas fsync at least guarantees that this file's data and its metadata are fully written to the permanent storage device.

Errors on close

Posted Feb 12, 2025 23:06 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (4 responses)

> In practice, though, I'm struggling to think of a case where sync_all( also known as fsync(2)) is the wrong thing, and checking returns from close(2) is the right thing.

High-performance file I/O is an exercise in optimism. By not calling fsync, you accept some probability of silent data loss in exchange for higher performance. But there's an even more performant way of doing that: You can just skip all the calls to File::write(), and for that matter, skip opening the file altogether, and just throw away the data now.

Presumably, then, it is not enough to just maximize performance. We also want to lower the probability of data loss as much as possible, without compromising performance. Given that this is an optimization problem, we can imagine various different points along the tradeoff curve:

* Always call fsync. Zero probability of silent data loss (ignoring hardware failures and other things beyond our reasonable control), but slower.
* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster.
* Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.

Really, it's the middle one that makes no sense, since one main memory cache miss is hardly worth writing home about in terms of performance. Maybe if you're writing a ton of small files really quickly, but then errno will be in cache and the performance cost becomes entirely unremarkable.

Errors on close

Posted Feb 13, 2025 10:13 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

* Never call fsync and never check the close return code. Significant probability of silent data loss, but faster. * Never call fsync, but do check the close return code. Presumably a lower probability of silent data loss, since you might catch some I/O errors, but almost as fast as not checking the error code (in the common case where there are no errors). In the worst case, this is a main memory load (cache miss) for the errno thread-local, followed by comparing with an immediate.

This is the core of our disagreement - as far as I can find, the probability of silent data loss on Linux is about the same whether or not you check the close return code, with the exception of NFS. Because you can't do anything with the FD after close, all FSes but NFS seem to only return EINTR (if a signal interrupted the call) or EBADF (you supplied a bad file descriptor), and in either case, the FD is closed. NFS is slightly different, because it can return the server error associated with a previous write call, but it still closes the FD, so there is no way to recover.

Errors on close

Posted Feb 13, 2025 17:54 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

> but it still closes the FD, so there is no way to recover.

Of course error-on-close is recoverable, you just delete the file and start over. Or if that doesn't work, report the error to the user so that they know their data has not been saved (and can take whatever action they deem appropriate, such as saving the data to a different filesystem, copying the data into the system clipboard and pasting it somewhere to be preserved by other means, etc.).

Errors on close

Posted Feb 13, 2025 18:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Of course error-on-close is recoverable, you just delete the file and start over.

Until you can't start over ... which is probably par for the course in most data entry applications ...

Cheers,
Wol

Errors on close

Posted Feb 14, 2025 10:49 UTC (Fri) by farnz (subscriber, #17727) [Link]

But, by the nature of Linux's close syscall, error on close means one of three things:

You supplied a bad file descriptor to close. No data loss, nothing to do.
A signal came in mid-close. No data loss, nothing to do.
You're on NFS, and a previous operation failed - but you don't know which one, or whether the data is safe.

Deleting the file is the worst possible thing to do with an error on close - two of the three are cases where the data has been saved, and it's an oddity of your code that resulted in the error being reported on close. The third is one where the file is on an NFS mount, the NFS server is set up to write data immediately upon receiving a command (rather than on fsync, since you won't get a delayed error for a write if the NFS server is itself caching) and you didn't fsync before close (required on NFS to guarantee that you get errors).

But even in the latter case, close is not enough to guarantee that you get a meaningful error that tells you that the data has not been saved - you need fsync, since the NFS server is permitted to return success to all writes and closes, and only error on fsync.

And just to be completely clear, I think this makes error on close useless, because all it means in most cases is either "your program has a bug" or "a signal happened at a funny moment". There's a rare edge case if you have a weird NFS setup where an error on close can mean "data lost", but if you're not in that edge case (which cannot be detected programmatically, since it depends on the NFS server's configuration), the two worst possible things you can do if there's an error on close are "delete the file (containing safe data) and start over" and "report to the user that you've saved their data, probably, so that they can take action just in case this is an edge case system.

On the other hand, fsync deterministically tells you either that the data is as safe as can reasonably be promised, or thatit's lost, and you should take action.

Ada does this too

Posted Feb 13, 2025 13:24 UTC (Thu) by daroc (editor, #160859) [Link] (8 responses)

As someone who enjoys unusual programming language features — it actually is possible to guarantee that a surprisingly large subset of programs in Turing-complete languages make forward progress. See Idris or Agda's termination checkers, which prove that a given function must eventually terminate because every loop involves one or more argument to the loop getting "smaller" in a specific sense.

You are of course right in general; the price that Idris and Agda pay for being able to say that some programs terminate is that the termination checker is not perfect, and will sometimes disallow perfectly okay programs. So I don't think it's necessarily a good idea for Rust to add termination checking to its type system, but it is technically a possibility.

Ada does this too

Posted Feb 13, 2025 15:13 UTC (Thu) by Wol (subscriber, #4433) [Link] (7 responses)

> You are of course right in general; the price that Idris and Agda pay for being able to say that some programs terminate is that the termination checker is not perfect, and will sometimes disallow perfectly okay programs. So I don't think it's necessarily a good idea for Rust to add termination checking to its type system, but it is technically a possibility.

Can't Rust have several checkers? If any one of them returns "okay", then the proof has passed and the code is okay. The programmer could then also add hints, maybe saying "run these checkers in this order", or "don't bother with these checkers", or whatever. So long as the rule is "any positive result from a checker is okay", that could reduce the checking time considerably.

Oh - and I don't know how many other languages do this sort of thing, but DataBASIC decrements -1 to 0 to find the end of an array :-) It started out as a "feature", and then people came to rely on it so it's standard documented behaviour.

(I remember surprising a C tutor by adding a bunch of bools together - again totally normal DataBASIC behaviour, but it works in C as well because I believe TRUE is defined as 1 in the standard?)

Cheers,
Wol

Running several checkers instead of one

Posted Feb 13, 2025 15:34 UTC (Thu) by farnz (subscriber, #17727) [Link] (5 responses)

It's not the runtime of the checker that's the problem; it's that we cannot write a checker that definitely accepts or rejects all reasonable programs. The underlying problem is that, thanks to Rice's Theorem (a generalisation of Turing's Halting Problem), a checker or combination of checkers can, at best, give you one of "undecided", "property proven to hold", or "property proven to not hold". There's two get-outs we use to make this tractable:

We treat "undecided" as "property proven to not hold", and reject programs where the desired properties don't hold, or where it's undecided.
We work to minimise the amount of code that falls into "undecided"; we can't make it zero, thanks to the theorems proven by Turing and Rice, but we can reduce it to "things that a skilled human would understand why they can't make that work".

If we could ensure that "undecided" was small enough, we'd not have a problem - but the problem we have is that all known termination checkers reject programs that humans believe terminate.

Running several checkers instead of one

Posted Feb 13, 2025 18:01 UTC (Thu) by Wol (subscriber, #4433) [Link] (4 responses)

That's not what I'm getting at, though. We might have three or four different checkers, which return different subsets of "I can prove this code".

It only takes one checker to return success, and we know that that Rust code is okay. So if we know (or suspect) which is the best checker to run, why can't we give Rust hints, to minimise the amount of checking Rust (has to) do. Which then means we can run more expensive checkers at less cost.

Cheers,
Wol

Running several checkers instead of one

Posted Feb 13, 2025 18:31 UTC (Thu) by daroc (editor, #160859) [Link] (2 responses)

Sure, but there's a mathematical proof that any set of checkers you can make will have some programs that they can't decide. It's not just a matter of finding the right one.

So the tradeoff will always be between not being able to check this property, or being able to check it but rejecting some programs that are probably fine.

That said, I do think there is a place for termination checking in some languages — the fact that Idris has it lets you do some really amazing things with dependent typing. Whether _Rust_ should accept that tradeoff is a matter of which things it will make harder and which things it will make easier, not just performance.

Running several checkers instead of one

Posted Feb 14, 2025 15:12 UTC (Fri) by taladar (subscriber, #68407) [Link] (1 responses)

Maybe there is also a place for termination checking that doesn't cover the entire program? It could limit where you have to look for bugs similar to unsafe blocks or the way test coverage is displayed.

Uses of termination checking

Posted Feb 14, 2025 15:27 UTC (Fri) by farnz (subscriber, #17727) [Link]

Idris has both as part of its totality checker; a function can be partial (in which case it may never terminate or produce a value - it can crash or loop forever) or total (in which case it must either terminate for all possible inputs, or produce a prefix of a possibly infinite result for all possible inputs).

Idris then uses this to determine whether it can evaluate a function at compile time (total functions) or whether it must defer to runtime (partial functions). This becomes important because Idris is dependently typed, so you can write a type that depends on the outcome of evaluating a function; if that function is total, then the type can be fully checked at compile time, while if it's partial, it cannot.

Running several checkers instead of one

Posted Feb 14, 2025 11:02 UTC (Fri) by farnz (subscriber, #17727) [Link]

If you run all the currently known termination checker algorithms that actually come up with useful results (with the exception of "run until a timeout is hit, say it might not terminate if the timeout is hit, or it does terminate if the timeout is not hit"), you're looking at a few seconds at most. The pain is not the time that the algorithms we know of take, it's the fact that most of them return "undecided" on programs that humans can tell will terminate.

Ada does this too

Posted Feb 13, 2025 16:06 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

> it works in C as well because I believe TRUE is defined as 1 in the standard?

It works because "_Bool + _Bool" is upcasted to "int + int" first and then true's are implicitly upcasted to 1's.