A review of file descriptor memory safety in the kernel

By Daroc Alden
August 22, 2024

On July 30, Al Viro sent a patch set to the linux-fsdevel mailing list with a comprehensive cover letter explaining his recent work on ensuring that the kernel's internal representation of file descriptors are used correctly in the kernel. File descriptors are ubiquitous; many system calls need to handle them. Viro's review identified a few existing bugs, and may prevent more in the future. He also had suggestions for ways to keep uses consistent throughout the kernel.

File descriptors are represented in user space as non-negative integers. In the kernel, these are actually indexes into the process's file-descriptor table, which stores entries of type struct file. Most system calls that take a file descriptor — with the exception of things such as dup2() that only touch the file-descriptor table, not the files themselves — need to refer to the associated struct file to determine how to handle the call. These structures are, like many things in the kernel, reference counted.

Therefore, a typical system call could increment the reference count for a file object, do something with it, and then decrement the reference count again. This would be functional, but, as Viro said in his cover letter, "incrementing and decrementing refcount has an overhead and for some applications it's non-negligible." So in practice, the kernel only changes the reference count when it knows the descriptor table is shared with other threads. If the table is not shared, the kernel knows that the reference count can't change, and the file won't be freed, so there's no reason to incur the overhead.

Unfortunately, this complicates handling the structures in question, because it's always possible that a thread ~~could be spawned~~ could end between checking that the table is shared and the end of the system call. Therefore, the kernel needs to store whether or not it skipped incrementing the reference count. Some file operations also need to take a lock on the file, so the kernel also needs to remember whether it took a lock at the same time. Both of these pieces of information are stored alongside a pointer to a struct file in struct fd, the primary subject of Viro's patch set.

Instances of struct fd are mostly created and destroyed by the special helper functions: fdget() and fdput(). Viro's investigation touched on all 160 call sites throughout the kernel, including a few places that don't go through the helpers, and caught several leaks and use-after-free bugs. "That kind of stuff keeps cropping up; it would be nice to have as much as possible done automatically."

Overall, Viro's report shows the kernel to be in a good state. Most of the places where struct fd is used have no problem. One of the exceptions is overlayfs, which uses the fd structure not only to indicate "non-reference-counted file", "reference counted file", or "empty", but also various errors. struct fd is not meant to store errors, and overlayfs's use gets in the way of some of Viro's follow-up work. Since that is the only code that treats the structure like that, Viro suggested that overlayfs might want to switch to a separate type that has better support for representing errors.

Almost everywhere else, the kernel acquires and releases a struct fd within a single function, which makes it easy to audit the code to be sure that the use is correct. But, manually checking so many uses is tedious; Viro ended his cover letter by exploring ways that the compiler could potentially help with that check. The kernel already uses the __cleanup attribute to manage function-local values in several places, so it would be ideal if the attribute could be used in this case as well. Unfortunately, doing so is not trivial — at least, not if one also wants good code generation.

Specifically, with the current representation of struct fd, the compilers used to build the kernel are not smart enough to tell that calling fdput() with an empty struct fd (one that does not point to a struct file) is a no-op and can just be elided. This adds a bunch of unnecessary code — mostly to error paths, but it can still be a performance hit. Using __cleanup makes this problem worse, because it will add cleanup code to branches where a human could tell it was unnecessary, but the compiler cannot. Viro said that Linus Torvalds had suggested making the struct a single pointer-sized integer, and packing the flags into the lowest bits. This representation would allow the compiler to determine when calls to fdput() can be elided with greater accuracy.

That isn't the only issue standing in the way of using __cleanup, however. The attribute interacts badly with goto statements; in version 12 of GCC and earlier, using a goto statement to jump past the initialization of an annotated variable still results in the the cleanup code getting called, but it refers to the uninitialized variable, which usually has undesirable effects. Clang does catch this problem, "but we can neither make it mandatory (not all targets are supported, to start with) nor bump the minimal gcc version from 5.1 all the way to 13."

So, while most functions can use __cleanup, some require a more manual approach. The bulk of Viro's patch set consists of patches for different subsystems that convert uses of struct fd to use __cleanup where possible. Viro used Clang to verify the parts of the code that support building that way, and manually inspected the rest.

The review

Reviewers were largely happy with the patch set. Christian Brauner said the patch set as a whole "looks good and pretty simple...nothing really surprising". As with any large patch set, there were a few places where the patches collided with other in-progress changes, but that was resolved without fuss.

Andrii Nakryiko and the other BPF developers had some comments about Viro's approach in the BPF code, suggesting that a small refactoring could make the necessary change less intrusive. Nakryiko ended up putting together a separate patch set to go via the bpf-next tree with the changes.

Overall, Viro's patch set is not terribly exciting or controversial — which is a good thing. While LWN often reports on the most contentious parts of the kernel-development process, it's important to remember that the vast majority of changes are like this one (although perhaps less broad in scope). Once the patch set is merged, users will be able to enjoy having fewer use-after-free vulnerabilities in the kernel, and kernel developers will be able to enjoy less complicated and error-prone code around file access.

Single threaded

Posted Aug 22, 2024 16:56 UTC (Thu) by willy (subscriber, #9762) [Link] (7 responses)

> Unfortunately, this complicates handling the structures in question, because it's always possible that a thread could be spawned between checking that the table is not shared and the end of the system call.

That's not possible. The table is not shared because the process has a single thread. We can't spawn a new thread because the single thread is doing something with a file descriptor. Fortunately there is no way to remote-spawn a thread in a different process (if you're thinking about tricks with gdb or ptrace, the thread needs to exit from the kernel before executing whatever gadget you've inserted)

Single threaded

Posted Aug 22, 2024 17:11 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

Ah, that's good to know. I had wondered, and now I suspect that I misunderstood the relevant part of the cover letter:

That, of course, does not extend to keeping the file reference past the return from syscall, passing it to other threads, etc. - for those uses we have to bump the file refcount, no matter what. The borrowed reference is not going to remain valid forever; the things that may terminate its validity are

return to userland (close(2) might follow immediately)

removing and dropping some reference from descriptor table (some ioctls have to be careful because of that; they can't outright close a descriptor - removing a file reference from descriptor table is fine, but dropping it must be delayed)

spawning a thread that would share descriptor table. Note that this is the only thing that could turn a previously unshared table into a shared one.

As long as the borrowing thread does not do any of the above, it is safe.

It does still need to store the information in case things go the other way, with a thread ending while the call is in progress, I think.

Single threaded

Posted Aug 22, 2024 22:38 UTC (Thu) by viro (subscriber, #7872) [Link]

Quite. If descriptor table is not shared, it will remain that way until you spawn a child and share descriptors with it (i.e. pass CLONE_FILES to copy_process()). Going the other way is possible without any actions taken by your thread - other threads can terminate and that's all it takes.

Single threaded

Posted Aug 22, 2024 19:26 UTC (Thu) by geofft (subscriber, #59789) [Link] (3 responses)

> Fortunately there is no way to remote-spawn a thread in a different process (if you're thinking about tricks with gdb or ptrace, the thread needs to exit from the kernel before executing whatever gadget you've inserted)

Huh, it hadn't occurred to me that this is effectively a downside of supporting something like CreateRemoteThread.

In userspace, glibc skips some forms of locking until the process is multi-threaded, and I think the safety of that relies on the same argument - in a single-threaded process, you can't create a second thread at the point that you're inside some glibc code that would need to be holding a lock, because the glibc code isn't creating a thread itself, and it's done with the locked data / critical section when control returns from inside glibc. So I think CreateRemoteThread would break that too, and on such an OS, you would need to defensively take locks even when you're single-threaded.

Single threaded

Posted Aug 22, 2024 22:04 UTC (Thu) by Sesse (subscriber, #53779) [Link]

You could also simply say that CreateRemoteThread on a non-cooperating process makes for undefined results, just like any other poking into a process' memory from the outside really isn't something a libc can support. (This “solves” the glibc problem, though obviously the kernel still needs some other strategy.)

Single threaded

Posted Aug 23, 2024 5:14 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> Huh, it hadn't occurred to me that this is effectively a downside of supporting something like CreateRemoteThread.

Not really. CreateRemoteThread operates similarly to ptrace(), which Linux supports. Alternatively, if your environment is even slightly cooperative, you can implement it as a special type of message processing. Wine does it this way: https://github.com/wine-mirror/wine/blob/master/dlls/ntdl... invoked via https://github.com/wine-mirror/wine/blob/6a7bfbab10d653f6...

Single threaded

Posted Aug 23, 2024 7:35 UTC (Fri) by randomguy3 (subscriber, #71063) [Link]

it's notable that the documentation for CreateRemoteThread gives two common use-cases, but explicitly recommends against using CreateRemoteThread (due to its side-effects) for one, and is hardly positive about using it for the other

Single threaded

Posted Aug 22, 2024 22:02 UTC (Thu) by viro (subscriber, #7872) [Link]

Yes. The only files_struct instance a thread can increment a refcount on is current->files. That's done by spawning a thread with CLONE_FILES in flags. Each thread with ->files pointing to given instance contributes to the refcount, so if current->files->count is 1, there must be no other threads with ->files pointing to the same instance *AND* the only way for that refcount to grow beyond 1 is explicit spawning the child by this thread.

What is forbidden is fdget() ... create_io_thread() ... fdput() - the only other call chains that might lead to copy_process() with CLONE_FILES in the arguments are either from syscall (not called by kernel code) or via kernel_thread(); that gets called only by PID 0 (future idle thread, spawning kthreadd, no fdget()/fdput() pairs spanning that call ;-) and by kthreadd itself (again, no fdget()/fdput() in that sucker).

There are places where we have create_io_thread() uncomfortably near the fdget()/fdput() scopes (io_sq_offload_create()), but they are not inside the scopes in question.

In any case, such breakage can only be self-inflicted - no other thread can do that to you.

Downsides of C language culture?

Posted Aug 23, 2024 7:45 UTC (Fri) by taladar (subscriber, #68407) [Link] (33 responses)

> Overall, Viro's report shows the kernel to be in a good state. Most of the places where struct fd is used have no problem. One of the exceptions is overlayfs, which uses the fd structure not only to indicate "non-reference-counted file", "reference counted file", or "empty", but also various errors. struct fd is not meant to store errors, and overlayfs's use gets in the way of some of Viro's follow-up work.

I can't help but feel that this sort of thing would just never happen culturally in a language like Rust which emphasizes correctness and clean abstractions even if it is technically possible to do so too.

Downsides of C language culture?

Posted Aug 23, 2024 12:01 UTC (Fri) by pizza (subscriber, #46) [Link] (25 responses)

> I can't help but feel that this sort of thing would just never happen culturally in a language like Rust which emphasizes correctness and clean abstractions even if it is technically possible to do so too.

Feel free to go back in time and release current bleeding-edge Rust 35 years ago, in a form that is capable of running on 40-year-old commodity hardware/OSes.

Downsides of C language culture?

Posted Aug 23, 2024 12:06 UTC (Fri) by fishface60 (subscriber, #88700) [Link] (24 responses)

I don't think this was meant to criticise how things got how they are, more that future Rust rewrites could eliminate this class of bug eventually.

Downsides of C language culture?

Posted Aug 23, 2024 12:18 UTC (Fri) by pizza (subscriber, #46) [Link] (18 responses)

> I don't think this was meant to criticise how things got how they are, more that future Rust rewrites could eliminate this class of bug eventually.

...That is a very naive assumption.

When you make your tent bigger to include more folks, the collective culture _will_ change.

(And, incidentally, as long as Linux is a C project with Rust bolted to it, as opposed to a Rust project with lots of C bolted to it, that is not going to change)

...Personally, I find "Rust Culture" incredibly sanctimonious and off-putting. Naturally, much like Agile, that can only be because I'm ClearlyDoingItWrong.

Downsides of C language culture?

Posted Aug 23, 2024 14:09 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (1 responses)

Even if Rust is bolted on the side, the type system can prevent certain bugs in Rust code. Of course it's unlikely that Rust will ever be a majority of Linux code, and that's not a goal at all. But if at some point there will be no reason to write a new driver in C, that will be something.

Downsides of C language culture?

Posted Aug 23, 2024 14:50 UTC (Fri) by pizza (subscriber, #46) [Link]

> But if at some point there will be no reason to write a new driver in C, that will be something.

...In other words, Linux becomes a Rust project with a large pile of legacy/still-necessary C bolted onto it. [1]

Because "But Rust is optional!" only applies if a driver or subsystem *you* need is Rust-only.

Rust-first (and eventually, Rust-only) is the _only_ logical outcome/goal of this entire exercise. [2]

[1] My understanding is that this is already technically possible, at least for some subsystems. However, it is not (yet) "culturally" acceptable.

[2] From Linux's perspective, that is. From Rust's perspective, this exercise has lead to improvements to the core language, tooling, and various libraries.

Downsides of C language culture?

Posted Aug 24, 2024 19:31 UTC (Sat) by ringerc (subscriber, #3071) [Link] (15 responses)

What I'm struggling with is

"I want/need X"

"Rust can't do X so what you want/need is wrong, stop wanting/needing it".

Downsides of C language culture?

Posted Aug 24, 2024 21:07 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> What I'm struggling with is
> "I want/need X"
> "Rust can't do X so what you want/need is wrong, stop wanting/needing it".

Yes, this is unfortunately one of the themes quite prevalent in the Rust community/culture.

Downsides of C language culture?

Posted Aug 26, 2024 11:42 UTC (Mon) by taladar (subscriber, #68407) [Link] (13 responses)

It is less that Rust can't do what you want, in many cases it is more a matter of people trying to bring their idioms from other languages to Rust (that is not a uniquely Rust problem btw) and then complaining that doing it that way won't work. That exact same problem happened 10-15 years ago when everyone was trying to apply Java OOP design patterns to languages that didn't even have OOP and had very different idioms to Java.

Downsides of C language culture?

Posted Aug 26, 2024 12:08 UTC (Mon) by pizza (subscriber, #46) [Link] (12 responses)

> It is less that Rust can't do what you want, in many cases it is more a matter of people trying to bring their idioms from other languages to Rust [...]

Well, sure, but those other languages aren't trying to sell themselves as "encompassing C's use cases" which implies that various necessities are supported. [1]

> and then complaining that doing it that way won't work.

No, the complaint is not that "<X> won't work"; it's being sanctimoniously told "you don't need to accomplish <X> at all".

[1] One year into the the Rust-on-Linux experiment, it _still_ depends on bleeding-edge toolchains and non-stable features. Which is an improvement from requiring out-of-tree patches, but clearly it is not yet a universal replacement for C.

Downsides of C language culture?

Posted Aug 26, 2024 12:19 UTC (Mon) by mb (subscriber, #50428) [Link] (4 responses)

The kernel always required a special C compiler with special non-standard features (gcc).
The standard compliant C compiler llvm was not able to compile the kernel until it also gained additional features (and the kernel was modified as well).

And now Rust also needs to get some more special features for Linux.
So what?

I don't see how on earth this could be *any* hint as to whether Rust is "clearly [..] not yet a universal replacement for C" or not.

And there are other operating systems implemented in Rust.

Downsides of C language culture?

Posted Aug 26, 2024 12:42 UTC (Mon) by pizza (subscriber, #46) [Link] (3 responses)

> And there are other operating systems implemented in Rust.

Good for them. But they don't have the same requirements/features as Linux. And they certainly didn't have approximately eleventy bajillion lines of existing code, with more LoC changing on every release than these "RustOSes" have in their collective (and combined) codebases.

Mind you, I'm not holding Linux up as some holy untouchable artifact here -- But when your sales pitch is "It can do everything you need, only better!" it's incumbent on you to actually make good on that claim instead of moving the goalposts, okay?

It is the hight of hubris to believe that RustCulture(tm) will remain dominant when the TrueBelievers of Lake Woebegone [1] are diluted to homoeopathic levels.

[1] Where all the women are strong, all the men are good-looking, and all the developers are above-average.

Downsides of C language culture?

Posted Aug 26, 2024 12:59 UTC (Mon) by mb (subscriber, #50428) [Link] (2 responses)

Linux is not implemented in standard C.
And Rust never claimed to be "universal replacement for Linux Kernel C".

>it's incumbent on you to actually make good on that claim

ridiculous

Downsides of C language culture?

Posted Aug 26, 2024 13:07 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

> Linux is not implemented in standard C.

As khim and other folks here are so quick to point out, pretty much nothing is implemented in purely "standard" C.

But if you're going to play that card, after many years of work, there are multiple versions of two completely independent toolchains that can be used to compile modern Linux. But only one Rust toolchain. One _version_ (ie the latest at that time) of the only extant Rust toolchain.

Not exactly a good argument to be making.

Downsides of C language culture?

Posted Aug 26, 2024 14:24 UTC (Mon) by intelfx (subscriber, #130118) [Link]

> But if you're going to play that card, after many years of work, there are multiple versions of two completely independent toolchains that can be used to compile modern Linux. But only one Rust toolchain. One _version_ (ie the latest at that time) of the only extant Rust toolchain.
>
> Not exactly a good argument to be making.

As you correctly say, it took the C codebase (and C toolchains) **many years of work** to get to this point. And it's still not "standard C" — it's still a pile of documented, non-documented and semi-documented extensions that only recently ended up being supported by another toolchain besides GCC.

And now a new contender (Rust) appears and in just a few months people are asking it to clear the bar that literally nothing else is held to?

I'm not a believer of the Church of Rust, far from it, but this line of argumentation is ridiculous (and disingenuous).

Downsides of C language culture?

Posted Aug 26, 2024 12:33 UTC (Mon) by ringerc (subscriber, #3071) [Link] (6 responses)

> No, the complaint is not that "<X> won't work"; it's being sanctimoniously told "you don't need to accomplish <X> at all".

This. Or you're wrong for wanting to.

It gets even better when the capabilities of the language/toolchain improve so you can now accomplish<X> ... and now it's ok, you're allowed to want to now.

The Rust community is far from unique in this, to be clear, but it's still very frustrating.

Downsides of C language culture?

Posted Aug 27, 2024 18:41 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

But is this an "X & Y" problem (or whatever it's called)?

Where X is *your* *perceived* solution, and the actual problem is Y?

Are you fixated on your perceived solution (X), or do you actually want to solve the real problem (Y)? Because X may just be a very bad fit to the language, which has a *different*, totally acceptable, solution to Y.

Certainly I get the impression that the Rust designers may be aggressive in asking "Why do you want X?" But that's because they want to understand what Y is, and provide a solution, just not necessarily the X solution.

Cheers,
Wol

Downsides of C language culture?

Posted Aug 27, 2024 19:42 UTC (Tue) by pizza (subscriber, #46) [Link] (3 responses)

> But is this an "X & Y" problem (or whatever it's called)?
> Where X is *your* *perceived* solution, and the actual problem is Y?

No.

> Are you fixated on your perceived solution (X), or do you actually want to solve the real problem (Y)? Because X may just be a very bad fit to the language, which has a *different*, totally acceptable, solution to Y.

The determination of Y being "totally acceptable" (or even _equivalent_) is not up to the Rust evangelists.

Downsides of C language culture?

Posted Aug 28, 2024 7:33 UTC (Wed) by taladar (subscriber, #68407) [Link] (2 responses)

On the other hand the determination that X is not an acceptable solution is usually based on 50+ years of evidence of C programs having giant security holes or recurring types of bugs because C programmers insist on solving Y by doing X.

Downsides of C language culture?

Posted Aug 28, 2024 10:50 UTC (Wed) by ringerc (subscriber, #3071) [Link]

There are absolutely legitimate reasons to say "we intentionally don't support [X] and think it's a terrible idea; these features provide other ways to solve the same underlying problems without the same hazards."

And sometimes those other ways come with compromises that don't suit everyone. Which is also ok so long as fans of the tool (like Rust) are willing to say "ok, it doesn't look like Rust is a good fit for your specific use case right now."

It becomes a problem when the story is then "this is the only way, if this way conflicts with other constraints you have you need to remove the other constraints and do it our way because that's the one right true way".

Even if it's 10x the time and development effort or mountains of boilerplate, or has a huge runtime performance penalty unacceptable in the use case etc.

For example, as a big postgres fan and sometimes postgres contributor, when someone asks me how to embed Postgres in their application I'll say "you probably shouldn't, there are other DBs that are much more suitable for embedding in apps. You can kind of do it with postgres but it'll be more fragile and harder for the user to back up, run, update and migrate your app. You can probably just use SQLite."

I won't tell them "embedding a DB in an app is fundamentally wrong, educate your users about properly installing and managing a RDBMS and it's backups, rhen have them configure your app to connect to a postgres they install themselves."

Just because postgres is a bad choice for in-app embedding doesn't mean it's wrong to want to embed a DB in your app.

But I've seen this sort of thing a lot, and definitely not just from rust, go, Python, postgres, or any other community. It's especially funny when the tool in question gains a new feature that makes the use case not suck with that tool and now it's ok, when cool, to want to do it when before it was consider a terrible idea irrespective of what tool you used.

Downsides of C language culture?

Posted Aug 28, 2024 12:42 UTC (Wed) by pizza (subscriber, #46) [Link]

> On the other hand the determination that X is not an acceptable solution is usually based on 50+ years of evidence of C programs having giant security holes or recurring types of bugs because C programmers insist on solving Y by doing X.

Since you're making sweeping generalizations, here's one more.

Folks advocating for Y are quick to hype theoretical _benefits_, but completely (and conveniently) ignore its _cost_, be it up front (ie initial implementation), ongoing (maintenance), and at runtime (eg performance, UX, etc)

Downsides of C language culture?

Posted Aug 28, 2024 10:36 UTC (Wed) by ringerc (subscriber, #3071) [Link]

Yes this can absolutely be the case.

I've been on the other side at times telling people "no, this isn't how to approach this problem," "this tool works best using a set-based rather than iterative formulation", "this isn't the right tool to solve your problem". I've also done a lot of walking people back from "I want to do X" until we get to "I want to solve problem Y" then forward again to find effective solutions without their assumptions and preconceptions.

So I get it.

And yes sometimes I too have said "dear God no you don't want to do that and here's why". By which I mean "if you attempt this you are in for a world of pain and slow failure."

But I do my best not to tell people that they're wrong for wanting to solve a problem at all. Or that the tool I'm help them with is unsuitable for their particular problem right now.

The real trouble arises when all problems are reframed around the tool's capabilities.

You don't want dynamic loading of functionality (because golang doesn't/didn't do that), decouple it via gRPC services and immutable containers because it's Just Better(TM). You don't ever want to use dynamic key/value data in relational DB because it's fundamentally wrong, stop wanting to do that (unless your RDBMS has dymaic set types or something in which case you're allowed to want it for appropriate use cases). The list goes on.

Downsides of C language culture?

Posted Aug 23, 2024 14:21 UTC (Fri) by viro (subscriber, #7872) [Link] (4 responses)

You do realize that "future Rust rewrites" would have to do exact same kind of work, don't you? Unless you believe in Sapir-Whorf bollocks to truly breath-taking extent, that is...

Downsides of C language culture?

Posted Aug 23, 2024 17:16 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

I don't think anyone is claiming that Rust is a free lunch. My understanding of the argument is more along the following lines:

* This general category of work has to be done (because otherwise, your code is buggy).
* In Rust, the compiler can enforce that you either do it correctly, or pinky swear that you've done it correctly for this specific line/block of code (i.e. you have to write unsafe).
* In C, the compiler does help you to some extent, but it will not outright fail to compile if you miss a spot or do something wrong.
* Rust doesn't reduce the amount of work that needs to be done, but it does make it easier to confirm that you've done all of the work you wanted to do.
* Arguably, Rust does slightly increase the amount of work to be done, because you probably need to implement some kind of typestate API to enforce correctness, and that is probably more complex than an idiomatic implementation in C would be. But you only have to do this once (per API that you want to protect), and typestates are not especially hard to implement (in Rust). So, assuming lots of callsites that need to be checked for correctness, this is an acceptable tradeoff.

Downsides of C language culture?

Posted Aug 23, 2024 19:33 UTC (Fri) by viro (subscriber, #7872) [Link] (2 responses)

Not the point. I don't give a rat's arse for advocacy and culture wars of any description. Language is a tool; it might be more or less convenient to express some things, but let's not dive into the linguistic relativity nonsense - it does *not* shape ones thoughts. If you are unable to separate the concepts from specific syntax, you simply do not understand the language in question.

Rust type system is interesting in some respects; in any case, C type system is pretty weak. Memory safety is obviously a desirable property of program (and it's a property of program, not of a language), and so are correctness proofs in that area. And automating some parts of those proofs is obviously useful - to an extent determined by how good your build coverage is, which is a bloody serious caveat for the kernel. The hard part is to find out _what_ to verify. And that is language-independent; you need to reason about the things done by the program to come up with invariants and predicates that need to be verified. And if you end up with "OK, it's actually safe here, but the reasons are seriously convoluted", you need to come up with some way to massage the damn thing to make the proof of safety more straightforward, etc.

What's more, you need to be able to follow the proof; without that "compiler will catch any violations" is worse than useless when it comes to the changes you (or somebody else) will be doing several years down the road. If compiler warnings become a black box, or worse, an oracle to be appeased, you are fucked; cargo-culting will follow, and you'll find out that memory safety is not the only thing that can go wrong in code where nobody can tell you why such and such thing is done.

Choice of syntax, etc., to be used for automating the verification is secondary. "It wouldn't be needed with Rust" is asinine - the PITA would be reordered, but it would still be there. All of it. Including coming up with sane semantics for objects, figuring out how to use them safely, etc. And you would not even get the benefit of having all of that done upfront, before there's enough users of that thing to make things painful - even leaving aside the question of how much of a benefit it is, imagine a perfectly memory-safe set of primitives. Refcounted struct file, fget() taking an integer and returning a cloned reference or none and fput() dropping a reference. Memory safety is obviously not an issue for users of that API; descriptor table modifications need to play safe wrt fget(), but that's also not hard to do. All of that is perfectly doable in Rust. Now, at some point somebody points you to noticable overhead on real-world loads, with cacheline ping-pong on refcount modifications found to be contributing a lot.

Now, you notice that considerable part of that could be handled by a switch from cloning to borrowing. Again, perfectly fine for Rust, innit? Except that then you need to figure out how many of the existing users of old API could be converted to the new one and what discipline is needed to keep the things safe afterwards. And _that_ will take exact same kind of analysis. On codebase already in Rust.

Downsides of C language culture?

Posted Aug 24, 2024 0:39 UTC (Sat) by roc (subscriber, #30627) [Link]

> Now, you notice that considerable part of that could be handled by a switch from cloning to borrowing. Again, perfectly fine for Rust, innit? Except that then you need to figure out how many of the existing users of old API could be converted to the new one and what discipline is needed to keep the things safe afterwards. And _that_ will take exact same kind of analysis. On codebase already in Rust.

It all depends on how much of the API's rules can be enforced by the Rust type system. If all of them, then converting users to the new API is a lot easier than with C: just try a mechanical replacement; if the compiler is OK with it, you're done, and the compiler will keep things safe afterwards too. If none of them are, then you're not much better off than with C. If some of them are, you benefit commensurably.

A similar example that I have personally experienced a lot is using Rayon to parallelize loops, which in many cases is as simple as "replace `iter()` with `par_iter()`; if it compiles, then there are no data races, and in practice it will almost always work".

Downsides of C language culture?

Posted Aug 28, 2024 3:37 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> Not the point. I don't give a rat's arse for advocacy and culture wars of any description. Language is a tool; it might be more or less convenient to express some things, but let's not dive into the linguistic relativity nonsense - it does *not* shape ones thoughts. If you are unable to separate the concepts from specific syntax, you simply do not understand the language in question.

Frankly I do not understand why this is written in response to my comment. I did not make any of the claims you are rebutting, and I explicitly positioned Rust as a tool for solving problems, not an ideology.

> What's more, you need to be able to follow the proof; without that "compiler will catch any violations" is worse than useless when it comes to the changes you (or somebody else) will be doing several years down the road. If compiler warnings become a black box, or worse, an oracle to be appeased, you are fucked; cargo-culting will follow, and you'll find out that memory safety is not the only thing that can go wrong in code where nobody can tell you why such and such thing is done.

Well... this depends on what we're talking about.

For the borrow checker, the rules are pretty well defined (object must outlive references to them, mutable references may not alias anything, shared references guarantee immutability). It is not always easy to understand how the compiler is able to enforce these rules, true, but the rules themselves are very clear to everyone. If you find yourself writing unsafe code, you should know exactly what invariants you are expected to uphold (or else you should have another read of the nomicon before trying to write unsafe code). And if you don't write unsafe code... then it's not your problem in the first place.

For typestate APIs, it's usually as simple as "the foo() function needs an object of type A, you have an object of type B, so you can't call foo() until you call something else that turns Bs into As, and then you can't call anything that takes a B anymore." I'm struggling to imagine how that logic would become difficult to follow. It's just a state machine with extra type safety.

Downsides of C language culture?

Posted Aug 23, 2024 14:07 UTC (Fri) by viro (subscriber, #7872) [Link] (5 responses)

_Which_ sort of thing? disjoint union of cloned refs/borrowed refs/none? Really? And here I thought that Rust had a type system that could express that...

Or do you mean that all abstractions must be there from the very beginning? I'm not fond of Rust as a language, but I think you are doing it a disservice here...

Seriously, if you want to use that thing for kernel work, you'd better be ready to do that sort of analysis, again and again. Figuring out the memory safe abstractions that can express something existing instances could be massaged into, that is. _NOT_ the "oh, we have something that could express the toy cases we'd been able to think of and as for anything else... well, somebody will deal with it somehow - or it can just stay in that legacy stuff and rot" kind of attitude you guys seem to be so fond of.

Downsides of C language culture?

Posted Aug 23, 2024 17:30 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

> _Which_ sort of thing? disjoint union of cloned refs/borrowed refs/none? Really? And here I thought that Rust had a type system that could express that...

This is spelled std::borrow::Cow. Wrap it in Option<T> if you want None as well (which I believe should have no size overhead, because one of the Cow enum variants has a reference in it, and references are non-nullable, so there's a niche the compiler can use for Option layout optimization).

But Cow has its own problems, because it is specifically designed for borrow-checked references. If you want something less restrictive, like refcounted references, then you probably need to use Rc/Arc::make_mut() instead (or one of its sister methods like unwrap_or_clone()).

> Or do you mean that all abstractions must be there from the very beginning? I'm not fond of Rust as a language, but I think you are doing it a disservice here...

In this case, I think the argument is that Rust enforces struct field visibility, so you can't just reach into a Rust struct and fiddle with its members to represent arbitrary states. You would be forced to wrap it in some other struct or enum that describes your additional state data. In practice, you would probably just use Result<T, E> for this use case, because then you get the question mark operator etc. for free.

Downsides of C language culture?

Posted Aug 23, 2024 19:53 UTC (Fri) by viro (subscriber, #7872) [Link] (1 responses)

If you look at the series, you'll see that well over 99% of accesses are fetches of f.file (turned into fd_file(f) in one patch, semi-automatically); what remains is about a dozen of places over the entire kernel (in ~8 files, IIRC) where we do something trickier. All gone very early in that series. The painful parts had been elsewhere...

Downsides of C language culture?

Posted Aug 27, 2024 4:45 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

The other possibility (that I can think of) is something like "cast it to char* and modify the bytes by hand." You can do that in Rust, of course, but Rust is a lot more willing to hit you with the UB hammer if you do something nonsensical like setting a bool to 3, an enum to a nonexistent variant, etc. In C, this sort of restriction either does not exist at all, or if it does exist, it labeled as a "trap representation" and blamed on the hardware. In practice, people are very quick to demand/assume that x86 does not have trap representations, and therefore C-compiled-for-x86 also cannot/should not have trap representations either.

Of course, C does not even have visibility restrictions in the first place, except for really blunt measures like an opaque struct. In Rust, visibility restrictions are considered part of the safety boundary - there are plenty of APIs in the standard library and third-party crates which rely on visibility restrictions to maintain safety. For example, if you reach into a Vec and change its size or capacity, you can cause it to read uninitialized memory or perform other kinds of UB. But that's still considered a sound API, because you can't reach into a Vec and fiddle with its private members (without writing unsafe code that does some kind of type-punning).

Anyway, the end result of all this is that, in Rust, if you care to do so, it is possible to design a struct or enum in such a way that it can only represent those states you actually want it to represent, and anyone who manages to get it to represent some other state has already done UB before your code even runs.

Downsides of C language culture?

Posted Aug 26, 2024 11:37 UTC (Mon) by taladar (subscriber, #68407) [Link]

My argument was less that Rust would prevent you from doing this and more that this sort of micro-optimization at the cost or readability and correctness is part of C culture and Rust culture (and the culture of quite a few other languages) is that you just don't do things like that because they are a bad idea.

Downsides of C language culture?

Posted Aug 26, 2024 11:48 UTC (Mon) by taladar (subscriber, #68407) [Link]

Just reusing fields of some struct from somewhere else in the code base for an unrelated purpose. That just wouldn't be done in most language cultures but especially not in languages like Rust that put clean abstractions and correctness first.

Downsides of C language culture?

Posted Aug 25, 2024 16:37 UTC (Sun) by cytochrome (subscriber, #58718) [Link]

I'm disappointed that it took so long for someone to bring up Rust in the comments for this piece. ;-)

What applications benefit from that single-threaded optimization?

Posted Aug 23, 2024 8:42 UTC (Fri) by roc (subscriber, #30627) [Link] (8 responses)

What applications these days are so performance-sensitive that they care about the overhead of taking and releasing the reference count on a file descriptor, but so performance *in*sensitive that they don't made making high-frequency syscalls and using only a single core?

What applications benefit from that single-threaded optimization?

Posted Aug 23, 2024 9:09 UTC (Fri) by dezgeg (subscriber, #92243) [Link] (1 responses)

Perhaps the most important benchmark of them all - compiling the kernel? :P

What applications benefit from that single-threaded optimization?

Posted Aug 24, 2024 10:55 UTC (Sat) by adobriyan (subscriber, #30858) [Link]

Kernel compile is dominated by lookup in some gcc hashtables last time I checked.

Ultimately it is a clash between "everything below 10th line in profile doesn't exist" and "everything in profile exists" cultures.

Kernel for better or worse is firmly in the latter category, otherwise things like ERR_PTR wouldn't exist probably.

What applications benefit from that single-threaded optimization?

Posted Aug 23, 2024 14:23 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (5 responses)

I am not sure this is the case that is optimized here, but for short lived processes, startup (in ld.so or in an interpreter) might do many statat or open/read/close system calls in a single threaded environment. I guess someone measured the effect?

What applications benefit from that single-threaded optimization?

Posted Aug 24, 2024 0:45 UTC (Sat) by roc (subscriber, #30627) [Link] (4 responses)

What person in their right mind would take an incredibly performance-sensitive problem and say "the best way to implement this would be to launch a huge number of short-lived single-threaded processes"?

I suppose dezgeg answered this --- `make` :-(.

Maybe it's time for someone to wrap gcc in a service that forks threads or processes after startup.

What applications benefit from that single-threaded optimization?

Posted Aug 24, 2024 0:59 UTC (Sat) by roc (subscriber, #30627) [Link] (3 responses)

Although actually I'm a little confused and don't see why `make` would be a problem here. Al Viro mentioned above that the performance issue is cache-line ping-ponging bumping reference counts on shared file descriptors. But that can only happen if you're manipulating a file descriptor that is actually shared across multiple processes, so the file operations performed by the dynamic linker and the compiler itself would not trigger this.

What applications benefit from that single-threaded optimization?

Posted Aug 24, 2024 10:07 UTC (Sat) by intelfx (subscriber, #130118) [Link] (2 responses)

I'm assuming the atomics themselves are part of the issue — even relaxed, they are necessarily slower than non-atomic accesses.

So as I understood it, the context here is "why not drop the refcounting-elision optimization at all and save some complexity".

What applications benefit from that single-threaded optimization?

Posted Aug 24, 2024 11:21 UTC (Sat) by roc (subscriber, #30627) [Link] (1 responses)

That's not what Viro said up above.

Generally I assume cache-hitting relaxed atomic operations are very cheap as long as you don't have a data dependency on the result, and for refcounts, you don't. When decrementing there is a conditional branch but it should be well predicted. Maybe I'm wrong... if so, I'd like to know more.

What applications benefit from that single-threaded optimization?

Posted Aug 27, 2024 17:50 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

To my understanding, relaxed atomics means:

* Don't tear the result if it's bigger than a word (or if the architecture is otherwise susceptible to tearing values of this size).
* If there would be a data race involving this value, don't produce UB, and instead just pick an arbitrary order at runtime.
* In cases where there would be a data race, the arbitrary order is per-variable and does not need to be consistent with anything else, including relaxed atomic operations on other variables.
* Because arbitrary ordering is permitted, cyclic data dependencies between multiple variables can create temporal paradoxes according to the spec (and, to a limited extent, on some real architectures), but it's still not UB for some weird reason. Reading through the literature, I'm having a hard time understanding why anyone would want to write code with such dependencies using relaxed-atomics in the first place. But we don't care, because we're not doing that anyway.
* Atomic load-modify-store (including atomic increment/decrement) is still a single atomic operation, not three operations, and it still has to produce coherent results when consistently used across all callsites. Ordering is relaxed, but atomicity is not.

So, here's the thing. This is (or at least should be) very cheap if all atomic operations are simple reads and writes. You just emit basic loads and stores, perhaps decorate the IR with some very lax optimization barriers, and don't think about synchronization at all. The architecture will almost certainly do something that is permitted by the existing rules, even without any flushes, fences, interlocking, or other nonsense. But I tend to expect that atomic read-modify-write (as seen in atomic reference counting) cannot assume away the ordering problem so easily. If you have two cores that both have cached copies of the variable, and they both concurrently try to do an atomic increment, the variable must ultimately increase by two, not one, and that implies a certain amount of synchronization between cores. They cannot both emit a simple (non-atomic) load/modify/store sequence and expect everything to be fine and dandy.

Brauners mail server

Posted Aug 24, 2024 9:12 UTC (Sat) by jtepe (subscriber, #145026) [Link] (2 responses)

While looking at the mail conversation for this patch set, I noticed that the mail server of Christian Brauner generates "unconventional" message ids. Two German words (which mostly don't make a whole lot of sense used together) with some randomness as suffix (e.g. "20240805-fehlbesetzung-nilpferd-1ed58783ad4d"). I'd really like to know which server generates these. Doing a quick search revealed nothing.

Brauners mail server

Posted Aug 24, 2024 16:01 UTC (Sat) by cyphar (subscriber, #110703) [Link] (1 responses)

It's not generated by his server, it's generated by neomutt. Last year, neomutt completely broke Message-IDs for kernel development (turning them into completely random strings[1] that make lore.kernel.org links even less legible than they already are). I think he has his muttrc somewhere, but here's mine that does something almost identical (except the random words are English)[2,3].

[1]: https://github.com/neomutt/neomutt/pull/3655
[2]: https://github.com/cyphar/dotfiles/blob/main/.mutt/muttrc...
[3]: https://github.com/cyphar/dotfiles/commit/cc751369dd0b6ac...

Brauners mail server

Posted Aug 24, 2024 19:04 UTC (Sat) by jtepe (subscriber, #145026) [Link]

Thank you. I've never used mutt or neomutt. But I'll check out neomutt.

Leveraging leak/address sanitizer for reference count checking

Posted Aug 29, 2024 6:26 UTC (Thu) by irogers (subscriber, #121692) [Link]

The perf tool followed the kernel style for reference counts. Code would increment reference counts just because, and fixing issues often resulted in use-after-free crashes - not crashing was a priority, memory safety less so. We resolved this without API changes and cleanups by leveraging leak sanitizer as a missing clean up detector and address sanitizer as a checker of double puts, etc. The approach is written up here:
https://perf.wiki.kernel.org/index.php/Reference_Count_Ch...

It may be possible to use the approach in the kernel with kasan and say maintaining the memory obtained by a get on a list of possibly leaked objects that could be dumped at will. The approach could be used to sanity check that the compiler inserts cleanup code appropriately - not a given with things like asm goto.

Reducing overhead from refcounting

Posted Sep 2, 2024 18:15 UTC (Mon) by kmeyer (subscriber, #50720) [Link]

It sounds like many refcount adjustment sites could instead be good candidates for hazard pointer protection. This would lower the coordination overhead required to protect frequent accesses, at some slight delay to destruction. Hazard pointers might also make it possible to eliminate some of the ad-hoc "did we adjust the refcount or not?" logic for single-thread processes, etc.