LWN: Comments on "Improving bindgen for the kernel"

enum conversion

NYKevin — Fri, 11 Oct 2024 03:25:27 +0000

Yes, you can do that, but in practice what people actually do is much worse, usually one of the following:

* Force everything to use a Factory or Builder interface (you too can write Java EE in C++ if you try hard enough).
* Create the object in an "empty" state and then initialize/populate it as a separate step after it has been constructed (the object's empty state is exposed to client code, and not just in a moved-from temporary variable that nobody is going to look at or use, so now all client code has to defensively account for the possibility of an empty object).

There is also the saner option of making the constructor private and exposing a public static method that constructs instances in one step, with validation, and returns std::optional. But the latter was only introduced in C++17, so it's hardly idiomatic in preexisting codebases.

enum conversion

farnz — Thu, 10 Oct 2024 17:03:43 +0000

A nit - C++ doesn't require infallible constructor functions, either; they are expected to throw exceptions for errors. Many places forbid this (Google definitely does, for example), because you either need everything to provide some level of exception safety (weak is enough - you don't need strong here), or you get into a major mess trying to keep track of which things are initialized, and which things aren't.

enum conversion

NYKevin — Thu, 10 Oct 2024 16:50:13 +0000

This comes down to exactly what promises you want your newtype to make (and, in principle, what promises you make to rustc, although AFAIK there is no stable way to make such promises right now). If you promise the user (and/or rustc, for niche optimization) that bit pattern X can never exist in your type, then you can't construct instances with that bit pattern, so you have to either mask them out or panic (or fail in some more graceful way - Rust does not require the use of infallible constructor functions as in C++, so you can return an Option or Result if you so desire). But sometimes, purity yields to pragmatism. If you know that bit pattern X plausibly could appear in the future, and you don't want to require future developers to explicitly accommodate that change, one option would be to preserve all bits but only allow Rust code to construct values within the acceptable range (unless using some sort of from_raw() function, that is safe, but documented as "can return values outside the expected range"). Another option would be to make an enum with one variant for "all set bits are recognized" and another for "some unrecognized bits are set," and then make all of your methods variant-agnostic and let the user write whatever code they see fit. But that probably can't be #[repr(transparent)], so IDK if that's workable in every context.

The other question is how you implement Eq and Hash for such a type. Do you mask out the unused bits before doing the comparison, or do you leave them as-is? Both options feel like they might make sense in some circumstances, so probably the best way is to compare all bits, but also provide a method that explicitly masks out the unexpected bits (and returns a new instance, rather than modifying the existing one in-place, since I assume this object would be Copy anyway).

enum conversion

gdt — Thu, 10 Oct 2024 13:16:14 +0000

If we know (or at least reasonably believe) that C will never give us a value outside of the range of possible flag values...

That assumption is shaky if the source of the value is an I/O register. New for formerly unknown bits can appear in the register. The question is if the language allows that situation to be handled with grace: it's often safe for the program to continue since the hardware vendor does not want its new hardware to cause shipped drivers to fail.

enum conversion

farnz — Thu, 10 Oct 2024 10:34:51 +0000

Note that there's a neat trick to make match statements work in limited cases (i.e. not "is nth bit set", but "is only the nth bit set":

#[repr(transparent)]
#[derive(Copy, Clone, Eq, PartialEq)]
struct Thing(u32);

impl Thing {
    const ONE: Self = Self(1);
    const USER: Self = Self(0x1000);
    const ALL_USER: Self = Self(0x1001);
    const KERNEL: Self =Self (0x2000);
    …
}

This is sometimes useful, if the C enum contains all the combinations you want to match on already - but it's not useful if you want a "don't care" bit.

macro are not const

tialaramex — Thu, 10 Oct 2024 10:12:40 +0000

People from WG14 (the C committee) have complained that WG21 (C++) exerts undue influence trying to get their language to shift in ways that would be convenient for the other language while retaining its claim of "compatibility" without giving much back in exchange.

But I'm not aware of any complaints about Rust, WG21 cares what WG14 standardises because it's awkward to pretend that C++ is somehow a "superset" of a language which has instead evolved in different ways. They would prefer WG14 to take care of all the awkward low level problems they don't care about, treating it as a junior partner which is not appreciated. But Rust doesn't care about C at all and especially about the ISO document. To the extent Rust interfaces with C it's via the implementations for ABI reasons. The ISO standard may say A and B are different, but if in practice they're the same on every real platform, Rust needs to know that.

glibc's approach

fw — Thu, 10 Oct 2024 08:20:49 +0000

With the one-file approach for all headers, I would be more worried about conflicting #defines in the headers. And there are probably #defines that are unused today and expand to something that can no longer be parsed or type-checked. Either way, you'd have to special-case some stuff—or send cleanup patches, given that this is not some open-world scenario that has to deal with arbitrary header files (like regular bindgen).

macro are not const

taladar — Thu, 10 Oct 2024 08:08:40 +0000

That seems unlikely. Why would anyone who hates C put so much effort into the language. Not to mention that most people who hate C probably also hate C++ so your grouping of C++ and Rust together makes for a very unlikely alliance.

Which new features in C23 cause you so much worry anyway, most of the changes seem quite small judging by the feature overview on Wikipedia.

glibc's approach

taladar — Thu, 10 Oct 2024 08:02:23 +0000

It has been a few years since I looked at the source code but I do remember the PHP project doing weird things like that with C preprocessor macros in their source code.

Reason for clang behavior?

taladar — Thu, 10 Oct 2024 07:44:13 +0000

> Clang actually only supports using one precompiled header per source file, and silently ignores any others passed.

Is there a good reason for that or did someone just decide that it wouldn't matter to ignore others because it would "only" impact performance? I would have expected a well written tool to give an error in this situation.

macro are not const

milesrout — Thu, 10 Oct 2024 04:55:56 +0000

It isn't and shouldn't ever be. C23 is C in name only: the C committee has been taken over by Rust and C++ enthusiasts that say explicitly that they hate the language and want it to die, and are treating the C standard development process as a method for transitioning people away from C towards those languages.

enum conversion

NYKevin — Thu, 10 Oct 2024 03:51:16 +0000

Technically, we can do better than that, but only if the C side is willing to play ball.

If we know (or at least reasonably believe) that C will never give us a value outside of the range of possible flag values, then in principle, we could define a newtype around the flag, add functions that validate the flag is within the expected range, and even write a custom Debug impl to automatically expand the flag out into symbolic names (instead of displaying a number). Whether all of that trouble is worth it is a harder question.

Unfortunately, what we can't do is make it compatible with match expressions. It would be nice to be able to match a flag on "is the nth bit set?" with some sort of match pattern, but I don't believe that is practical with Rust's existing match syntax. As such, probably the sensible thing to do is to either work with the flag as-is with typical bit fiddling, or if it is sufficiently complicated and we don't need to share memory with C on an ongoing basis, construct a proper enum or struct to represent the full state of the interface (which is probably more than just a flag variable).

macro are not const

NYKevin — Thu, 10 Oct 2024 03:37:49 +0000

Technically, macro_rules! is mostly equivalent to #define (with numerous minor semantic differences that are not particularly interesting or relevant here), but nobody wants to go around writing NAME!() everywhere when you could just use a const and call it NAME like a normal person.

enum conversion

JoeBuck — Thu, 10 Oct 2024 00:27:49 +0000

An alternative would be to just translate to integers and not to a Rust enum if any explicit numeric values are provided, because this would imply that the values matter and must be preserved.

enum conversion

guillemj — Wed, 09 Oct 2024 23:42:01 +0000

clang has this nice attribute that makes it possible to describe how C enums are intended to be used <https://clang.llvm.org/docs/AttributeReference.html#enum-...>.

This could perhaps be used in Linux by annotating such enums via a macro when the attribute is supported (so when using clang), then I assume bindgen could probably use that to decide how to best map these enums into Rust?

macro are not const

neggles — Wed, 09 Oct 2024 23:19:01 +0000

C has semantic differences between an inserted literal (which a macro definition becomes, since #define NAME 3 is effectively shorthand for "replace all instances of NAME with 3") and a constant variable, especially when the const int is declared in a header file. I'm not sufficiently familiar with the spec to know what the specific differences actually are, so I'll defer to others on that.

Rust lacks an exact equivalent to #DEFINE, but semantically treats a const u32 NAME almost the same as as C treats #define NAME; it's a constant value (albeit with an explicit type) and can thus be folded with other constants in an expression at compile time, etc.

macro are not const

Sesse — Wed, 09 Oct 2024 21:51:09 +0000

C23 finally imported constexpr from C++11. I guess C23 isn't allowed in the kernel yet, though?

macro are not const

roc — Wed, 09 Oct 2024 21:36:17 +0000

In C `const int NAME = 3;` is technically not a compile time constant so you can't do e.g.
int foo[NAME + 1];
Also, `const int NAME = 3;` defines the `NAME` variable, so if you #include that from multiple translation units that are then linked together, you get an error at link time due to multiple definitions.

The C-compatible way to do this is using enums:
enum { NAME = 3 };

editions?

intelfx — Wed, 09 Oct 2024 21:20:03 +0000

> it sounds like the backwards compatible defaults might overtime drift away from "sensible" defaults. is there something like editions that could help with this?

This was my first thought as I was finishing reading the article.

Sounds like even what was described in the article is already enough changes to warrant bundling them together into some sort of `--new-and-better` flag. And, indeed, such a flag should be generalized to an edition selection flag instead.

macro are not const

ballombe — Wed, 09 Oct 2024 20:47:41 +0000

In C, why do
#define NAME 3
instead of
const int NAME=3;
?
I can see several reasons, but all of them are incompatible with
pub const NAME: u32 = 3;

enum conversion

WolfWings — Wed, 09 Oct 2024 20:15:38 +0000

There's plenty of cases where enums are used as grouped bitfields as well. 0x1/0x2/0x3/0x4/0x8/0x10/0x20/0x30 as the defined values for example, so unfortunately 'power of two' wouldn't help as much I'm afraid.

glibc's approach

comex — Wed, 09 Oct 2024 19:25:13 +0000

I had the same thought.

To be fair, bindgen by default tries to generate bindings for all macros, rather than only a subset filtered by regular expression. If you try to group everything into one C file, there will almost certainly be at least one macro that expands to some series of tokens other than a constant expression, causing compilation errors.

But with libclang it should be possible to parse a C file that contains errors, and still extract information about the non-erroneous parts. Or you could run multiple passes (but still fewer than one for every macro).

There would still be the risk of macro expansions 'escaping' from their references and producing unexpected definitions. Suppose bindgen generated a single source file that looked like

enum { foo = (FOO) };
enum { bar = (BAR) };
enum { baz = (BAZ) };

and so on for every macro. What if one of the macros had a definition like this?

#define FOO ) }; enum { baz = (1000

…That's probably too pathological a case to care about, though.

There's also the option of just biting the bullet and adding a regex filter. Bindgen already supports such filters ('allowlists'); they're just not enabled by default. I don't know whether Linux uses them.

glibc's approach

fw — Wed, 09 Oct 2024 18:41:47 +0000

This seems rather slow? In glibc, we use gcc -E -dM to get the macro names, and then generate assembler files for all constants of interest in one go: https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/...

I think it takes a fraction of a second per header file, pretty much independently of the number of macro values we need to extract.

enum conversion

JoeBuck — Wed, 09 Oct 2024 18:12:38 +0000

Often in C, enums are used as bitmasks and defined to have values like 1, 2, 4, 8 etc, and bitwise-ors of the values are considered valid values. This design doesn't map cleanly into a Rust enum. I suppose the case can be detected by noting when the numerical values associated with enum constants are powers of two: in this case the translation should produce integer variables and constants of appropriate width, rather than a Rust enum. Or perhaps there's a type-safe way to implement this kind of model.

editions?

shironeko — Wed, 09 Oct 2024 17:59:40 +0000

it sounds like the backwards compatible defaults might overtime drift away from "sensible" defaults. is there something like editions that could help with this?