|
|
Subscribe / Log in / New account

Moving the kernel to modern C

By Jonathan Corbet
February 24, 2022
Despite its generally fast-moving nature, the kernel project relies on a number of old tools. While critics like to focus on the community's extensive use of email, a possibly more significant anachronism is the use of the 1989 version of the C language standard for kernel code — a standard that was codified before the kernel project even began over 30 years ago. It is looking like that longstanding practice could be coming to an end as soon as the 5.18 kernel, which can be expected in May of this year.

Linked-list concerns

The discussion started with this patch series from Jakob Koschel, who is trying to prevent speculative-execution vulnerabilities tied to the kernel's linked-list primitives. The kernel makes extensive use of doubly-linked lists defined by struct list_head:

    struct list_head {
	struct list_head *next, *prev;
    };

This structure is normally embedded into some other structure; in this way, linked lists can be made with any structure type of interest. Along with the type, the kernel provides a vast array of functions and macros that can be used to traverse and manipulate linked lists. One of those is list_for_each_entry(), which is a macro masquerading as a sort of control structure. To see how this macro is used, imagine that the kernel included a structure like this:

    struct foo {
    	int fooness;
	struct list_head list;
    };

The list member can be used to create a doubly-linked list of foo structures; a separate list_head structure is usually declared as the beginning of such a list; assume we have one called foo_list. Traversing this list is possible with code like:

    struct foo *iterator;

    list_for_each_entry(iterator, &foo_list, list) {
    	do_something_with(iterator);
    }
    /* Should not use iterator here */

The list parameter tells the macro what the name of the list_head structure is within the foo structure. This loop will be executed once for each element in the list, with iterator pointing to that element.

Koschel included a patch fixing a bug in the USB subsystem where the iterator passed to this macro was used after the exit from the macro, which is a dangerous thing to do. Depending on what happens within the list, the contents of that iterator could be something surprising, even in the absence of speculative execution. Koschel fixed the problem by reworking the code in question to stop using the iterator after the loop.

The plot twists

Linus Torvalds didn't much like the patch and didn't see how it related to speculative-execution vulnerabilities. After Koschel explained the situation further, though, Torvalds agreed that "this is just a regular bug, plain and simple" and said it should be fixed independently of the larger series. But then he wandered into the real source of the problem: that the iterator passed to the list-traversal macros must be declared in a scope outside of the loop itself:

The whole reason this kind of non-speculative bug can happen is that we historically didn't have C99-style "declare variables in loops". So list_for_each_entry() - and all the other ones - fundamentally always leaks the last HEAD entry out of the loop, simply because we couldn't declare the iterator variable in the loop itself.

If it were possible to write a list-traversal macro that could declare its own iterator, then that iterator would not be visible outside of the loop and this kind of problem would not arise. But, since the kernel is stuck on the C89 standard, declaring variables within the loop is not possible.

Torvalds said that perhaps the time had come to look to moving to the C99 standard — it is still over 20 years old, but is at least recent enough to allow block-level variable declarations. As he noted, this move hasn't been done in the past "because we had some odd problem with some ancient gcc versions that broke documented initializers". But, in the meantime, the kernel has moved its minimum GCC requirement to version 5.1, so perhaps those bugs are no longer relevant.

Arnd Bergmann, who tends to keep a close eye on cross-architecture compiler issues, agreed that it should be possible for the kernel to move forward. Indeed, he suggested that it would be possible to go as far as the C11 standard (from 2011) while the change was being made, though he wasn't sure that C11 would bring anything new that would be useful to the kernel. It might even be possible to move to C17 or even the yet-unfinished C2x version of the language. That, however, has a downside in that it "would break gcc-5/6/7 support", and the kernel still supports those versions currently. Raising the minimum GCC version to 8.x would likely be more of a jump than the user community would be willing to accept at this point.

Moving to C11 would not require changing the minimum GCC version, though, and thus might be more readily doable. Torvalds was in favor of that idea: "I really would love to finally move forward on this, considering that it's been brewing for many many years". After Bergmann confirmed that it should be possible to do so, Torvalds declared: "Ok, somebody please remind me, and let's just try this early in the 5.18 merge window". The 5.18 merge window is less than one month away, so this is a change that could happen in the near future.

It is worth keeping in mind, though, that a lot of things can happen between the merge window and the 5.18 release. Moving to a new version of the language standard could reveal any number of surprises in obscure places in the kernel; it would not take many of those to cause the change to be reverted for now. But, if all goes well, the shift to C11 will happen in the next kernel release. Converting all of the users of list_for_each_entry() and variants (of which there are well over 15,000 in the kernel) to a new version that doesn't expose the internal iterator seems likely to take a little longer, though.

Index entries for this article
KernelBuild system
KernelGCC


to post comments

Moving the kernel to modern C

Posted Feb 24, 2022 15:12 UTC (Thu) by ballombe (subscriber, #9523) [Link] (8 responses)

Note that, if for some reason you need to stay with c89, you can always add a block around the for() statement to hold the loop variable.

Moving the kernel to modern C

Posted Feb 24, 2022 15:51 UTC (Thu) by smurf (subscriber, #17840) [Link] (7 responses)

You'd need to do that to each caller, which is a *lot* of code churn.

Moving the kernel to modern C

Posted Feb 25, 2022 3:08 UTC (Fri) by kmeyer (subscriber, #50720) [Link] (5 responses)

You would just do it once, in the macro definition.

Moving the kernel to modern C

Posted Feb 25, 2022 6:24 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)

Based on the code shown in this article, I'm not clear on how you would actually do that. It *looks* like the macro expands to while(something) or for(something), and doesn't even have the "regular" set of braces, so you can't edit it to include an outer set of braces without modifying the call sites.

Moving the kernel to modern C

Posted Feb 25, 2022 7:08 UTC (Fri) by josh (subscriber, #17465) [Link]

I don't think you could with Linux's list macros. You could with Sparse's, which have pairs of macros invoked at the start and end of each loop (and which use two levels of braces).

But C99 makes this easy.

Moving the kernel to modern C

Posted Feb 25, 2022 11:33 UTC (Fri) by 4m1rk (guest, #157085) [Link] (2 responses)

The macro just creates the `for` expression. If you need to put the for expression inside a block then the macro needs to accept the body of the for too. Do C macros allow passing a code block?

Moving the kernel to modern C

Posted Feb 27, 2022 16:24 UTC (Sun) by mina86 (guest, #68442) [Link] (1 responses)

> Do C macros allow passing a code block?

Yes. You can pass pretty much whatever you want so long as parenthesise match and there is no comma outside of parenthesise.

Moving the kernel to modern C

Posted Mar 9, 2022 23:56 UTC (Wed) by bartoc (guest, #124262) [Link]

The comma is a bit of a huge pita if you're passing a whole loop body.

Moving the kernel to modern C

Posted Feb 27, 2022 23:59 UTC (Sun) by Sesse (subscriber, #53779) [Link]

#define for if (false); else for

:-)

Moving the kernel to modern C

Posted Feb 24, 2022 16:22 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Is "documented initializers" Linus's typo for "designated initializers"?

Moving the kernel to modern C

Posted Feb 24, 2022 17:36 UTC (Thu) by iabervon (subscriber, #722) [Link] (31 responses)

Instead of leaking a not-necessarily-valid pointer, couldn't the macro set it to NULL at the end? Actually, I'm surprised there isn't a standard trick for doing an assignment that will be an error unless the compiler eliminates it as dead code.

Moving the kernel to modern C

Posted Feb 24, 2022 17:59 UTC (Thu) by Paf (subscriber, #91811) [Link] (28 responses)

Cost. Pretty significant cost in some cases for something that shouldn’t even be necessary.

Moving the kernel to modern C

Posted Feb 24, 2022 20:09 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (5 responses)

It should be cost-free in any case where the iteration variable isn't accessed after the loop, since the compiler would eliminate the dead store. The code change is also fairly trivial: just edit the condition from "&pos->member != (head)" to "(&pos->member != (head)) || ((pos = NULL))".

Unfortunately this alone doesn't handle loops which exit early due to "break" or "goto". The "goto" case is unavoidable, but the "break" case can be dealt with by wrapping the macro in a second, trivial loop as shown in this example[0]. Note that the generated code (for gcc 5.1 with -O2) is *identical* between the version with the extra loop (traverse1) and the original version which does not set the iterator to NULL after the loop (traverse2). The initialization of the iterator to the flag state (-1), the condition for the outer loop, and the store of NULL to the iterator after the loop are all successfully eliminated.

[0] https://godbolt.org/z/4obYManzc

Moving the kernel to modern C

Posted Feb 24, 2022 20:48 UTC (Thu) by iabervon (subscriber, #722) [Link] (3 responses)

It might work to have:

extern unsigned long list_iterator_live_after_loop;

and "|| ((pos = (void *) list_iterator_live_after_loop), 0)"

I didn't try changing the kernel macro that way, but my little test code doesn't link if the iterator is used after the loop, but does link and work if it's not used. As I recall, the kernel is already using that sort of trick to use compiler optimization to remove an error message only if the compiler can disprove it.

Moving the kernel to modern C

Posted Feb 25, 2022 22:20 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

Unfortunately, this is probably UB: https://en.cppreference.com/w/c/language/extern

> The entire program may have zero or one external definition of every identifier with external linkage.
>
> If an identifier with external linkage is used in any expression other than a non-VLA, (since C99) sizeof, or _Alignof (since C11), there must be one and only one external definition for that identifier somewhere in the entire program.

There's no exception for short-circuit operators. If you use it at compile time, for anything other than sizeof, then it has to exist (have storage allocated somewhere).

Moving the kernel to modern C

Posted Feb 26, 2022 0:56 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

It's an UB according to the standard. But Linus very rarely is concerned with that: he tends to accept such stupidity only when there are no way convince compiler to stop breaking sane (from Linux developer's POV!) code.

It's one of the reasons about why GCC is the only supported compiler, BTW.

And GCC not just supports that feature, it even provides __attribute__((__error(msg))) extensions to make error messages more explicit. And GLibC uses it to define __errordecl macro.

Moving the kernel to modern C

Posted Feb 26, 2022 2:34 UTC (Sat) by foom (subscriber, #14868) [Link]

And this sort of bad idea is why the kernel is only possible to compile with optimizations enabled.

It would be a lot better if Linux used c++ constexpr functions and templates for compile time evaluation semantics, instead of abusing the optimizer to very poorly emulate them.

Moving the kernel to modern C

Posted Feb 25, 2022 1:19 UTC (Fri) by ianloic (subscriber, #54050) [Link]

It's kind of fascinating how small the cost is, even when using the pointer afterwards: https://godbolt.org/z/cbv4fqan3

Moving the kernel to modern C

Posted Feb 24, 2022 20:26 UTC (Thu) by iabervon (subscriber, #722) [Link] (21 responses)

Oh, I meant to imply that the compiler would eliminate all of those writes except for ones that expose bugs, but then I got side-tracked by wondering if you could make the kernel not even link unless the compiler eliminated the write. Anyway, it wouldn't affect the generated code unless the compiler can't tell the code is correct.

Moving the kernel to modern C

Posted Feb 24, 2022 22:12 UTC (Thu) by Paf (subscriber, #91811) [Link] (20 responses)

Very good point.

God I’d sure love to get to a newer C standard though…

Moving the kernel to modern C

Posted Feb 25, 2022 8:54 UTC (Fri) by ncm (guest, #165) [Link] (19 responses)

The smarter move would be to start compiling the kernel with a C++ compiler. A weenier step would be to accept source files with a ".cc" suffix and build those with a C++ compiler.

There would, in any case, be no need to step outside Gcc, where in fact that was done long ago, with no disruption, but with massive benefits. Anybody spooked about C++ should understand that Gcc and Clang are both coded in C++, whatever the language you compile on them.

Similarly, anybody spooked by C++ "hidden code" should understand that Rust does literally all of the things they are spooked by; and all of its power comes from that.

Staying on ancient EOL'd language Standards does nobody any good.

Moving the kernel to modern C

Posted Feb 25, 2022 9:02 UTC (Fri) by mpr22 (subscriber, #60784) [Link] (1 responses)

I suspect the people most likely to be spooked (for whatever value of "spooked") by any "hidden code" aspect of C++ are equally likely to be similarly spooked by the similar aspects of Rust.

Moving the kernel to modern C

Posted Feb 25, 2022 9:33 UTC (Fri) by ncm (guest, #165) [Link]

If they understood programming better, they would be more spooked by their C compiler failing to emit such code where, without, the code they wrote is buggy.

Moving the kernel to modern C

Posted Feb 25, 2022 9:30 UTC (Fri) by Wol (subscriber, #4433) [Link] (12 responses)

A weenier step accepting .cc files? The problem with that is "all of C++" is a security / close-to-the-metal nightmare, and the definition of what is the acceptable subset varies with who you talk to.

A further problem is the size / speed of the code. Yes C++ is *mostly* pretty good, but I suspect the compiler devs will barf on that word "mostly".

To what extent does kernel C currently drop out of C into assembler, and to what extent will C++ make that worse?

No I don't actually know the answers, I'm just predicting the devs' reactions.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 25, 2022 21:24 UTC (Fri) by ncm (guest, #165) [Link] (10 responses)

You demonstrate you know neither the answers, nor the questions, and are simply making things up as you go along. Just stop.

Moving the kernel to modern C

Posted Feb 26, 2022 13:46 UTC (Sat) by Paf (subscriber, #91811) [Link] (9 responses)

Well, it’s nice to know you’ve solved the problems and all of the kernel devs opposed are just fools who cannot see the light. I’m no expert in this particular area but I suspect there’s a *small* chance they’re not just all idiots.

Moving the kernel to modern C

Posted Feb 26, 2022 22:30 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (4 responses)

I would not say they are fools. But they are somewhat hidebound - viz the use of email as their primary collaboration system. They are also as capable as the rest of us to have prejudices. C++ in the 1990s was not a good choice for a Kernel - the programming paradigms popular then, the quality of compilers then and the language features then were not there. We were also a lot less aware of the potential for security holes that C code was subject to without a careful eye. Things have changed. Knowledge has been acquired about the best approaches to things. It’s been over 20 years. But as evidenced by some comments on this article - attitudes change slowly. Linux is unusual in major software projects as being very much a product of one person’s idiosyncrasies. There is a certain groupthink that it is particularly susceptible too. Different ways of thinking have historically led to an unpleasant email.
I suppose what I am saying is that experience outside the Kernel community suggests that using a language which enables the us of low or zero cost abstractions and automates resource management is a good idea. Trying to emulate these features in C89 (using macros!) because C89 is all you have is not a good solution. Preserving with C because it’s what you know is also not the best for a project that is used as a bedrock of modern computing.
Kernel mode C++ standards exist. They’re quite reasonable and not hard to implement or learn.

Moving the kernel to modern C

Posted Feb 27, 2022 0:09 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> I would not say they are fools. But they are somewhat hidebound - viz the use of email as their primary collaboration system.

So you're another idiot who thinks newer = better.

I'm not saying these new-fangled things don't work for you. And plenty of kernel devs use newer tools. But all these idiots going "ooh! new! shiny!" make life hell for people doing the work.

I'm on a kernel mailing list. And reading the emails, I feel like tearing my hair out sometimes. But moving to Rust seems a far better solution than C++.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 27, 2022 8:13 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (2 responses)

Newer does not always equal better, but using technology to solve problems is literally the business we are in. Submission of patches and PRs by email has real implications - security for one, less open and effective collaboration for another. Again, I’m not saying that it wasn’t the right solution for its time but there are alternatives that work better.
And as for Rust, I’m a big fan of it but there is no way that the Kernel is going to be rewritten in Rust. It’s much more viable to replace some data structures and paradigms with C++ alternatives. This is possible without rewriting everything or completely retraining all existing contributors.
And I don’t think calling people idiots is particularly helpful to what is a technical discussion.

Moving the kernel to modern C

Posted Feb 27, 2022 10:37 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

You just admitted yourself in another comment that "using C++ is not a technical problem". The problem is the social problem of making sure the C++ code really is smaller and more efficient than the C macros it replaces.

AND THERE ARE TOO MANY "OOH NEW SHINY" LEMMINGS...

Likewise moving away from email - the problem is a social problem - there AREN'T ENOUGH DEVELOPERS. I think a fair few subsystems ARE developed using solutions like github, gitlab, whatever. If it really worked, surely that model would spread rapidly. But it's not working, and it's not "not working" because the solution is better or worse, it's not working because there aren't enough people to make either solution work.

And actually, probably one of the biggest problems with C++, is that IT'S NOT TRANSPARENT. Developers don't have a clear model in their mind of HOW it works. I come from the days when tomorrow's weather forecast took a day to run on the most powerful computers if you were lucky! If you can't model performance down to the bare metal, you have no clue how long the program is going to take to run. That's one of the reasons the kernel has held back on compilers so long, the disconnect between engineering reality and theoretical correctness.

That's one of the reasons I rant about relational. With Pick the database is transparent - as an application developer I can REASON about performance right through to the OS. That's what's so hard with C++ - the devs can NOT reason through to the hardware. (Okay, it's getting harder even with C, but it's not obfuscated ...)

Cheers,
Wol

Stop this please

Posted Feb 27, 2022 16:51 UTC (Sun) by corbet (editor, #1) [Link]

Surely we can find a way to discuss things without calling each other idiots or lemmings, right? Please don't do this anymore.

Moving the kernel to modern C

Posted Mar 5, 2022 12:28 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

[Caveat: haven't written lots of C++ in years, and that was back in the early '10s. The language has changed and I haven't kept up, I know...]

The thing ncm likely meant was that Wol's imagined kernel-developers' worries are not actually what they are documented as having worried about. C++ is not a security nightmare, not any more than C, anyway; it hasn't been worse than C optimization-wise for about twenty years; and there is no sudden extra need to drop into asm just because this is C++ (C++ is still nearly a superset of C, and this is just as true of the GNU variant).

Their stated worries are more that C++ has abstractions that enable things to magically happen behind your back with no immediate indication at the call site, and since the kernel developers are really looking for a portable assembler a lot of the time, where everything the machine does is obvious, this is *far* from what they want: they have their hands full coping with parallelism-induced complexities, memory model complexities, looking out for speculative execution gadgets etc etc, without worrying about the apparently same code doing wildly different things depending on what type they're operating on.

Many of C++'s transparently-do-things features are routinely used and almost essential to use anything resembling modern C++: references are the classic case (now function parameters' values can change in the caller without an & at the call site), but also C++ before std::move used to not make it terribly clear whether things were being copied or not (and it was at the very least wordy to enforce one alternative), and even now we have things like stringviews which seem to come with built-in footguns. (Of course, the kernel would never use such pieces, but code review would need to make sure they never crept in... and you only need to forget *once*). Many of the pieces that *don't* amount to 'do this invisibly albeit usually helpfully' are related to templates, and, uh... the kernel is nonswappable code in which size is at a premium, and having the compiler promiscuously generate code to monomorphize templates on the fly was anathema for a long time (though Rust does the same thing, and people seem to be complaining less: maybe RAM is just that much cheaper now and kernel code size is less important? but icache bloat still matters, and Linus has worried about it in public, and that was years ago and it's worse now).

And that's without even mentioning the really big painful problem, so big and painful that there are still compiler switches to disable the feature entirely, so big and painful that it took decades to figure out how to write code safely in the presence of these things and the last time I looked at it the safe code was extremely unobvious and if it was wrong you were unlikely to know for many years until things blew up, because there was no way to automatically check for safety: exceptions. Lovely idea, makes code's non-exceptional path much clearer, but the implementation explodes exceptional flow paths and *all of them are invisible* and many might be in what looks like the middle of an atomic, indivisible entity to someone not thinking "what if this were overloaded and threw?". If you use RTTI for absolutely everything religiously you don't need to worry, but you only have to forget and do manual cleanup once and you're in trouble when you next get an exception passing through that region. The kernel would obviously never use exceptions in the first place, mind you. Of course that now makes it impossible for destructors to fail, which probably rules out *use* of destructors for anything nontrivial, which means you can't use RTTI, which means you can't write anything resembling modern C++. You don't pay for what you don't use, but many of the bits require many of the other bits to use them non-clumsily, and then many of those bits are papering over design faults in the earlier bits -- std::move, again -- and the result of adding all those bits together is *ferociously* complex.

Moving the kernel to modern C

Posted Mar 5, 2022 13:37 UTC (Sat) by Wol (subscriber, #4433) [Link]

> The thing ncm likely meant was that Wol's imagined kernel-developers' worries are not actually what they are documented as having worried about. C++ is not a security nightmare, not any more than C, anyway; it hasn't been worse than C optimization-wise for about twenty years; and there is no sudden extra need to drop into asm just because this is C++ (C++ is still nearly a superset of C, and this is just as true of the GNU variant).

> Their stated worries are more that C++ has abstractions that enable things to magically happen behind your back with no immediate indication at the call site, and since the kernel developers are really looking for a portable assembler a lot of the time, where everything the machine does is obvious, this is *far* from what they want: they have their hands full coping with parallelism-induced complexities, memory model complexities, looking out for speculative execution gadgets etc etc, without worrying about the apparently same code doing wildly different things depending on what type they're operating on.

Actually, this is pretty much exactly what I was trying to say ... that C++ does things behind your back, and when you're trying to make sure that your code fits in L1 cache or whatever, code bloat is SERIOUS STUFF.

How often do you see kernel developers talking about "fast path"? Quite a lot. And it only takes C++ to do something you don't expect and the fast path will become orders of magnitude slower. WHOOPS!

Cheers,
Wol

Moving the kernel to modern C

Posted Mar 11, 2022 16:51 UTC (Fri) by timon (subscriber, #152974) [Link] (1 responses)

I think you mean RAII (Resource acquisition is initialization) instead of RTTI (Run-time type information).

Moving the kernel to modern C

Posted Mar 17, 2022 16:27 UTC (Thu) by nix (subscriber, #2304) [Link]

Um, yes, of course. I've had a mental hash collision between those two acronyms for literally decades, to such an extent that if you asked me what RTTI stood for I'd often say "resource acquisition is oh wait".

Moving the kernel to modern C

Posted Feb 26, 2022 22:16 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link]

The two aspects of C++ that are usually disabled are RTTI and exceptions. GCC and Clang both disable at least one if not both of the features (I believe GCC both and LLVM at least RTTI). C++ code which just uses RAII and non-virtual methods (an awful lot of modern C++ code) optimises to the same as equivalent C but does not really on people remembering to do all the manual steps and help abstract. Apple and Microsoft both have standards for Kernel mode C++ that could be used as a starting point if need be.

Moving the kernel to modern C

Posted Feb 25, 2022 14:52 UTC (Fri) by jd (guest, #26381) [Link] (3 responses)

If we aren't going to go with ancient, then move to D. It's much, much newer and doesn't carry anything like the risks or overheads of C++. (Not that I'd recommend it, for other reasons, but it helps illustrate the age of the standard shouldn't matter as much as the quality of the result.)

Is there markup for any of the static checkers beloved by kernel developers that could be used to improve the quality of the results? (And when was the last time Coverity checked the kernel?)

There must be plenty that could be done to improve the kernel code without a drastic change of language.

Moving the kernel to modern C

Posted Feb 25, 2022 18:04 UTC (Fri) by davej (subscriber, #354) [Link] (1 responses)

> And when was the last time Coverity checked the kernel

99% of the time, the answer to this question is the same as "when did Linus last cut an -rc/final".
I usually kick off a run the same day, failing that the following morning.

Moving the kernel to modern C

Posted Feb 28, 2022 18:38 UTC (Mon) by jd (guest, #26381) [Link]

That's very comforting to know. I don't know how much it is picking up these days, but I've been haunting the fringes long enough to know that it has always provided a very important service to kernel developers. (Through the mead haze, I can dimly recall when it was first used.)

Moving the kernel to modern C

Posted Feb 25, 2022 21:33 UTC (Fri) by ncm (guest, #165) [Link]

I doubt Coverity works on D code. It does work on C++. "Risks and overheads" is wholesale speculation, unwelcome here.

Moving the kernel to modern C

Posted Feb 24, 2022 21:11 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link] (1 responses)

It would break code that does this:
list_for_each_entry(iterator, &foo_list, list) {
    	if (do_something_with(iterator)) {
    		break;
    	}
}
if (list_entry_is_head(iterator, &foo_list, list)) {
	// iteration finished
} else {
	do_something_else_with(iterator);
}
All this "compare to head" nonsense is why I prefer regular NULL-terminated linked lists to the kernel's circular linked lists. Insert/delete may take more instructions but iteration is much easier.

Moving the kernel to modern C

Posted Feb 24, 2022 21:55 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

Yes, and you also have macros like for_each_list_entry_continue() which depend on the value being left in the iterator. All of these would also break if the macro was changed to declare the iterator inside the `for` statement, C99-style.

One way to work around the problem in your example would be to move the condition inside the loop, like this:

list_for_each_entry(iterator, &foo_list, list) {
    // ...
    if (do_something_with(iterator)) {
        do_something_else_with(iterator);
        break;
    }
    // ...
    if (&iterator->list == &foo_list) {
        // this is the last entry; iteration finished
    }
}

The compiler should be smart enough to avoid checking the end condition twice in each iteration. Of course this becomes much less convenient if there is more than one break statement.

Moving the kernel to modern C

Posted Feb 24, 2022 17:46 UTC (Thu) by flussence (guest, #85566) [Link]

My thought is that it's silly to require compatibility with standards old enough that even software implementing their newer versions are falling out of long-term support. There are other cases where arguably there's a risk of causing a flag day, but moving off of C89 isn't one of them.

Moving the kernel to modern C

Posted Feb 24, 2022 18:47 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (47 responses)

Yay!

Don't forget this one too:

# warn about C99 declaration after statement
KBUILD_CFLAGS += -Wdeclaration-after-statement

Moving the kernel to modern C

Posted Feb 24, 2022 19:37 UTC (Thu) by zuzzurro (subscriber, #61118) [Link]

If the plan is to make this move at the beginning of the next cycle, shouldn't the -next kernel adopt it right now?

Moving the kernel to modern C

Posted Feb 24, 2022 20:30 UTC (Thu) by marcH (subscriber, #57642) [Link] (45 responses)

Yes please, finally! Combined declarations and initializations like every other programming language. More 'const' and fewer "this variable 'may' be used uninitialized" guessing/silliness. No more reverse Christmas trees.

In even more advanced languages 'const' is the default but let's not get carried away; too much maths that could scare hardware engineers emotionally attached to their registers.

Moving the kernel to modern C

Posted Feb 25, 2022 9:29 UTC (Fri) by geert (subscriber, #98403) [Link] (6 responses)

> In even more advanced languages 'const' is the default

"const" is the default for /var/iables?!?

Moving the kernel to modern C

Posted Feb 25, 2022 9:37 UTC (Fri) by ncm (guest, #165) [Link]

Let us not confuse the name with the thing named.

Moving the kernel to modern C

Posted Feb 26, 2022 0:14 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (1 responses)

Const is the default for named values. Mutability is opt in in Rust. C and C++ have const as the opt in.
Scala has different words for mutable and non-mutable values (var and val respectively.)

Constant v Immutable

Posted Feb 28, 2022 19:10 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

Rust distinguishes constants from variables which simply can't be mutated. By default you get a variable but it can't be mutated. You can declare constants instead with "const" or, if you annotate your variable with "mut" you allow the variable to be mutated subsequently.

let cannot_change = some::expression(with_variables_if_you, want);

const COMPILE_TIME: u32 = my::WayToGet::a_constant_value(PERHAPS_FROM_OTHER_CONSTANTS);

let mut count = 0; /* We will change this, presumably when counting stuff */

At compile time the constants must be well, constant, (over time the amount of labour the compiler is willing to undertake to determine what that constant *is* has increased, as it has in C++ to a much greater extent) but the ordinary immutable value is not known at compile time, yet, it is immutable (of course if it's inside the scope of a loop, it will be conjured into existence, perhaps with a new value, each time the loop runs)

This means Rust programs naturally do Kate Gregory's first step from maintaining a C++ codebase, marking everything immutable and then only marking as mutable the stuff that actually changes, so now the maintenance programmer has some idea what's actually going on.

If you're familiar with C, Rust's const is like a type-safe improvement on #define and Rust's default immutable variables are more like C's const. You may notice that the types were elided from my variable examples but not the constant, Rust insists on being explicitly told the type of constants but it will infer types for many variables from how they're defined or used.

Moving the kernel to modern C

Posted Feb 28, 2022 3:36 UTC (Mon) by marcH (subscriber, #57642) [Link] (2 responses)

> > In even more advanced languages 'const' is the default
>
> "const" is the default for /var/iables?!?

Sorry, I should used the standard name "non-modifiable lvalue"  /s

More seriously, you're highlighting a serious "const" problem in the programming languages and culture and especially in C. It's very confusing to call "constant" something in a local scope that does not change after initialization but that is _different_ everytime the including function is called. So yes, an "immutable variable" is the unfortunate and confusing name used to make that difference.

Rust does make a formal difference between 1) constant, 2) immutable and 3) mutable variables:https://doc.rust-lang.org/book/ch03-01-variables-and-muta...

Consider this example:

some_function()
{
... 
z = f(g(x1) + h(x2)) / (j(x3) - k(x4)) - l(x5) + ... ; 
...
}

There is a simple reason why most people don't write code like this and why they break it down into multiple steps: readability. Not just to avoid very long lines but to simply give a good NAME to carefully chosen checkpoints in the middle:

some_function()

  ... some code, including of course some statements and not just declarations ...

   const meaningful_name1 = f(g(x1) + h(x2) ;
const meaningful_name2 =  j(x3) - k(x4);
etc. 
 ...
}

Funny enough, I've sometimes seen this lack of intermediate "variables" being abused by people new to functional languages ("look Ma, no variables!). It's especially tempting when you have a ternary operator more readable than " cond ? A : B". I digress.

It's sad that many programming languages seem to care so little about the difference between read-only and read/write when mutability is in fact the most critical programming concept for both correctness (unintended side effects) and concurrency:
https://doc.rust-lang.org/book/ch16-00-concurrency.html

Every documentation about concurrency, locking, RCU and what not uses the words READ and WRITE every other line. Yet C does not care and calls everything "a variable". Can you see a problem / gap here?  C, the low level language  supposedly in charge of managing  memory accessed concurrently by devices and multicores got a formal memory model in... 2011! After Java and I believe by basically borrowing the C++ one. RCU and locking experts aside, the vast majority of kernel developers  underestimates or even ignores the ridicule of that C-tuation.

And of course the more read-only variables you have, the less likely you are to modify them by mistake. Can't hurt when coding in _the_ language of memory corruptions.

C has been influenced too much by the hardware engineering perspective where a variable is a memory location / register and not enough by the more "mathematical" view where a variable is just a name given to the result of some computation. Allowing declarations after statements is a baby step but into the right direction. All grown-up languages have already taken this step.

Moving the kernel to modern C

Posted Feb 28, 2022 8:17 UTC (Mon) by geert (subscriber, #98403) [Link]

Thanks, I do like the mathematical view!
And allowing declarations after statements is a requirement for making intermediate results of non-trivial processing const.

Moving the kernel to modern C

Posted Mar 1, 2022 0:09 UTC (Tue) by marcH (subscriber, #57642) [Link]

Forgot the classic reference: https://queue.acm.org/detail.cfm?id=3212479

C Is Not a Low-level Language
Your computer is not a fast PDP-11.
David Chisnall

> Caches are large, but their size isn't the only reason for their complexity. The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course. Consider in contrast an Erlang-style abstract machine, where every object is either thread-local or immutable

Etc.

Moving the kernel to modern C

Posted Feb 25, 2022 9:39 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (37 responses)

> Combined declarations and initializations like every other programming language.

Please no! That's the most horrible thing I hate in modern C.

Normally when reviewing code and looking for a variable, you just need to glance at
the top of each upper level opening brace and nothing more. With those insane
declarations after statement, you work like in bash: you have to read *ALL* lines
above where you are, hoping you didn't miss the right one. This serves absolutely
no purpose, and only has for effect to complicate code reviews and ease introduction
of new bugs.

Moving the kernel to modern C

Posted Feb 25, 2022 11:14 UTC (Fri) by Wol (subscriber, #4433) [Link] (14 responses)

You're confusing "use" and "initialise".

Don't allow mixing USE and declaration. But DO allow the *compiler* to set the initial value.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 25, 2022 15:13 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (13 responses)

I'm not sure we're speaking about the same thing. I'm speaking about not making this monstrosity possible, where I'd say "good luck" for figuring the type of "i" depending on the line you're reading, and its bounds:

#include <stdio.h>
#include <unistd.h>

int blah(long x, int j)
{
long i = x ? x : -1;
int k = i;

for (int i = 1; i < j; i++) {
k += i * 2;
char i = (k & 1) ? 'O' : 'E';
int pid = getpid();
printf("i=%d j=%d pid=%d\n", i, j, pid);
}
return k;
}

PS: sorry for the formatting, I didn't find how to make a code block.

Moving the kernel to modern C

Posted Feb 25, 2022 16:16 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

Making a code block on LWN needs two tags in HTML formatting: <pre> to indicate that formatting matters, and <tt> to indicate that you want monospaced fonts. Below is <pre><tt> followed by your code (with indentation added by my brain), followed by </tt></pre> - I've also had to escape special characters with HTML escapes (but it's a simple matter to write code to do this for you).


#include <stdio.h>
#include <unistd.h>

int blah(long x, int j)
{
    long i = x ? x : -1;
    int k = i;

    for (int i = 1; i < j; i++) {
        k += i * 2;
        char i = (k & 1) ? 'O' : 'E';
        int pid = getpid();
        printf("i=%d j=%d pid=%d\n", i, j, pid);
    }
    return k;
}

Moving the kernel to modern C

Posted Feb 28, 2022 10:15 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> I've also had to escape special characters with HTML escapes

Thanks. That was the thing that made me think I was heading the wrong direction and that possibly there was something simpler in order to just paste a piece of code.

Moving the kernel to modern C

Posted Feb 25, 2022 16:51 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

I thought that was allowed in ancient C ...

Unless you mean actually declaring inside the "if" statement ... but I thought declaring after a { was permitted anywhere. I dunno, it's ages since I've programmed C in anger.

But it would be nice to say you can ONLY declare after a {, but that includes things like "int i = 1". You shouldn't be able to do things like "int i; i=1; int j; j=2;", though.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 25, 2022 18:44 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

In C89 and GNU89 declarations must occur before statements within each block. C89 additional requires initializers to be compiler-time constants. However, it's not as if this equivalent GNU89 code is any easier to follow:

#include <stdio.h>
#include <unistd.h>

int blah(long x, int j)
{
    long i = x ? x : -1;
    int k = i;
    {
        int i;
        for (i = 1; i < j; i++) {
            k += i * 2;
            {
                char i = (k & 1) ? 'O' : 'E';
                {
                    int pid = getpid();
                    printf("i=%d j=%d pid=%d\n", i, j, pid);
                }
            }
        }
    }
    return k;
}

The real lessons here are "use meaningful names" and "avoid shadowing".

Moving the kernel to modern C

Posted Feb 26, 2022 0:21 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link]

This is called variable shadowing and is possible in C89 as well. It can be very ugly which is why it is discouraged although a good compiler should warn on shadowing.
A good practice in modern languages is to combine declaration and initialisation so that you reduce the chance of accessing an uninitialised value. It also encourages locality in the code which makes it easier to comprehend.

Moving the kernel to modern C

Posted Feb 28, 2022 11:48 UTC (Mon) by ianmcc (subscriber, #88379) [Link] (7 responses)

main.cpp:11:6: error: redeclaration of ‘char i’
   11 | char i = (k & 1) ? 'O' : 'E';
      |      ^
main.cpp:9:10: note: ‘int i’ previously declared here
    9 | for (int i = 1; i < j; i++) {
      |          ^

Moving the kernel to modern C

Posted Feb 28, 2022 13:56 UTC (Mon) by jem (subscriber, #24231) [Link] (6 responses)

You seem to have a faulty C compiler, or you didn't copy the code correctly. The curly bracket at the end of line 9 starts a new block, and it's perfectly legal to declare a new 'i' variable inside that block.

Moving the kernel to modern C

Posted Feb 28, 2022 17:06 UTC (Mon) by ianmcc (subscriber, #88379) [Link] (4 responses)

That might be valid C (although I don't know why, but it doesn't give any errors in an online C compiler). It isn't valid C++. The scope of the control variable declared in the for loop is the loop itself, so you can't declare another variable with the same name in the same scope.

for (int i = ..)
{
int i = 2; // not valid C++. There is already a variable 'i' declared in this scope
}

Moving the kernel to modern C

Posted Feb 28, 2022 20:10 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (3 responses)

> The scope of the control variable declared in the for loop is the loop itself, so you can't declare another variable with the same name in the same scope.

What you say agrees with the C++ standard, but it makes me wonder why the standard authors appear to have been competing to come up with the most Byzantine special cases and exceptions they could think of to integrate into the standard rather than taking the simplest and least surprising route. Syntactically, the body of the for loop is a single statement which may be a compound statement. That part is the same as C. The braces are *not* part of the syntax for the loop. If the scope of the control variable were in fact the loop itself, and the body were treated the same as any other statement, then the compound statement would be an independent scope nested *within* that for-loop scope, and declarations within the compound statement would shadow any declarations scoped to the for loop (as they do in C). Instead the standard pierces the abstraction and treats compound statements in a for loop body differently than compound statements located elsewhere. There is no logic to this that I can see, just a bald statement that "If a name introduced in an init-statement or for-range-declaration is redeclared in the outermost block of the substatement, the program is ill-formed."

Moving the kernel to modern C

Posted Mar 1, 2022 16:27 UTC (Tue) by ianmcc (subscriber, #88379) [Link] (2 responses)

In C++ the declaration and the body of the loop are the same scope. In C, initializer in the for loop establishes its own scope, so there are actually two scopes created with a C for loop. This was unintended behavior in C, and a defect report was raised about it, but it seems it wasn't seen as important enough. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2257.htm...

After all, who would write such code anyway? Its a strange thing to take issue with.

Moving the kernel to modern C

Posted Mar 1, 2022 21:58 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (1 responses)

> In C++ the declaration and the body of the loop are the same scope.

for (init-statement condition[opt] ; expression) statement

The body of the for loop is just *statement*. If the declaration were in that scope it wouldn't survive from one iteration of the loop to the next or be visible in *condition* or *expression*. Declarations in *init-statement* are scoped over the entire for loop, not just the body.

Normally a compound statement within *statement* would introduce its own separate block scope *below* the level of *statement*, but in C++ the lines are blurred between the body of the for loop and the *inside* of the compound statement. In other words, I would expect this to be a redeclaration error, because `char i` and `int i` are declared in the same scope (note that all the examples in the standard are of this form):

for (int i = 0; i < N; ++i)
    char i = 7;

but not this, because `char i` is declared in the new *nested* scope created by the compound statement and not directly in the body of the for loop:

for (int i = 0; i < N; ++i) {
    char i = 7;
}

Contrast this with the following code which the standard (C++20 draft) claims is "equivalent" to the second example "except that names declared in the init-statement are in the same declarative region as those declared in the condition, and except that a continue in statement (not enclosed in another iteration statement) will execute expression before re-evaluating condition":

{
    int i = 0;  /* init-statement */
    while (i < N  /* condition */) {
        { char i = 7; }  // statement
        ++i;
    }
}

In the "equivalent" while loop version there is clearly no redeclaration error—the `char i` declaration is within not just one but two levels of compound statements under the while loop and the scope where `int i` was declared.

> After all, who would write such code anyway? Its a strange thing to take issue with.

Whether you would write that by hand or not, it's an unnecessary (and IMHO completely pointless) complication which moreover breaks compatibility with C. Redeclaration conflicts could appear as a result of macro expansion or other code generation, not just in hand-written code.

Moving the kernel to modern C

Posted Mar 2, 2022 8:56 UTC (Wed) by ianmcc (subscriber, #88379) [Link]

You've got the history the wrong way around. The behaviour of C++ here hasn't changed since it was first standardized in 1998. At that time, C didn't allow a declaration in a for statement. C99 borrowed the wording from the C++ Annotated Reference Manual, without realizing that the wording had been updated during the C++ standardization process. So C introduced an incompatibility with C++, not the other way around. The C standards committee documents are very clear that this was accidental, not intentional.

The bottom line is that C++ will flag an error in some instances of very dubious code that is most likely a bug anyway (i.e. declaring a variable that shadows the loop control variable) where C99 would allow it. None of the standards committee see it as something worth the bother of fixing. If you really did intend to introduce a shadow declaration, the simple fix is to enclose it in another compound statement.

Moving the kernel to modern C

Posted Feb 28, 2022 17:14 UTC (Mon) by ianmcc (subscriber, #88379) [Link]

See also http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1865.htm

Moving the kernel to modern C

Posted Feb 25, 2022 11:28 UTC (Fri) by jem (subscriber, #24231) [Link] (4 responses)

You can add this to the list of things you hate about Rust, too. In Rust you can even do this:

fn main() {
    let i = 0;
    println!("{}", i);    // Prints 0.
    
    let i = i+1;
    println!("{}", i);    // Prints 1.
}

Note that the i variables are immutable ("const") and there are two of them. The second let introduces a new variable which is initialized with the value i+1, where i refers to the first variable.

Moving the kernel to modern C

Posted Feb 28, 2022 8:09 UTC (Mon) by geert (subscriber, #98403) [Link] (3 responses)

So is this really any safer in the end?
I still cannot rely on the constant i being constant over the full range of the block, as anyone can have inserted one or more redefinitions in the middle of the block.

Moving the kernel to modern C

Posted Feb 28, 2022 9:38 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

I believe every compiler for pretty much every language can detect shadowing and warn about it. C included.

Moving the kernel to modern C

Posted Feb 28, 2022 10:09 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Go is a notable exception.

Moving the kernel to modern C

Posted Feb 28, 2022 11:26 UTC (Mon) by taladar (subscriber, #68407) [Link]

It is quite idiomatic in Rust to check a Result<T, E> value and reuse the same name for the content of type T after checking.

The alternative isn't really any better, coming up with extra names for what is logically the same value.

Moving the kernel to modern C

Posted Feb 25, 2022 19:30 UTC (Fri) by ballombe (subscriber, #9523) [Link] (14 responses)

> Please no! That's the most horrible thing I hate in modern C.

I am glad to see I am not alone.

Usually, when it is used, it is a sign the function is too large and should be split, which would resolve the scoping issue.
Of course since C17 still does not support gnu89 nested functions, sometime splitting the function require passing an inordinate amount of parameters.

Moving the kernel to modern C

Posted Feb 25, 2022 19:53 UTC (Fri) by marcH (subscriber, #57642) [Link] (9 responses)

> > Please no! That's the most horrible thing I hate in modern C.

You meant: in _any_ vaguely modern language. What rock have you been living under?

> Usually, when it is used, it is a sign the function is too large and should be split, which would resolve the scoping issue.

Exactly. If you can't find a variable declaration then the function is simply too long.

Of course with https://en.wikipedia.org/wiki/Type_inference (1958) you don't even need to declarations _at all_ but again, let's not get carried away and scare ancient species...

Moving the kernel to modern C

Posted Feb 25, 2022 20:53 UTC (Fri) by mpr22 (subscriber, #60784) [Link] (8 responses)

If I can't find the variable declaration, I'm using generic or weakly language-aware editing tools instead of strongly language-aware editing tools.

Moving the kernel to modern C

Posted Feb 26, 2022 14:07 UTC (Sat) by Wol (subscriber, #4433) [Link] (7 responses)

Are you telling me strongly language-aware editing tools can find non-existing declarations?

Okay, I find the lack of declarations ON OCCASION a complete pain, but don't blame the tool if the language doesn't require declarations. There's plenty of languages like that out there ...

(Which is why I tend to use Option Explicit in VB, or -DCLVAR in FORTRAN, etc etc.)

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 26, 2022 14:55 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

> There's plenty of languages like that out there ...

There are, but in the small handful that I use (or have used in the past), I have never written a program where losing track of where I first initialized a variable would be a serious concern.

Moving the kernel to modern C

Posted Feb 26, 2022 15:19 UTC (Sat) by nix (subscriber, #2304) [Link]

> Are you telling me strongly language-aware editing tools can find non-existing declarations?

Well, no, but they can tell you what types the compiler has inferred for those variables (assuming the program is currently syntactically valid enough to do so, which in my experience is usually true if you're doing maintenance rather than writing a new function: if you're writing a new function, I hope you already know what types the variables you're using in your new code are. :) )

Moving the kernel to modern C

Posted Feb 26, 2022 22:10 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (4 responses)

Type Inference is not the same as no declaration. Type Inference just means types don’t have to be explicitly stated where the compiler can infer it from the expression used to initialise the variable.

Moving the kernel to modern C

Posted Feb 27, 2022 16:04 UTC (Sun) by marcH (subscriber, #57642) [Link] (3 responses)

Why would be the purpose of a declaration without an explicit type?

How would a typeless declaration help the people in this thread who complain about not finding declarations? If not the type, what else are they looking for?

Moving the kernel to modern C

Posted Feb 27, 2022 18:55 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (1 responses)

The declaration has a type - it’s inferred from the initialiser. It’s really only appropriate with local variables.

Moving the kernel to modern C

Posted Feb 27, 2022 20:49 UTC (Sun) by marcH (subscriber, #57642) [Link]

I see nothing factually wrong in your last two comments above but I really don't understand why they're posted as replies to mine. I'm afraid you're misunderstanding my points and questions.

Moving the kernel to modern C

Posted Feb 27, 2022 19:06 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

In Rust, it can move the lifetime of a variable around. So if I have some code that needs to live for the outer scope, but is only "known" in some nested place, I can use `let somename;` on a block in the right place, initialize it where the data is available, and continue on. I've used this before to keep a string alive long enough for a context where it may have been either allocated (and therefore need a place to hang its destructor) or came from some other string that lived longer than the function anyways.

Moving the kernel to modern C

Posted Feb 25, 2022 20:31 UTC (Fri) by abatters (✭ supporter ✭, #6932) [Link] (3 responses)

> gnu89 nested functions

Which require your entire program to have an executable stack.

Moving the kernel to modern C

Posted Feb 25, 2022 20:55 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (2 responses)

They only do if they are used as function pointers.

Moving the kernel to modern C

Posted Feb 25, 2022 22:12 UTC (Fri) by ncm (guest, #165) [Link] (1 responses)

Of course in C++ and Rust you have function literals. In C++,

  auto f = [](auto a, auto b) { return a + b; };
  assert(f(3, 4) == 7);
  assert(f(3.25, 3.75) == 7.0);
  assert(f("3"s, "4"s) == "34");

Rust lambdas might be less versatile.

Moving the kernel to modern C

Posted Feb 26, 2022 0:18 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link]

No, Rust Lambdas generally work the same as C++ lambdas - subject to the usual Rust rules about lifetime.

Moving the kernel to modern C

Posted Feb 25, 2022 23:25 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

As a vim user, I just use the asterisk and hash keys to find the declaration. Not sure what your editor supports, but if it can't do that, you might want to consider switching to a different editor (not necessarily vim, of course, whatever works for you).

Moving the kernel to modern C

Posted Feb 26, 2022 10:40 UTC (Sat) by niner (subscriber, #26151) [Link]

I have fixed far too many bugs caused by people having to restructure perfectly readable code to appease the C89 gods. Some of those bugs I had introduced myself. On the other hand I have yet to come across a single bug in our code that was clearly caused by any confusion supposedly allowed by having variables declared where they are actually going to be used. Indeed the latter can make things even clearer, because it allows for more usage of const as in:
{
    foo();
    void * const x = must_be_called_after_foo();
    ...
}

Moving the kernel to modern C

Posted Feb 25, 2022 4:31 UTC (Fri) by pabs (subscriber, #43278) [Link] (1 responses)

I wonder what the implications of this are for Bootstrappable Builds. I guess they will just need a longer GCC versions bridge before getting to the step of building Linux.

https://bootstrappable.org/

Moving the kernel to modern C

Posted Feb 25, 2022 10:49 UTC (Fri) by georgm (subscriber, #19574) [Link]

As written in the article, the minimum supported gcc version (5.1) already supports C11, so there shouldn't be any change here.

Moving the kernel to modern C

Posted Feb 26, 2022 0:26 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (53 responses)

It always amuses me how C programmers / Linux Kernel types disavow C++ and then proceed to recreate C++ features using a mixture of macros and a hope that everyone is on their best behaviour. I seem to recall that this was why GCC moved to using C++ because it has features that help when working with data structures (RAII, methods etc.)
No one is saying you have to use RTTI or exceptions although templates won’t be the worst idea in some of these cases if used properly.

Moving the kernel to modern C

Posted Feb 26, 2022 21:56 UTC (Sat) by marcH (subscriber, #57642) [Link] (52 responses)

> then proceed to recreate C++ features using a mixture of macros

You have a very good point but you also have the exact reason why this keeps happening:

> No one is saying you have to use RTTI or exceptions ...

The problem is: how do you tell developers _not_ to use RTTI, exceptions or any other of the gazillion features from the C++ kitchen sink that you don't want them to use? Most people don't read coding standards until they're told to (too late) and maintainers don't scale.

The _only_ way to enforce something efficiently and consistently is to have a fully automated tool (compiler, linter, static analyzer,...) that catches it. Anyone who has ever used CI knows that.

C++ fans who keep telling everyone "modern C++ can do that too" keep missing the point. The question is absolutely not about what C++ can do; we all know the answer is "everything" - which is exactly the problem. The C++ question is "What are the compiler/linter checks to STOP people from doing X, Y and Z?" Cause most projects can't afford the maintainer bandwidth to enforce coding standards manually and even maintainers make mistakes, especially when overwhelmed.

This is the difference between theoretical computer science and real world engineering with severe labor shortages, project management, deadlines etc. Maybe even some QA sometimes.

These C++ checks will unfortunately never be good enough because:
- Everyone has a slightly different definition of what is modern and/or safe C++ or about what they want to forbid.
- C++ fans are fans of C++ because it lets you do whatever you want
- The language wasn't designed for most of these checks in the first place, so they can be at best heuristics with many false positives. Just look at existing static analyzers.

So the only such check is to forbid C++ entirely and to wait for C to keep stealing very specific stuff from C++.

Or Rust.

PS: I had given up any hope of inexperienced people producing anything decent in shell scripting and then someone mentioned shellcheck in some obscure comment on LWN. Thank you!

Moving the kernel to modern C

Posted Feb 26, 2022 22:05 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (47 responses)

Many projects use the no RTTI / no exceptions dialect of C++ - every major compiler provides switches to enable that. I believe GCC itself is built in that way, many major games and financial applications are built that way. It’s simple to setup a build system to do that.
Apple’s driver kit uses C++ for kernel code and Microsoft also has a subset of C++ they approve for writing Kernel Mode code. It’s not beyond the wit of the Linux Kernel developers to do that or to set simple standards for code they will accept in mainline. All we are talking about is using RAII and data encapsulation to enable safer more reliable coding. As a comment above says, the aspects of Rust that are being promoted to the Kernel can all be enabled in C++ as a start.

Moving the kernel to modern C

Posted Feb 27, 2022 0:02 UTC (Sun) by Wol (subscriber, #4433) [Link] (6 responses)

Both Apple and Microsoft have C++ compilers they control and pay the developers to do as they are told ...

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 27, 2022 7:57 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (5 responses)

That is not a technical argument. The fact is it’s perfectly possible to write C++ code that behaves well in Kernel mode and is safer and more reliable than emulating the same features in C using macros etc.

Moving the kernel to modern C

Posted Feb 28, 2022 11:00 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

It is a technical argument; both Apple and Microsoft are able to enforce a subset of C++ that their kernel developers are happy with, and have CI setups that prohibit the use of features that do not behave well in kernel mode. On top of that, when new features appear (C++17, C++20, C++2x), Apple and Microsoft change the one acceptable kernel compiler to not support those features when compiling for the kernel.

One of the "superpowers" technically of compiled languages is that the compiler does not accept programs that will misbehave at runtime. We accept that, as a side effect of this, there are some programs that will behave at runtime that the compiler also does not accept.

If we have to rely on humans to spot use of "bad" features at runtime, we're putting mental effort onto reviewers, which is a scarce resource in kernel land. Thus, it's better for the kernel to use a compiler that puts the mental effort onto code authors (a C compiler) rather than one that simplifies coding, but puts much more mental strain on reviewers.

A Linux kernel C++ dialect could be created, but it would need buy-in from GCC and Clang to enforce that dialect, so that reviewer time is not spent on something the machine can enforce.

Moving the kernel to modern C

Posted Feb 28, 2022 13:04 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> It is a technical argument; both Apple and Microsoft are able to enforce a subset of C++ that their kernel developers are happy with, and have CI setups that prohibit the use of features that do not behave well in kernel mode.

I wrote quite a bit of code in Windows kernel space, and most of Windows kernel is written in pure C. With few notable exceptions like the bad old GDI code.

Apple's kernel code is also mostly C, and you can download it and check yourself: https://github.com/apple/darwin-xnu

Moving the kernel to modern C

Posted Feb 28, 2022 15:09 UTC (Mon) by camhusmj38 (subscriber, #99234) [Link]

The situation has changed. The Windows Implementation Library includes C++ RAII helpers which are used in the operating system and drivers. They take care of closing handles etc. MSVC also has a kernel mode C++ switch which disables RTTI and Exceptions as well as floating point.
Apple's driver kit has always been C++.

Moving the kernel to modern C

Posted Feb 28, 2022 14:59 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (1 responses)

checkpath.pl can take care of operator overloading if you're too scared.

Hey, even grep can do it.

Moving the kernel to modern C

Posted Feb 28, 2022 19:13 UTC (Mon) by farnz (subscriber, #17727) [Link]

I'm not scared of C++ at all. But I understand why the kernel developers might be, and I respect their position.

Looking at checkpatch.pl, it doesn't even know about operator overloading, or any other C++ feature - and the problem is not limited to operator overloading, but to any other C++ feature the kernel developers don't want to review.

Moving the kernel to modern C

Posted Feb 27, 2022 0:14 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (39 responses)

It's not just RTTI. Also multiple inheritance, operator overloading, fallible constructors (Google-style 'Init' methods), etc.

Moving the kernel to modern C

Posted Feb 27, 2022 7:50 UTC (Sun) by adobriyan (subscriber, #30858) [Link]

> operator overloading

std::span<T>::operator[] would be nice in fact.

__le32 + __le32 would be nice too, maybe.

Moving the kernel to modern C

Posted Feb 27, 2022 8:04 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (37 responses)

Fear of operating overloading is not based on anything more than FUD - there is a reason why Rust and other languages of recent origin permit the customisation of operations for user defined types. It can make them feel part of the language, makes programming more regular and encourages the use of abstractions. std::array’s overloaded operator [] makes it behave like a built in array, overriding operator < for a user defined type makes it possible to build generic algorithms for sorting etc.
As for multiple inheritance - depends on what you use it for. Mixins and interfaces are quite nice - but yes complex inheritance hierarchies esp. with virtual inheritance probably don’t have a space in kernel mode.

Moving the kernel to modern C

Posted Feb 27, 2022 14:50 UTC (Sun) by deater (subscriber, #11746) [Link] (35 responses)

> Fear of operating overloading is not based on anything more than FUD

There are real reasons to avoid operator overloading.

When doing system or embedded work it can be disastrous (for both performance but possibly also correctness) if what appears to be a standard operator that should map to a handful of assembly operations instead turns out to be a complex function with unknown side effects.

This can also be a problem when reviewing a patch, and it can be unclear from the small amount of context whether an operation is overloaded or not.

I suppose then the argument is that you should only view code in an IDE environment that requires 4GB of RAM and a full GUI interface and be sure to hover over each operation and offload all your thinking to a program written by someone else. Sometimes newer/flashier ways of writing code aren't necessarily better.

Moving the kernel to modern C

Posted Feb 28, 2022 9:33 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (34 responses)

> This can also be a problem when reviewing a patch, and it can be unclear from the small amount of context whether an operation is overloaded or not.

That's exactly why I despise operator overloading. You cannot trust anymore what you're reading.
I would just be fine with defining *new* operators using other unused symbols, though. At least
when you read them you figure you're on something special that requires certain care.

Moving the kernel to modern C

Posted Mar 1, 2022 23:32 UTC (Tue) by dvdeug (guest, #10998) [Link] (33 responses)

> That's exactly why I despise operator overloading. You cannot trust anymore what you're reading.

It's certainly an extreme reaction for a relatively minor thing. In an object orientated language, if your function gets passed a object x of type t, if t can be inherited from, x.method can do anything, no matter what the code for t says.

Even in C, a + b can do quite a few different things. Off the top of my head, I can't tell you what happens if a is a signed int with a negative value and b is an unsigned int. I do recall that integer overflow is undefined behavior, which is all sorts of fun. If you know the underlying types and the type programmer was sane, extending + to new types shouldn't make things much more complex.

Moving the kernel to modern C

Posted Mar 2, 2022 18:19 UTC (Wed) by marcH (subscriber, #57642) [Link] (31 responses)

> > That's exactly why I despise operator overloading. You cannot trust anymore what you're reading.

> It's certainly an extreme reaction for a relatively minor thing. In an object orientated language, if your function gets passed a object x of type t, if t can be inherited from, x.method can do anything, no matter what the code for t says.

Not knowing what code will run is considered a "relatively minor thing" only by fans of inheritance and object-oriented languages. Even in C where OO is super-explicit, tracking what .ops will run is one of the most time consuming thing.

"Object-oriented programming is an exceptionally bad idea which could only have originated in California." :-)

And of course the classic http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html

> The King, consulting with the Sun God on the matter, has at times threatened to banish entirely all Verbs from the Kingdom of Java. If this should ever to come to pass, the inhabitants would surely need at least one Verb to do all the chores, and the King, who possesses a rather cruel sense of humor, has indicated that his choice would be most assuredly be "execute".
> The Verb "execute", and its synonymous cousins "run", "start", "go", "justDoIt", "makeItSo", and the like, can perform the work of any other Verb by replacing it with an appropriate Executioner and a call to execute(). Need to wait? Waiter.execute(). Brush your teeth? ToothBrusher(myTeeth).go(). Take out the garbage? TrashDisposalPlanExecutor.doIt(). No Verb is safe; all can be replaced by a Noun on the run.

Moving the kernel to modern C

Posted Mar 2, 2022 18:55 UTC (Wed) by marcH (subscriber, #57642) [Link]

> Not knowing what code will run is considered a "relatively minor thing" only by fans of inheritance and object-oriented languages. Even in C where OO is super-explicit, tracking what .ops will run is one of the most time consuming thing.

Of course it's not a problem at all for pure user space code that can run entirely in a graphical IDE with an excellent debugger. But good luck when troubleshooting some intermittent hardware bug or cache misconfiguration using "printk" over a slow serial link.

Moving the kernel to modern C

Posted Mar 3, 2022 2:11 UTC (Thu) by dvdeug (guest, #10998) [Link] (29 responses)

> Not knowing what code will run is considered a "relatively minor thing" only by fans of inheritance and object-oriented languages.

Every step in computer history has reduced the programmer's ability to know what code will run. As Mel, the Real Programmer, said when refusing to use assembly “You never know where it's going to put things, so you'd have to use separate constants”. A C program that runs on *nix? The number of different compilers, processors, kernels and C libraries is amazing, the number of different combinations in practice at least runs into the thousands. If you're using a library that you're not bundling, that's another source of change, and even if you are, have you read that library?

(And as the IOCCC has shown over the years, #define is an awesome tool for code doing what you think it should do, not what the naive C programmer thinks it's doing.)

a+b should add two items. If it's not obvious what it's doing, then that's a bug in the library. You can go with function names like cos, cosf, cosl, ccos, ccosf, ccosl (standard C adds type information to both sides of the function name, for all the mockery *nix programmers gave Hungarian notation), but I'd really rather not.

> good luck when troubleshooting some intermittent hardware bug or cache misconfiguration using "printk" over a slow serial link.

I started to appreciate OO more when I read the argument that it succeeded because of GUIs, and I started to have to write some of those. If you've got a screen box that takes a bunch of GUI elements, I don't know of any better way than making them all inherit from Widget, especially if you want to be able to add arbitrary widgets from arbitrary libraries. Java's "everything is a class" is an annoying box, but when OO is useful, it's really useful.

Moving the kernel to modern C

Posted Mar 3, 2022 3:22 UTC (Thu) by marcH (subscriber, #57642) [Link] (26 responses)

> Every step in computer history has reduced the programmer's ability to know what code will run

You're mixing up operator overloading with totally unrelated things. Yes of course we have no idea what machine code runs, what micro code underneath that and of course with code-reuse (at last...) we don't know what code runs in libraries. But operator overloading and inheritance are very different: with them you don't know _which source tab in your own editor you should look at_! In other words, you get lost in _your own code_.

That's a huge difference. There is a very simple solution for all the rest: only use compilers and libraries versions that everyone else uses too. Not only it reduces the number of bugs massively, but when you hit one you can just google it. Not knowing which _code source_ runs in your own project is a totally different problem.

> a+b should add two items. If it's not obvious what it's doing, then that's a bug in the library.

I agree there's only so much damage that can be done by overloading '+'. But once again the problem is the same: where is the compiler / linter flag that limits operator overloading to only "sensible" use cases? Where do you even draw that line?

> Java's "everything is a class" is an annoying box, but when OO is useful, it's really useful.

To a hammer, everything looks like a nail..

Moving the kernel to modern C

Posted Mar 3, 2022 19:05 UTC (Thu) by dvdeug (guest, #10998) [Link] (9 responses)

> But operator overloading and inheritance are very different: with them you don't know _which source tab in your own editor you should look at_! In other words, you get lost in _your own code_.

It's easy to know where a + b comes from; if you know that a is of type Quaternion, then look in Quaternion."+" or whatever the specific notation is. (If you were running Java, that would be in quaternion.java; in C, of course, even if you know the function name, it could be anywhere, so if you're getting lost in your own code, you might want to look into Java or at least Java-like file naming conventions.) If you don't know what type a is, you have bigger problems than operator overloading.

Inheritance is more complex, yes. Ultimately, if it's your own code, you should know what type is getting passed in and where to look for that code. Whether you're using inheritance or switches, you're going to need to know what the value is to see what code is being run, and if you know what that value is, you can find the code.

> But once again the problem is the same: where is the compiler / linter flag that limits operator overloading to only "sensible" use cases?

You write in a language that has all code go through a general-purpose macro processor before the compiler, one that has long been shown to do evil, evil stuff (the Bourne shell being a notorious example of horrors that can be done by the well-intentioned.) You trust the programmer to be sensible, or (arguably) you run a more locked down language like Java.

> To a hammer, everything looks like a nail..

In response to the statement that sometimes a Phillips head screwdriver is useful? If you don't believe that OO is useful for GUIs, point to alternatives and explain why OO languages are so popular for GUIs. KDE was done in C++ back before libstdc++ existed.

Moving the kernel to modern C

Posted Mar 3, 2022 22:19 UTC (Thu) by Wol (subscriber, #4433) [Link] (3 responses)

> In response to the statement that sometimes a Phillips head screwdriver is useful? If you don't believe that OO is useful for GUIs, point to alternatives and explain why OO languages are so popular for GUIs. KDE was done in C++ back before libstdc++ existed.

Except that this article is not about GUIs. Maybe it should be - why do we keep on having to buy newer, faster computers just to stop the guis slowing everything down? Do we really want to let that bloat into the kernel - pretty much bringing our super-fast new CPUS to a grinding halt ...

Cheers,
Wol

Moving the kernel to modern C

Posted Mar 3, 2022 22:49 UTC (Thu) by dvdeug (guest, #10998) [Link] (2 responses)

This subthread started with a generalized bashing of OO languages, which is why a mention of where OO languages are useful came up.

I'd be interested to see good measurements of why modern GUIs are so much slower; I'm curious if once you compensate for the increase in screen size and the increase in data sizes, if they really are all that much slower.

Moving the kernel to modern C

Posted Mar 3, 2022 23:54 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

I thought it started with a suggestion that OO languages (C++) be used in the kernel.

In which case, OO-bashing is justified. OO *IS* useful, but in the kernel when a few extra bytes of object code can cost you dearly in page faults etc, those faults are pretty fatal for the concept. THAT was the purpose of the bashing ...

Cheers,
Wol

Moving the kernel to modern C

Posted Mar 4, 2022 9:42 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

The kernel embraced the size overheads of OO back in the 1990s, and C++'s OO model is just as much "pay for what you use" as the one the kernel uses.

If a class has no virtual members, it doesn't need a function table and doesn't have function-pointer dispatch overheads.

If it has some virtual members, only the members that are virtual have to be identified in the function table and called through function pointers.

Moving the kernel to modern C

Posted Mar 4, 2022 6:27 UTC (Fri) by marcH (subscriber, #57642) [Link] (4 responses)

> > But operator overloading and inheritance are very different: with them you don't know _which source tab in your own editor you should look at_! In other words, you get lost in _your own code_.

> Ultimately, if it's your own code, you should know what type is getting passed in and where to look for that code.

I should not have written _your own code_, I meant: you get lost _in the code base that you are working on_ as opposed to some external library, compiler or other abstraction you don't care about.

Most developers spend most of their time reading and debugging code they did not write themselves. Which is _exactly_ why kernel maintainers don't want C++

> If you don't believe that OO is useful for GUIs

I was just rephrasing you.

Moving the kernel to modern C

Posted Mar 4, 2022 14:57 UTC (Fri) by dvdeug (guest, #10998) [Link] (3 responses)

> I should not have written _your own code_, I meant: you get lost _in the code base that you are working on_

And it still stands; if it's the code base that you are working on, you should know what type is getting passed in and where to look for that code. Assuming you're not using Ruby, the type of the variable should be named nearby in the code.

And again, "I can't find which source tab to look at" is weird coming from someone bashing Java. In a.b(...), the code is found in file named after the type of a. In C, the definition of the function may be found in one of the files you #included in this code (or not; you can always declare external functions directly), but that header file name may have no relation to the file where the code is written. Of course, with the handy -D option to the C compiler, who knows what the function is actually named.

Moving the kernel to modern C

Posted Mar 4, 2022 17:28 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

> Assuming you're not using Ruby, the type of the variable should be named nearby in the code. [...] In a.b(...), the code is found in file named after the type of a.

??

The type of the variable does not provide vtable resolution. That's done only at run-time. Same problem for explicit .ops function pointers in C. You cannot just "grep" through vtables / .ops because they're resolved at run-time. That's the whole OO proposition.

> Assuming you're not using Ruby, the type of the variable should be named nearby in the code.

Afraid you lost me there.

> In C, the definition of the function may be found in one of the files you #included in this code (or not; you can always declare external functions directly), but that header file name may have no relation to the file where the code is written.

Code indexers can do a surprisingly good job most of the time. For the rest there is always git grep. It works as long as nothing is resolved at run-time.

> Of course, with the handy -D option to the C compiler, who knows what the function is actually named.

I've seen many -D and I've never seen one redefining function names. As already wrote above, no one like the pre-processor and its abuse is very easily and routinely caught at code review. The entire problem with C++ is "where do you draw the line?". That's really not a problem with the pre-processor: it's always very obvious whether you use it or not and I've seen "no pre-processor here" review comments countless times.

Moving the kernel to modern C

Posted Mar 4, 2022 20:22 UTC (Fri) by marcH (subscriber, #57642) [Link]

BTW a great trick to unravel (evil) cpp macros is to deliberately insert a compilation error. gcc then shows the entire stack of macros involved. Much easier than cc -E or something.

Same trick with build systems and many other situations with "too many layers of indirections": deliberately injecting errors is often a great shortcut.

(none of that available at run-time of course)

Moving the kernel to modern C

Posted Mar 5, 2022 2:42 UTC (Sat) by dvdeug (guest, #10998) [Link]

> The entire problem with C++ is "where do you draw the line?".

That's a problem with C++, not operator overloading or object orientation.

> That's really not a problem with the pre-processor: it's always very obvious whether you use it or not and I've seen "no pre-processor here" review comments countless times.

It is quite similar to operator overloading; you can see where the macro or operator is defined, but not necessarily where it is used.

Moving the kernel to modern C

Posted Mar 3, 2022 22:20 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (12 responses)

"Operator overloading" is no different than any other overloading. Operators are just functions with symbols for names rather than alphanumeric strings. If you don't have an issue with `add(a, b)` where `add` can be overloaded then there isn't much rational basis for objecting to `a + b` where `+` can be overloaded.

> But once again the problem is the same: where is the compiler / linter flag that limits operator overloading to only "sensible" use cases? Where do you even draw that line?

Sensible languages (such as Haskell) only allow overloading though the implementation of an interface (typeclass), which narrows the problem down: if you want to override `+` in Haskell then you need to provide an instance of `Num` for that data type, which implies that you need to implement the other numerical operations like `*`, `negate`, `fromInteger`, etc. which make up the minimal complete definition. If you try to define `+` at global scope outside of a `Num` instance without explicitly suppressing `Prelude.+` you'll get an error due to the conflicting names.

Typeclasses, in turn, generally come with "laws" which specify how their members should relate to each other, in addition to declaring types for each member which instances must share. (For historical reasons `Num` does not have any official typeclass laws itself, but there are certain basic expectations regarding associativity, commutivity, distributivity, and other properties[0] which amount to informal laws.) The laws, unlike the types, are not automatically enforced, but unlawful instances are strongly discouraged and it is usually simple to write property-based tests to verify that they are met. The key is that the instances of an interface are related by more than just a common name. Implementing a typeclass serves as a declaration of intent that the instance will follow the typeclass laws and generally behave as expected for an instance of that typeclass.

It helps that Haskell allows almost any sequence of symbols as an operator name, so there is no pressure to abuse the left-shift operator for stream output. For example. You can also use any function name as an infix operator by putting it in backquotes—you can even define precedence and fixity for named functions used with infix syntax, just as you can for symbolic operators.

Rust uses essentially the same model, with traits in place of typeclasses, though the set of numerical traits in Rust is a bit more nuanced than Haskell's catch-all `Num` typeclass and the syntax unfortunately doesn't allow for custom operators.

[0] https://hackage.haskell.org/package/base-4.16.0.0/docs/GH...

Moving the kernel to modern C

Posted Mar 4, 2022 8:12 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (5 responses)

> If you don't have an issue with `add(a, b)` where `add` can be overloaded then there isn't much rational basis for objecting to `a + b` where `+` can be overloaded.

No, that's precisely the opposite. If you don't have issue with "add(a, b)", the please by all means use that and leave the operators to their original meaning so that the vast majority of the characters in the code you are reading do what you were always taught they do. Nobody imagines that when you're doign your errands and see "3 + 1 offered", this "+" means "perform a database access and do some special operation to return a different value" nor "concatenate them and say 31". No, you imagine an addition, with all the simplicity that comes with it, but within some technical constraints imposed by computers (e.g. domain limitations causing wrapping, saturation or overflows). When I read "a + b", I hear "a plus b" and nothing else. I'm not hearing "a, the first argument of a function using a symbol looking like the plus I know, and a second argument b, now let's check if such a function exists otherwise I'll assume it's in fact a regular plus".

It is important to be able to read code the most naturally possible. It's a matter of efficiency and reliability. And most exploited security flaws in software are found by careful code review and could be spotted by their developers if the code was not constantly cheating on them doing nasty tricks that do not ressemble what it seems to do. That's the same reason why many developers perfer to use upper case for macros. It's a signal that you should look it up and that it might evaluate your arguments more than once, for example.

Moving the kernel to modern C

Posted Mar 4, 2022 9:38 UTC (Fri) by mpr22 (subscriber, #60784) [Link] (1 responses)

I do have an issue with add(a, b) anywhere that isn't the definition of a type's implementation of the Arithmetic interface, though.

When I'm trying to understand the calculations a piece of code is doing on data entities for which arithmetic is a "natural" concept, but which are not Sacred Primitive Types of the implementation language, wading through the resulting vast piles of "x = add_foo(a, multiply_foo(b, c))" etc is the kind of chore that degrades my attention span.

Moving the kernel to modern C

Posted Mar 4, 2022 13:04 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

It's also more difficult to read for me, but way less than if I cannot trust any of the most elementary operators anymore.

A good example are the mmx/sse/avx* API with all those complicated functions that map 1-to-1 to the underlying instructions. It's particularly hard to read, but it would be even worse if operators were abused to perform some of them. And there are still much less operators than possible functions anyway, so being a bit more explicit doesn't hurt.

Moving the kernel to modern C

Posted Mar 4, 2022 15:59 UTC (Fri) by dvdeug (guest, #10998) [Link] (1 responses)

> Nobody imagines that when you're doign your errands and see "3 + 1 offered", this "+" means "perform a database access

In all the programming languages under discussion, 3 + 1 returns 4. In mathematics, a + b is an arbitrary function, almost always associative and commutative; 3 + 1 can equal 0, in Z mod 4. I've certainly seen + used as a concatenation operator in real life, like in rebuses.

> if the code was not constantly cheating on them doing nasty tricks that do not ressemble what it seems to do. That's the same reason why many developers perfer to use upper case for macros. It's a signal that you should look it up and that it might evaluate your arguments more than once, for example.

Non-hygenic macros are insane. Had you said

> That's exactly why I despise [C's macro system]. You cannot trust anymore what you're reading.

I wouldn't have disagreed. So why the difference? Why can developers be trusted with macros and not operator overloading?

Moving the kernel to modern C

Posted Mar 4, 2022 17:02 UTC (Fri) by marcH (subscriber, #57642) [Link]

> > That's the same reason why many developers perfer to use upper case for macros. It's a signal that you should look it up and that it might evaluate your arguments more than once, for example.

> Why can developers be trusted with macros and not operator overloading?

They can't with either. No one likes the pre-processor. It's always seen as a necessary evil to work around C's limitations. Macros are subject to especially high review scrutiny and new ones can be introduced only if they solve a very generic problem and only when they are used all across the board (which ensures a lot of test coverage).

Moving the kernel to modern C

Posted Mar 4, 2022 17:06 UTC (Fri) by marcH (subscriber, #57642) [Link]

> And most exploited security flaws in software are found by careful code review and could be spotted by their developers if the code was not constantly cheating on them doing nasty tricks that do not ressemble what it seems to do.

I wonder where you got that from. Sure there's Spectre and alike but most security flaws I ever looked at (admittedly not that many) were all "mundane" out of bounds accesses, uninitialized use, use after free, integer overflows, etc. All the usual and mundate memory corruption features of C which according to Microsoft and Google account for 70% of all security issues in C/C++ projects (I don't know what the other 30% are)

Moving the kernel to modern C

Posted Mar 4, 2022 14:37 UTC (Fri) by foom (subscriber, #14868) [Link] (5 responses)

> Operator overloading" is no different than any other overloading.

It's different in a really key way in most languages (not Haskell): operators have a more convenient infix syntax, but there's a limited number of them which you can use, and that causes people to prefer to overload them for inappropriate operations.

Let's pretend c++ had no infix operators whatsoever (overloaded or not). So instead of `1 << 2` you'd say `1.left_shift(2)`. When creating an output stream type nobody would _even consider_ having programmers spell "write to output" as `cout.left_shift("hello world")`. That's just ridiculous on it's face! Of course you'd use a more appropriate name.

Yet, because of the limited operator vocabulary that is available to work with, there's a great temptation to use operator overloads for these nonsensical operations. An absolutely irresistible temptation, I'd say. And that's why operator overloading unfortunate in practice -- even though it shouldn't be in theory.

For example: C++20 just repeated this mistake, introducing an overload of bitwise_or `|` for (effectively) function composition in the new ranges library. With the weak rationale that `|` means pipe in Unix shell which is kinda the same thing so it "makes sense". But it doesn't. That's not what the operator | is supposed to mean in c++.

Moving the kernel to modern C

Posted Mar 4, 2022 16:07 UTC (Fri) by dvdeug (guest, #10998) [Link] (4 responses)

> operators have a more convenient infix syntax, but there's a limited number of them which you can use, and that causes people to prefer to overload them for inappropriate operations.

Good point in general. Scala mitigates that by letting methods with one argument be used infix (in version 2, in Scala 3 you have to declare it infix), and allowing pretty much arbitrary operator/method names (for better and worse, :^$*+ is a valid method name that can be used infix in Scala.)

I'd point out your C++ examples are self-inflicted; unlike programmers, the C++ designers could have added operators to the system.

Moving the kernel to modern C

Posted Mar 4, 2022 16:44 UTC (Fri) by atnot (subscriber, #124910) [Link]

> I'd point out your C++ examples are self-inflicted; unlike programmers, the C++ designers could have added operators to the system.

This is a really long tradition in C++ land. For example, afaict the only reason C++ attempts to use operators for streams in the first place is because the language itself was not rich enough to be able to express generic, type-safe string formatting well. It just so happened that the existing function overloading and a convenient left-associative operator let you cobble together a hacky workaround which then became standard.

Meanwhile nearly every other language decided to go and do what needed to be done to enable type safe string formatting instead. Including C++, which accidentally eventually enabled it to be written anyway, giving rise to {fmt} and then std::format.

Then they went and did the same thing again with iterators, sfinae, variant/visit/optional, etc.

I think the moral of iostreams is less that operator overloading is a bad idea, but that people will do absolutely horrible things if a language is not interested in finding ways to adequately address their needs.

Moving the kernel to modern C

Posted Mar 4, 2022 21:29 UTC (Fri) by foom (subscriber, #14868) [Link]

> I'd point out your C++ examples are self-inflicted

Indeed, the C++ standard library designers often (but not always) act as if they have no ability to influence the core language design. And they may be correct, to some degree -- the language and library changes are done by different working groups within the standards committee, so there's going to be a greater organizational friction to get your change in, if you need to modify both.

Moving the kernel to modern C

Posted Mar 5, 2022 17:13 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (1 responses)

> I'd point out your C++ examples are self-inflicted; unlike programmers, the C++ designers could have added operators to the system.

That was my point. Having a set of "free to use" operators that are never defined by default and are always type-specific would be perfect because they're sufficient to ring a bell when you read that code. But as the previous commenter said, using explicit names instead of left-shifting string still remains quite better. After all in some languages (BASIC for example) we were not shocked by reading "OR", "AND" or "XOR" as operators between two numbers. I'd be fine with a "CAT" operator to concatenate two strings, and remove the ambiguity that you have in certain languages like JS where a=b+1 is ambiguous when b="1" where you don't know if you'll get string "11" or integer 2.

Moving the kernel to modern C

Posted Mar 5, 2022 17:45 UTC (Sat) by dvdeug (guest, #10998) [Link]

> After all in some languages (BASIC for example) we were not shocked by reading "OR", "AND" or "XOR" as operators between two numbers. ... remove the ambiguity that you have in certain languages like JS where a=b+1 is ambiguous when b="1" where you don't know if you'll get string "11" or integer 2.

Does OR do a bitwise OR or a boolean OR? I don't see any saving in ambiguity there. Likewise for the left shift; for all the fuss over it, in practice I've never seen it be the least bit ambiguous.

As for JavaScript, as I said, I was thinking of statically typed languages, and if you're programming JavaScript, you should know the answer to that. (It's "11".) But it is a little more confusing than it would be in other languages, since "1" / 1 implicitly converts "1" to a number, which is problematic, since lossy conversions should generally be avoided, especially when you're shoving a round peg into a square hole. The problem is not with operator overloading, so much as it's with lossy conversions. "String" / 1 should be caught at compile time, not run time.

Moving the kernel to modern C

Posted Mar 4, 2022 7:42 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (2 responses)

> But operator overloading and inheritance are very different: with them you don't know _which source tab in your own editor you should look at_! In other words, you get lost in _your own code_.

Exactly. More than a decade ago, one of our project died after the ruby developer who was leaving admitted he was totally unable to tell us which assignments or operations were causing database accesses! Because doing a=b+c could *possibly* perform requests ina database to fetch some values or store the results. It became so deep that the code was a living being of itself that noone could tame anymore, and fixing bugs became totally impossible. Definitely a very bad idea. The only thing it provides is ease of *writing* code, but we must never forget that we write once and read it many times.

Moving the kernel to modern C

Posted Mar 4, 2022 14:30 UTC (Fri) by ianmcc (subscriber, #88379) [Link]

How is that fixed by spelling it a=add(b,c) ?

Moving the kernel to modern C

Posted Mar 4, 2022 14:36 UTC (Fri) by dvdeug (guest, #10998) [Link]

= or + doing database requests is insane, but so is naming the function f() or add() or shelia(). I was not familiar that any dynamically typed language had operator overloading; in a statically typed language, the types of a, b and c would have made automatic renaming possible. Clarity of naming is important, and I'd argue that there's a conservation of information; the more ambiguous the types, the more explicit the function naming needs to be to compensate.

> The only thing it provides is ease of *writing* code

One of the recent Scheme additions was writing (add (times a c) b) as {a * c + b}, because it actually is easier to read code like that, and

"hi " + name.toString + " and good morning"

is also easier to read than

"hi".append(name.toString).append ("and good morning")

Moving the kernel to modern C

Posted Mar 3, 2022 8:50 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> (And as the IOCCC has shown over the years, #define is an awesome tool for code doing what you think it should do, not what the naive C programmer thinks it's doing.)

What you are *completely* missing, is that most code like you describe does NOT do what it NEEDS to do.

For the application programmer (real or not), throwing hardware at the problem is an acceptable solution.

For the systems programmer, you can't just go out and get a couple of extra MEG of L1 cache. You can't go out and get a couple of extra cache lines.

I date from the years when tomorrow's weather forecast took 20hrs to run. "Who cares if the program takes twice as long?" - that guy would have been fired!

I did a load of work (copying files around, etc) a while back on my computer. Okay, I run a protection-heavy disk stack, but I think my PC spent the next 20 minutes flushing cache. I don't give a monkeys, it was other wise idle and responsive, but there are PLENTY of people out there who HAVE to care.

If you can optimise your code FOR SPEED without having a clue what's going on under the bonnet, you're a better man than I am, gunga din ...

Cheers,
Wol

Moving the kernel to modern C

Posted Mar 3, 2022 23:00 UTC (Thu) by dvdeug (guest, #10998) [Link]

What does any of this have to do with how slow a program is? Lots of fast programs are unreadable; I think it was Code Complete that talked about writing a DES decrypter in C that was easy to read, and speeding it up one hundred times by turning it into an unreadable pile of assembly. The Linux kernel uses a lot of those nightmare defines to get maximal inline speed at the cost of clarity and ease of finding the code running. Operator overloading has zero effect on speed.

> I date from the years when tomorrow's weather forecast took 20hrs to run.

Weather forecasting always wants faster computers to do more detailed and accurate forecasts. The National Hurricane Center mentioned for their hurricane forecasts that they post every four hours, they run various simulations, and one of them takes six hours to run, meaning it's never as up to date as the other simulations.

Moving the kernel to modern C

Posted Mar 3, 2022 0:07 UTC (Thu) by HenrikH (subscriber, #31152) [Link]

Agreed that the implicit promotions and convert rules in C is a bit complex (and actually it would be nice if there where a tool one could use to highlight an expression and have the tool explain exactly what happens in this regard) but they are at least specified in the standard.

For your example at hand with "a + b" where we have "int a" and "unsigned int b" and a have a negative value then since both are of the same rank (integers) but different types (signed vs unsigned) the signed integer is implicitly converted to an unsigned integer. Overflow is only undefined for the signed integer case while defined to wrap around for the unsigned integer which is why most of the "a + b" code works as intended.

Aka if a is -1 and b is 2 then -1 is "converted" to 0xFFFFFFFF, add +2 and we wrap around to 1. In practice the compiler simply moves both a register and does the x86 ADD instruction since converting signed to unsigned of the same size is a no op and a binary add in particular is signedness agnostic.

Rust operator overloading

Posted Feb 28, 2022 2:41 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

In fact Rust isn't really overloading at all, what's happening in Rust is that types get to implement some langitem Traits whose functions are invoked by operators. Without those trait implementations, you can't use these "overloadable" operators on a user-defined type at all. For example if my type Apple doesn't implement PartialEq you just can't write thisApple == thatApple, it won't compile. There was no "overloading" in the strict sense.

And where C++ is very profligate with these overloads, Rust is conservative, for example recently there was work to let you do more natural things with Wrapping types, so e.g. today you write

let a = Wrapping<i8>(127);
let b = Wrapping<i8>(1);
let c = a + b;
/* now c is Wrapping<i8>(-128) */

The authors originally thought it would be nice if this worked:

let a = Wrapping<i8>(127);
let b: i8 = 1;
let c = a + b;
/* authors thought it's reasonable c is Wrapping<i8>(-128) but is that definitely what was expected?*/

After feedback to their patches they agreed this could be surprising and instead only implemented *Assign Traits, thus:

let mut a = Wrapping<i8>(127);
let b: i8 = 1;
a += b;
/* The type of a hasn't changed, so it is now Wrapping<i8>(-128) and that makes sense */

In Rust for Linux they deliberately didn't implement most of these "operator" traits for String and similar types which need implied allocation, because Linus doesn't like implied allocation.

Two asides, whilst C++ can't do anything about it now, I'm convinced that subtyping is a bad idea and Rust's choice to provide inheritance only for Traits (ie you can inherit behaviour but not state) was the Right Thing there too. Multiple inheritance the way C++ is unambiguously a bad idea.

And, std::array is an example of C++ refusing to just fix the language. The built-in array in C 89 sucks. C++ inherits that, so the built-in array in C++ 98 sucks. And instead in modern C++ std::array is offered as a substitute, the built-in array syntax is just roped off as dangerous trash not to be used. Rust took the right stance here, in Rust 1.0 the arrays are not good, they're better than in C 89 (and thus modern C++) but they're far worse than Rust's library container types, unrelated examples in documentation ended up using a Vec because it works. Unlike C++ the built-in array wasn't left to die, and as of Rust 2021 the array type is pretty nice, it can do all the things you'd expect a container type to do, and only has a few small warts left to fix.

Moving the kernel to modern C

Posted Feb 27, 2022 18:53 UTC (Sun) by rgmoore (✭ supporter ✭, #75) [Link] (3 responses)

C++ fans are fans of C++ because it lets you do whatever you want

Which is more or less the explanation Linus used for why he doesn't want to move to C++. It isn't about the technical side of the language; it's about excluding people who don't have the discipline to code the way he thinks the kernel should be coded.

Moving the kernel to modern C

Posted Feb 27, 2022 19:05 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (2 responses)

C let’s you do whatever you want. C++ is more restrictive than C. As I’ve said, the decisions made in the 1990s made sense then, but they don’t anymore. People who make the most performant, high security code code in C++ these days. There are no points in this game for being chads programming the way of the ancestors - on the contrary this can be harmful. This is particularly the case where people are reinventing language features using text substitution macros and a set of instructions to be carefully followed. Using the correct subset of C++ for Kernel mode is also a discipline - just as writing the correct type of C code is a discipline (using VLAs can overflow your stack for example, so don’t do it) - one that can be enforced with analysis tools and compiler switches.
It’s OK to say that you don’t want to do it because you don’t know how or are scared to learn new things. But it’s also OK for people to say maybe sheer obstinacy is not a reason to not consider an idea. The cult of personality in this area is not helpful.

Moving the kernel to modern C

Posted Feb 27, 2022 21:03 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Using the correct subset of C++ for Kernel mode is also a discipline - just as writing the correct type of C code is a discipline (using VLAs can overflow your stack for example, so don’t do it) -

No, you're missing the point exactly like all other C++ fans. -Wwla is not a "discipline", it's a compiler flag. A discipline requires some mental effort whereas adding -Wwla to a configuration file requires zero human effort. The computer does all the work.

> one that can be enforced with analysis tools and compiler switches.

Great news: please share now the set of analysis tools and compiler switches that enforce the exact C++ subset kernel maintainers are willing to accept without increasing their code review workload that is already stretched past the limit. For at least all recent gcc and clang versions.

Good luck getting all kernel maintainers to agree on that C++ subset in the first place! So that was a trick question, sorry.

Moving the kernel to modern C

Posted Feb 28, 2022 8:55 UTC (Mon) by ballombe (subscriber, #9523) [Link]

VLA is a C99 extension.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds