Moving the kernel to modern C

Posted Feb 24, 2022 17:36 UTC (Thu) by iabervon (subscriber, #722)
Parent article: Moving the kernel to modern C

Instead of leaking a not-necessarily-valid pointer, couldn't the macro set it to NULL at the end? Actually, I'm surprised there isn't a standard trick for doing an assignment that will be an error unless the compiler eliminates it as dead code.

Moving the kernel to modern C

Posted Feb 24, 2022 17:59 UTC (Thu) by Paf (subscriber, #91811) [Link] (28 responses)

Cost. Pretty significant cost in some cases for something that shouldn’t even be necessary.

Moving the kernel to modern C

Posted Feb 24, 2022 20:09 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (5 responses)

It should be cost-free in any case where the iteration variable isn't accessed after the loop, since the compiler would eliminate the dead store. The code change is also fairly trivial: just edit the condition from "&pos->member != (head)" to "(&pos->member != (head)) || ((pos = NULL))".

Unfortunately this alone doesn't handle loops which exit early due to "break" or "goto". The "goto" case is unavoidable, but the "break" case can be dealt with by wrapping the macro in a second, trivial loop as shown in this example[0]. Note that the generated code (for gcc 5.1 with -O2) is *identical* between the version with the extra loop (traverse1) and the original version which does not set the iterator to NULL after the loop (traverse2). The initialization of the iterator to the flag state (-1), the condition for the outer loop, and the store of NULL to the iterator after the loop are all successfully eliminated.

[0] https://godbolt.org/z/4obYManzc

Moving the kernel to modern C

Posted Feb 24, 2022 20:48 UTC (Thu) by iabervon (subscriber, #722) [Link] (3 responses)

It might work to have:

extern unsigned long list_iterator_live_after_loop;

and "|| ((pos = (void *) list_iterator_live_after_loop), 0)"

I didn't try changing the kernel macro that way, but my little test code doesn't link if the iterator is used after the loop, but does link and work if it's not used. As I recall, the kernel is already using that sort of trick to use compiler optimization to remove an error message only if the compiler can disprove it.

Moving the kernel to modern C

Posted Feb 25, 2022 22:20 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

Unfortunately, this is probably UB: https://en.cppreference.com/w/c/language/extern

> The entire program may have zero or one external definition of every identifier with external linkage.
>
> If an identifier with external linkage is used in any expression other than a non-VLA, (since C99) sizeof, or _Alignof (since C11), there must be one and only one external definition for that identifier somewhere in the entire program.

There's no exception for short-circuit operators. If you use it at compile time, for anything other than sizeof, then it has to exist (have storage allocated somewhere).

Moving the kernel to modern C

Posted Feb 26, 2022 0:56 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

It's an UB according to the standard. But Linus very rarely is concerned with that: he tends to accept such stupidity only when there are no way convince compiler to stop breaking sane (from Linux developer's POV!) code.

It's one of the reasons about why GCC is the only supported compiler, BTW.

And GCC not just supports that feature, it even provides __attribute__((__error(msg))) extensions to make error messages more explicit. And GLibC uses it to define __errordecl macro.

Moving the kernel to modern C

Posted Feb 26, 2022 2:34 UTC (Sat) by foom (subscriber, #14868) [Link]

And this sort of bad idea is why the kernel is only possible to compile with optimizations enabled.

It would be a lot better if Linux used c++ constexpr functions and templates for compile time evaluation semantics, instead of abusing the optimizer to very poorly emulate them.

Moving the kernel to modern C

Posted Feb 25, 2022 1:19 UTC (Fri) by ianloic (subscriber, #54050) [Link]

It's kind of fascinating how small the cost is, even when using the pointer afterwards: https://godbolt.org/z/cbv4fqan3

Moving the kernel to modern C

Posted Feb 24, 2022 20:26 UTC (Thu) by iabervon (subscriber, #722) [Link] (21 responses)

Oh, I meant to imply that the compiler would eliminate all of those writes except for ones that expose bugs, but then I got side-tracked by wondering if you could make the kernel not even link unless the compiler eliminated the write. Anyway, it wouldn't affect the generated code unless the compiler can't tell the code is correct.

Moving the kernel to modern C

Posted Feb 24, 2022 22:12 UTC (Thu) by Paf (subscriber, #91811) [Link] (20 responses)

Very good point.

God I’d sure love to get to a newer C standard though…

Moving the kernel to modern C

Posted Feb 25, 2022 8:54 UTC (Fri) by ncm (guest, #165) [Link] (19 responses)

The smarter move would be to start compiling the kernel with a C++ compiler. A weenier step would be to accept source files with a ".cc" suffix and build those with a C++ compiler.

There would, in any case, be no need to step outside Gcc, where in fact that was done long ago, with no disruption, but with massive benefits. Anybody spooked about C++ should understand that Gcc and Clang are both coded in C++, whatever the language you compile on them.

Similarly, anybody spooked by C++ "hidden code" should understand that Rust does literally all of the things they are spooked by; and all of its power comes from that.

Staying on ancient EOL'd language Standards does nobody any good.

Moving the kernel to modern C

Posted Feb 25, 2022 9:02 UTC (Fri) by mpr22 (subscriber, #60784) [Link] (1 responses)

I suspect the people most likely to be spooked (for whatever value of "spooked") by any "hidden code" aspect of C++ are equally likely to be similarly spooked by the similar aspects of Rust.

Moving the kernel to modern C

Posted Feb 25, 2022 9:33 UTC (Fri) by ncm (guest, #165) [Link]

If they understood programming better, they would be more spooked by their C compiler failing to emit such code where, without, the code they wrote is buggy.

Moving the kernel to modern C

Posted Feb 25, 2022 9:30 UTC (Fri) by Wol (subscriber, #4433) [Link] (12 responses)

A weenier step accepting .cc files? The problem with that is "all of C++" is a security / close-to-the-metal nightmare, and the definition of what is the acceptable subset varies with who you talk to.

A further problem is the size / speed of the code. Yes C++ is *mostly* pretty good, but I suspect the compiler devs will barf on that word "mostly".

To what extent does kernel C currently drop out of C into assembler, and to what extent will C++ make that worse?

No I don't actually know the answers, I'm just predicting the devs' reactions.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 25, 2022 21:24 UTC (Fri) by ncm (guest, #165) [Link] (10 responses)

You demonstrate you know neither the answers, nor the questions, and are simply making things up as you go along. Just stop.

Moving the kernel to modern C

Posted Feb 26, 2022 13:46 UTC (Sat) by Paf (subscriber, #91811) [Link] (9 responses)

Well, it’s nice to know you’ve solved the problems and all of the kernel devs opposed are just fools who cannot see the light. I’m no expert in this particular area but I suspect there’s a *small* chance they’re not just all idiots.

Moving the kernel to modern C

Posted Feb 26, 2022 22:30 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link] (4 responses)

I would not say they are fools. But they are somewhat hidebound - viz the use of email as their primary collaboration system. They are also as capable as the rest of us to have prejudices. C++ in the 1990s was not a good choice for a Kernel - the programming paradigms popular then, the quality of compilers then and the language features then were not there. We were also a lot less aware of the potential for security holes that C code was subject to without a careful eye. Things have changed. Knowledge has been acquired about the best approaches to things. It’s been over 20 years. But as evidenced by some comments on this article - attitudes change slowly. Linux is unusual in major software projects as being very much a product of one person’s idiosyncrasies. There is a certain groupthink that it is particularly susceptible too. Different ways of thinking have historically led to an unpleasant email.
I suppose what I am saying is that experience outside the Kernel community suggests that using a language which enables the us of low or zero cost abstractions and automates resource management is a good idea. Trying to emulate these features in C89 (using macros!) because C89 is all you have is not a good solution. Preserving with C because it’s what you know is also not the best for a project that is used as a bedrock of modern computing.
Kernel mode C++ standards exist. They’re quite reasonable and not hard to implement or learn.

Moving the kernel to modern C

Posted Feb 27, 2022 0:09 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> I would not say they are fools. But they are somewhat hidebound - viz the use of email as their primary collaboration system.

So you're another idiot who thinks newer = better.

I'm not saying these new-fangled things don't work for you. And plenty of kernel devs use newer tools. But all these idiots going "ooh! new! shiny!" make life hell for people doing the work.

I'm on a kernel mailing list. And reading the emails, I feel like tearing my hair out sometimes. But moving to Rust seems a far better solution than C++.

Cheers,
Wol

Moving the kernel to modern C

Posted Feb 27, 2022 8:13 UTC (Sun) by camhusmj38 (subscriber, #99234) [Link] (2 responses)

Newer does not always equal better, but using technology to solve problems is literally the business we are in. Submission of patches and PRs by email has real implications - security for one, less open and effective collaboration for another. Again, I’m not saying that it wasn’t the right solution for its time but there are alternatives that work better.
And as for Rust, I’m a big fan of it but there is no way that the Kernel is going to be rewritten in Rust. It’s much more viable to replace some data structures and paradigms with C++ alternatives. This is possible without rewriting everything or completely retraining all existing contributors.
And I don’t think calling people idiots is particularly helpful to what is a technical discussion.

Moving the kernel to modern C

Posted Feb 27, 2022 10:37 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

You just admitted yourself in another comment that "using C++ is not a technical problem". The problem is the social problem of making sure the C++ code really is smaller and more efficient than the C macros it replaces.

AND THERE ARE TOO MANY "OOH NEW SHINY" LEMMINGS...

Likewise moving away from email - the problem is a social problem - there AREN'T ENOUGH DEVELOPERS. I think a fair few subsystems ARE developed using solutions like github, gitlab, whatever. If it really worked, surely that model would spread rapidly. But it's not working, and it's not "not working" because the solution is better or worse, it's not working because there aren't enough people to make either solution work.

And actually, probably one of the biggest problems with C++, is that IT'S NOT TRANSPARENT. Developers don't have a clear model in their mind of HOW it works. I come from the days when tomorrow's weather forecast took a day to run on the most powerful computers if you were lucky! If you can't model performance down to the bare metal, you have no clue how long the program is going to take to run. That's one of the reasons the kernel has held back on compilers so long, the disconnect between engineering reality and theoretical correctness.

That's one of the reasons I rant about relational. With Pick the database is transparent - as an application developer I can REASON about performance right through to the OS. That's what's so hard with C++ - the devs can NOT reason through to the hardware. (Okay, it's getting harder even with C, but it's not obfuscated ...)

Cheers,
Wol

Stop this please

Posted Feb 27, 2022 16:51 UTC (Sun) by corbet (editor, #1) [Link]

Surely we can find a way to discuss things without calling each other idiots or lemmings, right? Please don't do this anymore.

Moving the kernel to modern C

Posted Mar 5, 2022 12:28 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

[Caveat: haven't written lots of C++ in years, and that was back in the early '10s. The language has changed and I haven't kept up, I know...]

The thing ncm likely meant was that Wol's imagined kernel-developers' worries are not actually what they are documented as having worried about. C++ is not a security nightmare, not any more than C, anyway; it hasn't been worse than C optimization-wise for about twenty years; and there is no sudden extra need to drop into asm just because this is C++ (C++ is still nearly a superset of C, and this is just as true of the GNU variant).

Their stated worries are more that C++ has abstractions that enable things to magically happen behind your back with no immediate indication at the call site, and since the kernel developers are really looking for a portable assembler a lot of the time, where everything the machine does is obvious, this is *far* from what they want: they have their hands full coping with parallelism-induced complexities, memory model complexities, looking out for speculative execution gadgets etc etc, without worrying about the apparently same code doing wildly different things depending on what type they're operating on.

Many of C++'s transparently-do-things features are routinely used and almost essential to use anything resembling modern C++: references are the classic case (now function parameters' values can change in the caller without an & at the call site), but also C++ before std::move used to not make it terribly clear whether things were being copied or not (and it was at the very least wordy to enforce one alternative), and even now we have things like stringviews which seem to come with built-in footguns. (Of course, the kernel would never use such pieces, but code review would need to make sure they never crept in... and you only need to forget *once*). Many of the pieces that *don't* amount to 'do this invisibly albeit usually helpfully' are related to templates, and, uh... the kernel is nonswappable code in which size is at a premium, and having the compiler promiscuously generate code to monomorphize templates on the fly was anathema for a long time (though Rust does the same thing, and people seem to be complaining less: maybe RAM is just that much cheaper now and kernel code size is less important? but icache bloat still matters, and Linus has worried about it in public, and that was years ago and it's worse now).

And that's without even mentioning the really big painful problem, so big and painful that there are still compiler switches to disable the feature entirely, so big and painful that it took decades to figure out how to write code safely in the presence of these things and the last time I looked at it the safe code was extremely unobvious and if it was wrong you were unlikely to know for many years until things blew up, because there was no way to automatically check for safety: exceptions. Lovely idea, makes code's non-exceptional path much clearer, but the implementation explodes exceptional flow paths and *all of them are invisible* and many might be in what looks like the middle of an atomic, indivisible entity to someone not thinking "what if this were overloaded and threw?". If you use RTTI for absolutely everything religiously you don't need to worry, but you only have to forget and do manual cleanup once and you're in trouble when you next get an exception passing through that region. The kernel would obviously never use exceptions in the first place, mind you. Of course that now makes it impossible for destructors to fail, which probably rules out *use* of destructors for anything nontrivial, which means you can't use RTTI, which means you can't write anything resembling modern C++. You don't pay for what you don't use, but many of the bits require many of the other bits to use them non-clumsily, and then many of those bits are papering over design faults in the earlier bits -- std::move, again -- and the result of adding all those bits together is *ferociously* complex.

Moving the kernel to modern C

Posted Mar 5, 2022 13:37 UTC (Sat) by Wol (subscriber, #4433) [Link]

> The thing ncm likely meant was that Wol's imagined kernel-developers' worries are not actually what they are documented as having worried about. C++ is not a security nightmare, not any more than C, anyway; it hasn't been worse than C optimization-wise for about twenty years; and there is no sudden extra need to drop into asm just because this is C++ (C++ is still nearly a superset of C, and this is just as true of the GNU variant).

> Their stated worries are more that C++ has abstractions that enable things to magically happen behind your back with no immediate indication at the call site, and since the kernel developers are really looking for a portable assembler a lot of the time, where everything the machine does is obvious, this is *far* from what they want: they have their hands full coping with parallelism-induced complexities, memory model complexities, looking out for speculative execution gadgets etc etc, without worrying about the apparently same code doing wildly different things depending on what type they're operating on.

Actually, this is pretty much exactly what I was trying to say ... that C++ does things behind your back, and when you're trying to make sure that your code fits in L1 cache or whatever, code bloat is SERIOUS STUFF.

How often do you see kernel developers talking about "fast path"? Quite a lot. And it only takes C++ to do something you don't expect and the fast path will become orders of magnitude slower. WHOOPS!

Cheers,
Wol

Moving the kernel to modern C

Posted Mar 11, 2022 16:51 UTC (Fri) by timon (subscriber, #152974) [Link] (1 responses)

I think you mean RAII (Resource acquisition is initialization) instead of RTTI (Run-time type information).

Moving the kernel to modern C

Posted Mar 17, 2022 16:27 UTC (Thu) by nix (subscriber, #2304) [Link]

Um, yes, of course. I've had a mental hash collision between those two acronyms for literally decades, to such an extent that if you asked me what RTTI stood for I'd often say "resource acquisition is oh wait".

Moving the kernel to modern C

Posted Feb 26, 2022 22:16 UTC (Sat) by camhusmj38 (subscriber, #99234) [Link]

The two aspects of C++ that are usually disabled are RTTI and exceptions. GCC and Clang both disable at least one if not both of the features (I believe GCC both and LLVM at least RTTI). C++ code which just uses RAII and non-virtual methods (an awful lot of modern C++ code) optimises to the same as equivalent C but does not really on people remembering to do all the manual steps and help abstract. Apple and Microsoft both have standards for Kernel mode C++ that could be used as a starting point if need be.

Moving the kernel to modern C

Posted Feb 25, 2022 14:52 UTC (Fri) by jd (guest, #26381) [Link] (3 responses)

If we aren't going to go with ancient, then move to D. It's much, much newer and doesn't carry anything like the risks or overheads of C++. (Not that I'd recommend it, for other reasons, but it helps illustrate the age of the standard shouldn't matter as much as the quality of the result.)

Is there markup for any of the static checkers beloved by kernel developers that could be used to improve the quality of the results? (And when was the last time Coverity checked the kernel?)

There must be plenty that could be done to improve the kernel code without a drastic change of language.

Moving the kernel to modern C

Posted Feb 25, 2022 18:04 UTC (Fri) by davej (subscriber, #354) [Link] (1 responses)

> And when was the last time Coverity checked the kernel

99% of the time, the answer to this question is the same as "when did Linus last cut an -rc/final".
I usually kick off a run the same day, failing that the following morning.

Moving the kernel to modern C

Posted Feb 28, 2022 18:38 UTC (Mon) by jd (guest, #26381) [Link]

That's very comforting to know. I don't know how much it is picking up these days, but I've been haunting the fringes long enough to know that it has always provided a very important service to kernel developers. (Through the mead haze, I can dimly recall when it was first used.)

Moving the kernel to modern C

Posted Feb 25, 2022 21:33 UTC (Fri) by ncm (guest, #165) [Link]

I doubt Coverity works on D code. It does work on C++. "Risks and overheads" is wholesale speculation, unwelcome here.

Moving the kernel to modern C

Posted Feb 24, 2022 21:11 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link] (1 responses)

It would break code that does this:

list_for_each_entry(iterator, &foo_list, list) {
    	if (do_something_with(iterator)) {
    		break;
    	}
}
if (list_entry_is_head(iterator, &foo_list, list)) {
	// iteration finished
} else {
	do_something_else_with(iterator);
}

All this "compare to head" nonsense is why I prefer regular NULL-terminated linked lists to the kernel's circular linked lists. Insert/delete may take more instructions but iteration is much easier.

Moving the kernel to modern C

Posted Feb 24, 2022 21:55 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

Yes, and you also have macros like for_each_list_entry_continue() which depend on the value being left in the iterator. All of these would also break if the macro was changed to declare the iterator inside the `for` statement, C99-style.

One way to work around the problem in your example would be to move the condition inside the loop, like this:

list_for_each_entry(iterator, &foo_list, list) {
    // ...
    if (do_something_with(iterator)) {
        do_something_else_with(iterator);
        break;
    }
    // ...
    if (&iterator->list == &foo_list) {
        // this is the last entry; iteration finished
    }
}

The compiler should be smart enough to avoid checking the end condition twice in each iteration. Of course this becomes much less convenient if there is more than one break statement.