|
|
Subscribe / Log in / New account

Some __nonstring__ turbulence

By Jonathan Corbet
April 24, 2025
New compiler releases often bring with them new warnings; those warnings are usually welcome, since they help developers find problems before they turn into nasty bugs. Adapting to new warnings can also create disruption in the development process, though, especially when an important developer upgrades to a new compiler at an unfortunate time. This is just the scenario that played out with the 6.15-rc3 kernel release and the implementation of -Wunterminated-string-initialization in GCC 15.

Consider a C declaration like:

    char foo[8] = "bar";

The array will be initialized with the given string, including the normal trailing NUL byte indicating the end of the string. Now consider this variant:

    char foo[8] = "NUL-free";

This is a legal declaration, even though the declared array now lacks the room for the NUL byte. That byte will simply be omitted, creating an unterminated string. That is often not what the developer who wrote that code wants, and it can lead to unpleasant bugs that are not discovered until some later time. The -Wunterminated-string-initialization option emits a warning for this kind of initialization, with the result that, hopefully, the problem — if there is a problem — is fixed quickly.

The kernel community has worked to make use of this warning and, hopefully, eliminate a source of bugs. There is only one little problem with the new warning, though: sometimes the no-NUL initialization is exactly what is wanted and intended. See, for example, this declaration from fs/cachefiles/key.c:

    static const char cachefiles_charmap[64] =
	"0123456789"			/* 0 - 9 */
	"abcdefghijklmnopqrstuvwxyz"	/* 10 - 35 */
	"ABCDEFGHIJKLMNOPQRSTUVWXYZ"	/* 36 - 61 */
	"_-"				/* 62 - 63 */
	;

This char array is used as a lookup table, not as a string, so there is no need for a trailing NUL byte. GCC 15, being unaware of that usage, will emit a false-positive warning for this declaration. There are many places in the kernel with declarations like this; the ACPI code, for example, uses a lot of four-byte string arrays to handle the equally large set of four-letter ACPI acronyms.

Naturally, there is a way to suppress the warning when it does not apply by adding an attribute to the declaration indicating that the char array is not actually holding a string:

    __attribute__((__nonstring__))

Within the kernel, the macro __nonstring is used to shorten that attribute syntax. Work has been ongoing, primarily by Kees Cook, to fix all of the warnings added by GCC 15. Many patches have been circulated; quite a few of them are in linux-next. Cook has also been working with the GCC developers to improve how this annotation works and to fix a problem that the kernel project ran into. There was some time left to get this job done, though, since GCC 15 has not actually been released — or so Cook thought.

Fedora 42 has been released, though, and the Fedora developers, for better or worse, decided to include a pre-release version of GCC 15 with it as the default compiler. The Fedora project, it seems, has decided to follow a venerable Red Hat tradition with this release. Linus Torvalds, for better or worse, decided to update his development systems to Fedora 42 the day before tagging and releasing 6.15-rc3. Once he tried building the kernel with the new compiler, though, things started to go wrong, since the relevant patches were not yet in his repository. Torvalds responded with a series of changes of his own, applied directly to the mainline about two hours before the release, to fix the problems that he had encountered. They included this patch fixing warnings in the ACPI subsystem, and this one fixing several others, including the example shown above. He then tagged and pushed out 6.15-rc3 with those changes.

Unfortunately, his last-minute changes broke the build on any version of GCC prior to the GCC 15 pre-release — a problem that was likely to create a certain amount of inconvenience for any developers who were not running Fedora 42. So, shortly after the 6.15-rc3 release, Torvalds tacked on one more patch backing out the breaking change and disabling the new warning altogether.

This drew a somewhat grumpy note from Cook, who said that he had already sent patches fixing all of the problems, including the build-breaking one that Torvalds ran into. He asked Torvalds to revert the changes and use the planned fixes, adding: "It is, once again, really frustrating when you update to unreleased compiler versions". Torvalds disagreed, saying that he needed to make the changes because the kernel failed to build otherwise. He also asserted that GCC 15 was released by virtue of its presence in Fedora 42. Cook was unimpressed:

Yes, I understand that, but you didn't coordinate with anyone. You didn't search lore for the warning strings, you didn't even check -next where you've now created merge conflicts. You put insufficiently tested patches into the tree at the last minute and cut an rc release that broke for everyone using GCC <15. You mercilessly flame maintainers for much much less.

Torvalds stood his ground, though, blaming Cook for not having gotten the fixes into the mainline quickly enough.

That is where the situation stands, as of this writing. Others will undoubtedly take the time to fix the problems properly, adding the changes that were intended all along. But this course of events has created some bad feelings all around, feelings that could maybe have been avoided with a better understanding of just when a future version of GCC is expected to be able to build the kernel.

As a sort of coda, it is worth saying that Torvalds also has a fundamental disagreement with how this attribute is implemented. The __nonstring__ attribute applies to variables, not types, so it must be used in every place where a char array is used without trailing NUL bytes. He would rather annotate the type, indicating that every instance of that type holds bytes rather than a character string, and avoid the need to mark rather larger numbers of variable declarations. But that is not how the attribute works, so the kernel will have to include __nonstring markers for every char array that is used in that way.

Index entries for this article
KernelGCC


to post comments

No safeguards?

Posted Apr 24, 2025 16:27 UTC (Thu) by estansvik (guest, #127963) [Link] (51 responses)

Seems like a broken development process if patches can be added without going through review and CI. I thought the kernel had that, even for patches from Linus?

No safeguards?

Posted Apr 24, 2025 18:53 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link] (42 responses)

Automated testing/CI? No.

There's automated /build/ testing that would've caught this, which Linus skipped. But beyond that, it's "every subsystem for itself", which means filesystem folks are regularly stuck doing QA for the rest of the kernel.

I've had to triage, debug, and chase people down for bugs in mm, sched, 9p, block, dm - and that's really just counting the major ones that blew up my CI, never mind the stuff that comes across my desk in the course of supporting my users.

6.14 was the first clean rc1 in ages, and I was hoping that things were improving - but in 6.15 I've lost multiple days due to bugs in other subsystems, again. And this is basic stuff that we have automated tests for, but people don't want to fund testing, let alone be bothered with looking at a dashboard.

I have fully automated test infrastructure, with a dashboard, that I could start running on other subsystem trees today, and wrapping other test suites is trivial (right now it mainly runs fstests and my own test suite for bcachefs).

People just don't care, they're happy to do things the same old way they've always done as long as they still get to act like cowboys.

No safeguards?

Posted Apr 25, 2025 5:37 UTC (Fri) by marcH (subscriber, #57642) [Link] (8 responses)

> I have fully automated test infrastructure, with a dashboard, that I could start running on other subsystem trees today, and wrapping other test suites is trivial.

If you have already paid for enough resources, just go and do it. At least for all the suites and coverage that don't meltdown your infrastructure.

> People just don't care, they're happy to do things the same old way they've always done as long as they still get to act like cowboys.

_Some_ people don't care. But, there are these wonderful things called "name and shame", "peer pressure", etc. It's not nice so people don't say it out loud but the massive success of CI is largely based on those. There are some memes, search "you broke the build" for instance. Don't get me wrong: these are neither an exact science nor a silver bullet. So it may make a large difference in some areas and very little in others. But for sure it _will_ make some difference and be worth it.

If you can for a reasonable effort, then stop thinking and discussing about it; just go and do it.

No safeguards?

Posted Apr 25, 2025 6:00 UTC (Fri) by koverstreet (✭ supporter ✭, #4296) [Link] (4 responses)

> If you have already paid for enough resources, just go and do it. At least for all the suites and coverage that don't meltdown your infrastructure.

I've got enough hardware for my own resources, but it's not cheap, and I run those machines hard, so I'm not terribly inclined to subsidize the big tech companies that don't want to pay for testing or support the community. Already been down that road.

And I don't want to get suckered into being the guy who watches everyone else's test dashboards, either :)

It really does need to be a community effort, or at least a few people helping out a bit with porting more tests, and maintainers have to want to make use of it. I've got zero time left over right now for that sort of thing, since I'm trying to lift the experimental label on bcachefs by the end of the year.

No safeguards?

Posted Apr 25, 2025 18:47 UTC (Fri) by marcH (subscriber, #57642) [Link] (3 responses)

> I have fully automated test infrastructure, with a dashboard, that I could start running on other subsystem trees today, and wrapping other test suites is trivial

2 comments later:

> ... but it's not cheap, and I run those machines hard, so I'm not terribly inclined to subsidize the big tech companies that don't want to pay for testing or support the community. Already been down that road.

Understood but please don't give everyone false hopes again :-)

A last, somewhat desperate attempt to change your mind: please don't underestimate the awesome power of "role-modeling"[*]. For instance, you could pick a test suite that regularly finds regressions in only the couple, worst subsystems and run a small test subset to keep usage to a minimum? Based on what you wrote, this could be enough to continuously highlight regressions in those subsystems and continuously crank up the pressure on them to take over what you continuously demonstrate. If you keep the test workload small, this should cost you nothing but the initial time to set it up, which you wrote would be small. Who knows, other subsystems might even fear you'll come after them next? :-) Sorry, I meant: be continuously impressed by that demo and desire copying it?

BTW wouldn't that also help your _own_ workload, at least a bit? I mean you have to keep debugging this incoming flow of regressions anyway, don't you?

[*] A personal, very recent example: I finally figured out the security model of https://github.com/actions/cache , combined it with ccache and cut down kernel compilation in paltry GitHub runners from 10 minutes down to 30 seconds. If I had a "role-model" like you said you could be, I would have done this months earlier!

No safeguards?

Posted Apr 25, 2025 20:46 UTC (Fri) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

> Understood but please don't give everyone false hopes again :-)

I'm trying to motivate other people to step up and help out, either by getting it funded or contributing tests, by talking about what's already there and what we do have.

I am _not_ going to get this done on my own. But I'll certainly help out and lead the effort if other people are interested.

No safeguards?

Posted Apr 26, 2025 0:41 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)

From a "motivational" perspective, seeing some random test suite in CI is "nice". But it's nothing like seeing (someone else) demo your _own_ test suite automated. The latter is mind blowing; it's really night and day. Bonus points when you see it catching some of your own regressions.

There are some obviously subjective elements (familiarity, ...) but there is also a more objective "gap" because the devil really is in the details: replicating with a different test suite is very rarely "trivial".

> > > and wrapping other test suites is trivial.

That "trivial" was likely optimistic :-)

No safeguards?

Posted Apr 26, 2025 0:59 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link]

> That "trivial" was likely optimistic :-)

Here's the generic wrapper for fstests:

https://evilpiepirate.org/git/ktest.git/tree/tests/fs/fst...

And the bcachefs wrapper on top of that:

https://evilpiepirate.org/git/ktest.git/tree/tests/fs/bca...

bcachefs has more wrappers for testing nocow, largebs, and then a bunch more wrappers for kasan/kmsan/lockdep/etc., but that's the basics.

There's a tiny patch to fstests to have it emit ktest style test start/end markers, but overall there's really not much, and other test suites (blktests, mmtests) are similar in style to fstests - so it's really not bad.

Again though, I'm not going to volunteer my time for work other subsystems should be doing themselves when I already have a filesystem to write.

People need to stop acting so helpless. I've provided the tools to make this easy, the rest is up to the community.

No safeguards?

Posted Apr 25, 2025 7:42 UTC (Fri) by josh (subscriber, #17465) [Link] (2 responses)

> But, there are these wonderful things called "name and shame", "peer pressure", etc. It's not nice so people don't say it out loud but the massive success of CI is largely based on those.

No, one of the many massive successes of CI is that it gets *rid* of those. The right answer to "you broke the build" is not "shame on you", it's "shame on our lack of tooling, that that wasn't caught before it was merged".

No safeguards?

Posted Apr 25, 2025 14:20 UTC (Fri) by marcH (subscriber, #57642) [Link]

There are developers who have the discipline and desire to try to break their own code before sharing it. There are others who do not and prefer to wing it in order to "save time" and effort. They don't push all the test buttons they have and wait for bug reports instead. The world is not that binary and the same people can be either at different times but you get the idea. Whether the latter people actually save time themselves is debatable (bug fixing can be extremely time-consuming) but for sure they massively disrupt and waste the time of the former people and of the project as a whole.

The "name and blame" comes from version control and "git blame"; CI does not change that. But automation acts as a dispassionate referee by removing most of the personal and subjective elements like:
- Removes debates about workspace-specific configurations: the automated configurations are "gold".
- The first messenger is a robot
- Choice of test coverage and priorities. You still need to discuss what gets tested, how often etc. but these discussions happen when configuring CI, _before_ regressions and tensions happen.

It's not a silver bullet and you still have projects that ignore their own CI results, don't have enough test coverage, have enraged CI debates,... but in every case it exposes core issues at the very least which is still major progress.

No safeguards?

Posted Apr 25, 2025 18:13 UTC (Fri) by marcH (subscriber, #57642) [Link]

> ... lack of tooling, that that wasn't caught before it was merged

_Pre-merge_ checks are critical and they make indeed a massive difference wrt. to avoiding disrupting others and reducing tensions, good point.

Automation is not just "pre-merge" though. Longer, post-merge daily/weekly test runs are still required in the many projects that limit pre-merge checks to less than ~1h for prompt feedback; those projects will still have some regressions merged. Much fewer and narrower regressions merged but still some.

There is also the funny issue of A and B passing separately but not together. Rare but happens. This is solved by Gitlab "merge trains", Github "merge queues" and similar but these require more infrastructure.

Last but not least: the issue of flaky tests, infra failures and other false positives that degrade SNR and confuse "aggressive" maintainers who merge regressions. And who'd want to fix the flaky tests or the infra? Gets little credit and promotions. As often, the main issue is not technical. It's cultural and/or a business decision.

No safeguards?

Posted Apr 25, 2025 6:53 UTC (Fri) by estansvik (guest, #127963) [Link] (1 responses)

Okay, I feel your pain Kent.

I'm not doing kernel stuff, but thought that patches were at least gated by some build testing. Pretty amazing for such a high profile project to not at least have that. So it's all done on scout's honor?

No safeguards?

Posted Apr 29, 2025 0:53 UTC (Tue) by Paf (subscriber, #91811) [Link]

The merge process is “Linus hits a button on a git command”, how could it be gated on anything without a more complex infra?

No safeguards?

Posted Apr 26, 2025 17:48 UTC (Sat) by marcH (subscriber, #57642) [Link] (30 responses)

I saw your email about Unicode testing at https://lore.kernel.org/lkml/l7pfaexlj6hs56znw754bwl2spco...

> It is _not_ enough to simply rely on the automated tests.
> You have to have eyes on what your code is doing.

I wanted to reply there but that thread turned into a shouting match that was again leveraged to make some clicks and I don't want my name in the middle of that. So I'm going to reply here instead. It's relevant here too.

The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

For a start, temporarily revert your changes and run the tests. If they still pass, then you know for sure there is no coverage. Another great technique I use all the time: deliberately insert one line bugs. This works really well. If there is coverage, it's an actually faster way to find where it is[1]. You will also be amazed to discover how many tests do provide coverage but don't notice failures and never failed in their entire life! Especially true with test code in C and bash that ignore errors by default (errexit goes a long way but not all the way)

But too many developers either never heard of these basic testing techniques (code must go "forward" always!) or don't want to discover that they have to fix or even - the horror - add new tests. That's double the work when you thought you were done _and_ customers and users don't run it so it's generally less valued in the industry[2]. On the other hand, letting others find your bugs does not always have consequences; I mean what's a couple angry emails and reverts? There's a good chance it won't affect your career at all. Just keep winging it, it works after a few tries.

There's been fantastic progress here and there, but when you look at the greater picture this is still a generally young and immature industry and a management generally clueless about quality. Every manager knows who writes the most code, few know who breaks the most stuff and costs the most time. To be a bit fairer with managers, when there is poor test coverage then _no one_ really knows who breaks the most stuff!

[1] deliberate breakage is also the only way to make sense of obscure systemd like build systems or the C preprocessor. I digress.
[2] one of the telltales is the "scripts/" directory found in most projects. "scripts/" does not mean anything besides "less important".

No safeguards?

Posted Apr 26, 2025 18:03 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

> The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique

That's a brilliant one. I use that regularly, but I never would've thought to document it - it's great for illustrating the right mindset, too. I might add that to bcachefs's Submitting patches.

No safeguards?

Posted Apr 29, 2025 14:30 UTC (Tue) by jezuch (subscriber, #52988) [Link] (1 responses)

I think it's called mutation testing?

No safeguards?

Posted Apr 29, 2025 14:32 UTC (Tue) by jezuch (subscriber, #52988) [Link]

Bah, should've read the entire thread before answering :)

No safeguards?

Posted Apr 27, 2025 0:03 UTC (Sun) by gmaxwell (guest, #30048) [Link] (2 responses)

I have a set of shell scripts that will go through source code and make varrious mutations one at a time such as change + to -, replace && with ||, replace 0 with ~0, change 1 to 2 or 2 to 1, swaps <, >, and =, inserts negations, blanks a line entirely, etc. Then it attempts to compile the code with optimizations. If it compiles, the script checks the sha256sum of the resulting binary against all the hashes it's seen before and if it's a new hash it runs the tests. If the tests pass the source is saved off for my manual inspection later.

The single point changes tend to not make errors which self cancel out, and usually if an error does cancel out or the change is in code that doesn't do anything the binary will not change. In code where tests have good condition/decision branch coverage most things this procedure catches are test omissions or bugs.

This approach is super slow and kludgy, I've been repeatedly surprised and frustrated that no one has made a C-syntax aware tool to do similar testing without wasting tons of time on stuff that won't compile or won't make a difference (e.g. mutating comments.. though sometimes I've addressed this by first running the code through something that removes all the comments).

But it's worked well enough for me and parsing C syntax is far enough away from the kind of programming I enjoy that I haven't bothered trying to close this gap myself.

No safeguards?

Posted Apr 27, 2025 1:51 UTC (Sun) by roc (subscriber, #30627) [Link] (1 responses)

This is called mutation testing. There are a lot of existing tools for it, some of which are C-syntax-aware. Also there are mutation-testing tools that work by patching binary code.

No safeguards?

Posted Apr 28, 2025 14:49 UTC (Mon) by daroc (editor, #160859) [Link]

LWN covered one such tool for Rust code in October. I've tried it in some of my personal projects since then and found it somewhat useful for expanding my test suites.

No safeguards?

Posted Apr 27, 2025 13:55 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

I like all of what's mentioned about testing, but I think it's worth mentioning one radical option that people need to have in the back of their heads when thinking about tests: Exhaustive testing.

256 seems like lots to us, and so we instinctively don't want want to try all 256 possible inputs to a function which takes a single byte. But 256 is nothing to a machine, so exhaustive tests are an effective choice here and might catch bugs in weird cases you hadn't considered.

Obviously you can't always do this, for a variety of reasons, and when you can it may be too slow and need to run overnight or something - but it's worth having the idea in your mind because when you can just try everything that's it, you're done, all inputs were tested.

I was testing my impl TryFrom<f32> for realistic::Real and impl From<Real> for f32 to check that they round trip when non-NaN and finite. I quickly discovered a one epsilon problem for some values, not a big deal given the relatively low precision of 32-bit floats but good to know and worth fixing - however a million into the exhaustive testing it found huge deviations because my previous "I know what to test" testing hadn't hit upon some important cases and the exhaustive testing had stumbled onto these - we're talking an order of magnitude size error like oops 0.07 isn't 0.0067

Orders of magnitude and exhaustive testing

Posted Apr 27, 2025 18:54 UTC (Sun) by farnz (subscriber, #17727) [Link]

The combination of property testing and order of magnitude thinking can help you find your way here, if you weren't already thinking about it.

Property testing randomly generates a test case, and confirms that a property holds (like your "must round trip from f32 to realistic::Real to f32 correctly"); there's libraries that can help with this, but it's also reasonable to write your own property-based tester if the problem space doesn't benefit from shrinking test inputs. Once you have a property tester, it's not hard to use it to find out how much of the input domain you can test in one second - you adjust the number of random inputs you generate until your test case takes about a second to run.

From there, you need to know that an hour is about three times 103 seconds, and a day is about 80% of 105 seconds, while your input space size is roughly 10number of bits * 3 / 10; a 32 bit space is thus about 109.6 items (which you can round to 1010 - approximation is the name of the game here), and a 64 bit space is about 1019.2 (which you can round to 1020).

You then put the two together to work out how long your test would take if you stopped randomly generating test cases, and instead just went exhaustive; if your property test can test around 106 items in a second, then a day's run will cover 1011 possibilities. This is more than the number of possibilities in 32 bits, so you can exhaustively test the 32 bit space in a day. Similarly, if you can get your random tester to test around 1013 possibilities in a second on the available hardware, you know that a day will cover about 1018 tests, and so you need 102 (or 100) days to exhaustively test a 64 bit space (which, while slow, is fast enough that you might leave it running and pick up all the cases it finds for manual testing into the future, at least on significant releases).

And, of course, you can short-circuit this if, in your judgement, a single data item can be tested in few enough clock cycles; you know that 1 GHz is 109 clock cycles per second, so if you judge that your test takes under 1,000 clock cycles per item, that's 106 tests per second, which can exhaust a 32 bit space in a day.

No safeguards?

Posted Apr 28, 2025 16:34 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (21 responses)

> The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

As stated in another reply, this is mutation testing. One of my ideas is to use code and mutation testing to discover "this code affects this test" relations so that one could take a diff and run just the tests that "care" about it. This would help with test turnaround time during patch review (while still doing full runs prior to merging). One of our bottlenecks in our regular CI is that testing always runs everything even for "obviously cannot be affected by the diff" changes. It'd be much nicer to cycle just the relevant tests to green before moving onto the full test suite.

No safeguards?

Posted Apr 28, 2025 19:16 UTC (Mon) by marcH (subscriber, #57642) [Link] (19 responses)

> > The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

> As stated in another reply, this is mutation testing.

Yes and no.

If you ask a developer who is not interested in finding problems "Could you please perform some mutation testing?" then guess what will happen: nothing at all (Mutation what?)

On the other hand, if you tell them "Did you realize breaking this or that line passes the tests anyway?" after they proudly claimed to have tested their changes, then there is a small chance it will make some difference.

Who knows; the younger ones who don't think they know everything yet might even feel like they just tasted something useful and "upgrade" to more advanced and extensive mutation testing in the longer term.

Baby steps!

No safeguards?

Posted Apr 28, 2025 19:55 UTC (Mon) by pizza (subscriber, #46) [Link] (14 responses)

> On the other hand, if you tell them "Did you realize breaking this or that line passes the tests anyway?" after they proudly claimed to have tested their changes, then there is a small chance it will make some difference.

...or respond with "patches with improved tests welcome!"

No safeguards?

Posted Apr 29, 2025 0:02 UTC (Tue) by marcH (subscriber, #57642) [Link] (13 responses)

If someone submitting changes is seriously asking someone ELSE
to fix their _own_ lack of test coverage, right after being caught red-handed lying about said coverage, well now there is very simple, clear and compelling evidence that this person cannot be trusted with testing.

That's pretty far from not wanting to perform "mutation testing" or some other fancy word that most people won't even bother googling.

No safeguards?

Posted Apr 29, 2025 0:34 UTC (Tue) by pizza (subscriber, #46) [Link] (12 responses)

> If someone submitting changes is seriously asking someone ELSE to fix their _own_ lack of test coverage, right after being caught red-handed lying about said coverage, well now there is very simple, clear and compelling evidence that this person cannot be trusted with testing.

That's a nice sequence of what-ifs.

But even if it's true, so what? That person is not your supplier [1].

[1] https://www.softwaremaxims.com/blog/not-a-supplier

No safeguards?

Posted Apr 29, 2025 1:17 UTC (Tue) by marcH (subscriber, #57642) [Link] (5 responses)

> That's a nice sequence of what-ifs.

This is indeed a pretty specific and hypothetical path that we followed... _together_. Until now?

> But even if it's true, so what?

Then there are two possibilities:

1. The maintainer of that subsystem does not care and merges code anyway.
2. He cares and does not merge.

In _either_ case, everyone can draw very clear, evidence-based and useful conclusions about the quality of that subsystem.

The most important thing in quality is not the quality level itself. That level does matter of course, but what is even more important is not being ignorant and having some mere idea of where quality stand.

Take a look at this very short section
https://en.wikipedia.org/wiki/ISO_9000_family#ISO_9000_se...
It's all about processes, evidence and transparency. It's not concerned about defining what's "good" or "bad" quality, it's more about having some metrics in the first place - which unfortunately cannot be taken for granted.

When testing is "underrated" and mostly ignored in code reviews, that quality information is not even available, no one knows! Maybe the engineer who submits the code has been following some strict but private company QA process? Or maybe he just winged the whole thing due to unreasonable deadlines. Who knows - anyone with a bit of experience in this industry has already seen both. So, even a very basic "did you test this?" discussion already goes a long way.

Quality information is critical and actionable: it lets a company that sells some actual Linux-based product decide whether they should rewrite the Bluetooth subsystem or implement their own sound daemon versus stress-testing the existing one and participating upstream. Just some random examples; this sort of decisions happens all the time because open-source is "not a supplier".

Note an evidence-based, testing discussion is also (and in many cases: has been) very useful to reduce maintainer overload. FAIL | UNTESTED -> NACK. Done! Next (assuming that subsystem is interested in landing in some products)

No safeguards?

Posted Apr 29, 2025 3:02 UTC (Tue) by pizza (subscriber, #46) [Link] (4 responses)

> Take a look at this very short section of [ISO9000]

Seriously?

If you want ISO9000 compliance from me, you had damn well better be paying me.

If you're not, I repeat: I AM NOT YOUR SUPPLIER.

No safeguards?

Posted Apr 29, 2025 4:31 UTC (Tue) by marcH (subscriber, #57642) [Link] (3 responses)

> Seriously?

Clearly not:

> If you want ISO9000 compliance from me...

No safeguards?

Posted Apr 29, 2025 14:36 UTC (Tue) by pizza (subscriber, #46) [Link] (2 responses)

> Clearly not:

I've been part of the team dragging an organization through ISO9000 certification.
I've also worked for organizations in highly regulated spaces where ISO9000 was just the first of many, many steps.

But hey, if you're all about that process, guess what? You can report the test coverage bug through official channels, it will be triaged and prioritized based on the documented process, and it it meets the actionable threshold (ie it's the supplier responsibility to fix, as opposed to "new work" which may itself require further negotiations) it will be added to the development backlog. Eventually, it gets handed to a developer, and has to be QA'd and signed off by whatever else the process requires, and _eventually_ will land in some some future release.

So if you want that level of assurance and process from your suppliers? It's going to cost you... a _lot_. Not just in money, but time as well.

What's that, you have no formal contract that specifies the deliverables, compensation, and processes for reporting problems, plus the SLA for responses? Then they're not your supplier, and you have precisely *zero* legal (or moral) right to demand anything from them. Enjoy the "AS IS, NO WARRANTY WHATSOEVER" terms of the software you didn't pay for.

No safeguards?

Posted Apr 29, 2025 16:11 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

I'm really sorry that have you been hurt by ISO 9000 in the past and that this tangent of mine acted as a "trigger" for you. But this was just a tangent, I know nothing about ISO 9000 specifically and there is not much I can do to help with your healing process. On the contrary, my key message was: baby steps, don't even use fancy words like "mutation" to avoid scaring people.

I bet you're not the only one in such a difficult situation, maybe there are other people who can help?

No safeguards?

Posted Apr 29, 2025 20:06 UTC (Tue) by pizza (subscriber, #46) [Link]

> I bet you're not the only one in such a difficult situation, maybe there are other people who can help?

I'm afraid the difference between "can help" and "will help" is effectively insurmountable -- at least when there is no money involved.

No safeguards?

Posted Apr 29, 2025 8:29 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

> But even if it's true, so what? That person is not your supplier [1].

I think you're missing the point that that person cannot be trusted. If you've got any sense, you're going to blacklist him as a supplier ... especially if you're paying him in credibility!

"Honesty is the best policy" - I do my best to test my code, but I still regularly get bitten by quirks of the language, things I've forgotten, etc etc. And I try and make sure that my "customers" know what is tested and what isn't. The fact that half the time they don't listen, and the other half they don't understand, isn't my problem. Well it is, I have to fix the mess, but that's another incentive for me to try and get it right.

Cheers,
Wol

No safeguards?

Posted Apr 29, 2025 14:20 UTC (Tue) by pizza (subscriber, #46) [Link] (1 responses)

> I think you're missing the point that that person cannot be trusted. If you've got any sense, you're going to blacklist him as a supplier ... especially if you're paying him in credibility!

How many times do I have to point out that this person is *NOT* your supplier?

You're going to get better resuts when starting with "hey, your test coverage is missing something, here's a patch that fills the gap" versus "I created a situation that resulted in a failure but the existing tests didn't catch it, and if you don't fix this coverage gap immediately you're clearly lying about everything and can't be trusted blablabla"

No safeguards?

Posted Apr 29, 2025 16:01 UTC (Tue) by marcH (subscriber, #57642) [Link]

> versus "I created a situation that resulted in a failure but the existing tests didn't catch it, and if you don't fix this coverage gap immediately you're clearly lying about everything and can't be trusted blablabla"

Well beyond a strawman: it's elevated to an art form! :-)

Open source upstreams aren't suppliers

Posted Apr 29, 2025 14:50 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

The question then becomes where you're going to go for open source software if you blacklist all upstreams that refuse to act as suppliers. You can, of course, pay Red Hat, SUSE, Canonical, CIQ or others to act as your supplier based on top of open source upstreams, but then you're not going straight to the person who writes the code.

In the end, an upstream is not a supplier; there's even a different word to describe the relationship, since you have supplier/customer relationships in business, and upstream/downstream relationships in open source. Expecting an upstream to act as a supplier is opening yourself up to a world of pain every time the upstream's priorities and yours don't coincide.

Open source upstreams aren't suppliers

Posted Apr 29, 2025 15:12 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> In the end, an upstream is not a supplier; there's even a different word to describe the relationship, since you have supplier/customer relationships in business, and upstream/downstream relationships in open source. Expecting an upstream to act as a supplier is opening yourself up to a world of pain every time the upstream's priorities and yours don't coincide.

I thought we were talking about a DOWNSTREAM SUPPLYING faulty patches.

So we're blacklisting people who can't be arsed to supply properly working, tested code.

I have to be careful here, as I've done exactly that, but I've done my best to provide stuff that works (in a language, Scheme, that I find very difficult to work with), and I just have to accept that if nobody else sees value in what I've done they won't take it on. But if I behaved "entitled" and expected somebody to finish the work for me, then they have every right not to want to do business with me.

It all boils down, once again, to the "entitled" mentality a lot of people seem to have about other people doing work for "free" ...

Cheers,
Wol

Open source upstreams aren't suppliers

Posted Apr 29, 2025 15:31 UTC (Tue) by farnz (subscriber, #17727) [Link]

They're either an upstream offering you a project, or a downstream offering you a patch. They are not a supplier in either case, and expecting them to behave like a supplier is going to lead to problems and misunderstandings, precisely because that's the wrong sort of relationship to imagine.

Upstream and downstream also gets interesting because, unlike supplier and customer, it's one where the same entities dealing with the same project can change roles; Linus Torvalds is upstream of me in the Linux kernel fork he runs (and that we generally accept as the mainline), but if I chose to run my own Linux fork, Linus could choose to be downstream of me and submitting patches to me, or pulling patches from my upstream project into his downstream project. And, for added fun, Linus and I can swap roles - I can treat Linus as my upstream when I pull in 6.16, but then treat him as my downstream if he notices that I have a useful change that he'd like in his kernel.

No safeguards?

Posted May 2, 2025 7:53 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

It's not that easy. Saying "mutation testing" to a developer who has never heard of it is probably not the best way to evangelize mutation testing... but it *is* the best search term to find tooling that enables you to actually do it (without having to reinvent everything from scratch). So we're stuck using fancy ten dollar words for these things at least some of the time.

No safeguards?

Posted May 2, 2025 14:44 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

Like all people in a "privileged" work environment, I'm not sure you realize how ridiculously little testing some code gets before being submitted. Discussing "mutation testing" with the corresponding submitters is like trying to discuss literature with children learning to read. Please wait until they've reached middle school? In the mean time, "manually" point at a couple lines in their submission and ask them if any test fails when they break them. You might rarely ever interact with any such people so you can't notice but I can promise you there are some who never bother to try anything like that. Learning to do that does not require _any_ search which is already too much for people absolutely not interested in spending any time finding any bug with their code[*]. The very first ("baby") step is fixing that _mindset_ and that's already difficult enough. Don't scare these people with textbooks, at least not at first.

I've used an education analogy which makes me wonder: does any programming education teach you anything about test coverage and trying to break your own code? Or version control, or code reviews, or CI, or any quality topic,... I don't remember any at all but it was a while ago. I learned it all on the job. But these were full time software jobs. Now think about all the people who do not software full time and think: How hard could software be? If it were hard, it wouldn't be called "soft"ware :-)

[*] that's the job of the "validation team". Their precious time should be spent writing new bugs^H code.

No safeguards?

Posted May 2, 2025 14:56 UTC (Fri) by marcH (subscriber, #57642) [Link]

> does any programming education teach you anything about test coverage and trying to break your own code? Or version control, or code reviews, or CI, or any quality topic,...

How could I forget the "ugliest" child of them all: build systems :-D

No safeguards?

Posted May 2, 2025 15:34 UTC (Fri) by Wol (subscriber, #4433) [Link]

> Now think about all the people who do not software full time and think: How hard could software be? If it were hard, it wouldn't be called "soft"ware :-)

Job security? If software is hard, you have to leave it to the professionals?

I've only once worked in a pure software environment - it drove me almost suicidal. Pretty much every job I've had has been a small DP team supporting end users. There's no reason why software should be hard. If you have a mixed team of professional end users who can program, professional programmers who can end-user, AND EASY-TO-USE SOFTWARE, then doing things "right" isn't hard. That's why I'm a Pickie!!!

(And I don't call Excel, SQL, BQ/Oracle/etc easy to use.)

Cheers,
Wol

No safeguards?

Posted Apr 29, 2025 15:15 UTC (Tue) by kleptog (subscriber, #1183) [Link]

We solve this by having a "make quick-test" target which looks the the last commit and based on the directories of the files modified runs some specific subset of the tests. So if you're modifying the Makefile or some shared repo it's still going to take a while. But for the majority of patches you end up only running the tests for a single subsystem which reduces the turnaround time significantly.

Sure, it occasionally happens that a patch does actually break a test in another subsystem that you didn't expect, but that's pretty uncommon.

Like you, I wanted to automated this dependency detection, but a hand-maintained list gave all almost all the bang for very little buck.

No safeguards?

Posted Apr 24, 2025 19:17 UTC (Thu) by ballombe (subscriber, #9523) [Link]

"regression testing"? What's that? If it compiles, it is good, if it boots up it is perfect.

No safeguards?

Posted Apr 25, 2025 9:09 UTC (Fri) by Avamander (guest, #152359) [Link] (6 responses)

You seem to expect too modern development practices from the kernel. Keep in mind they're still lugging around patches over email.

Can't wait for them to switch to actually usable tools where patches don't get stuck in spam filters or between different mailing lists. Maybe then I'll bother to change a few small bugs that have been in the kernel for years.

No safeguards?

Posted Apr 25, 2025 12:02 UTC (Fri) by pizza (subscriber, #46) [Link] (5 responses)

> Maybe then I'll bother to change a few small bugs that have been in the kernel for years.

You'll just come up with some other excuse.

No safeguards?

Posted Apr 25, 2025 13:31 UTC (Fri) by Avamander (guest, #152359) [Link] (4 responses)

> You'll just come up with some other excuse.

I've actually submitted patches, but will not do so any longer because how truly terrible the workflow is. It's not worth the hassle, especially not for small changes.

Have you contributed anything though? I doubt it for some reason.

That's enough

Posted Apr 25, 2025 13:37 UTC (Fri) by corbet (editor, #1) [Link]

We really do not need to be flinging mudballs at each other; stop, please?

No safeguards?

Posted Apr 25, 2025 14:43 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> Have you contributed anything though? I doubt it for some reason.

I'm in the MAINTAINERS file.

No safeguards?

Posted Apr 30, 2025 9:43 UTC (Wed) by Avamander (guest, #152359) [Link] (1 responses)

> I'm in the MAINTAINERS file.

That explains it.

No safeguards?

Posted Apr 30, 2025 11:21 UTC (Wed) by pizza (subscriber, #46) [Link]

>> I'm in the MAINTAINERS file.
>That explains it.

Explains what, exactly?

Bad Practice

Posted Apr 24, 2025 16:42 UTC (Thu) by JanSoundhouse (subscriber, #112627) [Link] (8 responses)

IMO its pretty hard to side with Linus on this one. Making last minute changes without even checking if its actually working is pretty bad practice. Does Linus not have of access to some kind of infrastructure that does the absolute minimal checks in the form of "does-it-compile"? Works on my machine, lets ship it!

Maybe someone can help and setup some basic actions for him on his github repo?

Bad Practice

Posted Apr 25, 2025 6:49 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Agreed. The number of times "oh, this is trivial" and skipped CI has bitten me is a significant percentage of the times I've done it. *Sometimes* it works, but that "this is trivial" judgement really hampers the self-review process for spotting actual issues. But eating crow when you screw up is not a fatal thing…accept it, learn from it, and move on.

Bad Practice

Posted Apr 25, 2025 8:14 UTC (Fri) by jlargentaye (subscriber, #75206) [Link]

I agree that, as presented, this is a particular poor showing from Torvalds.

> Maybe someone can help and setup some basic actions for him on his github repo?

Surely you're joking. Or are you unaware of his poor opinion of GitHub's every design decision? To make it clear, the GitHub Linux repo is offered only as a read-only mirror.

Bad Practice

Posted Apr 25, 2025 15:38 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (5 responses)

This wasn't 'shipped'. It was a release candidate published so that others could test it, precisely to catch bugs which might only occur in environments different from the one the person who put together the release candidate used. Considering that the behavoir of GCC 15 and GCC 14 differs with regard to this particular kind of using an attribute, it's also not inappropriate to refert to the actual issue as a GCC 14 bug or at least a property of GCC 14 the GCC developers no longer consider useful.

Bad Practice

Posted Apr 25, 2025 16:19 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (4 responses)

No, not at all. There is a set of supported GCC versions which is much larger than "whatever Linus has on his machine". If he didn't want to check what was in linux-next or on the mailing list he totally could have worked around the issue on his machine, but he shouldn't have pushed untested crap to the -rc.

Bad Practice

Posted Apr 25, 2025 17:21 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (3 responses)

Then, what's your theory about the reason for this change of behaviour fromm GCC 14 to GCC 15 if it was neither a bugfix nor something the GCC developers considered a necessary improvement? Random mutation perhaps?

The very point of having a release candidate is to enable testing by others. It's not a release and bugs are expected.

Bad Practice

Posted Apr 25, 2025 18:21 UTC (Fri) by marcH (subscriber, #57642) [Link]

> The very point of having a release candidate is to enable testing by others. It's not a release and bugs are expected.

Come on, failure to compile (!) with everything except a pre-release GCC is absolutely not expected from a _release candidate_. Yes, people are expected to test release candidates and find bugs but they are not expected to waste everyone's time with that sort of issue in 2025. This steals from actual test time.

The Error of -Werror

Posted Apr 25, 2025 18:45 UTC (Fri) by mussell (subscriber, #170320) [Link]

So right now if you try to build the 6.15-rc3 tag with GCC 14 and CONFIG_WERROR, the error you will get is
/media/net/src/linux/drivers/acpi/tables.c:399:1: error: ‘nonstring’ attribute ignored on objects of type ‘const char[][4]’ [-Werror=attributes]

399 | static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst __nonstring = {

And the GCC documentation for nonstring, says that
The nonstring variable attribute specifies that an object or member declaration with type array of char, signed char, or unsigned char, or pointer to such a type is intended to store character arrays that do not necessarily contain a terminating NUL.
According to the C standard (and contrary to popular belief), char* and char[] are two distinct types as the latter has storage associated with it (ie. in .rodata) while the former is a single machine word (assuming pointers are represented by machine words.) What seems to have changed in GCC 15 is that you can now declare an array of char arrays as nonstring. On older compilers, trying to use an attribute where it can't be used gives a warning from -Wattributes, which is upgraded to an error by -Werror.

From my perspective, GCC did the right thing by allowing nonstring to be applied to char[][] since it aligns with programmers' expectations that char*[] and char[][] are basically the same. In fact, I consider <GCC 15's behaviour a bug and I see no reason not to backport this change to earlier versions. Really this is just -Werror doing -Werror things.

Bad Practice

Posted Apr 26, 2025 8:42 UTC (Sat) by jwakely (subscriber, #60262) [Link]

There's no GCC bug here. GCC 15 has a new feature, which isn't supported by GCC 14 so it gives a warning, which is escalated to breaking the build because of -Werror

Using new features needs to be gated on version checks or feature test macro checks (e.g. using __has_attribute). This has nothing to do with whether you're using a snapshot from two weeks before the GCC 15 release or you're using the final GCC 15.1 release, the behaviour is the same, because it's not a bug in some unstable early development preview. It's the expected behaviour of GCC 15.

Imagine: the future

Posted Apr 24, 2025 17:17 UTC (Thu) by marcH (subscriber, #57642) [Link] (22 responses)

Imagine a programming language where a string is a real type and not just an array of byte. Maybe, the language could keep track of the length instead of using a special, escape character?

Imagine CI environments that continuously compile and run some basic checks with a range of toolchains and environments.

Imagine not upgrading your OS the day before a release. Or adding last minute, unreviewed and barely tested changes, a.k.a "Friday Deployment".

Imagine some "team development software" that automagically connect dots and people with this new fancy "hyperlink" concept, so people spend less time searching email archives and git logs. It could be called a "forge"?

All too futuristic, sorry. I bet no one has ever ventured in such uncharted territories.

Imagine: the future

Posted Apr 25, 2025 8:20 UTC (Fri) by eru (subscriber, #2753) [Link] (21 responses)

C "strings" work the way they do because C is a low level language, where you want to be able to do low-level things when necessary. It's a feature, not a deficiency.
If you don't need byte-level control, there are of course lots of languages with smart strings and other nice things, and most userland applications should definitely be written in such languages, not C.

Imagine: the future

Posted Apr 25, 2025 8:29 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

Note that there's nothing stopping C from having better size-annotated strings except the dearth of useful APIs for such things in the standard library and crappy package management. Personally, I think size-annotated strings are *better* than NUL-terminated strings because you don't have to reallocate to get a substring that doesn't happen to coincide with the end of the source string. You just return a struct with the size and an interior pointer. Of course, the lack of tracking ownership for such things in C is also a major problem (which C++ indeed has with `std::string_view`), but I think it'd be much nicer overall than having to juggle `\0` bytes at the right places.

Imagine: the future

Posted Apr 26, 2025 11:25 UTC (Sat) by LtWorf (subscriber, #124958) [Link] (1 responses)

You would need to hide writing behind an API to have copy on write, and keeping track of owner. Basically a C++ QString.

Imagine: the future

Posted Apr 28, 2025 12:16 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

And…? `strcat` is a function and not some operator, so it's not like NUL-terminated strings really "win" there either. IMO, removing direct writes via pointers to strings would be a further benefit.

Imagine: the future

Posted Apr 25, 2025 10:15 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (17 responses)

No. C is like this because it's just barely enough to get software out the door in the mid-early 1970s.

You can do all this stuff "where you want to be able to do low-level things when necessary" in Rust, but Rust wouldn't have fitted on the mini computer a research team owns in 1974. The question is, why do we still care about whether the compiler could fit on a a 50+ year old machine ? I argue we should not care about that.

Dennis specifically tried to get fat pointers into C later and this proposal was rejected. Fat pointers, if you don't know, are how Rust's &str knows how big that string slice is - instead of an ordinary pointer it's a pointer plus more data the same size, in this case a length. If you're register starved (as older machines often were) this is expensive, but modern ISAs often have lots of named registers (and far more actual registers behind them for perf reasons) so this is an excellent choice on anything vaguely modern.

Fat pointers are also how Rust chooses to do dynamic dispatch, a pointer to "A bird of some kind" plus a pointer to "A table of functions for this kind of bird" is a fat pointer which enables us to emit code which doesn't care specifically what kind of bird this is, only that it's a bird and here's how to do bird things with it.

Imagine: the future

Posted Apr 25, 2025 11:15 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

I'm aware of why it is like it is from its roots. But improvements can be done. At least conceptually, because…

> Dennis specifically tried to get fat pointers into C later and this proposal was rejected.

Well, that's disappointing. Sometimes the standardization committees are their own worst enemy :( . (FD: I'm a member of the C++ committee, but not C.)

Imagine: the future

Posted Apr 27, 2025 7:56 UTC (Sun) by wahern (subscriber, #37304) [Link]

In Ritchie's proposal[1] the fat pointers weren't just a 2-tuple like Rust's str. They were an N-tuple, according to the dimension of the array. char[][] would require a 3-word fat pointer, which is actually more space efficient compared to, say, an array of Rust str's, while preserving the ability to cast between fat and non-fat multidimensional arrays without copying. Also, technically, Rust therefore doesn't actually have fat pointers in the sense of Ritchie's proposal.

I don't know why his approach was rejected. Perhaps it was the same reason Rust's fat-pointer-like data types aren't as general as what Ritchie proposed? But C99 ended up adopting the alternative approach he argued against--VLAs. And proposals for addressing some of the criticisms he pointed out seem poised for adoption, though they unfortunately missed the C23 window.

[1] See https://web.archive.org/web/20151226050349/https://www.be...

Imagine: the future

Posted Apr 25, 2025 11:52 UTC (Fri) by uecker (guest, #157556) [Link] (3 responses)

I think parent is right. Having a fat pointer means you embed some hidden information and this is what is not "low-level." You can have length-prefixed string pointers in C just fine and I use them in some projects. You certainly do not need Rust for this. As some other commenter pointed out, the APIs are all just build around traditional C strings and we would need to agree on a single string pointer type to really get people to switch (which we should do in WG14).

Imagine: the future

Posted Apr 25, 2025 13:28 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

How is the fat pointer "hidden information"? What's hidden about it ?

The thing for WG14 to have done was land the fat pointer types for say C99. I think that would have been one of those "controversial at the time but rapidly proved correct" choices like removing gets in C11. If you (or WG14 as a whole) make that happen in C2b that'd be welcome, but it's very late, I do not expect I would make any use of this, the decades of C programming are behind me.

The pascal-style length-prefixed string is completely orthogonal to the fat pointer string slice. That's why if I look inside a Rust binary it has text like "EPERMENOENTESRCHEINTREIOENXIOE2BIGENOEXECEBADF" baked inside it, there aren't any zero terminators _or_ length prefixes, neither is needed to refer to EPERM or E2BIG in that text with a slice.

Imagine: the future

Posted Apr 27, 2025 10:29 UTC (Sun) by wahern (subscriber, #37304) [Link]

There were two competing proposals, Tom MacDonald's Variable Length Arrays (http://jclt.iecc.com/Jct13.pdf#page=67), and Ritchie's Variable-Size Arrays (http://jclt.iecc.com/Jct22.pdf#page=5). The committee eventually went with MacDonald's, apparently. Both proposals seemed to emphasize the utility for numerical computing a la Fortran, not for automatic bounds checking. Also, neither resolved the function parameter syntax problem--i.e. array parameter decay. In Ritchie's proposal, same as in MacDonald's, you were (syntactically) passed a pointer to an array, which required (syntactic) dereferencing. Ritchie's example prototype: foo(int (*a)[?][?]). MacDonald's (and VLAs since C99): foo(int n, int m, int (*a)[n][m]). In both cases sizeof (*a) worked as expected, but resolving at runtime.

Notably, Ritchie's proposal lacked variable size automatic storage arrays. With the VLA proposal, you can declare arrays on the stack: int a[n][m];. In Ritchie's proposal you were required to use malloc (or alloca?): int (*a)[?][?] = (int (*)[n][m])
malloc(n * m * sizeof (int)));. I don't think this was an intrinsic limitation; I think Ritchie just objected to the ambiguity of how and when sizeof evaluated the integral expression(s) used in the type definition (i.e. n and m above). MacDonald's paper mentions that the compiler would have to cache the evaluated value so it reflected the value of the type at it's declaration; subsequently modifying n or m wouldn't change the result of sizeof. Except when using VLAs in parameters, or with variable length structures (part of the original proposal), this wasn't quite true (or true but irrelevant), something fat pointers avoids.

It would have been better if we ended up with Ritchie's proposal. But I surmise (based on those papers alone) that MacDonald's won the day because 1) it was easier to implement--no ABI changes nor introduction of fat pointers into the compiler architecture; 2) GCC seemed to already have most of the implementation already, albeit with a slightly different syntax; 3) Ritchie gave short shrift to declaring automatic storage variable size arrays; 4) MacDonald was working at Cray so presumably was deemed to speak for pressing industry demands. #3 and #4 seem especially pivotal given everybody's preoccupation with numerical computing rather than bounds safety per se.

Imagine: the future

Posted Apr 25, 2025 16:54 UTC (Fri) by khim (subscriber, #9252) [Link]

> I think parent is right. Having a fat pointer means you embed some hidden information and this is what is not "low-level."

That's just nonsense. By that reasoning we shouldn't have structs, or opaque pointers or any other ways of “hiding” information… yet that's literally bread-and-butter of any large project, include Linux kernel.

> You certainly do not need Rust for this.

You need something-else-not-C for that, though.

We have all that stupidity because C literals behave like they do and without changing language one couldn't fix that problem.

> As some other commenter pointed out, the APIs are all just build around traditional C strings and we would need to agree on a single string pointer type to really get people to switch (which we should do in WG14).

Yeah, that, too. Rust got saner strings than C because it started from scratch while C++ is a mess because it haven't.

The question of “how often new language have to be introduced” is a good one, but it feels as if the good answer is “somewhere between “every 10 years” and “every 20 years”… with all languages being supported for about 3-5x as long”.

Simply because certain things do need a full rewrite on new foundations with certain critical fixes embedded… and yet the only way to efficiently do that is via the well-known “one funeral at a time” way… means languages have to chage with generations… and these happen once per approximately 15 years.

Imagine: the future

Posted Apr 25, 2025 15:11 UTC (Fri) by excors (subscriber, #95769) [Link] (2 responses)

I've recently been looking at FORTRAN 77 and it effectively takes the fat pointer approach too. You declare a character variable (i.e. a string) with a compile-time-constant size, like `CHARACTER*(72) MSG`. Subroutines can declare a dummy argument (parameter) as `CHARACTER*(*)`, meaning the caller determines the size of the string. Assignment will automatically truncate to the string's actual size, so buffer overflows are impossible (in this case); and assignment of a shorter string will pad the result with space characters. (Most built-in operations ignore trailing spaces, e.g. 'A' .EQ. 'A ' will return .TRUE., so you can have variable-length strings in statically-allocated storage without an explicit terminator.)

That also works when you pass a substring into a call, much like string slices in Rust, meaning the size may be dynamic and the compiler has to pass it as a hidden argument alongside the pointer to the bytes.

(Subroutines can also declare the dummy argument with an explicit size, in which case the size passed by the caller is ignored. If it's declared as larger than the actual size, you get undefined behaviour.)

It seems to get a bit tricky when you have functions that return `CHARACTER*(*)` (meaning the caller has to provide the storage and pass it as another hidden argument) and use concatenation (`MSG = "HELLO " // WORLD()` etc, meaning an efficient compiler has to figure out where the result of the concatenation is going to be assigned to and how much has already been written before it calls the function, so WORLD can write its output directly into a substring of MSG, and if there are zero bytes left then the compiler is explicitly allowed to not call the function at all). And character arrays are weird.

But in general, considering it was half a century ago, it doesn't seem an entirely terrible system.

Earlier versions of FORTRAN sound actually terrible since there was no CHARACTER type, apparently you'd just store multiple characters (maybe up to 10 depending on platform) in an INTEGER or DOUBLE PRECISION variable and manipulate them arithmetically. CHARACTER wasn't standardised until FORTRAN 77, long after the origins of C, so I guess it wouldn't have served as inspiration for C, but at least those ideas were around in the 70s.

Imagine: the future

Posted Apr 25, 2025 16:20 UTC (Fri) by eru (subscriber, #2753) [Link]

FORTRAN was (and I guess still is) used for number crunching, and strings were needed mainly for labelling data in output to fanfold paper. So you can

      WRITE 5,100
100   FORMAT(41HGET AWAY WITH NONEXISTENT STRING FEATURES)

Imagine: the future

Posted Apr 25, 2025 19:08 UTC (Fri) by Wol (subscriber, #4433) [Link]

(As I understand it, the "official" spelling is FORTRAN IV and earlier, Fortran 77 and later ...)

So Fortran actually has strings as part of the language. I'd have thought it had a double-size - the space available, and the space used.

And from using FORTRAN, I know we had a string library, and we just shoved them into an integer array, or used Hollerith variables, as in 16HThis is a string. I don't personally remember using Hollerith, though.

Cheers,
Wol

Imagine: the future

Posted Apr 25, 2025 16:28 UTC (Fri) by eru (subscriber, #2753) [Link] (7 responses)

I don't know much about Rust, but I am guessing it then needs also raw pointers and conversions between them and fat pointers, otherwise some low-level operations would be impossible to program.

I believe there is absolutely no point in extending C with such things now. C occupies its particular niche as a "just do what I say" language fine, and someone who is not happy with its feature set can and should use some more modern language, like Rust.

Imagine: the future

Posted Apr 25, 2025 17:14 UTC (Fri) by khim (subscriber, #9252) [Link]

> I don't know much about Rust, but I am guessing it then needs also raw pointers and conversions between them and fat pointers, otherwise some low-level operations would be impossible to program.

Sure. from_raw_parts is there for cases where it's needed (like in FFI).

> C occupies its particular niche as a "just do what I say" language fine

I would say that C occupies the niche of “it should just go away, eventually” language.

Just like we no longer use PL/I (like Multics) or Pascal (like a MacOs)… C should, eventually, go away, too.

It's just a bit sad that with all the fascination about managed code people stopped thinking about replacement for low-level languages… we have got one, sure, but that was actually an accident, the plan was to produce yet-another-high-level-language.

P.S. And, of course it would be stupid to start the grand rewrite from OS kernel: OS kernels are extremely stubborn and it takes a lot of time to do them right… but apparently we are slowly going in that direction, anyway. Studies show that green, just written Rust code is comparable in quality to C code that had 3-5 years of bugfixing… means Rust rewrite only makes sense when code in question is rewritten for some other reason, not just because “it's Rust, it's shiny, let's rewrite everything”… even if, somehow, people are still doing it, anyway.

Imagine: the future

Posted Apr 25, 2025 17:16 UTC (Fri) by intelfx (subscriber, #130118) [Link] (1 responses)

> C occupies its particular niche as a "just do what I say" language fine

As repeatedly demonstrated by optimizing compilers, the problems start when C does *not* "just do what I say".

Imagine: the future

Posted Apr 25, 2025 18:53 UTC (Fri) by marcH (subscriber, #57642) [Link]

Mandatory reference...

https://queue.acm.org/detail.cfm?id=3212479 "C Is Not a Low-level Language"

Rust thin and fat pointers

Posted Apr 25, 2025 17:17 UTC (Fri) by farnz (subscriber, #17727) [Link] (3 responses)

In Rust, whether a raw pointer is "thin" or "fat" is determined by the type of the pointee; a *const u32 is always a thin pointer to a u32, while a *const (dyn Debug) is always a fat pointer. Note that wherever I say *const, there's an equivalent *mut for raw pointers to mutable memory, and you can only get mutable (exclusive) references from *mut pointers.

There's also an API (marked experimental, because there's a chunk of details that everyone wants to get right before it stabilises and cannot be changed if it's bad) that lets you "split" any pointer into a *const () (which is always thin) and an opaque representation of the other part (either a zero-sized type for a thin pointer, or the other half of the fat pointer) and rebuild a pointer from a thin pointer and the opaque representation of the other half.

Additionally, there's casts that let you do any conversions between raw pointer types that you want to do, and a set of APIs (the "strict provenance" APIs) which exist to allow you to write code that will fail to compile if it does things that aren't guaranteed to work on all reasonable CPUs and compilers - at the expense of also failing to compile some things that are guaranteed to work on all reasonable compilers for your choice of CPU.

It ends up being a much bigger API surface for raw pointers than C has, but that's largely because where C's API surface says "make a mistake in a reasonable use of pointers, and you have UB", Rust is aiming for "there is a function in the API surface that will either work, or fail to compile, if your use of raw pointers is reasonable".

Rust thin and fat pointers

Posted Apr 26, 2025 14:06 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

AFAIK The strict provenance APIs don't get you a guarantee your code won't _compile_ if it can't actually work with a provenance model for pointers, but only that Rust's tooling, such as MIRI, and hardware features such as CHERI will be able to spot that this isn't working and blow up, rather than the program having some subtle issue that's basically impossible to debug.

AIUI, suppose we intend to smuggle a 2-bit value in the bottom two bits of pointers to a 4-aligned structure, Rust's compiler will allow us to write code to do: integer smuggled + 64-bit real pointer -> 64-bit smuggling pointer, and to do: 64-bit smuggling pointer -> integer smuggled + 64-bit real pointer . If we screwed up and in some cases we're using the bottom six bits not bottom two because our arithmetic was faulty, the compiler will not catch that and the Rust will compile to maybe do something crazy, however if we use MIRI (which is much slower) to test this code MIRI will be able to understand that we smashed some pointer bits and never restored them, if it sees that happen in execution it will flag exactly what went wrong, meaning we can probably find and fix our bug if we can reproduce it under MIRI.

Without a strict provenance API the code is still wrong, but the tooling just goes "Eh, it's a pointer, I have no idea" and so we can't be sure why it's corrupted or what went wrong. This will have to be true for many Linux kernel drivers doing MMIO - if we're supposed to emit a write to address 0x4008 and 0x4009 to flip bits in an interrupt controller there's no provenance, those are just magic numbers, they have no internal rationale those are just the correct addresses - so the strict provenance APIs are not relevant and a tool like MIRI can't understand what's going on (not that MIRI is suitable for debugging a kernel driver AFAIK) which is why parallel "exposed provenance" APIs were provided in Rust as well as the strict provenance for these cases where the tools can't help and you need to acknowledge that it's on you to choose the right magic addresses.

Rust thin and fat pointers

Posted Apr 26, 2025 14:16 UTC (Sat) by hmh (subscriber, #3838) [Link] (1 responses)

I sure hope people get this through CHERRI first, to ensure you could still run Rust on a future platform with hardware-based enforcing of pointer provenance (i.e. "fat pointers in hardware which get invalidated if you look at them wrong, and trap").

If there is *one* think C got right, is that standards-compliant C (which is *difficult* to write) actually respects pointer provenance and forbids type-conversion roundtrip shenanigans that destroy hardware-based pointer provenance... But, as usual, the C compiler will be only too happy to do whatever you say and break that (which is why it is difficult to write such code without extra guidance).

Rust thin and fat pointers

Posted Apr 26, 2025 20:28 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

Historically C did not have a coherent pointer provenance rule. Here's https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm Defect Report #260 from the start of this century in which provenance basically is summoned into existence to explain why question 2 "if two objects hold identical representations derived from different sources, can they be used exchangeably ?" has the answer "No" in C.

Martin Uecker (who has commented elsewhere in this LWN thread) helped write the forthcoming ISO/IEC TS 6010 which you could think of as like an "addendum" for standard C which says here's how we think provenance should work. It is functionally equivalent to what's offered in Rust, albeit obviously C's pointer types don't have member functions, etc. and so the API is IMO nowhere close to the same ergonomics.

Yes, if you use Rust's strict provenance APIs to meddle with pointers, what you're doing makes sense under CHERI hardware (just one R) just as it does under Rust's software MIRI, typically operations which remain sound under this scenario are things like "We'll hide some flag bits", "We're going to use provenance from pointer A with data from related pointer B" and "Look ma, XOR" all of which CHERI isn't bothered by.

The change to say Rust's usize is the same width as the _address_ of a pointer, rather than like C uintptr_t being the same width as the pointer itself, is what falls out of the existence of CHERI. The strict provenance APIs are compatible, but they'd make sense even if CHERI didn't exist.

Cataloguing errors and fixes

Posted Apr 24, 2025 17:59 UTC (Thu) by geofft (subscriber, #59789) [Link]

A couple years back at my day job we set up a Slack bot to auto-respond to particular error messages (with regex match) and suggest what you should do about them. Usually we'd run into this situation with something that had been fixed on trunk, but would still affect people with out-of-date clones, people intentionally working on older branches, production systems that hadn't been redeployed, etc. A few of us noticed we were spending a lot of time manually replying on Slack, which also meant the person asking had to wait until someone saw their message and replied; the bot was a way to write a high-quality answer once and get it to people as soon as they answered. It got to the point where we would proactively set up regex matches and replies for migrations that we knew would have backwards incompatibilities, even before shipping the migration.

In this case, you could imagine Kees setting up a responder for these particular GCC errors and a reply saying that the patches to fix them are over here and are waiting to be pulled into mainline (assuming, hypothetically, that there were some system like Slack or even email that Linus would have posted a message to saying, "anyone run into these errors before?").

I've always thought that it would be a neat next step to integrate this directly into command-line output, somehow, so that if your terminal ran a command that produced an error, it could look it up against some trusted/curated source of known errors applicable to your work and display an alert if it recognized one. The key trick here is that the catalog of errors has to be online in some fashion: this isn't just about improving error messages or behavior in the code you have on disk, because you want to be able to intelligently respond to problems that were discovered after whatever git commit you're on. ("Online" could of course mean that you regularly download a database of errors and query it locally; it doesn't have to involve sending your build logs somewhere in real time, but the database has to be updated frequently.)

make -j allmodconfig

Posted Apr 25, 2025 9:45 UTC (Fri) by adobriyan (subscriber, #30858) [Link]

It took 30 minutes to compile amd64 allmodconfig on my previous potato (8c/16t) and it takes 13.5 minutes on the current one (16c/32t).

So it is ~4 hours per allmodconfig per 1c/2t core per 1 compiler.

Allyesconfig is even worse.

Good luck with that, dear volunteers.

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 11:29 UTC (Fri) by ballombe (subscriber, #9523) [Link] (9 responses)

GCC 15 will introduce a change of the default C standard level which cause some software to fail to build. For a list of example, see <https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=ftbfs-g...>
However GCC has a reliable and well documented release timeline, so software project know when they need to publish a new version to support gcc 15.

By releasing Fedora with an unreleased version of gcc, Fedora cause all such software project to have to rush to fix gcc-15 related breakage before that date or appear broken to fedora users.

This is not respectful of the gcc team nor of the other projects.

This is gcc 2.96 all over again.

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 12:38 UTC (Fri) by jwakely (subscriber, #60262) [Link] (7 responses)

No, this is overblown rubbish. Fedora does this *every year*, and the release of fedora 42 with a GCC 15.0.1 snapshot happened 10 days before the release of the final GCC 15.1 version.

Do you really think everybody was waiting for those last 10 days to fix their code, and were just caught unawares by fedora 42 being a few days sooner? Don't be silly.

As for not being respectful of the gcc team, the people who made the fedora changes are many of the same people releasing GCC 15. The maintainer for fedora's gcc package was also the release manager of GCC 15.1.

GCC 2.96 was a completely different situation, with a version that didn't exist upstream, containing unreleased changes unique to the "2.96" release. The snapshot in fedora 42 is almost identical to the final GCC 15.1, and will be updated to a newer 15.1.1 snapshot in a few days.

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 17:49 UTC (Fri) by geert (subscriber, #98403) [Link] (6 responses)

Can you build all packages (incl. the Linux kernel) in Fedora 42 with the compiler that comes with Fedora 42?

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 18:14 UTC (Fri) by pizza (subscriber, #46) [Link] (4 responses)

> Can you build all packages (incl. the Linux kernel) in Fedora 42 with the compiler that comes with Fedora 42?

Yes, that is a hard requirement for being shipped as part of F42.

(Though that may require shipping patches that are not yet upstream)

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 22:32 UTC (Fri) by AdamW (subscriber, #48457) [Link] (3 responses)

No it isn't. The F42 FTBFS (Fails To Build From Scratch) tracker still depends on several hundred open bugs - https://bugzilla.redhat.com/show_bug.cgi?id=2300528 . Not all of those are triggered by GCC 15, but some are for sure.

It's not practically possible to ensure *every* package in a distro as big as Fedora builds with a new compiler version, and the release date of the compiler is only really marginally related (GCC 15 is now officially released, but that doesn't mean all those apps have fixes upstream now).

Do not underestimate the impact of fedora breakage.

Posted Apr 26, 2025 0:00 UTC (Sat) by pizza (subscriber, #46) [Link] (1 responses)

> No it isn't. The F42 FTBFS (Fails To Build From Scratch) tracker still depends on several hundred open bugs - https://bugzilla.redhat.com/show_bug.cgi?id=2300528 . Not all of those are triggered by GCC 15, but some are for sure.

If it FTBFS, how can it be shipping as part of F42?

Do not underestimate the impact of fedora breakage.

Posted Apr 26, 2025 0:50 UTC (Sat) by AdamW (subscriber, #48457) [Link]

F42 contains the most recent successful build. That might be an early F42 build, an F41 build, even in rare cases an F40 build. https://dl.fedoraproject.org/pub/fedora/linux/releases/42... has 17 packages with 'fc41' in the name, for e.g.

Do not underestimate the impact of fedora breakage.

Posted Apr 26, 2025 8:28 UTC (Sat) by jwakely (subscriber, #60262) [Link]

>GCC 15 is now officially released, but that doesn't mean all those apps have fixes upstream now

Right. Shipping gcc-15.0.1 ten days before the official release (and then updating to gcc-15.1.1 immediately after the release) is really not a big deal. It would have made no practical difference if GCC had happened to release two weeks earlier, or F42 had been delayed a couple of weeks.

It's not like we radically change GCC just days before a new major release, breaking the ABI of all the packages that have been built with it up to that point.

The story here is Linus being cavalier about pushing without adequate testing, not fedora's policies which have been this way for years.

Do not underestimate the impact of fedora breakage.

Posted Apr 26, 2025 8:34 UTC (Sat) by jwakely (subscriber, #60262) [Link]

You can build the kernel, of course, and 99% or more of other packages. But that has nothing to do with whether F42 shipped with gcc-15.0.1 or gcc-15.1.0 or gcc-15.1.1

Fedora rawhide had been using gcc-15 for months, long before F42 branched from rawhide.

Do not underestimate the impact of fedora breakage.

Posted Apr 25, 2025 12:43 UTC (Fri) by pizza (subscriber, #46) [Link]

> By releasing Fedora with an unreleased version of gcc, Fedora cause all such software project to have to rush to fix gcc-15 related breakage before that date or appear broken to fedora users.

You conveniently forget that Fedora shipping the bleeding edge of compilers has been their MO from the beginning and that shipping an almost-released complier happens approximately every few releases. You also conveniently leave out the fact that Fedora packagers are the primary contributors of upstream fixes for problems that using bleeding-edge GCC uncovers.

> This is not respectful of the gcc team nor of the other projects.

...You mean the GCC folks that are employed by Red Hat... and produce and maintain the Fedora GCC packages?

Meanwhile, what's the effective difference for other projects? They will still need to be fixed for GCC 15 regardless, and approximately nobody pre-emptively tries major new compiler releases before its gets packaged and shipped in at least one major distribution. And even then, they probably don't care what a bleeding/rolling distribution like Fedora ships because the only binaries they support are produced by ancient EL/LTS releases.

(On that note, I recently filed a bug against AppImage's tooling because it currently barfs when you try to use it with binaries produced with <3yr-old binutils)

> This is gcc 2.96 all over again.

Oh, you mean it produces C++ binaries that are ABI-incompatible with both the previous _and_ subsequent releases due to an ongoing years-long rewrite of the C++ standard library, plus other breakages caused by fixing innumerable spec-compliance bugs?

GCC pre-releases and venerable traditions

Posted Apr 25, 2025 11:35 UTC (Fri) by decathorpe (subscriber, #170615) [Link] (8 responses)

All even-numbered Fedora releases ship with new major versions of GCC, that's nothing new. And given how GCC and Fedora development cycles line up, it's usually a snapshot close to the first stable release of the new version:
The Fedora 42 mass rebuild was done with a very early GCC 15 snapshot (this is basically the first "real world" testing new GCC versions get every year!), and it was now released with a very late GCC 15 snapshot that isn't quite yet the new 15.1 release (though I don't really understand why 15.1 is the first stable release and not 15.0). It's definitely not an ideal situation, but arguably better than shipping a less-tested major version ~5 months after release instead.

GCC pre-releases and venerable traditions

Posted Apr 25, 2025 12:49 UTC (Fri) by jwakely (subscriber, #60262) [Link] (7 responses)

>though I don't really understand why 15.1 is the first stable release and not 15.0

Because what number do you give "not yet 15.0" snapshots? 14.99?

15.0.0 means the development trunk for the 9-10 months of new features development, then 15.0.1 is a pre-release snapshot as we approach the release, then 15.1.0 is the first release. After that release, new snapshots from that branch are 15.1.1, until the 15.2.0 release then snapshots will be 15.2.1 and so on.

https://gcc.gnu.org/develop.html#num_scheme

GCC pre-releases and venerable traditions

Posted Apr 25, 2025 13:51 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

> Because what number do you give "not yet 15.0" snapshots? 14.99?

That's one option. What we do on CMake is, when branching, we set the `main` branch's patch number "to the stratosphere", usually with YYYYMMDD, but a static "high" number like 100 also works (though, e.g., the kernel may prefer 1000 here since it *does* reach 100+ patch releases). The minor level is shared between the latest release branch and `main`. So if you have official releases:

14.1.0 (released 2025-04-01)
14.1.1 (released 2025-04-13)
14.2.0 (released 2025-04-23)

while, correspondingly, `main` could be:

14.1.20250401 # post-14.1.0
14.1.20250402

14.1.20250412
14.1.20250413 # post-14.1.1
14.1.20250414

14.1.20250422
14.2.20250423 # post-14.2.0

We bump the patch level every night, but it may also make sense to do so only if the last change was not a patch-date bump. This allows main-tracking users to get at least some gradient on headed-to-the-next-version development if needed while not "reserving" a magic `.0` patch version interpretation or doing even/odd splits.

GCC pre-releases and venerable traditions

Posted Apr 26, 2025 8:16 UTC (Sat) by jwakely (subscriber, #60262) [Link] (2 responses)

But this means that main still has the major version 14 after a release branch is created for 14.0, and presumably stays like that all the way until main is about to become 15.0?

An advantage of the GCC scheme is that the trunk became 16 as soon as a release branch for gcc-15 was created. There can be no doubt that somebody using an early GCC 16 snapshot is using some version of GCC 15.x, which is good because they're now separate branches with diverging histories and diverging content.

The GCC scheme works *very* well for us in practice. Some people just don't like it because it's not what they're used to. But it's just a one-based numbering system for the versions within a given release series, it's not all that whacky. Not everything has to start counting from one :-)

GCC pre-releases and venerable traditions

Posted Apr 26, 2025 9:11 UTC (Sat) by jwakely (subscriber, #60262) [Link] (1 responses)

Also your scheme has a major disadvantage that 14.1.20250422 looks like it comes somewhere between 14.1.0 and 14.2.0 but that's only true chronologically, not in terms of feature (or bugs). You just have to "know" that a yyyymmdd value means the major number is misleading. Maybe they works for you, but it wouldn't for GCC, we really want to know that GCC 16.something is not from the same branch as 15.something

The GCC release numbering scheme is actually very logical and perfectly fits our branching and release workflow. It's funny, nobody has a problem with one-based numbering for major releases, everybody accepts that 1.0 is the first proper release and 0.9 comes before that, but as soon as we do that for 15.1.0 being the first release of the 15 series and 15.0.1 being before that, people are all "ugh, why do you have to be weird, it's not cool to be weird, you know".

GCC pre-releases and venerable traditions

Posted Apr 28, 2025 12:10 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

FWIW, I was answering "what else", not giving suggestions for GCC :) .

In any case, I like it and GCC's more than the even/odd flipping that used to be way more prevalent. AFAIK, HDF5 is a holdout on that front, but I struggle to think of others I come across these days.

GCC pre-releases and venerable traditions

Posted Apr 25, 2025 14:18 UTC (Fri) by decathorpe (subscriber, #170615) [Link] (2 responses)

Why introduce even more special numbers" Ways to indicate pre-releases that are not based on assigning arbitrary meaning to "special" numbers exist, like 15.0.0-pre.

GCC pre-releases and venerable traditions

Posted Apr 25, 2025 16:49 UTC (Fri) by jwakely (subscriber, #60262) [Link] (1 responses)

15.0.0 generally sorts before 15.0.0-pre though, and the advantage of the current system is that it doesn't need to add any new parts, it fits the longstanding major.minor.patchlevel numbering scheme.

GCC pre-releases and venerable traditions

Posted May 22, 2025 15:25 UTC (Thu) by sammythesnake (guest, #17693) [Link]

Debian uses a tilde for version numbers that should reliability sort before, rather than after, an upcoming version. You'll see this with the "backports" section (i.e. versions from testing adapted to be installed on the stable release of Debian)

E.g. there's foo-1.8 in stable, foo-2.0~bp1 in backports, and foo-2.0 in testing. When the next stable release of Debian happens, foo-2.0 will be considered a "newer" version than foo-2.0~bp1 and installed preferentially.

Typically, the last backports version is identical with the new stable version except for being compiled against (and declaring dependencies on) the older libraries available in stable.

Conceptually, it's like having a patch version number of "-1" which naturally sorts earlier than a patch version of "0" (i.e. foo-2.0~bp1 sorts like 2 point 0 point -1 point (bp)1 < 2 point 0 point 0)

I like the semantics of this approach, but it does require that whatever is sorting version numbers understands what the tilde means and isn't really a widespread enough idiom for that be be something one can assume in general.

Fedora not stable distro.

Posted Apr 25, 2025 14:38 UTC (Fri) by r1w1s1 (guest, #169987) [Link] (2 responses)

Personally, I’ve always seen Fedora as Red Hat’s testing lab — not a stable distro. But Linus uses it as a daily driver anyway. 🤷

Fedora not stable distro.

Posted Apr 25, 2025 18:28 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

Using a bleeding edge distro as a "daily driver" is fine. What is not fine is not having at least one _other_, more stable system at least compile-testing release candidates. Ideally this would of course be done in some basic kernel.org CI but short of having any CI for release candidates for some unknown reason, some other box or VM would also do the trick.

Fedora not stable distro.

Posted Apr 27, 2025 0:08 UTC (Sun) by shemminger (subscriber, #5739) [Link]

Agree, the point of being an upstream developer is to be a scout and clear the path.
It is much worse when some enterprise customer reports an issue when a new widget comes out.

on Fedora and GCC

Posted Apr 25, 2025 22:09 UTC (Fri) by AdamW (subscriber, #48457) [Link]

"Fedora 42 has been released, though, and the Fedora developers, for better or worse, decided to include a pre-release version of GCC 15 with it as the default compiler. The Fedora project, it seems, has decided to follow a venerable Red Hat tradition with this release."

Well, no, it's a bit more complicated than this.

We include a technically-prerelease GCC with *every* even numbered Fedora release. We have done all the way back to Fedora 28. Even-numbered Fedora releases have .0.1 gcc builds, which - per the GCC plan, https://gcc.gnu.org/develop.html - are late (stage 4) pre-releases. The first release in a GCC series is always versioned .1. Fedora 28 had gcc 8.0.1 (pre-release for 8.1). 30 had 9.0.1, 32 had 10.0.1 and so on.

This is nothing like the ancient-history GCC 2.96 thing, because it's all worked out and co-ordinated with the GCC developers, who are *entirely* aware of it. It's effectively part of the GCC development process: rebuilding everything in Fedora with the under-development GCC shakes out a lot of bugs and issues that otherwise might not be found till much later.

The kernel team in general is also aware of this, AFAIK. It really does seem like Linus was, uh, kinda headstrong in just slapping in a 'fix' for his box instead of working with Kees, here.

Attribute on type vs variable

Posted Apr 27, 2025 6:55 UTC (Sun) by wahern (subscriber, #37304) [Link] (1 responses)

> The __nonstring__ attribute applies to variables, not types, so it must be used in every place where a char array is used without trailing NUL bytes. He would rather annotate the type, indicating that every instance of that type holds bytes rather than a character string, and avoid the need to mark rather larger numbers of variable declarations. But that is not how the attribute works, so the kernel will have to include __nonstring markers for every char array that is used in that way.

I don't think it could work well as an attribute on the type. The relevant type is a char array, not just a char, nor (in this scenario) pointer-to-char. While you can typedef a char array, only to either an incomplete array type (typedef char foo[]) or an array of a specific size (typedef char foo[42]). I don't think there's a way to use the typedef while also being able to set the size at the variable declaration, as required by the example (cachefiles_charmap[64]). And you want to be able to define the size at the declaration, not the definition, in order to be able to use a string literal for initialization. You can use an initializer list ({ 'A', 'B', ...}), but that's not only more cumbersome, it entirely defeats the purpose as there would be no diagnostic to suppress in that case. You could in theory just set the attribute on the scalar typedef (type char foo) and make it work in the compiler, as Linus clearly had in mind, but it's ugly and arguably non-sensical--you're setting an attribute on a type that's only meaningful when the type is used derivatively. This kind of attribute exposes the gaps in C array semantics.

Seems what's really wanted here is the ability to use string literal syntax, but without an implicit trailing NUL, obviating the need to warn about it being dropped. But that's not the feature GCC has. :( This problem is kind of par for the course for alot of C extensions, especially ones involving arrays--leaky abstractions predicated on internal compiler semantics and conspicuously highlighting gaps in C array semantics.

Attribute on type vs variable

Posted Apr 27, 2025 9:34 UTC (Sun) by jwakely (subscriber, #60262) [Link]

>Seems what's really wanted here is the ability to use string literal syntax, but without an implicit trailing NUL, obviating the need to warn about it being dropped

I agree that some special literal syntax to exclude the nul would be useful, but I think there's also value in marking the variable as "not a string". That could allow the compiler to warn it you pass it to strlen, strcpy etc.

That might not be very relevant to the kernel but seems useful in general.

But outside the kernel I would just use C++ which has better ways to do all this anyway (user-defined literals, string views, std::array, ...)


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds