nullability annotations in C

Posted Feb 11, 2025 22:53 UTC (Tue) by khim (subscriber, #9252)
In reply to: nullability annotations in C by alx.manpages
Parent article: Maintainer opinions on Rust-for-Linux

> You may be happy that C23 changed that.

So it took 33 years to replace one function. Great. How quickly would it propagate through project of Linux size, at this rate? 330 or 3300 years?

> So yes, the language evolves slowly, but it continuously evolves into a safer dialect.

Sure, but evolution speed is so glacial that it only makes sense if you postulate, by fiat, that full rewrite in another language is not an option.

And don't forget that backers may, at some point, just throw up in towel, unable to deal with excruciating inability to change anything of substance.

Apple have Swift, Google is still picking between Rust and Carbon, but eventual decision would be one or another, but, notably, not C or C++, Microsoft seems to think about Rust too… so who would be left to advance C and C++ after all players that actually wanted to change it would leave?

> But it's not been forgotten.

The question is not whether it's forgotten or not, but if we can expect to see program with most pointers either market const or noalias.

And the simple answer, given the above example, is that one may spend maybe 10 or 20 years rewriting Linux kernel in Rust (yes, that's big work, but a journey of a thousand miles begins with a single step)… or go with with C – and then never achieve that. Simply because in 50 or 100 year, well before C would become ready to adopt such paradigm, everything would be rewritten in something else than C anyway.

Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice.

> Having researched into string APIs for some years almost half of my workday, returning a pointer is usually more useful.

No, it's not. It's only “more useful” if you insist on zero-string abominations. If your strings are proper slices (or standalone strings on the heap… only C conflates them, C++, Rust and even such languages as Java and C# have separate types) then returning pointer is not useful. You either need to have generic type that returns slice or return index. And returning index is more flexible.

> You just need to get right the issue with qualifiers.

No, you also need to guarantee that C would continue to be used. That's a tall order.

I wonder why no one ever made proper C/C++ replacement (as in: language that is designed to interoperate with C/C++ but is not built on top of C/C++ core) before Rust… but now, when it's done, we may finally face the question about why should we continue to support strange and broken C semantics with null-terminated strings… invent crazy schemes, CPU extensions – all to support something that shouldn't have existed in the first place.

That's not the question for the next 3-5 years, but in 10 years… when world would separate into competing factions… it would be interesting to see if any of them would stay faithful to C/C++ and what would they pick up instead to develop “sovereign software lands”.

nullability annotations in C

Posted Feb 11, 2025 23:22 UTC (Tue) by alx.manpages (subscriber, #145117) [Link] (10 responses)

> So it took 33 years to replace one function.

Not one. The entire libc has been updated in that direction.

> How quickly would it propagate through project of Linux size, at this rate?

The problem was devising the way to do it properly. Once we have that, the idea can be propagated more easily. It's not like you need a decade to update each one function.

The kernel can probably implement this for their internal APIs pretty easily. The kernel already supports C11, so all the pieces are there. The bottleneck is in committer and reviewer time.

> Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice.

I'm 31. I hope to continue using C for many decades. :)

> It's only “more useful” if you insist on zero-string abominations.

I do enjoy NUL-terminated strings, yes, which is why I find returning pointers more useful. The problems with strings, why they have been blamed for so long, wasn't really fault of strings themselves, but of the language, which wasn't as expressive as it could be. That's changing.

nullability annotations in C

Posted Feb 11, 2025 23:45 UTC (Tue) by khim (subscriber, #9252) [Link] (7 responses)

> I'm 31. I hope to continue using C for many decades. :)

You certainly would be able to do that, after all Cobol 2023 and Fortran 2023 both exist.

The question is how many outside of “Enterprise” (where nothing is updated… ever, except when it breaks down and falls apart completely) would care.

The problems with strings, why they have been blamed for so long, wasn't really fault of strings themselves, but of the language, which wasn't as expressive as it could be.

The problem with strings are precisely strings. It's not even the fact that NUL is forbidden to be used inside (after all Rust strings are guaranteed UTF-8 inside).

The problem lies with the fact that something that should be easy and simple (just look in register and you know the length) is incredibly hard with null-terminated strings. It breaks speculations, requires special instructions (like what SSE4.2 added or “no fault vector load”, used by RISC-V), plays badly with many algorithms (why operation that shouldn't change anything in memory at all – like splitting string in two – ever changes anything?).

Null-terminated strings are not quite up there with the billion dollar mistake but they are very solid contenders for the 2nd place.

> The problem was devising the way to do it properly.

Try again. _Generic is C11 and tgmath is C99.

Means were there for 12 or 24 years (depending on how you are counting), there was just no interest… till 100% guaranteed job-security stance that C would never be replaced (simply because all prospective “C killers” were either built around C core or were unable to support effective interop with C) was threatened by Rust and Swift.

Then and only then wheels started moving… but I'm pretty sure they would pretty soon be clogged again… when it would be realized that on one side only legacy projects are interested in using C anyway and the other side the majority of legacy projects just don't care to change anything unless they are forced to do that.

> The kernel can probably implement this for their internal APIs pretty easily. The kernel already supports C11, so all the pieces are there. The bottleneck is in committer and reviewer time.

Yeah, but that's precisely the issue: while existing kernel developers may want to perform such change they are already already overworked and overstressed… and newcomers, normally, want nothing to do with C. I guess the fact that exceptions like you exist gives it a chance… but it would be interesting to see how it'll work.

Kernel is one of the few projects that can actually pull that off.

nullability annotations in C

Posted Feb 13, 2025 8:46 UTC (Thu) by aragilar (subscriber, #122569) [Link] (6 responses)

I do wonder how much of people not adopting new features of C is/was due to the primary compiler of a certain OS basically limiting itself to C89, and the other options not integrating with said compiler/OS?

nullability annotations in C

Posted Feb 13, 2025 9:59 UTC (Thu) by taladar (subscriber, #68407) [Link] (5 responses)

So called enterprise/long term support distros also have a lot to answer for in terms of holding back the adoption of new features because they would be a problem for their back ports to their ancient 10-15 year old versions.

nullability annotations in C

Posted Feb 13, 2025 10:28 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

How would backports hurt anyone? Sure, you can only use GCC 12 on RHEL 7, but that beast was released more than ten years ago, before first version of Rust, even!

Sure, at some point backporting stops, but I don't think the hold ups are “enterprise distros” (at least not RHEL specifically): these, at least, provide some updated toolchains. GCC 12 was released in a year 2022, thus it's pretty modern, by C standards. “Community distros” don't bother, most of the time.

nullability annotations in C

Posted Feb 14, 2025 14:26 UTC (Fri) by taladar (subscriber, #68407) [Link] (3 responses)

Your link goes to the Developer Toolset, those are optional tools that can be used on the platform but are not used for the platform itself.

nullability annotations in C

Posted Feb 14, 2025 14:29 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

Who even cares what they use for he development of the platform itself?

Developers shouldn't even care about that, it's internal implementations detail.

nullability annotations in C

Posted Feb 17, 2025 8:49 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

It is relevant for the language features that can be used in code backported to the versions of software used in their distro, the entire point of this discussion thread. Or rather the language features that can not be adopted yet because people who want to do those backports will complain.

nullability annotations in C

Posted Feb 17, 2025 9:15 UTC (Mon) by khim (subscriber, #9252) [Link]

How is that relevant? Linux kernel was all too happy to adopt features not implemented by clang, and patches needed to support clang – and clang, at that point, was already used by Android, the most popular Linux distrubution used by billions… why RHEL should be treated differently?

Let RHEL developers decide what to do with their kernel: they can create special kgcc package (like they already did years ago) or rework features in any way they like.

nullability annotations in C

Posted Feb 12, 2025 6:14 UTC (Wed) by interalia (subscriber, #26615) [Link] (1 responses)

Out of interest, what are the compiler version requirements to use the new string functions? I don't do C/C++ much any more nowadays so I'm extremely rusty and out of date.

In theory the kernel could switch easily enough given review time as you say, but would doing this also require bumping the required compiler version for the kernel? If so I'm not sure if they would feel safe for doing so for quite a few years, and Rust would also advance in the meantime.

nullability annotations in C

Posted Feb 12, 2025 8:41 UTC (Wed) by alx.manpages (subscriber, #145117) [Link]

> Out of interest, what are the compiler version requirements to use the new string functions?

Any compiler that supports C11 should be able to support these.

Here's an example of how to write such a const-generic API:

```
alx@devuan:~/tmp$ cat strchr.c
const char *my_const_strchr(const char *s, int c);
char *my_nonconst_strchr(char *s, int c);

#define my_strchr(s, c) \
( \
_Generic(s, \
char *: my_nonconst_strchr, \
void *: my_nonconst_strchr, \
const char *: my_const_strchr, \
const void *: my_const_strchr \
)(s, c) \
)
alx@devuan:~/tmp$ gcc -Wall -Wextra -pedantic -S -std=c11 strchr.c
alx@devuan:~/tmp$
```

nullability annotations in C

Posted Feb 12, 2025 11:17 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (19 responses)

> > Having researched into string APIs for some years almost half of my workday, returning a pointer is usually more useful.
>
> No, it's not. It's only “more useful” if you insist on zero-string abominations.
> If your strings are proper slices (or standalone strings on the heap…
> only C conflates them, C++, Rust and even such languages as Java and C# have separate types)
> then returning pointer is not useful.
> You either need to have generic type that returns slice or return index.
> And returning index is more flexible.

Actually, I worked at a project that used counted strings (not terminated by a NUL, unless we needed to pass them to syscalls), and even there, functions returning a pointer were overwhelmingly more used than ones returning a count.

Consider the creation of a counted string:

```
s.str = malloc(s1.len + s2.len + s3.len);
p = s.str;
p = mempcpy(p, s1.str, s1.len);
p = mempcpy(p, s2.str, s2.len);
p = mempcpy(p, s3.str, s3.len);
s.len = p - s.str;
```

Equivalent code that uses a count would be more complex (and thus more unsafe):

```
s.str = malloc(s1.len + s2.len + s3.len);
s.len = 0;
s.len += foo(s.str + s.len, s1.str, s1.len);
s.len += foo(s.str + s.len, s2.str, s2.len);
s.len += foo(s.str + s.len, s3.str, s3.len);
```

nullability annotations in C

Posted Feb 12, 2025 12:04 UTC (Wed) by excors (subscriber, #95769) [Link] (17 responses)

I think the equivalent code in C should be using a library that provides a struct containing pointer, length and capacity. E.g. something like https://github.com/antirez/sds (not recommending this specific library, it's just the first one I found) where you can say:

sds s = sdsempty();
s = sdscatsds(s, s1);
s = sdscatsds(s, s2);
s = sdscatsds(s, s3);
sdsfree(s);

(and in the unlikely event that you're doing a lot of concatenation and really care about minimising malloc calls, you can add `s = sdsMakeRoomFor(s, sdslen(s1) + sdslen(s2) + sdslen(s3));` near the top). That makes it both simpler and safer than the original code. You should never be directly manipulating the length field.

(Of course in almost all other languages the equivalent code would be `s = s1 + s2 + s3;` which is even more simpler and safer.)

nullability annotations in C

Posted Feb 12, 2025 12:40 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (16 responses)

> (and in the unlikely event that you're doing a lot of concatenation and really care about minimising malloc calls, you can add `s = sdsMakeRoomFor(s, sdslen(s1) + sdslen(s2) + sdslen(s3));` near the top). That makes it both simpler and safer than the original code.

I disagree with the last sentence. It was true in the past, without powerful static analyzers. Managed memory within APIs hides information to the compiler (and static analyzer), and thus provides less safety overall, provided that you have a language expressive enough and a static analyzer powerful enough to verify the program.

Consider the implementation of mempcpy(3) as a macro around memcpy(3) (or an equivalent inline function that provides the same information to the compiler):

#define mempcpy(dst, src, n) (memcpy(dst, src, n) + n)

A compiler (which knows that memcpy(3) returns the input pointer unmodified; this could be expressed for arbitrary APIs with an attribute in the future, but for now the compiler knows memcpy(3) magically) can trace all offsets being applied to the pointer 'p', and thus enforce array bounds statically. You don't need dynamic verification of the code.

With a managed string like you propose, you're effectively blinding the compiler from all of those operations. You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.

nullability annotations in C

Posted Feb 12, 2025 12:49 UTC (Wed) by khim (subscriber, #9252) [Link] (15 responses)

> With a managed string like you propose, you're effectively blinding the compiler from all of those operations.

Why? What's the difference? If everything is truly “static enough” then managed string can be optimized away. That's not a theory, if you would look on Rust's example then temporary string is completely elided and removed from the generated code, C compiler (which is, essentially, the exact same compiler) should be able to do the same.

> You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.

So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library.

Haven't the history of Linux kernel fuzzing shown us that this approach simply doesn't work?

nullability annotations in C

Posted Feb 12, 2025 13:04 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (14 responses)

> So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library.

I personally use NUL-terminated strings because they require less (almost none) ad-hoc code. I'm working on a hardened string library based on <string.h>, providing some higher-level abstractions that preclude the typical bugs.

<https://github.com/shadow-maint/shadow/tree/master/lib/st...>

> Why? What's the difference?

Complexity. Yes, you can write everything inline and let the compiler analyze it. But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives). You can't beat the simplicity of <string.h> in that regard.

nullability annotations in C

Posted Feb 12, 2025 13:16 UTC (Wed) by khim (subscriber, #9252) [Link] (13 responses)

> But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives).

Nope. Things don't work like that. Smaller API may help human to manually optimize things, because humans are awfully bad at keeping track of hundreds and thousands of independent variables, but really good at finding non-trivial dependencies between few of them.

Compiler optimizer is the exact opposite: it doesn't have smarts to glean all possible optimizations from a tiny, narrow, API, but it's extremely good at finding and eliminating redundant calculations in different pieces on thousands lines of code.

> You can't beat the simplicity of <string.h> in that regard.

Possibly. And if your goal is something extremely tiny (like code for a smallest possible microcontrollers) then this may be a good choice (people have successfully used Rust on microcontrollers, but usually without standard library since it's too bit for them). But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?

nullability annotations in C

Posted Feb 12, 2025 13:27 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (12 responses)

> Compiler optimizer is the exact opposite

I never cared about optimized code. I only care about correct code. C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.

There's a reason why -fanalyzer works reasonably well in C and not in C++. All of that complexity triggers many false positives and negatives. Not being able to run -fanalyzer in C++ makes it a less safe language, IMO.

The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.

> But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?

Safety.

My string library has helped find and fix many classes of bugs (not just instances of bugs) from shadow-utils. It's a balance between not adding much complexity (not going too high-level), but going high enough that you get rid of the common classes of bugs, such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.

You'd have a hard time introducing bugs with this library. Theoretically, it's still possible, but the library makes it quite difficult.

nullability annotations in C

Posted Feb 12, 2025 13:53 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> I never cared about optimized code.

Then why are you even using C and why do we have this discussion?

> C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.

No, it's not. The fact that we have complicated things like browsers implemented in C++ but nothing similar was ever implemented in C is proof enough of that.

C++ may not be as efficient than C (especially if we care about size and memory consumption) but it's definitely safer.

But if you don't care about efficiency then any memory safe language would do better! Even BASIC!

> The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.

Why do you care about analyzer if alternative is to use something that simply makes most things that analyzer can detect impossible. Or even something like WUFFS if you need extra assurances?

But again: all these tricks are important if your goal is speed first, safety second. If you primary goal is safety then huge range of languages from Ada to Haskell and even Scheme would be safer.

> such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.

These are all examples of bugs that any memory-safe language simply wouldn't allow. C++ would allow it, of course, but that's because C++ was designed to be “as fast C but safer”… one may discuss about if it achieved it or not, but if you don't target “as fast C” bucket then there are bazillion languages that are safer.

nullability annotations in C

Posted Feb 13, 2025 10:43 UTC (Thu) by alx.manpages (subscriber, #145117) [Link] (1 responses)

This reminds me of Esperanto. Such a great language that everybody should learn it. If it works for you, that's great, but please don't tell me which language is safer _for me_. I know better.

nullability annotations in C

Posted Feb 13, 2025 19:05 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> This reminds me of Esperanto. Such a great language that everybody should learn it. If it works for you, that's great, but please don't tell me which language is safer _for me_. I know better.

No, you don't. No human can keep track of all of the C pitfalls in non-trivial code.

Even the most paranoid DJB code for qmail had root holes, and by today's standards it's not a large piece of software.

nullability annotations in C

Posted Feb 14, 2025 23:30 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> There's a reason why -fanalyzer works reasonably well in C and not in C++.

Yes, I agree. However, IIRC, it is because its main author (David Malcolm) is vastly more familiar with C than C++. Clang also has something like it in some of its `clang-tidy` checks, but I agree that GCC's definitely has a different set of things it covers, so they can coexist nicely.

nullability annotations in C

Posted Feb 15, 2025 0:12 UTC (Sat) by mb (subscriber, #50428) [Link] (7 responses)

>I never cared about optimized code. I only care about correct code.

Why not use an interpreted language then?

>My string library has helped find and fix many classes of bugs ...

Sure. Thanks for that.
Modern languages do that for free, though.

>but the library makes it quite difficult.

Modern languages make it about impossible.

nullability annotations in C

Posted Feb 15, 2025 0:24 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (6 responses)

> Why not use an interpreted language then?

Because C is my "mother tongue" regarding computers. I can write it much better than other languages, just like I can speak Valencian better than other --possibly easier-- languages.

nullability annotations in C

Posted Feb 15, 2025 0:51 UTC (Sat) by mb (subscriber, #50428) [Link] (5 responses)

>Because C is my "mother tongue"

That explains your "reasoning" indeed.

nullability annotations in C

Posted Feb 15, 2025 22:29 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (4 responses)

Why is "reasoning" quoted?

nullability annotations in C

Posted Feb 15, 2025 22:40 UTC (Sat) by mb (subscriber, #50428) [Link] (3 responses)

Because I put it in quotes.
And because "I always did it like this" isn't a reasoning that helps in discussions.

nullability annotations in C

Posted Feb 15, 2025 23:05 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (2 responses)

> Because I put it in quotes.

Why did you put it in quotes? Were you implying that my reasoning is inferior than yours? Isn't that offensive? Please reconsider your language.

> And because "I always did it like this" isn't a reasoning that helps in discussions.

It is, IMO. I'm not a neurologist. Are you? I'm not a expert in how people learn languages and how learning secondary languages isn't as easy as learning a mother tongue. But it is common knowledge that one can speak much better their mother tongue than languages learned after it. It should be those that argue the opposite, who should justify.

Or should I take at face value that I learnt the wrong language, and that somehow learning a different one will magically make me write better *without regressions*? What if it doesn't? And why should I trust you?

nullability annotations in C

Posted Feb 15, 2025 23:12 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

>> Because I put it in quotes.
>Please reconsider your language.

I will from now on block you here on LWN and anywhere else.

nullability annotations in C

Posted Feb 15, 2025 23:31 UTC (Sat) by alx.manpages (subscriber, #145117) [Link]

> I will from now on block you here on LWN and anywhere else.

Okay. You don't need to. Just asking me to not talk to you would work just fine. I won't, from now on. I won't block you, though.

nullability annotations in C

Posted Feb 12, 2025 12:15 UTC (Wed) by khim (subscriber, #9252) [Link]

But why would you need all that complexity? If you work with strings a lot… wouldn't you have convenience methods?

It Rust you would write something like this:

    [str1, str2, str3].concat().into_boxed_str()

And that's it. In C-like language that doesn't use “dot” to chain functions it would be something like:

   string_to_frozen_string(concat_strings(str1, str2, str2))

Or, maybe, even just

   concat_strings(str1, str2, str2)

Sure, NUL-terminated strings are a bed design from the beginning to the end, but also string.h interface is awful, as whole.

The only justification for that design is the need to produce something decent without optimizing compiler and in 16KB (or were they up to 128KB by then?) of RAM.

Today you have more RAM in your subway ticket and optimizing compilers exist, why stick to all this manual manipulations where none are needed?