nullability annotations in C
nullability annotations in C
Posted Feb 11, 2025 22:53 UTC (Tue) by khim (subscriber, #9252)In reply to: nullability annotations in C by alx.manpages
Parent article: Maintainer opinions on Rust-for-Linux
> You may be happy that C23 changed that.
So it took 33 years to replace one function. Great. How quickly would it propagate through project of Linux size, at this rate? 330 or 3300 years?
> So yes, the language evolves slowly, but it continuously evolves into a safer dialect.Sure, but evolution speed is so glacial that it only makes sense if you postulate, by fiat, that full rewrite in another language is not an option.
And don't forget that backers may, at some point, just throw up in towel, unable to deal with excruciating inability to change anything of substance.
Apple have Swift, Google is still picking between Rust and Carbon, but eventual decision would be one or another, but, notably, not C or C++, Microsoft seems to think about Rust too… so who would be left to advance C and C++ after all players that actually wanted to change it would leave?
> But it's not been forgotten.The question is not whether it's forgotten or not, but if we can expect to see program with most pointers either market const
or noalias
.
And the simple answer, given the above example, is that one may spend maybe 10 or 20 years rewriting Linux kernel in Rust (yes, that's big work, but a journey of a thousand miles begins with a single step)… or go with with C – and then never achieve that. Simply because in 50 or 100 year, well before C would become ready to adopt such paradigm, everything would be rewritten in something else than C anyway.
Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice.
> Having researched into string APIs for some years almost half of my workday, returning a pointer is usually more useful.No, it's not. It's only “more useful” if you insist on zero-string abominations. If your strings are proper slices (or standalone strings on the heap… only C conflates them, C++, Rust and even such languages as Java and C# have separate types) then returning pointer is not useful. You either need to have generic type that returns slice or return index. And returning index is more flexible.
> You just need to get right the issue with qualifiers.No, you also need to guarantee that C would continue to be used. That's a tall order.
I wonder why no one ever made proper C/C++ replacement (as in: language that is designed to interoperate with C/C++ but is not built on top of C/C++ core) before Rust… but now, when it's done, we may finally face the question about why should we continue to support strange and broken C semantics with null-terminated strings… invent crazy schemes, CPU extensions – all to support something that shouldn't have existed in the first place.
That's not the question for the next 3-5 years, but in 10 years… when world would separate into competing factions… it would be interesting to see if any of them would stay faithful to C/C++ and what would they pick up instead to develop “sovereign software lands”.
Posted Feb 11, 2025 23:22 UTC (Tue)
by alx.manpages (subscriber, #145117)
[Link] (10 responses)
Not one. The entire libc has been updated in that direction.
> How quickly would it propagate through project of Linux size, at this rate?
The problem was devising the way to do it properly. Once we have that, the idea can be propagated more easily. It's not like you need a decade to update each one function.
The kernel can probably implement this for their internal APIs pretty easily. The kernel already supports C11, so all the pieces are there. The bottleneck is in committer and reviewer time.
> Simply because it's hard to find about under 40 (let alone anyone under 30) who may even want to touch C if they have a choice.
I'm 31. I hope to continue using C for many decades. :)
> It's only “more useful” if you insist on zero-string abominations.
I do enjoy NUL-terminated strings, yes, which is why I find returning pointers more useful. The problems with strings, why they have been blamed for so long, wasn't really fault of strings themselves, but of the language, which wasn't as expressive as it could be. That's changing.
Posted Feb 11, 2025 23:45 UTC (Tue)
by khim (subscriber, #9252)
[Link] (7 responses)
You certainly would be able to do that, after all Cobol 2023 and Fortran 2023 both exist. The question is how many outside of “Enterprise” (where nothing is updated… ever, except when it breaks down and falls apart completely) would care. The problem with strings are precisely strings. It's not even the fact that NUL is forbidden to be used inside (after all Rust strings are guaranteed UTF-8 inside). The problem lies with the fact that something that should be easy and simple (just look in register and you know the length) is incredibly hard with null-terminated strings. It breaks speculations, requires special instructions (like what SSE4.2 added or “no fault vector load”, used by RISC-V), plays badly with many algorithms (why operation that shouldn't change anything in memory at all – like splitting string in two – ever changes anything?). Null-terminated strings are not quite up there with the billion dollar mistake but they are very solid contenders for the 2nd place. Try again. _Generic is C11 and tgmath is C99. Means were there for 12 or 24 years (depending on how you are counting), there was just no interest… till 100% guaranteed job-security stance that C would never be replaced (simply because all prospective “C killers” were either built around C core or were unable to support effective interop with C) was threatened by Rust and Swift. Then and only then wheels started moving… but I'm pretty sure they would pretty soon be clogged again… when it would be realized that on one side only legacy projects are interested in using C anyway and the other side the majority of legacy projects just don't care to change anything unless they are forced to do that. Yeah, but that's precisely the issue: while existing kernel developers may want to perform such change they are already already overworked and overstressed… and newcomers, normally, want nothing to do with C. I guess the fact that exceptions like you exist gives it a chance… but it would be interesting to see how it'll work. Kernel is one of the few projects that can actually pull that off.
Posted Feb 13, 2025 8:46 UTC (Thu)
by aragilar (subscriber, #122569)
[Link] (6 responses)
Posted Feb 13, 2025 9:59 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (5 responses)
Posted Feb 13, 2025 10:28 UTC (Thu)
by khim (subscriber, #9252)
[Link] (4 responses)
How would backports hurt anyone? Sure, you can only use GCC 12 on RHEL 7, but that beast was released more than ten years ago, before first version of Rust, even! Sure, at some point backporting stops, but I don't think the hold ups are “enterprise distros” (at least not RHEL specifically): these, at least, provide some updated toolchains. GCC 12 was released in a year 2022, thus it's pretty modern, by C standards. “Community distros” don't bother, most of the time.
Posted Feb 14, 2025 14:26 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (3 responses)
Posted Feb 14, 2025 14:29 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
Who even cares what they use for he development of the platform itself? Developers shouldn't even care about that, it's internal implementations detail.
Posted Feb 17, 2025 8:49 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Feb 17, 2025 9:15 UTC (Mon)
by khim (subscriber, #9252)
[Link]
How is that relevant? Linux kernel was all too happy to adopt features not implemented by clang, and patches needed to support clang – and clang, at that point, was already used by Android, the most popular Linux distrubution used by billions… why RHEL should be treated differently? Let RHEL developers decide what to do with their kernel: they can create special kgcc package (like they already did years ago) or rework features in any way they like.
Posted Feb 12, 2025 6:14 UTC (Wed)
by interalia (subscriber, #26615)
[Link] (1 responses)
In theory the kernel could switch easily enough given review time as you say, but would doing this also require bumping the required compiler version for the kernel? If so I'm not sure if they would feel safe for doing so for quite a few years, and Rust would also advance in the meantime.
Posted Feb 12, 2025 8:41 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link]
Any compiler that supports C11 should be able to support these.
Here's an example of how to write such a const-generic API:
```
#define my_strchr(s, c) \
Posted Feb 12, 2025 11:17 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (19 responses)
Actually, I worked at a project that used counted strings (not terminated by a NUL, unless we needed to pass them to syscalls), and even there, functions returning a pointer were overwhelmingly more used than ones returning a count.
Consider the creation of a counted string:
```
Equivalent code that uses a count would be more complex (and thus more unsafe):
```
Posted Feb 12, 2025 12:04 UTC (Wed)
by excors (subscriber, #95769)
[Link] (17 responses)
sds s = sdsempty();
(and in the unlikely event that you're doing a lot of concatenation and really care about minimising malloc calls, you can add `s = sdsMakeRoomFor(s, sdslen(s1) + sdslen(s2) + sdslen(s3));` near the top). That makes it both simpler and safer than the original code. You should never be directly manipulating the length field.
(Of course in almost all other languages the equivalent code would be `s = s1 + s2 + s3;` which is even more simpler and safer.)
Posted Feb 12, 2025 12:40 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (16 responses)
I disagree with the last sentence. It was true in the past, without powerful static analyzers. Managed memory within APIs hides information to the compiler (and static analyzer), and thus provides less safety overall, provided that you have a language expressive enough and a static analyzer powerful enough to verify the program.
Consider the implementation of mempcpy(3) as a macro around memcpy(3) (or an equivalent inline function that provides the same information to the compiler):
#define mempcpy(dst, src, n) (memcpy(dst, src, n) + n)
A compiler (which knows that memcpy(3) returns the input pointer unmodified; this could be expressed for arbitrary APIs with an attribute in the future, but for now the compiler knows memcpy(3) magically) can trace all offsets being applied to the pointer 'p', and thus enforce array bounds statically. You don't need dynamic verification of the code.
With a managed string like you propose, you're effectively blinding the compiler from all of those operations. You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.
Posted Feb 12, 2025 12:49 UTC (Wed)
by khim (subscriber, #9252)
[Link] (15 responses)
Why? What's the difference? If everything is truly “static enough” then managed string can be optimized away. That's not a theory, if you would look on Rust's example then temporary string is completely elided and removed from the generated code, C compiler (which is, essentially, the exact same compiler) should be able to do the same. So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library. Haven't the history of Linux kernel fuzzing shown us that this approach simply doesn't work?
Posted Feb 12, 2025 13:04 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (14 responses)
I personally use NUL-terminated strings because they require less (almost none) ad-hoc code. I'm working on a hardened string library based on <string.h>, providing some higher-level abstractions that preclude the typical bugs.
<https://github.com/shadow-maint/shadow/tree/master/lib/st...>
> Why? What's the difference?
Complexity. Yes, you can write everything inline and let the compiler analyze it. But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives). You can't beat the simplicity of <string.h> in that regard.
Posted Feb 12, 2025 13:16 UTC (Wed)
by khim (subscriber, #9252)
[Link] (13 responses)
Nope. Things don't work like that. Smaller API may help human to manually optimize things, because humans are awfully bad at keeping track of hundreds and thousands of independent variables, but really good at finding non-trivial dependencies between few of them. Compiler optimizer is the exact opposite: it doesn't have smarts to glean all possible optimizations from a tiny, narrow, API, but it's extremely good at finding and eliminating redundant calculations in different pieces on thousands lines of code. Possibly. And if your goal is something extremely tiny (like code for a smallest possible microcontrollers) then this may be a good choice (people have successfully used Rust on microcontrollers, but usually without standard library since it's too bit for them). But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?
Posted Feb 12, 2025 13:27 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (12 responses)
I never cared about optimized code. I only care about correct code. C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.
There's a reason why -fanalyzer works reasonably well in C and not in C++. All of that complexity triggers many false positives and negatives. Not being able to run -fanalyzer in C++ makes it a less safe language, IMO.
The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.
> But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?
Safety.
My string library has helped find and fix many classes of bugs (not just instances of bugs) from shadow-utils. It's a balance between not adding much complexity (not going too high-level), but going high enough that you get rid of the common classes of bugs, such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.
You'd have a hard time introducing bugs with this library. Theoretically, it's still possible, but the library makes it quite difficult.
Posted Feb 12, 2025 13:53 UTC (Wed)
by khim (subscriber, #9252)
[Link] (2 responses)
Then why are you even using C and why do we have this discussion? No, it's not. The fact that we have complicated things like browsers implemented in C++ but nothing similar was ever implemented in C is proof enough of that. C++ may not be as efficient than C (especially if we care about size and memory consumption) but it's definitely safer. But if you don't care about efficiency then any memory safe language would do better! Even BASIC! Why do you care about analyzer if alternative is to use something that simply makes most things that analyzer can detect impossible. Or even something like WUFFS if you need extra assurances? But again: all these tricks are important if your goal is speed first, safety second. If you primary goal is safety then huge range of languages from Ada to Haskell and even Scheme would be safer. These are all examples of bugs that any memory-safe language simply wouldn't allow. C++ would allow it, of course, but that's because C++ was designed to be “as fast C but safer”… one may discuss about if it achieved it or not, but if you don't target “as fast C” bucket then there are bazillion languages that are safer.
Posted Feb 13, 2025 10:43 UTC (Thu)
by alx.manpages (subscriber, #145117)
[Link] (1 responses)
Posted Feb 13, 2025 19:05 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
No, you don't. No human can keep track of all of the C pitfalls in non-trivial code.
Even the most paranoid DJB code for qmail had root holes, and by today's standards it's not a large piece of software.
Posted Feb 14, 2025 23:30 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Yes, I agree. However, IIRC, it is because its main author (David Malcolm) is vastly more familiar with C than C++. Clang also has something like it in some of its `clang-tidy` checks, but I agree that GCC's definitely has a different set of things it covers, so they can coexist nicely.
Posted Feb 15, 2025 0:12 UTC (Sat)
by mb (subscriber, #50428)
[Link] (7 responses)
Why not use an interpreted language then?
>My string library has helped find and fix many classes of bugs ...
Sure. Thanks for that.
>but the library makes it quite difficult.
Modern languages make it about impossible.
Posted Feb 15, 2025 0:24 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (6 responses)
Because C is my "mother tongue" regarding computers. I can write it much better than other languages, just like I can speak Valencian better than other --possibly easier-- languages.
Posted Feb 15, 2025 0:51 UTC (Sat)
by mb (subscriber, #50428)
[Link] (5 responses)
That explains your "reasoning" indeed.
Posted Feb 15, 2025 22:29 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (4 responses)
Posted Feb 15, 2025 22:40 UTC (Sat)
by mb (subscriber, #50428)
[Link] (3 responses)
Posted Feb 15, 2025 23:05 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link] (2 responses)
Why did you put it in quotes? Were you implying that my reasoning is inferior than yours? Isn't that offensive? Please reconsider your language.
> And because "I always did it like this" isn't a reasoning that helps in discussions.
It is, IMO. I'm not a neurologist. Are you? I'm not a expert in how people learn languages and how learning secondary languages isn't as easy as learning a mother tongue. But it is common knowledge that one can speak much better their mother tongue than languages learned after it. It should be those that argue the opposite, who should justify.
Or should I take at face value that I learnt the wrong language, and that somehow learning a different one will magically make me write better *without regressions*? What if it doesn't? And why should I trust you?
Posted Feb 15, 2025 23:12 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
I will from now on block you here on LWN and anywhere else.
Posted Feb 15, 2025 23:31 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link]
Okay. You don't need to. Just asking me to not talk to you would work just fine. I won't, from now on. I won't block you, though.
Posted Feb 12, 2025 12:15 UTC (Wed)
by khim (subscriber, #9252)
[Link]
But why would you need all that complexity? If you work with strings a lot… wouldn't you have convenience methods? It Rust you would write something like this: Sure, NUL-terminated strings are a bed design from the beginning to the end, but also The only justification for that design is the need to produce something decent without optimizing compiler and in 16KB (or were they up to 128KB by then?) of RAM. Today you have more RAM in your subway ticket and optimizing compilers exist, why stick to all this manual manipulations where none are needed?
nullability annotations in C
> I'm 31. I hope to continue using C for many decades. :)
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
alx@devuan:~/tmp$ cat strchr.c
const char *my_const_strchr(const char *s, int c);
char *my_nonconst_strchr(char *s, int c);
( \
_Generic(s, \
char *: my_nonconst_strchr, \
void *: my_nonconst_strchr, \
const char *: my_const_strchr, \
const void *: my_const_strchr \
)(s, c) \
)
alx@devuan:~/tmp$ gcc -Wall -Wextra -pedantic -S -std=c11 strchr.c
alx@devuan:~/tmp$
```
nullability annotations in C
>
> No, it's not. It's only “more useful” if you insist on zero-string abominations.
> If your strings are proper slices (or standalone strings on the heap…
> only C conflates them, C++, Rust and even such languages as Java and C# have separate types)
> then returning pointer is not useful.
> You either need to have generic type that returns slice or return index.
> And returning index is more flexible.
s.str = malloc(s1.len + s2.len + s3.len);
p = s.str;
p = mempcpy(p, s1.str, s1.len);
p = mempcpy(p, s2.str, s2.len);
p = mempcpy(p, s3.str, s3.len);
s.len = p - s.str;
```
s.str = malloc(s1.len + s2.len + s3.len);
s.len = 0;
s.len += foo(s.str + s.len, s1.str, s1.len);
s.len += foo(s.str + s.len, s2.str, s2.len);
s.len += foo(s.str + s.len, s3.str, s3.len);
```
nullability annotations in C
s = sdscatsds(s, s1);
s = sdscatsds(s, s2);
s = sdscatsds(s, s3);
sdsfree(s);
nullability annotations in C
> With a managed string like you propose, you're effectively blinding the compiler from all of those operations.
nullability annotations in C
nullability annotations in C
> But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives).
nullability annotations in C
nullability annotations in C
> I never cared about optimized code.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
Modern languages do that for free, though.
nullability annotations in C
nullability annotations in C
nullability annotations in C
nullability annotations in C
And because "I always did it like this" isn't a reasoning that helps in discussions.
nullability annotations in C
nullability annotations in C
>Please reconsider your language.
nullability annotations in C
nullability annotations in C
[str1, str2, str3].concat().into_boxed_str()
And that's it. In C-like language that doesn't use “dot” to chain functions it would be something like:
string_to_frozen_string(concat_strings(str1, str2, str2))
Or, maybe, even just
concat_strings(str1, str2, str2)
string.h
interface is awful, as whole.