|
|
Subscribe / Log in / New account

nullability annotations in C

nullability annotations in C

Posted Feb 12, 2025 12:40 UTC (Wed) by alx.manpages (subscriber, #145117)
In reply to: nullability annotations in C by excors
Parent article: Maintainer opinions on Rust-for-Linux

> (and in the unlikely event that you're doing a lot of concatenation and really care about minimising malloc calls, you can add `s = sdsMakeRoomFor(s, sdslen(s1) + sdslen(s2) + sdslen(s3));` near the top). That makes it both simpler and safer than the original code.

I disagree with the last sentence. It was true in the past, without powerful static analyzers. Managed memory within APIs hides information to the compiler (and static analyzer), and thus provides less safety overall, provided that you have a language expressive enough and a static analyzer powerful enough to verify the program.

Consider the implementation of mempcpy(3) as a macro around memcpy(3) (or an equivalent inline function that provides the same information to the compiler):

#define mempcpy(dst, src, n) (memcpy(dst, src, n) + n)

A compiler (which knows that memcpy(3) returns the input pointer unmodified; this could be expressed for arbitrary APIs with an attribute in the future, but for now the compiler knows memcpy(3) magically) can trace all offsets being applied to the pointer 'p', and thus enforce array bounds statically. You don't need dynamic verification of the code.

With a managed string like you propose, you're effectively blinding the compiler from all of those operations. You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.


to post comments

nullability annotations in C

Posted Feb 12, 2025 12:49 UTC (Wed) by khim (subscriber, #9252) [Link] (15 responses)

> With a managed string like you propose, you're effectively blinding the compiler from all of those operations.

Why? What's the difference? If everything is truly “static enough” then managed string can be optimized away. That's not a theory, if you would look on Rust's example then temporary string is completely elided and removed from the generated code, C compiler (which is, essentially, the exact same compiler) should be able to do the same.

> You're blindly handling the trust into the string library. If the library has a bug, you'll suffer it. But also, if you misuse the library, you'll have no help from the compiler.

So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library.

Haven't the history of Linux kernel fuzzing shown us that this approach simply doesn't work?

nullability annotations in C

Posted Feb 12, 2025 13:04 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (14 responses)

> So you would trust your ad-hoc code, but wouldn't trust widely tested and reviewed library.

I personally use NUL-terminated strings because they require less (almost none) ad-hoc code. I'm working on a hardened string library based on <string.h>, providing some higher-level abstractions that preclude the typical bugs.

<https://github.com/shadow-maint/shadow/tree/master/lib/st...>

> Why? What's the difference?

Complexity. Yes, you can write everything inline and let the compiler analyze it. But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives). You can't beat the simplicity of <string.h> in that regard.

nullability annotations in C

Posted Feb 12, 2025 13:16 UTC (Wed) by khim (subscriber, #9252) [Link] (13 responses)

> But the smaller the APIs are, the less work you impose on the analyzer, and thus the more effective the analysis is (less false negatives and positives).

Nope. Things don't work like that. Smaller API may help human to manually optimize things, because humans are awfully bad at keeping track of hundreds and thousands of independent variables, but really good at finding non-trivial dependencies between few of them.

Compiler optimizer is the exact opposite: it doesn't have smarts to glean all possible optimizations from a tiny, narrow, API, but it's extremely good at finding and eliminating redundant calculations in different pieces on thousands lines of code.

> You can't beat the simplicity of <string.h> in that regard.

Possibly. And if your goal is something extremely tiny (like code for a smallest possible microcontrollers) then this may be a good choice (people have successfully used Rust on microcontrollers, but usually without standard library since it's too bit for them). But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?

nullability annotations in C

Posted Feb 12, 2025 13:27 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (12 responses)

> Compiler optimizer is the exact opposite

I never cared about optimized code. I only care about correct code. C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.

There's a reason why -fanalyzer works reasonably well in C and not in C++. All of that complexity triggers many false positives and negatives. Not being able to run -fanalyzer in C++ makes it a less safe language, IMO.

The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.

> But using these for anything intended to be used on “big” CPUs with caches measured in megabytes? Why?

Safety.

My string library has helped find and fix many classes of bugs (not just instances of bugs) from shadow-utils. It's a balance between not adding much complexity (not going too high-level), but going high enough that you get rid of the common classes of bugs, such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.

You'd have a hard time introducing bugs with this library. Theoretically, it's still possible, but the library makes it quite difficult.

nullability annotations in C

Posted Feb 12, 2025 13:53 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> I never cared about optimized code.

Then why are you even using C and why do we have this discussion?

> C++ claims to be safer than C, among other things by providing (very-)high-level abstractions in the library. I think that's a fallacy.

No, it's not. The fact that we have complicated things like browsers implemented in C++ but nothing similar was ever implemented in C is proof enough of that.

C++ may not be as efficient than C (especially if we care about size and memory consumption) but it's definitely safer.

But if you don't care about efficiency then any memory safe language would do better! Even BASIC!

> The optimizer might be happy with abstractions, but the analyzer not so much. I care about the analyzer.

Why do you care about analyzer if alternative is to use something that simply makes most things that analyzer can detect impossible. Or even something like WUFFS if you need extra assurances?

But again: all these tricks are important if your goal is speed first, safety second. If you primary goal is safety then huge range of languages from Ada to Haskell and even Scheme would be safer.

> such as off-by-ones with strncpy(3), or passing an incorrect size to snprintf(3), with for example a macro that automagically calculates the size from the destination array.

These are all examples of bugs that any memory-safe language simply wouldn't allow. C++ would allow it, of course, but that's because C++ was designed to be “as fast C but safer”… one may discuss about if it achieved it or not, but if you don't target “as fast C” bucket then there are bazillion languages that are safer.

nullability annotations in C

Posted Feb 13, 2025 10:43 UTC (Thu) by alx.manpages (subscriber, #145117) [Link] (1 responses)

This reminds me of Esperanto. Such a great language that everybody should learn it. If it works for you, that's great, but please don't tell me which language is safer _for me_. I know better.

nullability annotations in C

Posted Feb 13, 2025 19:05 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> This reminds me of Esperanto. Such a great language that everybody should learn it. If it works for you, that's great, but please don't tell me which language is safer _for me_. I know better.

No, you don't. No human can keep track of all of the C pitfalls in non-trivial code.

Even the most paranoid DJB code for qmail had root holes, and by today's standards it's not a large piece of software.

nullability annotations in C

Posted Feb 14, 2025 23:30 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> There's a reason why -fanalyzer works reasonably well in C and not in C++.

Yes, I agree. However, IIRC, it is because its main author (David Malcolm) is vastly more familiar with C than C++. Clang also has something like it in some of its `clang-tidy` checks, but I agree that GCC's definitely has a different set of things it covers, so they can coexist nicely.

nullability annotations in C

Posted Feb 15, 2025 0:12 UTC (Sat) by mb (subscriber, #50428) [Link] (7 responses)

>I never cared about optimized code. I only care about correct code.

Why not use an interpreted language then?

>My string library has helped find and fix many classes of bugs ...

Sure. Thanks for that.
Modern languages do that for free, though.

>but the library makes it quite difficult.

Modern languages make it about impossible.

nullability annotations in C

Posted Feb 15, 2025 0:24 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (6 responses)

> Why not use an interpreted language then?

Because C is my "mother tongue" regarding computers. I can write it much better than other languages, just like I can speak Valencian better than other --possibly easier-- languages.

nullability annotations in C

Posted Feb 15, 2025 0:51 UTC (Sat) by mb (subscriber, #50428) [Link] (5 responses)

>Because C is my "mother tongue"

That explains your "reasoning" indeed.

nullability annotations in C

Posted Feb 15, 2025 22:29 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (4 responses)

Why is "reasoning" quoted?

nullability annotations in C

Posted Feb 15, 2025 22:40 UTC (Sat) by mb (subscriber, #50428) [Link] (3 responses)

Because I put it in quotes.
And because "I always did it like this" isn't a reasoning that helps in discussions.

nullability annotations in C

Posted Feb 15, 2025 23:05 UTC (Sat) by alx.manpages (subscriber, #145117) [Link] (2 responses)

> Because I put it in quotes.

Why did you put it in quotes? Were you implying that my reasoning is inferior than yours? Isn't that offensive? Please reconsider your language.

> And because "I always did it like this" isn't a reasoning that helps in discussions.

It is, IMO. I'm not a neurologist. Are you? I'm not a expert in how people learn languages and how learning secondary languages isn't as easy as learning a mother tongue. But it is common knowledge that one can speak much better their mother tongue than languages learned after it. It should be those that argue the opposite, who should justify.

Or should I take at face value that I learnt the wrong language, and that somehow learning a different one will magically make me write better *without regressions*? What if it doesn't? And why should I trust you?

nullability annotations in C

Posted Feb 15, 2025 23:12 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

>> Because I put it in quotes.
>Please reconsider your language.

I will from now on block you here on LWN and anywhere else.

nullability annotations in C

Posted Feb 15, 2025 23:31 UTC (Sat) by alx.manpages (subscriber, #145117) [Link]

> I will from now on block you here on LWN and anywhere else.

Okay. You don't need to. Just asking me to not talk to you would work just fine. I won't, from now on. I won't block you, though.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds