Idiom exclusion is really so important

Posted Nov 13, 2024 8:06 UTC (Wed) by error27 (subscriber, #8346)
Parent article: Progress on toolchain security features

Whenever there is a compiler warning, hopefully, the warnings are mostly useful but there are always going to be some which are not useful. The idiom exclusion feature cuts down on the non-useful warnings.

For example, it's mostly useful to have a warning about unsigned comparisons with zero. But when the code looks has both and upper and lower bound like this: "if (x < 0 || x >= 10) {", then the warning is not useful. The code works 100% as intended.

There are different strategies you could use to silence the warning. In the unsigned example, you could just delete the "x < 0" comparison. But that's work which provides no benefit and arguably makes the code slightly less readable.

For the integer overflows, presumably you would add a annotation like "if (wrap_ok(x + y) < x)". And actually I don't hate that too much... But probably other people do, and it's also a lot of work to do it retroactively over a giant project like the kernel. It could be avoided if we use idiom exclusions.

I understand why the GCC developers don't like idiom exclusions but compiler warnings are already a mostly random collection of warnings and they already miss 95% of bugs. If the code looks fine, then it's okay to not print a warning.

Idiom exclusion is really so important

Posted Nov 13, 2024 9:31 UTC (Wed) by pm215 (subscriber, #98099) [Link]

Coverity has a particularly irritating version of that "unsigned comparison with zero in a bounds check" warning, where it will do so when the type being bounds checked is an enum. Unless I have got confused (not unlikely where C standard minutiae are concerned :-)) enums with no negative values may be signed or unsigned as the compiler chooses, so for portability the greater-than-or-equal-to-zero check is necessary, but Coverity gripes about it because the implementation happens to have used an unsigned type for the enum...

Idiom exclusion is really so important

Posted Nov 13, 2024 9:44 UTC (Wed) by magfr (subscriber, #16052) [Link] (19 responses)

I have a hand time understanding why

x+y<x

is considered easier to read and better than

INT_MAX-y<=x

Sure, the former is type agnostic and naïvely generates smaller code but I would expect both to end up checking the flag result of x+y.

Idiom exclusion is really so important

Posted Nov 13, 2024 10:31 UTC (Wed) by intelfx (subscriber, #130118) [Link] (7 responses)

> I have a hand time understanding why
>
> x+y<x
>
> is considered easier to read and better than
>
> INT_MAX-y<=x

It is considered better in the eyes of the language lawyers. Which, apparently, is enough to trump any kind of readability concerns.

I'd ask a different question, though. With the amount of compiler extensions already part of "kernel C" (some of which are even in this article), I'm having a hard time understanding reasons for not adding a `__builtin_wraps(expr)` kind of thing that explicitly evaluates `expr` under `-fwrapv` rules and tests that for overflow.

Surely an `if (__builtin_wraps(a + b))` is more readable than both `if (a + b < a)` and `if (INT_MAX - b <= a)`?

Idiom exclusion is really so important

Posted Nov 13, 2024 10:36 UTC (Wed) by pm215 (subscriber, #98099) [Link] (5 responses)

The compilers have had that for a while: see __builtin_add_overflow_p in https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Built...

Idiom exclusion is really so important

Posted Nov 13, 2024 10:40 UTC (Wed) by intelfx (subscriber, #130118) [Link] (4 responses)

Nope. Sorry, these builtins are a just a planetary abomination. The identity of the arithmetic operation is completely erased, making it absolutely in-obvious that it's in fact an arithmetic operation rather than a random function call.

It needs to be a `__some_builtin(expr)`, where `expr` is a **regular** arithmetic expression using **regular** + and - symbols.

Idiom exclusion is really so important

Posted Nov 13, 2024 10:50 UTC (Wed) by pm215 (subscriber, #98099) [Link] (2 responses)

I certainly wouldn't use the naked builtin directly, but that's true of most GCC builtins. But I'll absolutely take "looks like a function call and has 'add' as a word rather than '+' as a symbol, but is extremely obviously doing an overflow check" over "has '+' in the expression but is ridiculously opaque about what it is actually doing", especially when the thing it is doing is important for correctness and often for security.

No objections if you want to get nicer facilities added to the compiler, but in the interim I'll use the ones we have. And if I did have to work with a compiler where I had to use "x + y < x" I would wrap it in a function so I could give it a clearer name...

Idiom exclusion is really so important

Posted Nov 13, 2024 11:09 UTC (Wed) by intelfx (subscriber, #130118) [Link] (1 responses)

> But I'll absolutely take "looks like a function call and has 'add' as a word rather than '+' as a symbol, but is extremely obviously doing an overflow check" over "has '+' in the expression but is ridiculously opaque about what it is actually doing"

Who was saying anything about being opaque?

I just proposed a `__builtin_overflows(expr)` thing which makes it equally obvious that it is a test for overflow. **And** that it is an arithmetic expression.

All those builtins disguising arithmetic operations as function calls — no. Just no. The arithmetics are infix for a reason. This is C, not some kind of a Lisp or a reverse Forth.

Idiom exclusion is really so important

Posted Nov 13, 2024 11:20 UTC (Wed) by pm215 (subscriber, #98099) [Link]

You seemed to me to be proposing that until this suggestion of yours is implemented in compilers we should continue to open-code "if (x + y < x)". That open-coding is what I am describing as opaque.

Your idea is clearly nicer than the existing builtins, but we don't have it yet and in the best case won't have it widely available for years, so the question of what is most legible and least bug prone given current compiler facilities still matters, I think.

Idiom exclusion is really so important

Posted Nov 13, 2024 11:01 UTC (Wed) by error27 (subscriber, #8346) [Link]

Yeah. That would be nice. My idea is that it would be scope based. The compiler would trap any integer overflows and goto the label. I've left off \ characters and probably made typos but you get the idea:

#define saturate(math...) (__attribute__(overflow overflowed_label) {
size_t result;
bool overflow = false;
result = (math);
goto done;

overflowed_label:
overflow = true;
done:
overflow ? SIZE_MAX : result;
})

Idiom exclusion is really so important

Posted Nov 13, 2024 10:37 UTC (Wed) by intelfx (subscriber, #130118) [Link]

Ah, apologies, I apparently misread that. My brain thought that you just missed a "not" somewhere in there because I didn't realize someone could be asking _that_ question.

Well, `if (a + b < a)` preserves the identity of participating arithmetic operations. When you see a `if (a + b < a)`, it is immediately obvious that you are dealing with a sum of `a` and `b`, and if you are familiar with wrapping arithmetic, it's also almost immediately parsable as checking for overflow.

On the other hand, `if (INT_MAX - a <= b)` does not preserve neither the identity of the operation, nor the identity of operands. You need significant brain time to mentally carry over the operands to one side, invert the signs and realize that we are talking about `a + b`, not some weird arithmetic expression with unrelated `a`, `b` and a constant.

Personally, it's a no-brainer which one is clearer.

Idiom exclusion is really so important

Posted Nov 13, 2024 10:32 UTC (Wed) by pm215 (subscriber, #98099) [Link] (2 responses)

Personally I think something like add_would_overflow(x, y) is easier to read than both and less susceptible to "looks almost like the idiom but isn't quite right" errors. (And you can implement it via the compiler builtins.)

Idiom exclusion is really so important

Posted Nov 13, 2024 10:54 UTC (Wed) by adobriyan (subscriber, #30858) [Link] (1 responses)

Half of the time you want to both add and bail. Another half is to only check because operation in question will be done much later
and passing the result is cumbersome or not necessary.

The _right_ way is to not invent new idioms (which every project will do differently) but to use __builtin_add_overflow().

T len;
if (__builtin_add_overflow(a, b, &len)) { return -E; };
[use len]

or (because clang doesn't do BAO_p)

if (__builtin_add_overflow(a, b, &(int[1]){})) { return -E; };

Those who fear that BAO does addition differently and will make people break glass and cut hands may want to lobby for a warning if BAO is done on different types (especially with both different signedness _and_ length).

BAO is cool for checking if result fits into some other type:

if (__builtin_add_overflow(a, 0, &(T[1]){})

Just use BAO.

Idiom exclusion is really so important

Posted Nov 27, 2024 23:58 UTC (Wed) by jwakely (subscriber, #60262) [Link]

>The _right_ way is to not invent new idioms (which every project will do differently) but to use __builtin_add_overflow().

It's in C23 as chk_add so instead of inventing your own, write chk_add as a simple wrapper around bao and then everybody agrees on a standard API.

N.B. chk_add has the output pointer first, not last. Otherwise it's the same.

Idiom exclusion is really so important

Posted Nov 14, 2024 16:13 UTC (Thu) by anton (subscriber, #25547) [Link] (7 responses)

I would expect both to end up checking the flag result of x+y.

Looking at <2024Sep6.152642@mips.complang.tuwien.ac.at>, gcc-10 generates the "naïve" code (on AMD64 9 bytes for "b<b=1" and 14 bytes for "b==LONG_MIN"), while gcc-12 generates the longer code from both idioms. Neither checks the flag result. You have to explicitly use __builtin_sub_overflow(b,1,&c) to get that, and then both versions generate a 6-byte sequence using add instead of the 5-byte sequence using dec. Whatever goes on in the minds of the gcc maintainers, there seem to be things on their agenda that are more important than generating short and fast code.

Idiom exclusion is really so important

Posted Nov 14, 2024 16:51 UTC (Thu) by pizza (subscriber, #46) [Link] (6 responses)

> Whatever goes on in the minds of the gcc maintainers, there seem to be things on their agenda that are more important than generating short and fast code.

As the saying goes, "patches welcome"

Idiom exclusion is really so important

Posted Nov 15, 2024 8:53 UTC (Fri) by anton (subscriber, #25547) [Link] (5 responses)

Are you speaking for the gcc maintainers?

If not, what makes you think so? What makes you think that they would welcome a patch that reverts the change that pessimises the "b<b-1" case rather than exercising one of the justification mechanisms for their deeds, as they have done in other cases? Or simply ignoring the issue as in Bug 93811 where I provided working code (not a patch, though).

If yes, why should I do for free what most of you are paid to do? Especially given your lack of fulfilling your part of the deal, as discussed above.

Idiom exclusion is really so important

Posted Nov 15, 2024 12:15 UTC (Fri) by pizza (subscriber, #46) [Link] (4 responses)

> Are you speaking for the gcc maintainers?

I'm speaking as a maintainer of completely unrelated free software.

> If not, what makes you think so? What makes you think that they would welcome a patch that reverts the change that pessimises the "b<b-1" case rather than exercising one of the justification mechanisms for their deeds, as they have done in other cases? Or simply ignoring the issue as in Bug 93811 where I provided working code (not a patch, though).

What makes you think that complaining (and disparaging the GCC maintainers) on LWN is remotely productive or useful?

Everyone involved with any free software project has far, far more things on their to-do lists than they can possibly accomplish; consequently they prioritize the things they care about.

If the GCC maintainers' priorities differ from yours, it is incumbent upon *you* do step up and contribute in some way (eg code/patches, funding, or advocacy). Either way, disparaging the maintainers (and/or other active contributors) isn't going to accomplish anything productive.

> If yes, why should I do for free what most of you are paid to do? Especially given your lack of fulfilling your part of the deal, as discussed above.

There is no "deal" here, or at least not one with *you*.

...The ones that pay the maintainers (if they are being paid at all -- most aren't!) get to decide the priorities.

You don't like that? Patches welcome. Heck, you can even keep that patch entirely to yourself if the GCC maintainers don't like it.

"Patches welcome" means "Shut up!"

Posted Nov 15, 2024 13:07 UTC (Fri) by anton (subscriber, #25547) [Link] (3 responses)

What makes you think that complaining (and disparaging the GCC maintainers) on LWN is remotely productive or useful?

My comment was productive or useful by correcting the expectation of magfr.

Everyone involved with any free software project has far, far more things on their to-do lists than they can possibly accomplish; consequently they prioritize the things they care about.

Certainly. And, as I wrote, some gcc maintainer found the time to recognize the "b<b-1" idiom, and found the time to transform it into longer code even when gcc is invoked with -Os. So what makes you think that submitting a patch that reverts that change would achieve anything remotely productive or useful?

If the GCC maintainers' priorities differ from yours, it is incumbent upon *you* do step up and contribute in some way (eg code/patches, funding, or advocacy).

You think that I should provide a patch for a project maintained by people who will ignore the patch because their priorities differ from mine? Why should that be remotely productive or useful?

You think I should provide funding or advocacy for a project where the maintainers' priorities differ from mine? I can see that the gcc maintainers would like that, but why should I?

There is no "deal" here, or at least not one with *you*.

In that case, what did you want to say with "patches welcome"? As used by you it apparently means "I don't like what you are writing and I wish you would shut up, so I suggest a wild-goose chase". Your last paragraph makes that even clearer.

"Patches welcome" means "Shut up!"

Posted Nov 15, 2024 14:35 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> Certainly. And, as I wrote, some gcc maintainer found the time to recognize the "b<b-1" idiom, and found the time to transform it into longer code even when gcc is invoked with -Os. So what makes you think that submitting a patch that reverts that change would achieve anything remotely productive or useful?

Because there may be very good intentional reasons for that change in GCC? Or maybe this is an unintentional side effect of some other fix/improvement and can itself be fixed incrementally with sufficient elbow grease?

(I can say from experience that micro-optimizations like that rarely make a meaningful difference in code size/performance. Unless you're at the point of truly counting/shaving bytes, in which case you are going to have a real budget to throw at this)

> You think that I should provide a patch for a project maintained by people who will ignore the patch because their priorities differ from mine? Why should that be remotely productive or useful?

You would create the patch because it makes things that much better for *you*. Sharing/pushing it upstream is entirely optional.

> You think I should provide funding or advocacy for a project where the maintainers' priorities differ from mine? I can see that the gcc maintainers would like that, but why should I?

I meant advocate for *your* use case/problems, not advocate for the project in general. (And "advocate" means "persuade the audience", not "crap all over them")

But in a more general sense, very little out there is ever fully aligned with one's personal priorities. Welcome to life.

> In that case, what did you want to say with "patches welcome"? As used by you it apparently means "I don't like what you are writing and I wish you would shut up, so I suggest a wild-goose chase". Your last paragraph makes that even clearer.

...You're the one expecting other folks to care about and perform (potentially very specialized) work on your behalf because *someone else* is paying them to work on other use cases. That's some pretty serious entitlement.

"Patches welcome" means "Shut up!"

Posted Nov 15, 2024 18:41 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

I can say from experience that micro-optimizations like that rarely make a meaningful difference in code size/performance.

In that case, why have they made the change at all? Or many other changes? The usual claim is that each transformation by itself rarely makes any meaningful difference, but they add a whole lot of them, and together they usually result in smaller/faster code. And it's certain that you cannot generate smaller code by compiling an individual idiom into larger code.

You would create the patch because it makes things that much better for *you*. Sharing/pushing it upstream is entirely optional.

I would not know how. I distribute free software, i.e., as source code. How would a patch that only I use make things better for me, much less "much better"? Because I can then answer bug reports with "works for me"? That's not how I prefer to work.

You're the one expecting other folks to care about and perform (potentially very specialized) work on your behalf because *someone else* is paying them to work on other use cases.

What makes you think so? Concerning gcc, I have mostly given up on the idea that they feel bound by the social contract where users find and report bugs and gcc maintainers fix them, so no, I don't expect them to do anything, and therefore I have mostly stopped reporting gcc bugs.

Concerning people who pay them, I doubt that Intel pays for gcc development with the intention that they implement transformations that make the code longer when using -Os.

"Patches welcome" means "Shut up!"

Posted Nov 16, 2024 10:46 UTC (Sat) by joib (subscriber, #8541) [Link]

As a nowadays inactive maintainer of a small corner of gcc, and speaking only for myself, largely I agree with what "pizza" has said in this subthread. GCC is a large project with a lot of stakeholders, and nobody gets exactly what they want. Those who put in more, be it in terms of funding developers, or individual contributors putting in elbow grease to scratch their own itches, get out more than those who contribute less.

For some specific points you made:

> In that case, why have they made the change at all? Or many other changes? The usual claim is that each transformation by itself rarely makes any meaningful difference, but they add a whole lot of them, and together they usually result in smaller/faster code. And it's certain that you cannot generate smaller code by compiling an individual idiom into larger code.

Nobody intentionally pessimizes code, unless it's to fix some correctness issue. There's obviously no conspiracy against people with passionate opinions about generating optimal code for "b < b - 1". Without looking into specifics, my guess would be that this is an unintentional side-effect of some other work, and the effect it caused was insignificant in size/performance on the benchmark suites that are used to track the development of the compiler, mainly SPECcpu, but also other similar application benchmarks. Those are ultimately what people tend to care about, not some particular micro-optimization (unless that particular micro-optimization affects the size and/or performance of some larger application).

That is not to say micro-optimizations aren't useful; they certainly are used for analyzing the impact of some particular optimization passes etc. But again, ultimately what matters to users is the size/performance of complete applications. In some cases it can mean a code transformation that pessimizes some particular aspect (like generating optimal code for "b < b - 1") but makes some other code faster, for an ultimately positive result on SPEC, gets merged.

> Concerning gcc, I have mostly given up on the idea that they feel bound by the social contract where users find and report bugs and gcc maintainers fix them, so no, I don't expect them to do anything, and therefore I have mostly stopped reporting gcc bugs.

Like so many other projects, GCC has many more bug reports than it has developers able to fix them. Thus bugs have to be triaged, and if it so happens that your favourite bug isn't considered important enough to get the attention of the very limited developer time, then unfortunately it means it will languish. If you're unable to convince the developers that your bug is very important and needs attention, then other ways around that conundrum are to fix it yourself or pay someone to do it, hence 'patches welcome'. To the extent that can be construed as "Shut up!", it's pointing out that complaining about poor code generation on comp.arch, lwn, or some other forum, fun as it may be as a way to pass your free time, is unlikely to cause other people to step up and fix your issue.

> Concerning people who pay them, I doubt that Intel pays for gcc development with the intention that they implement transformations that make the code longer when using -Os.

Very true, but it's also likely true that out of the near infinite ways the compiler can be improved, Intel doesn't consider generating optimal code for "b < b - 1" to be particularly high on the list of priorities.