Footguns

Posted Jul 19, 2021 13:16 UTC (Mon) by mrugiero (guest, #153040)
In reply to: Footguns by farnz
Parent article: Rust for Linux redux

> There are two interpretations here; either forty is NULL, or it isn't. If forty is NULL, then ptr->value; is UB, and thus the code snippet invokes UB. If forty is not NULL, then ptr->value is not UB, and so the compiler reasons that !forty must be false, always.
AFAICT, the logical conclusion here is that every pointer argument can be NULL unless proved otherwise. Since you can't prove that for non-static functions, that's necessarily UB. This is why many compilers have extensions to explicitly tell the compiler an argument can never be NULL.

> In this particular example, the compiler's reasoning chain (and the user's mistake) is fairly obvious, and it wouldn't be challenging to have a warning that says "hey, !forty is always false because ptr->value would be UB otherwise, so this if statement is not meaningful", and have the user spot the error from there, but it gets more challenging with complex code. And neither compiler developers nor standard committees seem willing to say that good C compilers give a friendly warning when they find a codepath that invokes UB - in part because it'd trigger a lot of noise on legacy codebases.
Which was exactly my point. Besides, the point of triggering noise is moot IMO, since you can always add a flag to ignore the warning like with any other warning.

Footguns

Posted Jul 19, 2021 13:27 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

> > There are two interpretations here; either forty is NULL, or it isn't. If forty is NULL, then ptr->value; is UB, and thus the code snippet invokes UB. If forty is not NULL, then ptr->value is not UB, and so the compiler reasons that !forty must be false, always.

> AFAICT, the logical conclusion here is that every pointer argument can be NULL unless proved otherwise. Since you can't prove that for non-static functions, that's necessarily UB. This is why many compilers have extensions to explicitly tell the compiler an argument can never be NULL.

This is why the function included the check "if (!forty) { return; }"; after that test, forty is provably not NULL, and the compiler can optimize knowing that forty is not NULL. The gotcha is that the user invoked UB before that check by dereferencing ptr, and the compiler used that information to know that forty cannot be NULL after the dereference, because if it is, UB is invoked.

> > In this particular example, the compiler's reasoning chain (and the user's mistake) is fairly obvious, and it wouldn't be challenging to have a warning that says "hey, !forty is always false because ptr->value would be UB otherwise, so this if statement is not meaningful", and have the user spot the error from there, but it gets more challenging with complex code. And neither compiler developers nor standard committees seem willing to say that good C compilers give a friendly warning when they find a codepath that invokes UB - in part because it'd trigger a lot of noise on legacy codebases.

> Which was exactly my point. Besides, the point of triggering noise is moot IMO, since you can always add a flag to ignore the warning like with any other warning.

And my point is that this is entirely a social problem - nobody with the power to say "compilers that use UB to optimize must tell the user what UB the compiler is exploiting" is willing to do so. If compiler writers for C were doing that, even if it was the preserve of the very best C compilers (so GCC and CLang), it'd be enough to reduce the number of developers who get surprised by compiler use of UB.

Footguns

Posted Jul 19, 2021 18:57 UTC (Mon) by mrugiero (guest, #153040) [Link] (1 responses)

> This is why the function included the check "if (!forty) { return; }"; after that test, forty is provably not NULL, and the compiler can optimize knowing that forty is not NULL. The gotcha is that the user invoked UB before that check by dereferencing ptr, and the compiler used that information to know that forty cannot be NULL after the dereference, because if it is, UB is invoked.
Of course. My point was that _on entry_ it isn't provably not NULL, so NULL dereference is a possibility in that line, and thus the conclusion should consistently that the first dereference is UB. My point is that there's no true ambiguity about that.

> And my point is that this is entirely a social problem - nobody with the power to say "compilers that use UB to optimize must tell the user what UB the compiler is exploiting" is willing to do so. If compiler writers for C were doing that, even if it was the preserve of the very best C compilers (so GCC and CLang), it'd be enough to reduce the number of developers who get surprised by compiler use of UB.
So we agree, then?

Footguns

Posted Jul 19, 2021 21:10 UTC (Mon) by farnz (subscriber, #17727) [Link]

And the problem with your point is that C is not very amenable to whole program analysis (for a variety of reasons to do with only having a global and a local static namespace, plus using textual inclusion instead of modules), which means that there's a lot of legacy code where the first dereference is not UB. It's only UB if the program execution can call ub with forty being NULL, which is not guaranteed. If the rest of the code ensures that ub is always called with a valid pointer to configs, then no UB is present.

As I've pointed out in another comment, that check could be the consequence of a macro expansion - in which case, pessimising the output machine code because you've used a macro to access configs isn't a good look - there was no UB before, now you're adding code that verifies something that the human writing the code has already verified is true outside ub, and the use of a macro has been sufficient to introduce extra machine instructions.

This is part of why it's a damn hard social problem - it's possible that ub is only called from code in 3 different C files, and all of those files already check for NULL before calling ub, so in the actual final application, there is no UB. The compiler using the fact that the check can only be true and cause an early return if UB has already happened to optimize ub simply speeds things up at no cost.