|
|
Subscribe / Log in / New account

Footguns

Footguns

Posted Jul 19, 2021 13:58 UTC (Mon) by farnz (subscriber, #17727)
In reply to: Footguns by mrugiero
Parent article: Rust for Linux redux

Trivial plain C counterexample, where the check is seen by the compiler, but the programmer did not intend to tell the compiler that the pointer could be NULL:


/* In a header file, for config handling */
struct configs {
/* Lots of pointers to config members here, omitted */
};

#define GET_STR_CONFIG_OR_DEFAULT(config, member, default) \
    (config ? config->member ? config->member : default : default)

#define GET_INT_CONFIG_OR_DEFAULT(config, member, default)
    (config ? config->member ; default)

/* In a C file, using the header file */
void do_amazing_things(void *config) {
    int rating = GET_INT_CONFIG_OR_DEFAULT(config, rating, 0);
    char * expertise = GET_STR_CONFIG_OR_DEFAULT(config, expertise, "novice");

    /* Do the amazing things */
}

In this case, the call to GET_INT_CONFIG_OR_DEFAULT (a macro) has promised that config is not NULL because it does an early return; a compiler that doesn't optimize this will be issuing repeated NULL checks for something that it knows must be non-NULL, or complaining because the user has an unnecessary NULL check in the expansion of GET_STR_CONFIG_OR_DEFAULT.

The code the compiler sees when the preprocessor hands it off is:


struct configs {
/* Lots of pointers to config members here, omitted */
};
/* Rest of headers etc */

void do_amazing_things(void *config) {
    int rating = (config ? config->member : 0);
    char * expertise = (config ? config->member ? "expertise" : "expertise");
}

Which leads you into the question of provenance of your "always true" if statement; in this case, the compiler has to know that it knows config is not NULL because the programmer did something that's an explicit check, not a "if it's NULL, then we have UB" case, otherwise you get a noisy warning because the programmer included an unnecessary NULL check in the macro. But they did that because that way the macro is safe to use even if it's only used once in a function, and if the user does do:


/* In a C file, using the header file */
void do_amazing_things(void *config) {
    int rating = ((struct configs*) config)-gt;rating;
    char * expertise = GET_STR_CONFIG_OR_DEFAULT(config, expertise, "novice");

    /* Do the amazing things */
}

then the compiler will need to warn because its "config is not NULL" assumption came from "if config is NULL, then there is UB in the execution". But first we need compiler authors to get onboard with the idea that whenever UB is invoked to permit an optimization (noting that this can require you to do complex tracking to get from your low-level IR (MachineInstr in the LLVM case, for example) all the way back to C, so that you can confirm that a given decision in codegen is not causing issues because it results in bad signed overflow behaviour (for example).


to post comments

Footguns

Posted Jul 19, 2021 19:11 UTC (Mon) by mrugiero (guest, #153040) [Link] (1 responses)

> In this case, the call to GET_INT_CONFIG_OR_DEFAULT (a macro) has promised that config is not NULL because it does an early return; a compiler that doesn't optimize this will be issuing repeated NULL checks for something that it knows must be non-NULL, or complaining because the user has an unnecessary NULL check in the expansion of GET_STR_CONFIG_OR_DEFAULT.
The compiler can prove the extra check is unnecessary there. I'm not sure it would be wrong for it to complain, but it's an entirely different case than adding a test after a dereference of something you can't prove to not be NULL.

> Which leads you into the question of provenance of your "always true" if statement; in this case, the compiler has to know that it knows config is not NULL because the programmer did something that's an explicit check, not a "if it's NULL, then we have UB" case, otherwise you get a noisy warning because the programmer included an unnecessary NULL check in the macro. But they did that because that way the macro is safe to use even if it's only used once in a function, and if the user does do:

> /* In a C file, using the header file */
> void do_amazing_things(void *config) {
> int rating = ((struct configs*) config)-gt;rating;
> char * expertise = GET_STR_CONFIG_OR_DEFAULT(config, expertise, "novice");

> /* Do the amazing things */
> }

> then the compiler will need to warn because its "config is not NULL" assumption came from "if config is NULL, then there is UB in the execution". But first we need compiler authors to get onboard with the idea that whenever UB is invoked to permit an optimization (noting that this can require you to do complex tracking to get from your low-level IR (MachineInstr in the LLVM case, for example) all the way back to C, so that you can confirm that a given decision in codegen is not causing issues because it results in bad signed overflow behaviour (for example).

The only thing I said is always true is that you can't dereference something you can't prove not to be NULL and expect it to not be UB; the check reordering is allowed because it was UB before and we can assume at that point what the author wanted. This doesn't mean a warning should not be emitted unless (somehow) specified otherwise. If there's a check before in the same function it's trivial to prove it's no longer NULL after that. Besides, you don't need to know whether an optimization resulted from the UB. Knowing it exists is enough. And AFAICT that can be known before transforming to IR.

Footguns

Posted Jul 20, 2021 11:54 UTC (Tue) by farnz (subscriber, #17727) [Link]

> The only thing I said is always true is that you can't dereference something you can't prove not to be NULL and expect it to not be UB; the check reordering is allowed because it was UB before and we can assume at that point what the author wanted. This doesn't mean a warning should not be emitted unless (somehow) specified otherwise. If there's a check before in the same function it's trivial to prove it's no longer NULL after that. Besides, you don't need to know whether an optimization resulted from the UB. Knowing it exists is enough. And AFAICT that can be known before transforming to IR.

This is the difference between your model, and the C model. In C, you can dereference something that you can't prove not to be NULL, and expect it to not be UB; it's only UB if, on a specific execution, the thing is NULL. If it happens not to be NULL, then there's no UB. This comes down to whole program analysis - as a programmer, you might ensure that you only call ub(config) with non-NULL pointers, and thus it's fine.

So the C compiler is able to reason backwards - if thing *is* NULL, then there is UB. Ergo is must not be NULL, because UB isn't allowed, which permits optimization on the basis that it's not NULL. This is fine if the explicit NULL check is (e.g.) from a macro, or in generated code, or left over from refactoring and not yet removed; you've taken out something that's always false in this context, and just deleted the dead code.

It only becomes a problem where the human is surprised by the deletion of dead code - i.e. where the human thought the check was still important, but it's actually considered always false (or always true) by the compiler. And as deletion of dead code takes place at every level - from AST, through IR, through machine instruction choice - the compiler needs to link back its elimination of dead code and determine whether or not the human would be surprised by this particular elimination.

And that's what makes it a hard problem - we may know that ub(config) is only ever called with a non-NULL pointer, but that needs a whole program analysis which the compiler cannot perform. We may also know this simply because we've refactored so that callers of ub(config) do the NULL check themselves, and the extra NULL check inside the function may be there because a preprocessor (which can be outside the compiler!) has used a general form of code that works regardless of the presence of the NULL checks; why use two macros and force the human to think about whether or not a NULL check has happened already, when you can use one?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds