Undefined Behaviour as usual

Posted Feb 23, 2024 18:36 UTC (Fri) by geofft (subscriber, #59789)
In reply to: Undefined Behaviour as usual by tialaramex
Parent article: Stenberg: DISPUTED, not REJECTED

You are correct about it being undefined behavior, and you are correct that a compiler can cause the program to behave arbitrarily in response to it, but I would argue it is still not a vulnerability.

A vulnerability is a misbehavior in a program that compromises its security in response to crafted input / malicious actions from an attacker. This requires a few things to exist. "Misbehavior" requires a coherent concept of intended behavior. "Attacker" and "crafted input" require a system that interacts with untrusted parties in some way. "Security" requires things that you're trying to keep secure (maintain confidentiality, integrity, and/or availability) from the attacker.

As an obvious example, the fact that bash executes arbitrary code is not an arbitrary code execution vulnerability. It's what bash is supposed to do. The thing that supplies input to bash is expected to be the legitimate user of bash, not an attacker. bash is supposed to be able to run any command, and so it doesn't have a distinction between behavior and misbehavior. If you can get arbitrary code execution in rbash, for instance, then yes, that's a vulnerability, because rbash is designed to take untrusted input and maintain security of the system it runs against that input. If you can run arbitrary commands by setting environment variables even if you can't control input, then there's probably a vulnerability (as Shellshock was). But for regular bash, if you are setting up a system where you're piping untrusted input into bash, that's a vulnerability in your system, not in bash.

The only way to distinguish that is to know at a human level what bash is supposed to do and where it's supposed to take inputs from. There is no automated technical distinction to be made between bash and rbash. There is no automated technical distinction to determine that commands like bash and python are only intended to be run with trusted input, but commands like grep and sort are supposed to be able to handle untrusted input. You can call this a "gut feeling" if you like, but it's inherent to how we use computers. We never run a program for the sake of running a program; we run a program in order to accomplish some task of human interest, and the parameters of that task, not the program, determine what counts as misbehavior and insecurity.

There is a simple argument that CVE-2020-19909 (note there is a typo in today's article, it's 19909, not 1909) is not a vulnerability. It is not that the undefined behavior doesn't exist, or that the risk of a compiler doing something arbitrarily weird and unwanted is low. It is entirely compatible with a compiler generating code to do system("rm -rf /") in the case of signed integer overflow. The argument is that attackers do not have access to set an arbitrary retry delay value, and any real-world system that uses curl where attackers do have this access has a more fundamental vulnerability - e.g., they can provide arbitrary arguments to the curl command line, or they already have code execution in an address space in which libcurl is linked. Even in the most limited case where the attacker can only specify this one value and nothing other than curl is imposing limits on the value they'd still be able to effectively request that curl should hang for 24 days on a 32-bit system or until well past the heat death of the universe on a 64-bit system, which is already a denial of service vulnerability, and fixing that would avoid hitting the integer overflow in curl. And if you, yourself, as the legitimate user of curl or libcurl provide a ludicrous value for the timeout, it cannot be said to be a vulnerability, because you are not an attacker of your own system.

How do we know if this claim is correct? You're right in a technical sense that you can't really know - it's always theoretically possible that someone wrote a web form that takes input from untrusted users and one of the fields is the retry vulnerability value But it's also equally theoretically possible that someone wrote a web form that takes input from untrusted users and splices it into a bash command line without escaping. And, in fact, it's not just theoretically possible, it's quite common. But nobody would say this means there's a vulnerability in bash, would they?

The process of determining what is or isn't a vulnerability has to come down to human judgment. You could plausibly argue, for instance, that Shellshock wasn't a vulnerability because attackers shouldn't be able to pass arbitrary environment variables into bash. But the real-world deployment of CGI meant that there was a large installed base of users where attackers were in fact able to set environment variables to arbitrary values. Moreover, it meant that humanity believed that it was a reasonable design to do that, and the flaw was not with CGI for setting those variables.

And it's not sufficient to just lean on documented behavior. First, would you consider it an adequate fix for the vulnerability if curl made no code changes but just changed the documentation to say that the input to the timeout value must be small enough to avoid integer overflow? But also there have actually been real-world vulnerabilities, that are unquestionably vulnerabilities in human judgment, that were documented. Log4shell comes to mind: the behavior of loading JNDI plugins in log statements was absolutely intentional, and the support in JNDI for loading plugins from arbitrary servers was also absolutely intentional. But the resulting behavior was so unreasonable that the Log4j authors did not argue "there is no deviation from the documented behavior" - which they could have argued with much more certainty than a gut feeling. Or consider the KASLR bypass from the other day: it isn't material whether the kernel developers intended to publish a world-readable file with post-ASLR addresses or not, it is still a security bug either way.

There is, simply, no way to determine what is or isn't a vulnerability without the involvement of human judgment. You can make a reasonable argument that the maintainers of the software are poorly incentivized to make accurate judgments, yes. But someone has to make the judgment.

(Also - I actually fully agree with you about CVE-2023-52071. The argument that it only applies in debug builds and not release builds is reasonable as far as it goes, but in my human judgment, it is totally reasonable to run debug builds in production while debugging things, and you're right that Daniel's claim that it can only possibly cause a crash is incorrect. Because the bad code deterministically does an out-of-bounds access, it's totally reasonable for the compiler to treat the code as unreachable and thus conclude the rest of the block is also unreachable, which can change the actual output of the curl command in a security-sensitive way. The compiler can tear out the whole if statement via dead-code elimination, or it can lay out something that isn't actually valid code in the true case, since it's allowed to assume the true case never gets hit. He's quite possibly right that no compiler actually does that today; he's wrong that it's reasonable to rely on this.)

Undefined Behaviour as usual

Posted Feb 23, 2024 19:54 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (8 responses)

> Because the bad code deterministically does an out-of-bounds access, it's totally reasonable for the compiler to treat the code as unreachable

How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.

The only semi-legitimate use case for such access is stack protector runtime test scripts (and even those should be done in assembler for 100% reliability).

> warning: array subscript 3 is above array bounds of ‘wchar_t[3]’ {aka ‘int[3]’} [-Warray-bounds=]

int f(void);
int main(int argc, char *argv[])
{
wchar_t prefix[3] = {0};
if (f()) {
assert(prefix[3] == L'\0');
}
return EXIT_SUCCESS;
}

Undefined Behaviour as usual

Posted Feb 23, 2024 21:16 UTC (Fri) by geofft (subscriber, #59789) [Link] (4 responses)

I gave a somewhat contrived example in this other comment. It is entirely possible that the OOB-ness of the access is conditional in some way, such as via preprocessor macros or code generation from some template, and the programmer knows that f() is not going to actually return true in the case where the access would be out of bounds.

Here's another example, though you might argue that it is also contrived. Suppose you have a binary format that stores numbers between 0 and 32767 in the following way: if the number is less than 128, store it in one byte, otherwise store it in two bytes big-endian and set the high bit.

inline int is_even(unsigned char *p) {
    if (p[0] & 0x80)
        return p[1] % 2 == 0;
    return p[0] % 2 == 0;
}

unsigned char FIFTEEN[] = {0x15};

if (is_even(FIFTEEN))
    printf("15 is even\n");

After inlining there's a line of code talking about FIFTEEN[1] which is out of bounds, inside an if statement, just like your example. The if statement doesn't succeed, so there's no UB, but you need to do some compile-time constant expression evaluation to conclude that, and it's pretty reasonable, I think, to have a compiler that supports inlining but does no compile-time arithmetic.

Undefined Behaviour as usual

Posted Feb 24, 2024 5:32 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (3 responses)

It is probably less work to just emit potentially UB access and let pagefault handler sort it out.

Undefined Behaviour as usual

Posted Feb 24, 2024 21:22 UTC (Sat) by jrtc27 (subscriber, #107748) [Link] (2 responses)

On a system with 4K pages, you have a 1 in 4096 chance that the OOB access is on a different page and thus *could* even generate a page fault. Let alone the fact that in a large program there will very likely be something on the next page anyway and so you still wouldn't get a page fault.

Undefined Behaviour as usual

Posted Feb 25, 2024 7:03 UTC (Sun) by adobriyan (subscriber, #30858) [Link] (1 responses)

Yes pagefault is not reliable test but so what.

Again, if compiler can 100% prove UB access it should refuse to compile.

If UB access cannot be proven then it should shut up and emit access on the grounds that maybe, just maybe, it doesn't know something.

Linus(?) once made an example that future very powerful gcc 42 LTOing whole kernel may observe that kernel never sets PTE dirty bit
and helpfully optimise away all reads of said bit. Which, of course, will break everything.

Undefined Behaviour as usual

Posted Feb 25, 2024 8:36 UTC (Sun) by mb (subscriber, #50428) [Link]

> Again, if compiler can 100% prove UB access it should refuse to compile.

A compiler cannot at the same time assume UB doesn't exist and refuse to compile if it does exist.

You have to decide on a subset of UB that you want to abort instead of assuming it doesn't exist.
Which kind of defeats the purpose of UB then. It's defined behavior then.

We *do* have languages that have a proper language subset without UB. Just use them.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:58 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.

Such approach is incompatible with SPEC CPU 2006 which means none of compiler vendors would ever release such a compiler.

Undefined Behaviour as usual

Posted Feb 24, 2024 5:09 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (1 responses)

There should be translate.google.com but for programming languages.

Undefined Behaviour as usual

Posted Feb 24, 2024 10:57 UTC (Sat) by excors (subscriber, #95769) [Link]

Programming language translation is one use case where LLMs seem moderately useful. Sometimes they output complete garbage, and sometimes they give a subtly incorrect translation, but sometimes it'll be enough to let you get a basic understanding of an unfamiliar language. (Which is similar to Google Translate.)

Assuming you mean the Fortran code in the bug report, ChatGPT 3.5 (asked to translate it into C) interprets the loop as "for (int I = 1; I <= NAT; I++) {" and the WRITEs as e.g. "if (LAST && MASWRK) fprintf(IW, "%d %s %s %f %f %f %f\n", I, ANAM[I], BNAM[I], PM, QM, PL, QL);", which seems plausible. Though it does fail to mention that the Fortran array indexing starts at 1, while C starts at 0, which is a potentially important detail when talking about bounds-checking.

(But that code seems largely irrelevant to the actual bug, which is that the benchmark illegally declares an array to have size 1 in one source file and 80M in another source file, and GCC optimised it by treating the 1 as a meaningful bound, and SPEC won't fix their buggy benchmarks because they "represent common code in use" and because any changes "might affect performance neutrality" (https://www.spec.org/cpu2006/Docs/faq.html#Run.05), so the compiler developers have to work around it.)