Stenberg: DISPUTED, not REJECTED

[Posted February 23, 2024 by daroc]

The Curl project has previously had problems with CVEs issued for things that are not security issues. On February 21, Daniel Stenberg wrote about the Curl project's most recent issue with the CVE system, saying:

I keep insisting that the CVE system is broken and that the database of existing CVEs hosted by MITRE (and imported into lots of other databases) is full of questionable content and plenty of downright lies. A primary explanation for us being in this ugly situation is that it is simply next to impossible to get rid of invalid CVEs.

Undefined Behaviour as usual

Posted Feb 23, 2024 16:44 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (111 responses)

This "not a vulnerability" CVE-2020-1909 is a signed overflow. I'm sure Daniel believes it's harmless. I'm sure his (internal) initial reporter believes it's harmless. In reality it might be harmless on my system and yours. We have no way to know, it's completely impractical to figure it out across the broad swathe of supported platforms and deployed uses of Curl. So actually "it's not a vulnerability" just comes down to Daniel's gut feeling. But in future Daniel's gut feeling decides whether a CVE is issued.

C says if we do signed arithmetic and there's an overflow, that's Undefined Behaviour. The compiler can transform the program in any way it sees fit so long as it preserves observable behaviour - all transforms of UB are valid, so the behaviour of the software in this case is in reality completely arbitrary. This "not a vulnerability" could cause absolutely anything to happen.

I have absolutely no doubt that this "not a vulnerability" distinction is not useful, and I suspect it's actively harmful. If your program has documented behaviour but it doesn't actually do what was documented it's not useful to pretend to know whether this deviation is or is not a "vulnerability". My guess is that in insisting that some cases are and some are not, you help bad guys circle the pieces of your system you're not actually defending...

I understand Daniel's instinct. But it's misdirected here. If the worst bug you had in 2020 was an integer overflow in command line parsing code that's not a bad year, take the win. Unfortunately it wasn't the worst Curl bug in 2020.

For whatever it's worth, this variable should have a Duration type but neither C nor its standard library provide one. And so everything flows from there. Why are we multiplying it by 1000? Because we have a retry wait in seconds but we want to calculate milliseconds. In C instead it's a signed "long" integer and of course those have Undefined Behaviour on overflow.

In the more recent CVE that Daniel mentions it's a different cause of Undefined Behaviour. In this case a classic one-past-the-end pointer dereference. But again, Daniel characterises it as fine, because in his mind the worst case is a crash, even though "Undefined Behaviour" is not in fact specified as "Don't worry, it's fine, worst case it will crash"...

Mostly the (presumably unintended) effect of this article was to make me think Curl is worse software and the people maintaining it have prioritised their personal feeling of self-worth and aren't too bothered whether the software is good.

Undefined Behaviour as usual

Posted Feb 23, 2024 16:58 UTC (Fri) by JoeBuck (subscriber, #2330) [Link] (1 responses)

Everything you say here is correct in that the bugs should be fixed. However, we live in a world where people are encouraged by many software distributors to do

curl https:// some.domain / who_knows_what_this_is.sh | sh

and sometimes they are told to become root first. Compared to the possible negative consequences of that, other security issues with Curl are in the noise.

Undefined Behaviour as usual

Posted Feb 23, 2024 17:19 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

Yeah and it doesn't take long to even find examples of "curl -k ... | sh", that users sometimes fail to use because they place sudo on curl not bash, until being suggested to do so... I totally agree that this *this* is a real problem and it's a cultural problem, not easy to fix via just filing a CVE.

Undefined Behaviour as usual

Posted Feb 23, 2024 17:02 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (35 responses)

This is exactly the type of generic analysis that focuses on pure theory and neither on code nor use cases that made CVEs totally useless over the years and harms software development and security in general by making reports not trustable anymore. Everyone believes they found the bug that could have caused the world to collapse and wants to be rewarded for saving it in time, filing a CVE with the highest possible CVSS. But practical implications are no longer analyzed.

Why don't you trust the analysis the developer and maintainer of the project did on the impacts of these issues ?

Undefined Behaviour as usual

Posted Feb 23, 2024 17:35 UTC (Fri) by mb (subscriber, #50428) [Link] (30 responses)

> Why don't you trust the analysis the developer and maintainer of the project did on the impacts of these issues ?

The code does not seem to be compiled with -fwrapv. (I can't find it in the sources)
Therefore, signed overflow is UB in curl.

One cannot rule out security implications without knowing the whole system in that case.
For a given set of compiler, libs, curl, etc.. maybe one can assess that it is a harmless bug by looking at the generated asm code.
But if I use a different compiler version, the assessment is completely invalid. It could very well have security implications in my setup.

Curl should use -fwrapv or -ftrapv.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:12 UTC (Fri) by Sesse (subscriber, #53779) [Link] (24 responses)

Can you point to any (remotely real) system where this actually causes a security issue?

Undefined Behaviour as usual

Posted Feb 23, 2024 18:15 UTC (Fri) by mb (subscriber, #50428) [Link] (6 responses)

No. Because I have not tried to find one.

But neither can curl ensure there is no such system anywhere.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:25 UTC (Fri) by Sesse (subscriber, #53779) [Link] (5 responses)

Well, normally, the burden of proof is on the accuser? I mean, I'm pretty sure curl will hit UB in various ways if you have e.g. int at 18 bits (presumably with 9-bit bytes), which is also not impossible at all as per the C standard. Would you file curl CVEs for every code that would behave incorrectly on such a system? After all, curl certainly cannot rule out that such a system exists somewhere. Since you seem to assume -fwrapv would nullify this bug (even though it does absolutely nothing to change the C standard), what about compilers where such an option does not exist?

Sometimes, you can do bad things (have bugs) without them turning into a security vulnerability in practice.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:35 UTC (Fri) by mb (subscriber, #50428) [Link] (2 responses)

>Well, normally, the burden of proof is on the accuser?

No. That is not how security works.

> if you have e.g. int at 18 bits (presumably with 9-bit bytes),

What-about-ism.
Has nothing to do with signed-overflow being UB.

>Since you seem to assume -fwrapv would nullify this bug

No. I didn't say that. I was talking about UB. Not whether the bug would still be a bug or not.

Undefined Behaviour as usual

Posted Feb 27, 2024 11:02 UTC (Tue) by JustABug (guest, #169930) [Link]

A bug can just be a bug.
It doesn't need to have a CVE entry to be remediated quickly.

Does this UB allow for priviledge escalation? Data expositon? What's the attack vector? User intentionally entering a stupid value?

If the user can run curl they can run rm -rf

What's the output? Program crash? Exploitable unintended behaviour? What's an abuse scenario?

The researcher filing the CVE needs to demonstrate their CVE isn't a nothing burger.

The only advantage I can think of for filing a CVE for every UB is ensuring the fix is backported. Using BS CVEs as a tool to get things backported is an abuse of the system to address the problem of selective backporting.

Undefined Behaviour as usual

Posted Mar 22, 2024 18:58 UTC (Fri) by DanilaBerezin (guest, #168271) [Link]

> No. That is not how security works.

That is how security works though? Any line of code has the potential to be exploited. If the mere possibility of an exploit is the bar we set to file a CVE, then I can mark every line of code in every project as a CVE. Obviously, that would be very silly.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:41 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (1 responses)

What's fun is that often the people who go through extreme mind-stretching to imagine a possible vulnerability in a random bug are the same who complain about the excess of CVEs reported in new stable kernels, which just apply a softened version of their logic :-)

Undefined Behaviour as usual

Posted Feb 24, 2024 0:19 UTC (Sat) by mb (subscriber, #50428) [Link]

Do you have an actual example of such a person?
I could not imagine a single one.

Undefined Behaviour as usual

Posted Feb 23, 2024 21:57 UTC (Fri) by bagder (guest, #38414) [Link] (16 responses)

No they can't. This, as many other of these UB problems are mostly *theoretical* problems in imaginary systems that *might* "do anything". Blindly considering all UBs to be security problems is not helping the world.

I consider it a fairly important property that a security problem should be possible to actually happen on existing hardware where the code was built with existing compilers. If not, it actually is not a security problem and we don't do any service to the world by treating as such.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:08 UTC (Fri) by mb (subscriber, #50428) [Link] (15 responses)

>we don't do any service to the world by treating as such.

That means you'll have to constantly re-assess old UB bugs with every new compiler and system release.
That's an almost infinite amount of work.
Treating UB bugs as security bugs and fixing them right away is easy compared to that.

Undefined Behaviour as usual

Posted Feb 24, 2024 1:03 UTC (Sat) by jwarnica (subscriber, #27492) [Link] (13 responses)

We already have to reassess all behavior, even defined behavior, of new compiler and system releases, as the compiler and system itself may be buggy. Or less buggy, but previously relied upon buggy behavior changed in usually good, but one particular situation, bad ways.

Undefined Behaviour as usual

Posted Feb 24, 2024 11:53 UTC (Sat) by mb (subscriber, #50428) [Link] (12 responses)

No. Finding newly introduced misbehavior is a completely different class of task than re-assessing known UB bugs over and over again for actual misbehavior.
Because re-assessing known UB bugs is completely unnecessary, if they are understood as ticking time bombs that can trigger undefined behavior (including security issues) at any point in time if the ever so slightest bit of environment changes.

UB bugs must be documented as such and must get fixed right away.
Rejecting a CVE is the opposite of that. It is not a responsible reaction to an UB bug. It is hiding your head in the sand.
Documenting and fixing the UB *solves* the problem once and for all. No re-assessment needed.

Undefined Behaviour as usual

Posted Feb 24, 2024 12:07 UTC (Sat) by jwarnica (subscriber, #27492) [Link] (11 responses)

UB is UB in the spec. It is defined in the implementation.

DB is DB in the sepc. It is further defined (possibly incorrectly) in the implementation.

System v+1 might change both of these scenarios.

The only difference if v+1 changes your application is how righteous your bug report to the system is.

Undefined Behaviour as usual

Posted Feb 24, 2024 12:33 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

>System v+1 might change both of these scenarios.

Correct.
Upon system change: Having to check two things or having to check only one out of the two. What is better?

Also: The defined-behavior check is being mostly done by the release tests of the system components.

> UB is UB in the spec. It is defined in the implementation.

That is not true.
Data race UB is not defined in the implementation anywhere. It cannot be defined. It is non-deterministic. That's just one example.

You are talking about Implementation-Defined and just assume that UB is the same thing.
It is not.

Undefined Behaviour as usual

Posted Mar 1, 2024 15:48 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> Data race UB is not defined in the implementation anywhere. It cannot be defined. It is non-deterministic. That's just one example.

The concrete behavior is non-deterministic, sure. However, the language spec can say things like "must treat the value as valid" even if its value is not what is expected (or even naïvely possible given the code). Instead, we have "if a data race happens, the bits in the memory must not be interpreted". The JVM has the former (you get *some* behavior of the code you wrote even if it's not what you expect). C has the latter (if the compiler notices a data race, it can instead assume it never happens and optimize accordingly; this may be "no behavior" after optimization passes complete). Note that C compilers don't stick in "if we notice UB at runtime, do_bad_thing()" code (unless requested via UBSAN and friends). Instead, their noticing of UB ends up affecting how optimizations treat the code which *then* transform it into unexpected logic flows based on the raw syntax of the code available.

In this sense, I think that overflowing on a command line argument parse is unlikely to have any real effect as the optimizer is just going to assume it stays in range and "means something". However, instead we have "the timeout parameter takes an integer, multiplication makes an integer…maybe it overflows, but who cares as 'it doesn't happen'". The code is the same whether overflow happens or not…it's just that the value is not interpreted as 1000*input but as 1000*input % UINT_MAX (or whatever type it is). Given the stratospheric values needed to overflow, I dare say that anyone flirting with these values already had a DoS on their plate in case the intended timeout actually ended up having to expire. It's only a real problem for those running with UBSAN on…but they're asking for DoS crashes by doing so anyways.

IMO, should UB be patched and guarded against? Yes. Does *this* instance warrant a CVE? No. `curl | sh` with bogus timeout parameters is more easily handled with replacing a URL with a download from malicious.site instead.

Undefined Behaviour as usual

Posted Feb 27, 2024 11:21 UTC (Tue) by danielthompson (subscriber, #97243) [Link] (8 responses)

> UB is UB in the spec. It is defined in the implementation.

I think this confuses undefined behavior (e.g. reading past the end of an array) with implementation defined behavior (e.g. signess of char, size of int, etc). In some cases (such as reading past the end of an array whose length is not known at compile time) then compiler implementation cannot define what happens!

Signed integer overflow is interesting because it is an undefined behaviour (according to the C spec) and many implementations really do make it's behavior undefined. For example the behavior on overflow can and does change based on the optimization level. Nevertheless it is possible for an implementation to fully define what happens on overflow if the application is willing to accept missed optimization opportunities (hence -fwrapv).

It is also interesting because one of the common undefined behaviors linked to signed overflow is the removal of security bounds checks that (incorrectly) assumed integer overflow would wrap... and this certainly did lead to vulnerabilities.

However, despite the above, I'm unconvinced by the what-about-ism in the "every overflow is a ticking bomb" argument": the set of plausible optimizations on a time conversion during command line processing is relatively small.

Undefined Behaviour as usual

Posted Feb 29, 2024 8:35 UTC (Thu) by anton (subscriber, #25547) [Link]

There is no way to "really make it's behaviour undefined", except by compiling to code where the architecture makes the behaviour undefined. Sane computer architects don't, and even insane architects don't do it for anything that a compiler is likely to output for signed addition. When reading past the end of an array, the code produced by the compiler usually just reads past the end of the array; it takes special effort by the compiler writer to do something different; but yes, compiler writers have taken this special effort, with the result being that there are cases of "optimizing" a loop with a read past the end of an array into an infinite loop.

The removal of bounds checks is another such "optimization"; however, it is often based on the assumption that pointer arithmetic (not signed integer arithmetic) does not wrap around. Therefore, to avoid this kind of miscompilation, -fwrapv is insufficient. The better option is -fno-strict-overflow (which combines -fwrapv and -fwrapv-pointer).

Undefined Behaviour as usual

Posted Feb 29, 2024 11:47 UTC (Thu) by SLi (subscriber, #53131) [Link] (6 responses)

I think what the parent may have wished to say is that undefined behavior, in some sense, becomes defined behavior once you are looking at the binary, possibly sometimes together with some environmental details.

While I'm all for making code robust and even moving to safer languages, I think most people are more interested in vulnerabilities in actual binaries running on actual computers than in ones where a theoretical, possibly evil compiler could read the spec and perform a theoretically valid compilation where the output does something bad.

Undefined Behaviour as usual

Posted Feb 29, 2024 14:31 UTC (Thu) by pizza (subscriber, #46) [Link] (5 responses)

> I think what the parent may have wished to say is that undefined behavior, in some sense, becomes defined behavior once you are looking at the binary, possibly sometimes together with some environmental details.

Yeah, that's my take on this too.

It's "undefined" in the spec, but the actual compiled binary (+runtime environment) exhibits highly consistent (albeit unexpected/unintended) behavior. After all, script kiddies couldn't exploit those bugs into a privilege escalation without the binary behaving in this implementation-specific manner.

Undefined Behaviour as usual

Posted Feb 29, 2024 19:10 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

Am I right in thinking that you'd agree that it's the output binary that's usually deterministic in its environment (modulo things with clearly-defined non-determinism such as the RDRAND instruction, or runtime data races between multiple threads), and not the combination of compiler (including flags) and source code?

In other words, while this would be a nasty surprise, you wouldn't be surprised if recompiling the same UB-containing source resulted in a different binary, but you would be surprised if using the same binary in the same environment with the same inputs had a different output, unless it was doing something that is clearly specified to have non-deterministic behaviour by the hardware (like a data race, or RDRAND).

Undefined Behaviour as usual

Posted Mar 1, 2024 1:37 UTC (Fri) by pizza (subscriber, #46) [Link] (3 responses)

> Am I right in thinking that you'd agree that it's the output binary that's usually deterministic in its environment (modulo things with clearly-defined non-determinism such as the RDRAND instruction, or runtime data races between multiple threads), and not the combination of compiler (including flags) and source code?

Yes, I'd agree with this, and I don't think that it's a terribly controversial opinion.

> In other words, while this would be a nasty surprise, you wouldn't be surprised if recompiling the same UB-containing source resulted in a different binary

If you're recompiling with the same toolchain+options I'd expect the resulting binaries to behave damn near identically from one build to the next [1]. (Indeed, as software supply-chain attacks become more widespread, 100% binary reproducibility is a goal many folks are working towards)

> but you would be surprised if using the same binary in the same environment with the same inputs had a different output, unless it was doing something that is clearly specified to have non-deterministic behaviour by the hardware (like a data race, or RDRAND).

Yep. After all, the script kiddies wouldn't be able to do their thing unless a given binary on a given platform demonstrated pretty consistent behavior.

[1] The main difference would the linker possibly putting things in different places (especially if multiple build threads are involved) but that doesn't change the fundamental attack vector -- eg a buffer overflow that smashes your stack on one build (and/or platform) will still smash your stack on another, but since the binary layout is different, you'll likely need to adjust your attack payload to achieve the results you want. Similarly, data-leakage-type attacks (eg Heartbleed) usually rely on being able to repeat the attack with impunity until something "interesting" is eventually found.

Undefined Behaviour as usual

Posted Mar 1, 2024 10:04 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

If you're recompiling with the same toolchain+options I'd expect the resulting binaries to behave damn near identically from one build to the next [1]. (Indeed, as software supply-chain attacks become more widespread, 100% binary reproducibility is a goal many folks are working towards)

This is a potentially dangerous expectation in the presence of UB in the source code; there are optimizations that work by searching for a local maximum, and where for fully-defined code (even where it's unspecified behaviour, where there are multiple permissible outcomes), we know that there is only one maximum they can find. We use non-determinism in that search to speed it up, and for UB we run into the problem that there can be multiple maxima, all of which are locally the best option.

Because the search is non-deterministic, exactly which maximum we end up in for some UB cases is also non-deterministic. This does mean that 100% binary reproducibility has the nice side-effect of wanting to reduce UB - by removing UB, you make the search type optimizations find the one and only one optimal stable state every time, instead of choosing a different one each time).

And I'd agree that it's not terribly controversial to believe that a binary running in user mode has no UB - there's still non-deterministic behaviour (like the result of a data race between threads, or the output of RDRAND), and if your binary's behaviour is defined by something non-deterministic, it could end up in what the standards call unspecified behaviour. This is not universally true once you're in supervisor mode (since you can do things on some platforms like change power rails to be out-of-spec, which results in CPUs having UB since the logic no longer behaves digitally, and thus it's possible for software UB to turn into binary defined behaviour of changing platform state such that the platform's behaviour is now unpredictable).

Undefined Behaviour as usual

Posted Mar 1, 2024 14:48 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> we know that there is only one maximum they can find. We use non-determinism in that search to speed it up, and for UB we run into the problem that there can be multiple maxima, all of which are locally the best option.

FWIW I've seen my fair share of shenanigans caused by nondeterminsitic compiler/linking behavior. To this day, there's one family of targets in the CI system that yields a final binary that varies by about 3KB from one build to the next depending on which nonidentical build host was used to cross-compile it. I suspect that is entirely due to the number of cores used in the highly parallel build; I've never seen any variance from back-to-back builds on the same machine (binaries are identical except for the intentionally-embedded buildinfo)

But I do understand what you're saying, and even agree -- but IME compilers are already very capable of loudly warning about the UB scenarios that can trigger what you described. Of course, folks are free to ignore/disable warnings, but I have no professional sympathy for them, or the consequences.

> This is not universally true once you're in supervisor mode (since you can do things on some platforms like change power rails to be out-of-spec [...]

I've spent most of my career working in bare-metal/supervisory land, and yeah, even a off-by-one *read* could have some nasty consequences depending on which bus address that happens to hit. OTOH, while the behavior of that off-by-one read is truly unknowable from C's perspective, if the developer "corrects" the bug by incrementing the array size by one (therefore making it too large for the hw resource) the program is now "correct" from C's perspective, but will trigger the same nasty consequences on the actual hardware.

Undefined Behaviour as usual

Posted Mar 1, 2024 16:34 UTC (Fri) by farnz (subscriber, #17727) [Link]

OTOH, while the behavior of that off-by-one read is truly unknowable from C's perspective, if the developer "corrects" the bug by incrementing the array size by one (therefore making it too large for the hw resource) the program is now "correct" from C's perspective, but will trigger the same nasty consequences on the actual hardware.

I spend a lot of time in the land of qualified compilers, where the compiler promises that as long as you stick to the subset of the language that's qualified, you can look only at the behaviours evident in the source code to determine what the binary will do. You're expected, if you're working in this land, to have proper code review and separate code audit processes so that a fix of the type you describe never makes it to production, since it's obvious from the source code that the program, while "correct" from C's perspective, is incorrect from a higher level perspective.

And a lot of the problems I see with the way UB is handled feel like people expect all compilers to behave like qualified compilers, not just on a subset of the language, but on everything, including UB.

Undefined Behaviour as usual

Posted Feb 26, 2024 13:16 UTC (Mon) by bagder (guest, #38414) [Link]

Bugs should be fixed yes.

An UB is a bug that *might* have a security impact. We do not make the world a better place by blindly assuming every UB is a security vulnerability. That's just like crying wolf. It is not helpful. Proper security assessment should still be applied.

Since they are bugs that should be fixed, we don't have to "come back" later to reassess their security impact. Unless we reintroduce the bugs of course.

Those are guidelines I adhere to in the projects I work in.

Undefined Behaviour as usual

Posted Feb 24, 2024 5:57 UTC (Sat) by jmspeex (subscriber, #51639) [Link] (3 responses)

> One cannot rule out security implications without knowing the whole system in that case.

Correct. But then that is also true of most bugs you find in software. Based on that alone almost all bugs should be filed at the highest severity, which isn't exactly helpful. Any UB should get fixed, but when UB gets discovered in a piece of code, someone has to make a reasonable effort to figure out how bad the impact is likely to be. Write out of bounds on the stack: very very bad. Integer wrap-around in a loop counter: could be bad unless careful analysis shows it's unlikely to be. Integer overflow in the value that only gets printed to a log: much less likely to be exploited unless proven otherwise.

Undefined Behaviour as usual

Posted Feb 24, 2024 6:35 UTC (Sat) by Otus (subscriber, #67685) [Link] (2 responses)

> Based on that alone almost all bugs should be filed at the highest severity, which isn't exactly helpful.

Having a CVE doesn't imply highest severity, even low severity vulnerabilities are meant to have one. Severity analysis is a separate matter.

Undefined Behaviour as usual

Posted Feb 24, 2024 6:39 UTC (Sat) by jmspeex (subscriber, #51639) [Link] (1 responses)

Well, if you look at the issue being discussed, it was rated as severity 9.8/10. If you're going to give that rating for any integer overflow because (technically it's UB), then you have no room left for the scary stuff.

Undefined Behaviour as usual

Posted Feb 24, 2024 11:30 UTC (Sat) by Otus (subscriber, #67685) [Link]

I can easily believe that the severity was wrong. But shouldn't that then be fixed?

I don't really know what the correct severity would've been here, but the severity part has always been black magic. (I don't think those are particularly useful in practice.)

My point is simply that CVE isn't supposed to be exclusively for highest impact issues, but any vulnerabilities.

Undefined Behaviour as usual

Posted Feb 29, 2024 8:49 UTC (Thu) by anton (subscriber, #25547) [Link]

The gcc option -fno-strict-overflow would imply -fwrapv (but I have not looked in the sources whether curl is compiled with that).

Yes, unless you are compiling a benchmark, compiling with -fno-strict-overflow is a good idea. This limits the kinds of shenenigans that the compilers do, a little. There are also a number of other such flags.

Actually, if we consider C "undefined behaviour" to be a security issue all by itself, then a C compiler that does not (by default) limit what it does is a security issue all by itself. Maybe someone (a Rust fan?) should file one CVE for each undefined behaviour (211 in C11 IIRC) for each C compiler, unless that compiler limits that behaviour for that case by default (offering flags like -fstrict-overflow for compiling benchmarks is allowed, of course).

Undefined Behaviour as usual

Posted Feb 24, 2024 15:40 UTC (Sat) by mcatanzaro (subscriber, #93033) [Link] (3 responses)

> This is exactly the type of generic analysis that focuses on pure theory and neither on code nor use cases that made CVEs totally useless over the years and harms software development and security in general by making reports not trustable anymore.

But it's also correct. Signed integer overflow is a software vulnerability. It doesn't matter whether it's exploitable or not. CVEs are for tracking vulnerabilities, not exploits.

Undefined Behaviour as usual

Posted Feb 25, 2024 23:33 UTC (Sun) by neggles (subscriber, #153254) [Link] (2 responses)

If CVEs aren't for exploits why is the E there, then?

Undefined Behaviour as usual

Posted Feb 26, 2024 1:15 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (1 responses)

Because they're an Enumeration? This idea comes out of the paper "Towards a Common Enumeration of Vulnerabilities"

Undefined Behaviour as usual

Posted Feb 26, 2024 9:07 UTC (Mon) by geert (subscriber, #98403) [Link]

https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_...

Undefined Behaviour as usual

Posted Feb 23, 2024 17:34 UTC (Fri) by flussence (guest, #85566) [Link] (43 responses)

> C says if we do signed arithmetic and there's an overflow, that's Undefined Behaviour. The compiler can transform the program in any way it sees fit so long as it preserves observable behaviour - all transforms of UB are valid, so the behaviour of the software in this case is in reality completely arbitrary.

This describes a fantasy world where C compilers (and perhaps all software) are made by insane villains and actively abuse people for doing things outside what a written standard specifies, and to be blunt, it's just "free speech" advocacy with different inflection. I for one am glad the tech culture of 40 years ago has been largely stomped out by more reasonable people.

Undefined Behaviour as usual

Posted Feb 23, 2024 17:37 UTC (Fri) by mb (subscriber, #50428) [Link] (11 responses)

> are made by insane villains and actively abuse people

Compilers exploiting UB happens all the time. It is the base of all optimizations.

> for doing things outside what a written standard specifies,

UB is by the very definition of UB outside of what the standard specifies.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:44 UTC (Fri) by fw (subscriber, #26023) [Link] (1 responses)

Compilers exploiting UB happens all the time. It is the base of all optimizations.

Not really, Java and C# do not have undefined behavior, and yet there are optimizing compilers. Even for C, it's a rather extreme position to say that register allocation (probably among the top three optimizations to implement in a compiler for current architectures) depends on undefined behavior. For others like constant propagation it's a bit of a stretch, too.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:18 UTC (Fri) by khim (subscriber, #9252) [Link]

> Not really, Java and C# do not have undefined behavior, and yet there are optimizing compilers.

Java and C# absolutely do have undefined behavior. It's just handled like Rust handles it: “safe” language guarantees absence of UB by compiler whole “unsafe” part allows on to write programs with UB.

Java forces you to write these parts in entirely different language using JNI while C# have an unsafe subset, similarly to Rust, but in both cases UB is still, very much, form the basis for all optimizations.

> Even for C, it's a rather extreme position to say that register allocation (probably among the top three optimizations to implement in a compiler for current architectures) depends on undefined behavior.

Of course it does. Everything in C depends on absence of undefined behavior. Simply because it's permitted to convert pointer to function into pointer to chat *. This may break your program if it does register allocation in a different way.

And it's not even theoretical issue! Back in MS-DOS era register variables could only use si and di (and they weren't used for anything else) and thus it was possible to write code that would [ab]use that in it's signal handler.

> For others like constant propagation it's a bit of a stretch, too.

That one may be exploiting by finding and changing constants in the compiler code. And that, too, was used back when compilers weren't smart enough to break such tricks.

Undefined Behaviour as usual

Posted Feb 24, 2024 12:39 UTC (Sat) by vegard (subscriber, #52330) [Link] (7 responses)

> Compilers exploiting UB happens all the time. It is the base of all optimizations.

The first part is true, but the second seems trivially false. Constant propagation does not in any way relate to or rely on UB, yet it is an optimization. Same with tail call optimizations, inlining, even register allocation just to name a few.

Undefined Behaviour as usual

Posted Feb 24, 2024 12:47 UTC (Sat) by mb (subscriber, #50428) [Link] (6 responses)

Yes. I was wrong. Let me fix it:
Compilers exploiting UB happens all the time. It is the base of most optimizations.

And let me even add something: UB is required for connecting the virtual machine models of the compiler to the real world. Otherwise the virtual machine model would have to *be* the actual machine model. And even then it would still include UB, because actual machines have UB.

Undefined Behaviour as usual

Posted Feb 24, 2024 13:21 UTC (Sat) by vegard (subscriber, #52330) [Link] (5 responses)

I disagree that UB is required for programming languages in general. Maybe it's required if you have a certain set of requirements (perhaps especially performance-related requirements). But it's fully possible to specify a programming language that doesn't have UB. It is widely believed that safe Rust does not have UB (the RustBelt paper proves this for a slightly stripped-down version of the language).

Undefined Behaviour as usual

Posted Feb 24, 2024 13:49 UTC (Sat) by mb (subscriber, #50428) [Link] (4 responses)

What are you going to compute with your Rust program, if it doesn't include a single unsafe block?
You can't do I/O.

If you are going to rewrite and recompile your Rust program each time the input changes, then the Compiler is part of your program flow. The compiler is the "unsafe-block" in this case, which provides the input data. Without it, the Rust program can't process anything. It would be static.

Yes, we can have fully safe languages like Rust. But they must *always* interface to an unsafe part. Otherwise they can't produce results. Safe-Rust alone is useless. General purpose languages will always have an interface to unsafe code.

The real world is unsafe. The real world has UB.

Undefined Behaviour as usual

Posted Feb 25, 2024 0:55 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

To be fair if you're willing to give up generality you can have software with excellent performance and no undefined behaviour. That's the point of WUFFS. We can and should write a lot more software in such languages (starting with, we should use WUFFS everywhere it's appropriate)

Generality is of course a price we cannot often afford. You wouldn't write the WUFFS compiler in WUFFS (the current transpiler is in Go) or an operating system, or indeed a web browser but the point is that our industry got into the habit of using chainsaws everywhere because they're so powerful, rather than using the right tool for the job.

Undefined Behaviour as usual

Posted Feb 26, 2024 7:50 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

In principle, it is possible to compose a complete, usable system entirely out of a combination of:

* "Safe" code (where the compiler will not let you write code that contains UB).
* "Verified" code (where the compiler will let you write whatever you want, but you also write a machine-verified proof that no UB will actually occur).

But then this is just a matter of definitions - You can set up your build system in such a way that it will refuse to compile unsafe code without a valid proof of soundness. Then you can consider the proof to be part of the source code, and now your unsafe language, plus the theorem proving language, together function as a safe language.

Formally verified microkernels already exist. The sticking point, to my understanding, is the lack (or at least incompleteness) of a verified-or-safe userland.

Undefined Behaviour as usual

Posted Feb 26, 2024 10:02 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

The other sticking point is the difficulty of formal verification using current techniques. Formally verifying seL4 took about 20 person-years to verify code that took 2 person-years to write.

Undefined Behaviour as usual

Posted Feb 26, 2024 10:12 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

The thing to bear in mind is that, technically, the type checking in a safe language is also a valid proof of safety (or else it would not type check). So every safe Rust program (and every Java program, etc.) is verifiably safe, assuming the type system is sound.*

The question is, is that good enough? Most formal verification wants to prove more than "mere" type safety or soundness, and that tends to be hard because the property you're trying to prove is complex and highly specific to the individual application. But if you just want to prove a lack of UB, that's probably more feasible.

* There are soundness bugs in the current implementation of Rust. Most of them are rather hard to trigger accidentally, but they do exist.

Undefined Behaviour as usual

Posted Feb 29, 2024 9:01 UTC (Thu) by anton (subscriber, #25547) [Link]

Compilers exploiting UB happens all the time. It is the base of all optimizations

Nonsense! Compilers don't need to assume that the program does not exercise undefined behaviour in order to, e.g., optimize 1+1 into 2. Actually, assuming that the most troublesome undefined behaviours (e.g., signed overflow or whatever is behind strict aliasing) do not happen has little performance impact.

Undefined Behaviour as usual

Posted Feb 23, 2024 19:34 UTC (Fri) by geofft (subscriber, #59789) [Link] (29 responses)

I think the traditional description of undefined behavior as "demons fly out of your nose" has done a disservice to understanding it. (After all, if the generated code had access to either demons or your nose, there would be a security vulnerability somewhere in granting that power to your userspace account in the first place. :) )

The point of undefined behavior is not that the compiler is allowed to be lawful-evil about how it interpreters your code, and so you have to be paranoid about what it might do. The point is that an optimizing compiler is permitted to assume that you are writing reasonable code that does not require the compiler to be paranoid about what you meant, and so it can make reasonable optimizations on real-world code. And every compiler that people actually use is optimizing. (There is a loose conceptual connection here with speculative execution vulnerabilities: you can avoid them with a non-speculating CPU, but nobody seems to be buying those.)

The code behind CVE-2023-52071 is actually a pretty good example of this:

WCHAR prefix[3] = {0};
if (something) {
    DEBUGASSERT(prefix[3] == L'\0');
    more stuff;
}

While it's pretty obvious to a human reader in context that this is just a typo, it's probably harder for a compiler to distinguish this from, say,

#ifdef LINUX
#define NUMBER_OF_THINGS 10
#else
#define NUMBER_OF_THINGS 5
#endif

thing_t things[NUMBER_OF_THINGS] = {0};
if (some_function_in_another_file_that_only_succeeds_on_linux()) {
    thing[7] = something;
    more stuff;
}

which, as long as some_function_in_another_file_that_only_succeeds_on_linux() does what it says, never actually invokes undefined behavior. The compiler can notice that the assignment is undefined in the non-Linux case, and instead of doing something villainous, it can do something useful, i.e., assume that the assignment statement cannot be reached and dead-code-eliminate it and everything after it until the closing brace - and then dead-code eliminate the function call because there's nothing left.

However, in the actual case in the curl source code, the dead-code elimination is actually pretty bad! You do really want that code to execute; the coder's intention was not that the block was skippable. The compiler can do the exact same "useful" action and get you a pretty negative result: the curl command does no output (I think), but it's returning success anyway. It's not far-fetched to imagine that in turn leading to unexpected data loss. The compiler does not need to be actively evil to cause a real problem.

(Note that what's happening isn't that the compiler is doing something in response to undefined behavior being invoked. The compiler is simply not doing something on the assumption that undefined behavior is never invoked; specifically, it just doesn't compile the block. No real-world compiler has any interest in inserting code to do something weird that it wouldn't otherwise insert. But even so, optimizing out something that shouldn't have been optimized can cause problems - impact not intent and all that.)

Signed overflow being undefined behavior is a little bit silly because a well-intentioned optimizing compiler will only use that optimization for one purpose: to emit whatever the underlying CPU's most efficient arithmetic instructions are to handle the non-overflowing case. On essentially every modern CPU, that's two's-complement wrapping operations, but the historic existence of other CPUs means that the standard wanted to allow optimizing compilers to have a chance on those platforms too. Today it would be reasonable to make it no longer undefined behavior. All the other types of undefined behavior are undefined because there are reasonable optimizations that users actually want their compilers to do. Strict aliasing means that a loop that reads an array behind a pointer doesn't have to reread the pointer each time through, just in case something else in the loop changed it. Data races are undefined so that compilers don't have to use atomic operations or lock prefixes for everything. Buffer overflows are undefined so that there aren't bounds checks inserted everywhere. And so forth.

Undefined Behaviour as usual

Posted Feb 23, 2024 20:04 UTC (Fri) by Wol (subscriber, #4433) [Link] (3 responses)

> And every compiler that people actually use is optimizing.

And this is an untrue generalisation :-)

Admittedly I don't use it much, but a lot of people make great use of it - I guess many of the DataBASIC compilers are not optimising. I know OpenQM isn't. The guy who wrote it is documented as saying the extra complexity involved wasn't worth the candle.

Okay, it's probably still vulnerable to optimisation, because DataBASIC compiles to high-level p-code, which is then processed by an interpreter written in C ... but that ain't an optimising compiler.

Cheers,
Wol

Off-topic

Posted Feb 23, 2024 20:37 UTC (Fri) by geofft (subscriber, #59789) [Link] (1 responses)

LWN really needs to implement downvotes / flagging, so when someone says "This comment you are making about C compilers, in a thread entitled 'undefined behavior,' is possibly untrue for this one thing that isn't a C compiler and is for a language that does not have undefined behavior," it doesn't permanently stain the comment thread.

Off-topic

Posted Feb 24, 2024 5:12 UTC (Sat) by willy (subscriber, #9762) [Link]

Oh, I thought LWN's flag for"This comment contains no content of value" was to set the author to "Wol"

Undefined Behaviour as usual

Posted Feb 24, 2024 7:53 UTC (Sat) by jem (subscriber, #24231) [Link]

How do you define "optimizing compiler"? Can you tell by comparing the source code and the machine code that the latter was produced by an optimizing compiler or not? Do you have a list of patterns or traits that can be found in the machine code that determine that an optimizing compiler was used to produce the code?

Or, if the definition of a non-optimizing compiler is that the binary code is a series of fragments that can clearly be used to identify the corresponding parts of the source code, how on earth are you going to formalize this definition?

Undefined Behaviour as usual

Posted Feb 23, 2024 21:26 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

> Signed overflow being undefined behavior is a little bit silly because a well-intentioned optimizing compiler will only use that optimization for one purpose: to emit whatever the underlying CPU's most efficient arithmetic instructions are to handle the non-overflowing case.

This does seem to be the expectation of many C++ programmers and I'd assume also C programmers.

It's wrong though. Here's a very easy example, the compiler just constant folded your arithmetic overflow out of existence... https://godbolt.org/z/v4evh3eEG

Undefined Behaviour as usual

Posted Feb 26, 2024 14:01 UTC (Mon) by error27 (subscriber, #8346) [Link]

Doing nothing is the definition of the fastest, most efficient way to handle undefined behavior. It's the most typical response as well.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:44 UTC (Fri) by khim (subscriber, #9252) [Link]

Sigh. I wonder if “we code for the hardware” guys would ever learn that “well-intentioned optimizing compiler” is an oxymoron, it just simply couldn't exist and doesn't exist. Compiler couldn't have intentions, well-intentions or ill-intentions. It's just simply basis of the whole compiler theory.

That discussion happened more than decade ago and it's still relevant. And if you think that gcc have, suddenly, become “well-intentioned” simply because GCC12 or GCC13 don't turn that particular example into pile of goo then you are sorely mistaken: it's only because these have learned to [ab]use SIMD instructions in -O2 mode. Take that away and we are hack to square one.

At this point we should stop pretending C and C++ are salvageable because trouble with them is social, not technical: even after decades of discussions “we code for the hardware” crowd is not ready to give up on their dream of “well-intentioned” compiler while compiler makers are not even trying to discuss changes in the language which may help these people to produce working programs.

Undefined Behaviour as usual

Posted Feb 24, 2024 1:08 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (20 responses)

I've replied more specifically about signed overflow because at this point it's a trope.

More generally though over the past say five years I've become increasingly comfortable with the "demons fly out of your nose" characterisation despite the fact that yes, technically that specifically won't happen (because demons aren't real). The characterisation is appropriate because it inculcates the appropriate level of caution, whereas the "It will never do anything unreasonable" guidance you prefer reassures C and C++ programmers that they're safe enough when in fact they're in constant danger and _should_ be extremely cautious.

There's an Alisdair Meredith talk which I can't find right now where Alisdair confidently explains that your C++ program cannot delete all the files on a customer's hard disk unless you wrote code to delete all their files - He argues that while sure as a result of UB it might unexpectedly branch to code you wrote that shouldn't normally run, or pass different parameters than you expected; it cannot ever just do arbitrary stuff. This is of course completely untrue, I'm guessing every LWN reader can see why -- but it does make it easier to feel relaxed about having mountains of crappy C++ code. Sure it has "Undefined Behaviour" but that probably just means it will give the wrong answers for certain inputs, right?

If every C and C++ programmer had the "Ralph in danger" meme image on a poster over their desk I'd feel like at least we're on the same page about the situation and they've just got a different appetite for risk. But that's not the world we have.

Undefined Behaviour as usual

Posted Feb 24, 2024 1:23 UTC (Sat) by pizza (subscriber, #46) [Link] (19 responses)

> He argues that while sure as a result of UB it might unexpectedly branch to code you wrote that shouldn't normally run, or pass different parameters than you expected; it cannot ever just do arbitrary stuff. This is of course completely untrue,

No, it is categorically true, because "undefined" does not mean "arbitrary and unbounded"

Using your logic, triggering UB means the computer could respond by literally exploding. Ditto if your computer gets hit with a cosmic ray.

If you argue that "no, the computer can't do that because nodody built explosives into it", why can't that argument also be applied to UB arbitrarily deleting your files instead? Sure, both are _possible_ in the sense that "anything is possible" but you're far more likely to have your car hit by a train, an airplane, and a piece of the International Space Station... simultaneously.

Undefined Behaviour as usual

Posted Feb 24, 2024 2:03 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (18 responses)

I wasn't expecting I'd have to explain this here, but I guess I have to be disappointed.

For the file deletion situation the usual way this comes up is that bad guys hijack a program (whatever its purpose may have been) to execute arbitrary code (not something it was intended to do, but ultimately not hard to achieve in some UB scenarios as numerous incidents have demonstrated). Then they do whatever they like, which in some cases may include deleting your files (perhaps after having preserved an encrypted "backup" they can sell to you).

Undefined Behaviour as usual

Posted Feb 24, 2024 3:12 UTC (Sat) by pizza (subscriber, #46) [Link] (17 responses)

>Then they do whatever they like, which in some cases may include deleting your files (perhaps after having preserved an encrypted "backup" they can sell to you).

Seriously? Calling that the consequence of "undefined behaviour" is beyond farcical, as the _computer operator_ is *deliberately choosing* to delete files.

Just because the operator is unauthorized doesn't make them not the operator.

And "undefined behaviour" is not a requirement for, nor does it necessarily lead to, arbitrary command execution.

Undefined Behaviour as usual

Posted Feb 25, 2024 18:28 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (16 responses)

Alisdair claims this can't happen. The fact is it does happen. You insist Alisdair is correct anyway and it can't happen.

Emotionally it's satisfying to insist that you're right and Mother Nature is wrong. But pragmatically the problem is that Mother Nature doesn't care how you feel about it

And it's going to keep happening until you stop doing the thing that doesn't work, even though you find that emotionally unsatisfying as an outcome.

Undefined Behaviour as usual

Posted Feb 25, 2024 23:39 UTC (Sun) by pizza (subscriber, #46) [Link] (15 responses)

> Alisdair claims this can't happen. The fact is it does happen. You insist Alisdair is correct anyway and it can't happen.

No, Alisdair and I both claim it can't happen *unless someone intentionally writes code to make it happen*

...It won't happen by pure happenstance. (Which even your contrived script kiddie example demonstrates)

> But pragmatically the problem is that Mother Nature doesn't care how you feel about it

Uh.. there is nothing "natural" about computer software or even computer hardware; they cannot operate in ways that exceed what they were designed to do. But that's neither here nor there; "Mother Nature" doesn't respond to unexpected stimulus in arbitrary ways either; nature has rules that governs how it functions. (Granted, we don't understand many/most of them, but that doesn't mean they don't exist.)

For example, reading an initialized memory cell yields "undefined" results. However, in reality (sorry, "Nature") the value will either be 1 or 0. It literally cannot be anything else, because the computer can only register a 1 or a 0 in response to that input -- you won't ever get a value of "0.661233123413" or "blue". So yes, it is "undefined" but it is *bounded*. What happens in repsonse to that? That depends on what that value is used for in the larger system.

Going back to the curl not-a-CVE, when the worst possible outcome is that the user gets access to one byte of data they already had access to, there is no path from that read to "nuke your filesystem" unless curl is being used within a system already designed to nuke your filesystem (or the OS or runtime or whatever was intentionally designed to nuke your filesystem) if you read-out-of-bounds.

Another way of looking at this is that sure, the contents of that extra byte is technically undefined, but so is every other byte in the HTTP response from the server -- including whether or not you get one at all. Similarly, what the server does as a result of you making that request is also undefined and largely outside your control. It could trigger thermonuclear war for all you know. But it won't trigger global thermonuclear war unless someone deliberately gave it those capabilities. In other words, undefined, but *bounded*.

Undefined Behaviour as usual

Posted Feb 26, 2024 6:40 UTC (Mon) by mb (subscriber, #50428) [Link] (11 responses)

>"Mother Nature" doesn't respond to unexpected stimulus in arbitrary ways either; nature has rules that governs how it functions.

That is not true.
https://en.wikipedia.org/wiki/Hardware_random_number_gene...

The rest of your post is also largely not true. But that has been explained often enough to you, so I won't repeat.

Undefined Behaviour as usual

Posted Feb 26, 2024 9:10 UTC (Mon) by geert (subscriber, #98403) [Link] (1 responses)

So it is OK for the optimizer to remove your random number generation code, as it is UB?

Undefined Behaviour as usual

Posted Feb 26, 2024 9:16 UTC (Mon) by mb (subscriber, #50428) [Link]

Yes, it would be totally correct.
Unless the hardware access is correctly marked as unsafe volatile memory access. Which of course is necessary for every access to the real world (a.k.a. hardware) outside of the language's virtual machine model.

Undefined Behaviour as usual

Posted Feb 27, 2024 20:24 UTC (Tue) by pizza (subscriber, #46) [Link] (8 responses)

You're once again conflating "undefined" with "unbounded", mixing in an unhealthy measure of probability.

A TRNG does not respond "arbitrarily"; it still can only operate within its design constraints, which of course includes the characteristics of the materials it was constructed from. And, while any given read of a TRNG is "undefined" the value is bounded, with each discrete value being equally probabilistic as long as it is used within its designed operating conditions. [1]

It will always return a value between 0.0 and 1.0. [2] It cannot return "Fred" or kill your cat unless you put it into a box with a vial of poison.

...And the physical phenomenon that the RNG is measuring also has to have bounds, or you'd not be able to detect it -- Certain stimuli can make these events more likely (yay, Fission!) but that's just a change in probabilities [3] The point being, they don't respond "arbitrarily". Your Pb isn't going to turn into Au because a butterfly flapped its wings halfway across the world. Either an atom decays or it doesn't. Either an electron crosses the P-N junction or it doesn't.

[1] Several $dayjobs ago, I helped design a TRNG, so I have a decent idea how they work... and when they fail.
[2] Strictly speaking it should also be able to return "Failure"
[3] Which is itself a predictable physical property of the materials, and is taken into account in the TRNG design.

Undefined Behaviour as usual

Posted Feb 27, 2024 20:46 UTC (Tue) by mb (subscriber, #50428) [Link] (7 responses)

>Either an atom decays or it doesn't. Either an electron crosses the P-N junction or it doesn't.

https://en.wikipedia.org/wiki/Double-slit_experiment

Undefined Behaviour as usual

Posted Feb 27, 2024 22:12 UTC (Tue) by pizza (subscriber, #46) [Link] (6 responses)

> https://en.wikipedia.org/wiki/Double-slit_experiment

How exactly does a high school physics experiment support your claim that "Mother Nature doesn't respond to unexpected stimulus in arbitrary ways either; nature has rules that governs how it functions" is "not true"? [1]

...This experiment shows that we are still trying to figure out what those rules are, not that they don't exist!

It certainly doesn't change the fact that while any given observation is unpredictable, the probabilities are. (eg you can't predict an the decay of an individual atom, but you can accurately predict the overall _rate_ of decay of a mole of them)

[1] https://lwn.net/Articles/963598/

Undefined Behaviour as usual

Posted Feb 28, 2024 10:20 UTC (Wed) by Wol (subscriber, #4433) [Link] (5 responses)

The double-slit experiment is a (-: classic :-) example of the (-: relational :-) phenomenon that observing the results of an experiment changes the results.

That's all, folks.

Cheers,
Wol

Undefined Behaviour as usual

Posted Feb 28, 2024 12:19 UTC (Wed) by Wol (subscriber, #4433) [Link]

Whoops - relative not relational ...

Cheers,
Wol

Undefined Behaviour as usual

Posted Feb 29, 2024 14:24 UTC (Thu) by pizza (subscriber, #46) [Link] (2 responses)

> The double-slit experiment is a (-: classic :-) example of the (-: relational :-) phenomenon that observing the results of an experiment changes the results.

Yeah, so? That doesn't demonstrate that Nature behaves arbitrarily and doesn't follow rules; it demonstrates that Nature's rules are a lot more complicated than we previously understood.

Undefined Behaviour as usual

Posted Feb 29, 2024 18:29 UTC (Thu) by mb (subscriber, #50428) [Link]

That is not true.
Laplace's demon is wrong.

https://en.wikipedia.org/wiki/Laplace%27s_demon

Nature has inherent randomness and undefined behavior.
The thinking that we just don't know all the rules is wrong.

Undefined Behaviour as usual

Posted Feb 29, 2024 21:08 UTC (Thu) by Wol (subscriber, #4433) [Link]

I think I muddled my relative and my quantum, but the thing is, as we understand Nature and Science at present, quantum is truly random.

We have classical physics, where everything follows rules and is deterministic.

We have relative, which iirc is the same.

And then we have quantum, where things happen at the micro level, but the rules only work at the macro level - we have no idea what (if any at all) the deterministic rules are. Especially as the main thing behind quantum seems to be the making of something out of nothing - if we have nothing there to start with, how can there be anything there to apply deterministic rules TO!?

So if quantum is basically nothing, surely it's reasonable to assume the quantum rules are nothing, too :-)

Cheers,
Wol

Undefined Behaviour as usual

Posted Feb 29, 2024 21:02 UTC (Thu) by rschroev (subscriber, #4164) [Link]

That's a quantum phenomenon, not a relativistic one.

Undefined Behaviour as usual

Posted Feb 26, 2024 7:40 UTC (Mon) by jem (subscriber, #24231) [Link] (2 responses)

>For example, reading an initialized memory cell yields "undefined" results. However, in reality (sorry, "Nature") the value will either be 1 or 0. It literally cannot be anything else, because the computer can only register a 1 or a 0 in response to that input -- you won't ever get a value of "" or "blue".

I can easily imagine a memory technology where reading an uninitialized memory cell produces the value 1, and could on the next read (still uninitialized) produce the value 0. If you repeat the process a sufficient number of times you could end up with a mean value of 0.661233123413.

Undefined Behaviour as usual

Posted Feb 26, 2024 11:20 UTC (Mon) by mpr22 (subscriber, #60784) [Link]

> I can easily imagine a memory technology where reading an uninitialized memory cell produces the value 1, and could on the next read (still uninitialized) produce the value 0.

Mmm, delicious sparkling bits. (The client was doing something ill-advised – can't remember whether it was power down or just hard reset – to the system while MLC NOR Flash was being programmed, which is admittedly something a bit worse than just "uninitialized memory".)

Undefined Behaviour as usual

Posted Feb 26, 2024 15:04 UTC (Mon) by farnz (subscriber, #17727) [Link]

Ordinary DRAM can work that way; the capacitor in the DRAM cell is in one of three states; 0, intermediate, and 1. Intermediate is read out as either 0 or 1, but whether intermediate can stay at intermediate, or ends up forced to 0 or 1 on read depends on details of the DRAM design. Some DRAM designs will force the cell to 0 or 1 state on the first read or refresh, others have a more stochastic process where the cell can stay at intermediate until it's written (which sets the state to either 0 or 1, unconditionally), but may stabilise randomly.

And because this is outside the DRAM specifications (normally - you can get slightly more expensive DRAM ICs which guarantee read stability even without writes), different batches of the same IC may have different behaviours. In practice, you need special cases to observe this, since every refresh cycle counts as a read for the purposes of stabilizing the value, and any cell that's been written to is also stable.

As a result, you need to be depending on reads being stable even if the cell hasn't been written yet, and be reading from the DRAM shortly after it's been powered up, before there's been enough refresh cycles to stabilize its value, and using DRAM that can take several read or refresh cycles to stabilize the cell values. The first is almost certainly a lurking bug in your code (it was in the case I hit, it just took a long time to find and fix it, and the "quick" fix was to buy more expensive DRAM that guaranteed stability while we hunted down the software bug), the second pretty much requires you to be running code directly from flash or ROM, not booting the way a PC does (since the boot sequence takes long enough that you've had many 64ms or shorter refresh cycles during boot), and the third requires you to be unlucky with the specific DRAM ICs you buy.

Undefined Behaviour as usual

Posted Feb 24, 2024 11:28 UTC (Sat) by hsivonen (subscriber, #91034) [Link]

This is not accurate. Compilers use the signed integer overflow UB for assuming that it doesn’t happen, which permits mathematical reasoning that’s valid in non-modular integer domain.

Google uses int for loop counters, and they seem to want the optimizations that arise from signed overflow being UB and, therefore, assumed not happening by the compiler.

https://google.github.io/styleguide/cppguide.html#Integer...

Undefined Behaviour as usual

Posted Feb 24, 2024 18:26 UTC (Sat) by faramir (subscriber, #2327) [Link]

Not sure if this is an "insane villain" level...

https://www.reddit.com/r/roguelikedev/comments/ytlw2f/a_b...

"Then people first started using the #pragma directive in C, its behaviour was implementation-defined. Version 1.34 of gcc, released around 1988/89, mischievously defined the behaviour as "try running hack, then rogue, then Emacs' version of Towers of Hanoi". i couldn't find 1.34 in the gcc archives, but gcc-1.35.tar.bz2 concedes that the directive might in fact be useful:"

Undefined Behaviour as usual

Posted Feb 23, 2024 17:37 UTC (Fri) by mgb (guest, #3226) [Link] (13 responses)

Not 100% facetiously: Maybe a CVE should be filed against Undefined Behavior in the C standard instead.

Undefined Behaviour as usual

Posted Feb 23, 2024 18:05 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (12 responses)

To be honest, it should be against compilers that abuse the original UB that was mostly "depending on your CPU it will be different" and that turned it into "haha here's a potential though unlikely UB, let's just drop this entire code block that protects against a vulnerability so that we can appear faster than competition in benchmarks".

UB makes sense for signed integer overflows that might return -x or ~x depending on the machine's twos complement. I.e. "we trust that you don't care about the result past this limit". It makes sense for shifts with high bits set, for the same reason (some CPUs will zero the output, others will mask higher bits), etc. The absurd approach of "the program might burn your house" has been causing a lot of harm to a ton of perfectly behaving programs on their usual platforms.

Sure, some optimizations are based on this. But instead of UB they could have presented the known possible outputs (e.g: what acceptable values will x+1 have). It would have permitted the same optimizations without purposely ruining valid tests. And actually, compilers could agree on this without even revisiting the standard.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:53 UTC (Fri) by khim (subscriber, #9252) [Link] (11 responses)

> And actually, compilers could agree on this without even revisiting the standard.

They could, but what would it change? The real problem is not the fact that C and C++ have so many undefined behaviors but the fact that there are so many people with attitude if your language would have rules that I wouldn't like then I would just go and break them, anyway.

If C developers don't even contemplate the idea that they may, gasp, play by the rules, then what may change in the compiler ever achieve?

If you may, somehow, identify these people, and then kick them out of your community like Rust did then this exercise starts becoming useful, but I don't believe that may ever happen in C or C++ communities. “We code for the hardware” crowd is too entrenched and influential there.

Undefined Behaviour as usual

Posted Feb 24, 2024 8:03 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (7 responses)

> If C developers don't even contemplate the idea that they may, gasp, play by the rules

As a reminder, the language rules are not public, you have to pay to get them, or find one of the latest drafts. *That* is the biggest issue. Do you imagine I first learned about the UB status of signed integer wrapping something like 20 years after using C because all compilers were applying it as one would expect, i.e. like unsigned ? I think it's gcc 10 or so that first broke it and tons of programs with it.

I guess many C developers came from other languages and already had asm experience before going to C, and there's no such UB in asm, you know what you're doing and it makes no sense to suddenly pretend that the conversion to C will eliminate some code. But that's reality.

IMHO one thing Rust developers did well was to carefully enumerate (and learn so that they can tirelessly recite them) all UB of C and educate people about them. But I can assure you that the vast majority of C developers are not aware of even 10% of them because the language is made to let the developer express what he wants to do in a portable way and the compiler decides otherwise as some obscure rule allows it to, for reasons that initially had nothing to do with their purpose, but were abused for the sake of optimization.

Undefined Behaviour as usual

Posted Feb 24, 2024 10:57 UTC (Sat) by khim (subscriber, #9252) [Link] (6 responses)

> As a reminder, the language rules are not public, you have to pay to get them, or find one of the latest drafts. *That* is the biggest issue.

Do you really believe that? That's the craziest idea about UB that I have every had. Can you point out a single person who may genuinely claim “I wrote program with UB simply because I couldn't find the definition of the language?” and mean that?

Some use it as an excuse for ignorance, but I would argue that you can't trust anyone who couldn't find draft of the standard anyway: it's much easier to find eel.is link than to write a “hello world” program in any new language.

> I think it's gcc 10 or so that first broke it and tons of programs with it.

Seriously? First GCC 10 breaks programs in 2020, then I use time machine to go back into 2011 to start that discussion, then nix uses another time machine to go back into 2007 and start that one and yet another time machine is used to plant the evidence into gcc 2.95 which was released last century… don't you think that your explanation is starting to look a bit convoluted?

I, for one, don't remember ever jumping from 2020 into 2011 to plan the LWN discussion and I'm pretty sure no one used such a valuable artifact to alter GCC 2.95 behavior retroactively.

> I guess many C developers came from other languages and already had asm experience before going to C, and there's no such UB in asm

There are plenty of UB in asm. CGA snow is perfect example. Behavior of Z80 string manipulation instructions is also undefined (it works fine on some systems but stops memory refresh on some others which makes programs unportable). And I'm pretty sure older CPUs had some interesting UBs, too.

> it makes no sense to suddenly pretend that the conversion to C will eliminate some code. But that's reality.

No, the reality is much simpler: it's simply not possible to translate random code snippets from one language to another if both source and target languages have UB. And if you don't allow UBs in source language then certain things become simply flat out impossible.

It's not a coincidence that majority of widely used languages these days don't have UBs. It's hard to deal with languages that have UBs. It's also not a concidence that these same language are always not self-hosting ones. Because to have self-hosting language you need language with UB, you would hit the Rice theorem.

When people repeat, decade after decade that it doesn't make any sense to follow math law and how the fact that something is flat out impossible shouldn't stop other guys from delivering things that they want…

At some point you just have accept that these guys couldn't be changed, they could only be replaced.

> IMHO one thing Rust developers did well was to carefully enumerate (and learn so that they can tirelessly recite them) all UB of C and educate people about them.

That was side effect from the fact that Rust developers (at least some of them) come from math world and thus they are not “following the intuition”, they know some important things.

I don't think before they started poking at UBs that C/C++ compilers are using in optimizations anyone ever tried to catalogue them to see if language with these UB is even internally consistent. Rust guys used LLVM as basis and thus, naturally wanted to know what rules it obeys. And they have found that both sides go with “intuitively justified” rules. One side happily assumes that they can use hardware capabilities in a C or C++ program even if language doesn't permit that (which is nonsense), the other side uses “things that are probably correct” to perform optimizations (which may lead to funny results)… and every group postulates that they are right while “the other side” are villains hell-bent on destroying everything in the known universe.

> But I can assure you that the vast majority of C developers are not aware of even 10% of them because the language is made to let the developer express what he wants to do in a portable way

How that phrase even makes any sense? UBs exist precisely to “let the developer express what he wants to do in a portable way”! If you write code that triggers UB then your code is not portable! It's as simple as that!

This article explains how we have arrived at the point where C/C++ have been turned into “language unsuitable for any purpose” but it doesn't tell us what to do and how can it be made useful, again!

The sad thing is that it's not even that hard to understand why we have that fiasco, only most C and C++ tend to insist that they don't need to change anything in their behavior and compiler developers need to just go and give them ponies.

> compiler decides otherwise as some obscure rule allows it to

That's the fundamental disconnect. One side knows that compiler is dumb, insane entity — simply by definition: there are no organ in a compiler that could have made it sane! “Sane” compiler which may “decide” something are simply just not happening. As in: NONE of compilers that were EVER created had the ability to “descide”.

The only way a complicated multi-pass compiler may be written is if you create fixed in advance set of rules that program which is compiled follows and then ensure that all hundreds of passes in the compiler never break any invariants that are embedded in these rules.

That's fundamental limitation, we simply have no idea how to make compilers without fixing the rules of input language. It's not even clear if that's possible in principle.

> for reasons that initially had nothing to do with their purpose

“Reasons” is another thing that compilers couldn't use. Because, again, they lack an organ which may help them think about reasons. You probably may attach ChatGPT and then they would get such ability, but I'm not even sure it's good idea: this would just mean that compilers would sometimes understand your “reasons” and sometimes fail to do that.

Hardly an improvement: compilers today may be insane and unreasonable, but at least they are consistent and builder are “reproducible”, if you would add “sanity” and “reasoning” to compilers you would lose that.

> but were abused for the sake of optimization

Except they weren't “abused”. Compiler is, by definition, something that transforms source language into machine language. If you couldn't say what they program which you have on the input is supposed to do then you couldn't say whether transformation performed on it is correct or incorrect.

Some people actually tried to “help” these developers that complain about how “compiler breaks they programs” and asked them: Okay, standard doesn't say what all these non-C programs are supposed to do… but may you, as human, look on it and say how it's supposed to work?

And, of course, the answer was resounding no because without language definition every C user if free to invent their own ideas about how non-C program is supposed to behave! And they actually do invent their own ideas about that!

Of course writing the compiler which works with such non-definition of non-language is impossible.

You can not play basketball or football if players disagree about the rules of the game. And if half of players think they are playing one game and half of them think they are playing something entirely diffrent then nothing would ever work.

And, unfortunately, C (and, to lesser degree, C++) is precisely in such position today.

Undefined Behaviour as usual

Posted Feb 24, 2024 14:22 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (5 responses)

> > As a reminder, the language rules are not public, you have to pay to get them, or find one of the latest drafts. *That* is the biggest issue.

> Do you really believe that? That's the craziest idea about UB that I have every had. Can you point out a single person who may genuinely claim “I wrote program with UB simply because I couldn't find the definition of the language?” and mean that?

This has been the case for most of the C devs I know. Did you ever see a teacher at school enumerate all the UB ? No, they aren't aware either. Even some of the C books I used to read 35 years ago to try to learn the language by myself never made mention of such specificities. I was never informed of any such UB and everything I used to do used to work perfectly well and as expected for two decades.

The point is that historically the compiler authors considered that if your code does something undefined, you're on your own, and it's just the developers' job to make sure that the input domain doesn't cause UB. It has been this way for several decades, and developers took care of not causing overflows etc and all was fine. Then later it turned to the compiler trying to detect if some combinations of inputs could possibly trigger an UB, and if so, they would declare the whole block UB if the developer couldn't prove the input cannot be produced. This started to create tons of painful bugs and to make the code much harder to write correctly, since the language doesn't offer provisions for announcing input domains. Sometimes you would just like to be able to tell the compiler "trust me, this variable is not null" or "trust me this value is always between 13 and 27" and be done with it. But no, it will either produce some warnings that are irrelevant to your use case or even eliminate some code considering that it might burn your house so why compile it in the first place.

And the fact that the compiler silently eliminates some code without any option left to at least get a compile-time warning about such decisions is a real disaster.

Undefined Behaviour as usual

Posted Feb 24, 2024 15:35 UTC (Sat) by khim (subscriber, #9252) [Link]

> Even some of the C books I used to read 35 years ago to try to learn the language by myself never made mention of such specificities.

Why is that ever relevant? You couldn't do much using 35 years manuals today. Heck, most manuals that existed back then would have talked about completely different language which is not even accepted by many compilers today!

Times changing and people learn new things. What was perceived acceptable decades ago today is deemed “dangerous”.

We have learned how to do continuous integration, fuzzy testing and many other things. Importance of UB and the need to write programs which don't ever trigger UB was supposed to become realized on that road somewhere, too… only lots of C and C++ developers, somehow, happily ignore these changes even today.

> The point is that historically the compiler authors considered that if your code does something undefined, you're on your own, and it's just the developers' job to make sure that the input domain doesn't cause UB.

Nope. Let's take a look on the Turbo C manual. You can download it here. What does it say about the use of registers in assembler? This:

If there is a register declaration in a function, in-line assembly may use or change the value of the register variable by using 51 or 01, but the preferred method is to use the C symbol in case the internal implementation of register variables ever changes.

This doesn't, yet, elevates UB to the position where it is today (simply because it was assumed, back then, that you wouldn't ever use your program with different compilers and this may rely on behavior of the compiler), but it already talks about how to write “future-proof” code.

The only thing that changed in past three decades was change in the availability of new versions of the compilers. Today is it assumed that you write code for some version of language, not for a particular compiler.

> Then later it turned to the compiler trying to detect if some combinations of inputs could possibly trigger an UB

No. All optimisation's are always starting from deciding that certain “dumb and stupid code” would never be written. And said “dumb and stupid code” always included some for of UB. If you go back to that example that I have already shown you:

void foo(int* i) {
  i[1] = 42;
}

int bar() {
  int i = 3, j = 3, k = 3;
  foo(&i);
  return j + k;
}

int main() {
  printf("%d\n", bar());
}

This code is fine, according to the Turbo C manual (and it actually works with Turbo C), but it stopped working in last century already (would be interesting to know what compiler was first to break it)!

And nobody objected.

Maybe if people would have realized back then where this road would, eventually, lead… we would have lived in a different world now. Not sure if it would have been better or worse, but it would have been different for sure.

> This started to create tons of painful bugs and to make the code much harder to write correctly, since the language doesn't offer provisions for announcing input domains.

Well, standard doesn't offer anything, but GCC does.

> And the fact that the compiler silently eliminates some code without any option left to at least get a compile-time warning about such decisions is a real disaster.

No, the disaster is that assumption that compiler may offer such code. Look on that silly foo/bar code again. Most compilers released in last century would silently eliminate some code without any option left to at least get a compile-time warning from that example. But would you really want warnings there?

If compiler would issue warning for every change that it does that you would get million of warnings for thousand lines of code, most of them pointless. You don't want that.

What you do want are warnings for the “dangerous” transformations, but that's not possible, too, because, as an attempt to create friendly C have shown for every optimization which someone consider “dangerous” there are bunch of people who consider that exact same optimization “essential” (start with simplest case of shifting 32bit value by 33 and watch how group of C “we code for the hardware” guys would become divided, repeat 100 times and end up with as many definitions of C as you would have people involved, perhaps with even more definitions because if you ask the same question twice in different forms you would get more than one answer from “we code for the hardware” guys).

> Did you ever see a teacher at school enumerate all the UB ? No, they aren't aware either.

I recently helped friend with modern, C++17-based course. They haven't listed all possible UBs, but they have been talking about how UBs may affect your program in unpredictable ways and they have been talking about how you have to use asan/tsan/ubsan/etc to avoid UBs.

But that doesn't solve the issue of people who have learned C and/or C++ long ago and refuse to accept the fact that they have to avoid UB.

It's one of areas where things change in ages-old fashion: An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning.

But if that is how that whole stupid story would be resolved then we may, as well, change the language (Rust is a good candidate, but there are others), too: “the growing generation” doesn't yet have any investments in C or C++ and if the plan is to, gradually, replace developers and rewrite everything… it's just easier to track status of that work if new code is written in a new language, isn't it?

Undefined Behaviour as usual

Posted Feb 25, 2024 2:54 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (3 responses)

> Sometimes you would just like to be able to tell the compiler "trust me, this variable is not null" or "trust me this value is always between 13 and 27" and be done with it.

These seem extremely dangerous and I would not recommend them. Being able to insist that things are true when maybe they aren't is almost certainly going to make the things you complain of even worse. Prefer instead to help the compiler figure out for itself that variables must have the properties you desire.

For example Rust's NonNull<T> is a pointer that's not null. If you have a pointer, you can ask NonNull::new to give you an Option<NonNull<T>> instead, and it will either give you Some(NonNull<T>) or None accordingly.

WUFFS will let you express "I claim that value is between 13 and 27" in code, and check that indeed it can see why value is between 13 and 27 so you're correct. For example:

assert value < 27 via "a < b: a < c; c <= b"(c: foo)

That says, a human has promised you elsewhere (in a list of propositions humans have proved) that this obvious rule about comparison operators is true. Using that proposition, and the facts that foo is less than or equal to 27 and that value is less than foo, you can see that value is less than 27. Done.

Note that this won't compile if you're wrong (e.g. value isn't less than foo) unless you fed WUFFS a faulty proposition and then relied on that proposition, so, don't do that, if you need to prove tricky things hire a mathematician. And especially note that I wrote _compile_ there. This isn't a runtime problem, WUFFS rejects faulty programs at compile time.

Undefined Behaviour as usual

Posted Feb 25, 2024 10:35 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

> For example Rust's NonNull<T> is a pointer that's not null.

And yet you may use unsafe to forcibly shove null in it and then “bad thing happen”™.

You couldn't use your knowledge that NonNull<T> is just a 8byte piece of data in memory which accepts sequence of 64 zero bits just fine and use that to save some memory somewhere. This is just not guaranteed to work and people (and not the compiler!) would punish you if you would insist that it's just fine because you have unit tests and they are working.

That is the fundamental difference between C/C++ and Rust.

> Note that this won't compile if you're wrong (e.g. value isn't less than foo) unless you fed WUFFS a faulty proposition and then relied on that proposition, so, don't do that, if you need to prove tricky things hire a mathematician.

You have not have someone who kicks out the violators. Because every language is either safe, incomplete and non-self-hosting (and then you have some other language which is used to host that safe language) or unsafe, complete and, optionally, self-hosting.

And attempts to skip that duty of keeping violators in line just simply doesn't work: Java and PHP are memory safe languages, yet how many security breaches have people seen via programs written in these languages?

Compiler may help you to avoid stupid mistakes, but if you actively lie to the compiler then nothing may ever be truly reliable or secure.

Undefined Behaviour as usual

Posted Feb 26, 2024 2:47 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

I would characterise what you've described as the fundamental difference as Safety Culture, and I agree it's the essential difference at the heart of Rust, more so than a technical difference like their very different answer to the question implied by Rice's Theorem (when we can't tell if this is a valid program, what then? Rust's answer is that this program does not compile, the C++ answer is that it must compile anyway)

For WUFFS it's not quite the same in two crucial ways, firstly because it's not a general purpose language it can't have and won't have a large audience. So it can be completely picky, it would be fine if they don't want people who like Vim, or anybody whose name has too many vowels in it, no problem.

Secondly though their safety is much more fundamental to the language itself. You could take Rust's technology and build something completely unsafe with that. The Rust community wouldn't, but it's not a restriction of the technology. WUFFS can't really do that, the technology doesn't really do anything else, there's a lot of clever maths stuff going on there which is heading for only one destination, certainty of correctness. The only other extraordinary thing about WUFFS is performance, but even there it does so in a way that only makes sense if you assume correctness is mandatory. I don't believe you would build this technology for any other reason.

For both reasons I don't think there's any need to "kick out the violators". To such people WUFFS makes no sense, it's like worrying about whether the crows on your web cam accept the Terms of Service of your ISP. The crows don't even understand what Terms of Service are as an idea, much less accept these particular terms.

In your terms I'd guess you'd consider WUFFS "incomplete". That's fine, it's not supposed to be "complete" in this sense.

Undefined Behaviour as usual

Posted Feb 25, 2024 11:19 UTC (Sun) by Wol (subscriber, #4433) [Link]

> > Sometimes you would just like to be able to tell the compiler "trust me, this variable is not null" or "trust me this value is always between 13 and 27" and be done with it.

> These seem extremely dangerous and I would not recommend them. Being able to insist that things are true when maybe they aren't is almost certainly going to make the things you complain of even worse. Prefer instead to help the compiler figure out for itself that variables must have the properties you desire.

Not quite the same thing, but again to be able to say "the only valid values are between 13 and 27". Then the compiler goes "if I detect a value that isn't, that's an error". Or trigger a warning "cannot prove boundaries are met".

Cheers,
Wol

Undefined Behaviour as usual

Posted Feb 25, 2024 0:18 UTC (Sun) by laurent.pinchart (subscriber, #71290) [Link] (2 responses)

> If you may, somehow, identify these people, and then kick them out of your community like Rust did

Whoever you may be, I hope for the "Rust community" that you don't represent them here. "Kicking people out" when you disagree with them doesn't build welcoming communities, as is clearly explained in the article you linked. This whole thread is degenerating into a cockfight, it would be nice if everybody could take a deep breath.

Undefined Behaviour as usual

Posted Feb 25, 2024 1:50 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

The reason Rust has safety is that Rust has a safety culture. Safety isn't just a tick box, it's a principle. The technology is an enabler but you could do whatever you want with the technology, it is perfectly easy to make C++ style YOLO types which destroy all the safety properties you'd enjoy in the rest of the Rust ecosystem and if those are just fine somehow then why even have Rust ?

I wouldn't perhaps choose the phrase "kicking people out" but it's clearly not going to work to have people who don't agree with your safety culture in a community that specifically values this property. It's the old "tolerance of intolerance" problem. We cannot have shared values *and* welcome people who specifically disagree with those values, either those people aren't welcome or those aren't shared values after all.

Undefined Behaviour as usual

Posted Apr 2, 2024 19:16 UTC (Tue) by ssokolow (guest, #94568) [Link]

https://en.wikipedia.org/wiki/Paradox_of_tolerance

Undefined Behaviour as usual

Posted Feb 23, 2024 18:36 UTC (Fri) by geofft (subscriber, #59789) [Link] (9 responses)

You are correct about it being undefined behavior, and you are correct that a compiler can cause the program to behave arbitrarily in response to it, but I would argue it is still not a vulnerability.

A vulnerability is a misbehavior in a program that compromises its security in response to crafted input / malicious actions from an attacker. This requires a few things to exist. "Misbehavior" requires a coherent concept of intended behavior. "Attacker" and "crafted input" require a system that interacts with untrusted parties in some way. "Security" requires things that you're trying to keep secure (maintain confidentiality, integrity, and/or availability) from the attacker.

As an obvious example, the fact that bash executes arbitrary code is not an arbitrary code execution vulnerability. It's what bash is supposed to do. The thing that supplies input to bash is expected to be the legitimate user of bash, not an attacker. bash is supposed to be able to run any command, and so it doesn't have a distinction between behavior and misbehavior. If you can get arbitrary code execution in rbash, for instance, then yes, that's a vulnerability, because rbash is designed to take untrusted input and maintain security of the system it runs against that input. If you can run arbitrary commands by setting environment variables even if you can't control input, then there's probably a vulnerability (as Shellshock was). But for regular bash, if you are setting up a system where you're piping untrusted input into bash, that's a vulnerability in your system, not in bash.

The only way to distinguish that is to know at a human level what bash is supposed to do and where it's supposed to take inputs from. There is no automated technical distinction to be made between bash and rbash. There is no automated technical distinction to determine that commands like bash and python are only intended to be run with trusted input, but commands like grep and sort are supposed to be able to handle untrusted input. You can call this a "gut feeling" if you like, but it's inherent to how we use computers. We never run a program for the sake of running a program; we run a program in order to accomplish some task of human interest, and the parameters of that task, not the program, determine what counts as misbehavior and insecurity.

There is a simple argument that CVE-2020-19909 (note there is a typo in today's article, it's 19909, not 1909) is not a vulnerability. It is not that the undefined behavior doesn't exist, or that the risk of a compiler doing something arbitrarily weird and unwanted is low. It is entirely compatible with a compiler generating code to do system("rm -rf /") in the case of signed integer overflow. The argument is that attackers do not have access to set an arbitrary retry delay value, and any real-world system that uses curl where attackers do have this access has a more fundamental vulnerability - e.g., they can provide arbitrary arguments to the curl command line, or they already have code execution in an address space in which libcurl is linked. Even in the most limited case where the attacker can only specify this one value and nothing other than curl is imposing limits on the value they'd still be able to effectively request that curl should hang for 24 days on a 32-bit system or until well past the heat death of the universe on a 64-bit system, which is already a denial of service vulnerability, and fixing that would avoid hitting the integer overflow in curl. And if you, yourself, as the legitimate user of curl or libcurl provide a ludicrous value for the timeout, it cannot be said to be a vulnerability, because you are not an attacker of your own system.

How do we know if this claim is correct? You're right in a technical sense that you can't really know - it's always theoretically possible that someone wrote a web form that takes input from untrusted users and one of the fields is the retry vulnerability value But it's also equally theoretically possible that someone wrote a web form that takes input from untrusted users and splices it into a bash command line without escaping. And, in fact, it's not just theoretically possible, it's quite common. But nobody would say this means there's a vulnerability in bash, would they?

The process of determining what is or isn't a vulnerability has to come down to human judgment. You could plausibly argue, for instance, that Shellshock wasn't a vulnerability because attackers shouldn't be able to pass arbitrary environment variables into bash. But the real-world deployment of CGI meant that there was a large installed base of users where attackers were in fact able to set environment variables to arbitrary values. Moreover, it meant that humanity believed that it was a reasonable design to do that, and the flaw was not with CGI for setting those variables.

And it's not sufficient to just lean on documented behavior. First, would you consider it an adequate fix for the vulnerability if curl made no code changes but just changed the documentation to say that the input to the timeout value must be small enough to avoid integer overflow? But also there have actually been real-world vulnerabilities, that are unquestionably vulnerabilities in human judgment, that were documented. Log4shell comes to mind: the behavior of loading JNDI plugins in log statements was absolutely intentional, and the support in JNDI for loading plugins from arbitrary servers was also absolutely intentional. But the resulting behavior was so unreasonable that the Log4j authors did not argue "there is no deviation from the documented behavior" - which they could have argued with much more certainty than a gut feeling. Or consider the KASLR bypass from the other day: it isn't material whether the kernel developers intended to publish a world-readable file with post-ASLR addresses or not, it is still a security bug either way.

There is, simply, no way to determine what is or isn't a vulnerability without the involvement of human judgment. You can make a reasonable argument that the maintainers of the software are poorly incentivized to make accurate judgments, yes. But someone has to make the judgment.

(Also - I actually fully agree with you about CVE-2023-52071. The argument that it only applies in debug builds and not release builds is reasonable as far as it goes, but in my human judgment, it is totally reasonable to run debug builds in production while debugging things, and you're right that Daniel's claim that it can only possibly cause a crash is incorrect. Because the bad code deterministically does an out-of-bounds access, it's totally reasonable for the compiler to treat the code as unreachable and thus conclude the rest of the block is also unreachable, which can change the actual output of the curl command in a security-sensitive way. The compiler can tear out the whole if statement via dead-code elimination, or it can lay out something that isn't actually valid code in the true case, since it's allowed to assume the true case never gets hit. He's quite possibly right that no compiler actually does that today; he's wrong that it's reasonable to rely on this.)

Undefined Behaviour as usual

Posted Feb 23, 2024 19:54 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (8 responses)

> Because the bad code deterministically does an out-of-bounds access, it's totally reasonable for the compiler to treat the code as unreachable

How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.

The only semi-legitimate use case for such access is stack protector runtime test scripts (and even those should be done in assembler for 100% reliability).

> warning: array subscript 3 is above array bounds of ‘wchar_t[3]’ {aka ‘int[3]’} [-Warray-bounds=]

int f(void);
int main(int argc, char *argv[])
{
wchar_t prefix[3] = {0};
if (f()) {
assert(prefix[3] == L'\0');
}
return EXIT_SUCCESS;
}

Undefined Behaviour as usual

Posted Feb 23, 2024 21:16 UTC (Fri) by geofft (subscriber, #59789) [Link] (4 responses)

I gave a somewhat contrived example in this other comment. It is entirely possible that the OOB-ness of the access is conditional in some way, such as via preprocessor macros or code generation from some template, and the programmer knows that f() is not going to actually return true in the case where the access would be out of bounds.

Here's another example, though you might argue that it is also contrived. Suppose you have a binary format that stores numbers between 0 and 32767 in the following way: if the number is less than 128, store it in one byte, otherwise store it in two bytes big-endian and set the high bit.

inline int is_even(unsigned char *p) {
    if (p[0] & 0x80)
        return p[1] % 2 == 0;
    return p[0] % 2 == 0;
}

unsigned char FIFTEEN[] = {0x15};

if (is_even(FIFTEEN))
    printf("15 is even\n");

After inlining there's a line of code talking about FIFTEEN[1] which is out of bounds, inside an if statement, just like your example. The if statement doesn't succeed, so there's no UB, but you need to do some compile-time constant expression evaluation to conclude that, and it's pretty reasonable, I think, to have a compiler that supports inlining but does no compile-time arithmetic.

Undefined Behaviour as usual

Posted Feb 24, 2024 5:32 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (3 responses)

It is probably less work to just emit potentially UB access and let pagefault handler sort it out.

Undefined Behaviour as usual

Posted Feb 24, 2024 21:22 UTC (Sat) by jrtc27 (subscriber, #107748) [Link] (2 responses)

On a system with 4K pages, you have a 1 in 4096 chance that the OOB access is on a different page and thus *could* even generate a page fault. Let alone the fact that in a large program there will very likely be something on the next page anyway and so you still wouldn't get a page fault.

Undefined Behaviour as usual

Posted Feb 25, 2024 7:03 UTC (Sun) by adobriyan (subscriber, #30858) [Link] (1 responses)

Yes pagefault is not reliable test but so what.

Again, if compiler can 100% prove UB access it should refuse to compile.

If UB access cannot be proven then it should shut up and emit access on the grounds that maybe, just maybe, it doesn't know something.

Linus(?) once made an example that future very powerful gcc 42 LTOing whole kernel may observe that kernel never sets PTE dirty bit
and helpfully optimise away all reads of said bit. Which, of course, will break everything.

Undefined Behaviour as usual

Posted Feb 25, 2024 8:36 UTC (Sun) by mb (subscriber, #50428) [Link]

> Again, if compiler can 100% prove UB access it should refuse to compile.

A compiler cannot at the same time assume UB doesn't exist and refuse to compile if it does exist.

You have to decide on a subset of UB that you want to abort instead of assuming it doesn't exist.
Which kind of defeats the purpose of UB then. It's defined behavior then.

We *do* have languages that have a proper language subset without UB. Just use them.

Undefined Behaviour as usual

Posted Feb 23, 2024 22:58 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.

Such approach is incompatible with SPEC CPU 2006 which means none of compiler vendors would ever release such a compiler.

Undefined Behaviour as usual

Posted Feb 24, 2024 5:09 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (1 responses)

There should be translate.google.com but for programming languages.

Undefined Behaviour as usual

Posted Feb 24, 2024 10:57 UTC (Sat) by excors (subscriber, #95769) [Link]

Programming language translation is one use case where LLMs seem moderately useful. Sometimes they output complete garbage, and sometimes they give a subtly incorrect translation, but sometimes it'll be enough to let you get a basic understanding of an unfamiliar language. (Which is similar to Google Translate.)

Assuming you mean the Fortran code in the bug report, ChatGPT 3.5 (asked to translate it into C) interprets the loop as "for (int I = 1; I <= NAT; I++) {" and the WRITEs as e.g. "if (LAST && MASWRK) fprintf(IW, "%d %s %s %f %f %f %f\n", I, ANAM[I], BNAM[I], PM, QM, PL, QL);", which seems plausible. Though it does fail to mention that the Fortran array indexing starts at 1, while C starts at 0, which is a potentially important detail when talking about bounds-checking.

(But that code seems largely irrelevant to the actual bug, which is that the benchmark illegally declares an array to have size 1 in one source file and 80M in another source file, and GCC optimised it by treating the 1 as a meaningful bound, and SPEC won't fix their buggy benchmarks because they "represent common code in use" and because any changes "might affect performance neutrality" (https://www.spec.org/cpu2006/Docs/faq.html#Run.05), so the compiler developers have to work around it.)

Undefined Behaviour as usual

Posted Mar 22, 2024 18:56 UTC (Fri) by DanilaBerezin (guest, #168271) [Link] (4 responses)

No. Undefined behavior does not mean "your software will literally do anything." It means that the standard does not define the behavior. It does not mean that an implementation is not allowed to define the behavior. Or that another standard like POSIX can't define the behavior. All it means is that a conformant implementation is not guaranteed to define behavior when a program has UB. That's it.

If "good" software written in C should unilaterally and universally avoid UB then a lot of software you rely on would simply be impossible to. The family of dlopen, dlsym, dlclose functions, for example, relies on casting a function pointer to a `void *`. This is undefined behavior, but nonetheless it is impossible to implement that functionality without eventually performing such a cast. So what is the solution? POSIX defines the behavior of casting a function pointer to a "void *" where the C standard still leaves it undefined. As another example, the Linux kernel is littered with undefined behavior. There are countless violations of the strict aliasing rule when it does a performant checksum by casting some arbitrary pointer to a "u32 *" or when it implements some sort of memory re-use (e.g. pooling, memory allocators etc.). Ironically, it also is littered with countless examples of signed overflow, because they serve an intended purpose see: https://lwn.net/Articles/959189/. The Linux kernel avoids the UB in these cases by supplying the options `-fno-strict-aliasing` and `fno-strict-overflow` to GCC, which tell it to define the behavior in the case that violations of strict aliasing occur or signed overflow occur respectively.

Now, I don't know how Curl is compiled (I don't have the time to build it from source and analyze the makefile, but feel free to try and prove me wrong), but if it is compiled with those options, signed overflow is simply and deterministically not a problem. In the worst case, it presents itself as a logic error where an integer will unintentionally wrap around. Now the case for the out-of-bounds access is certainly tougher to justify. It is impossible to define behavior in the case of an out-of-bounds access since that behavior is undefined on the architectural level. It's certainly a bug and one that should be fixed, but at the same time, I can also see why Daniel would be reluctant to call it a security issue. It *may* be a security issue, but then again so can any other line of code. I don't think it makes sense marking things as security issues when there isn't a proven way to exploit it.

Undefined Behaviour as usual

Posted Mar 23, 2024 0:57 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> If "good" software written in C should unilaterally and universally avoid UB then a lot of software you rely on would simply be impossible to. The family of dlopen, dlsym, dlclose functions, for example, relies on casting a function pointer to a `void *`

Casting function pointers to "void*" or vice versa is not a UB. It's only UB if the pointer types have different alignment requirements.

Undefined Behaviour as usual

Posted Mar 23, 2024 1:04 UTC (Sat) by DanilaBerezin (guest, #168271) [Link] (1 responses)

Nope. This is a very common misconception, but the standard only defines casts to "void *" for pointers to data types, not function types. This is because functions can have different pointer sizes and even a different memory address space entirely than pointers to data types depending on architecture. See this: https://stackoverflow.com/questions/5579835/c-function-po...

This just shows how abstract the C memory model is. A lot of things that would otherwise not make any sense to make UB due to dominance of modern PC systems, are made UB solely for the purpose of being able to a wide variety of obscure architectures.

Undefined Behaviour as usual

Posted Mar 23, 2024 13:35 UTC (Sat) by kleptog (subscriber, #1183) [Link]

I ran into this with microcontrollers, the ATmega328p. You have a code memory space and data memory space and you cannot cast pointers to one to pointers in the other.

The large memory models in MS-DOS did similar. The code segment and the data segment did not overlap. You needed far-pointers, which included the segment, be able to pass references around.

For PCs with a lot of memory it's feasible to have a unified address space, but micro-controllers where every byte counts it's different. It's perfectly fair for the C standard to consider it undefined behaviour, while POSIX explicitly defines it.

Undefined Behaviour as usual

Posted Mar 25, 2024 9:17 UTC (Mon) by farnz (subscriber, #17727) [Link]

It's UB in Standard C; POSIX defines it the way you state, but Standard C intends to support Harvard Architecture systems where void * is not large enough to store a function pointer.

This relates to one of the less annoying misconceptions about UB; while the ISO Standard may leave something as UB, or explicitly state that it's UB, it is perfectly permissible for downstream standards and implementations to define something that's UB in Standard C. So, for example, instead of getting Standard C to fix the shifting behaviour, we could (at least in theory) have made it a POSIX obligation on both C and C++ compilers, and required compilers to either drop POSIX support or do the sensible shift behaviour defined in C++20.

Stenberg: DISPUTED, not REJECTED

Posted Feb 23, 2024 19:56 UTC (Fri) by ewen (subscriber, #4772) [Link] (2 responses)

As far as I can tell the curl project did succeed in getting this “CVE” (CVE-2023-52071) rejected, although their statement about it (linked in the original post) doesn’t seem to have been updated to reflect that new status yet. Both of:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-5...
https://nvd.nist.gov/vuln/detail/CVE-2023-52071

show “REJECTED” status for me now (and minimal details on what the original claimed vulnerability was in the change history). It seems to have changed earlier today.

So it seems curl becoming a CNA did give them some additional leverage to reject “CVE” reports they thought were bogus.

But the curl project and others are right that the CVE system is pretty broken. And basically designed for the opposite of today’s situation, ie designed to track good faith reports against vendors unwilling to fix real problems. Sadly there’s a lot of momentum behind the CVE system now, so I don’t see it being fixed any time soon. And all vendors just being able to arbitrarily reject any report without documentation is itself vulnerable to abuse too.

Ewen

Stenberg: DISPUTED, not REJECTED

Posted Feb 23, 2024 21:34 UTC (Fri) by bagder (guest, #38414) [Link]

Thanks, I had not been told about this. I have now updated the blog post with a mention of this fact. Wow. I did not expect it to happen (this quickly).

Stenberg: DISPUTED, not REJECTED

Posted Feb 26, 2024 14:19 UTC (Mon) by hkario (subscriber, #94864) [Link]

> But the curl project and others are right that the CVE system is pretty broken. And basically designed for the opposite of today’s situation, ie designed to track good faith reports against vendors unwilling to fix real problems.

those vendors haven't gone away, Microsoft is still operating very much in this way, and I'm sure there are plenty others