Stenberg: DISPUTED, not REJECTED
The Curl project has previously had problems with CVEs issued for things that are not security issues. On February 21, Daniel Stenberg wrote about the Curl project's most recent issue with the CVE system, saying:
I keep insisting that the CVE system is broken and that the database of existing CVEs hosted by MITRE (and imported into lots of other databases) is full of questionable content and plenty of downright lies. A primary explanation for us being in this ugly situation is that it is simply next to impossible to get rid of invalid CVEs.
Posted Feb 23, 2024 16:44 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (111 responses)
C says if we do signed arithmetic and there's an overflow, that's Undefined Behaviour. The compiler can transform the program in any way it sees fit so long as it preserves observable behaviour - all transforms of UB are valid, so the behaviour of the software in this case is in reality completely arbitrary. This "not a vulnerability" could cause absolutely anything to happen.
I have absolutely no doubt that this "not a vulnerability" distinction is not useful, and I suspect it's actively harmful. If your program has documented behaviour but it doesn't actually do what was documented it's not useful to pretend to know whether this deviation is or is not a "vulnerability". My guess is that in insisting that some cases are and some are not, you help bad guys circle the pieces of your system you're not actually defending...
I understand Daniel's instinct. But it's misdirected here. If the worst bug you had in 2020 was an integer overflow in command line parsing code that's not a bad year, take the win. Unfortunately it wasn't the worst Curl bug in 2020.
For whatever it's worth, this variable should have a Duration type but neither C nor its standard library provide one. And so everything flows from there. Why are we multiplying it by 1000? Because we have a retry wait in seconds but we want to calculate milliseconds. In C instead it's a signed "long" integer and of course those have Undefined Behaviour on overflow.
In the more recent CVE that Daniel mentions it's a different cause of Undefined Behaviour. In this case a classic one-past-the-end pointer dereference. But again, Daniel characterises it as fine, because in his mind the worst case is a crash, even though "Undefined Behaviour" is not in fact specified as "Don't worry, it's fine, worst case it will crash"...
Mostly the (presumably unintended) effect of this article was to make me think Curl is worse software and the people maintaining it have prioritised their personal feeling of self-worth and aren't too bothered whether the software is good.
Posted Feb 23, 2024 16:58 UTC (Fri)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
curl https:// some.domain / who_knows_what_this_is.sh | sh
and sometimes they are told to become root first. Compared to the possible negative consequences of that, other security issues with Curl are in the noise.
Posted Feb 23, 2024 17:19 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link]
Posted Feb 23, 2024 17:02 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link] (35 responses)
Why don't you trust the analysis the developer and maintainer of the project did on the impacts of these issues ?
Posted Feb 23, 2024 17:35 UTC (Fri)
by mb (subscriber, #50428)
[Link] (30 responses)
The code does not seem to be compiled with -fwrapv. (I can't find it in the sources)
One cannot rule out security implications without knowing the whole system in that case.
Curl should use -fwrapv or -ftrapv.
Posted Feb 23, 2024 18:12 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (24 responses)
Posted Feb 23, 2024 18:15 UTC (Fri)
by mb (subscriber, #50428)
[Link] (6 responses)
But neither can curl ensure there is no such system anywhere.
Posted Feb 23, 2024 18:25 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (5 responses)
Sometimes, you can do bad things (have bugs) without them turning into a security vulnerability in practice.
Posted Feb 23, 2024 18:35 UTC (Fri)
by mb (subscriber, #50428)
[Link] (2 responses)
No. That is not how security works.
> if you have e.g. int at 18 bits (presumably with 9-bit bytes),
What-about-ism.
>Since you seem to assume -fwrapv would nullify this bug
No. I didn't say that. I was talking about UB. Not whether the bug would still be a bug or not.
Posted Feb 27, 2024 11:02 UTC (Tue)
by JustABug (guest, #169930)
[Link]
Does this UB allow for priviledge escalation? Data expositon? What's the attack vector? User intentionally entering a stupid value?
If the user can run curl they can run rm -rf
What's the output? Program crash? Exploitable unintended behaviour? What's an abuse scenario?
The researcher filing the CVE needs to demonstrate their CVE isn't a nothing burger.
The only advantage I can think of for filing a CVE for every UB is ensuring the fix is backported. Using BS CVEs as a tool to get things backported is an abuse of the system to address the problem of selective backporting.
Posted Mar 22, 2024 18:58 UTC (Fri)
by DanilaBerezin (guest, #168271)
[Link]
That is how security works though? Any line of code has the potential to be exploited. If the mere possibility of an exploit is the bar we set to file a CVE, then I can mark every line of code in every project as a CVE. Obviously, that would be very silly.
Posted Feb 23, 2024 18:41 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link] (1 responses)
Posted Feb 24, 2024 0:19 UTC (Sat)
by mb (subscriber, #50428)
[Link]
Posted Feb 23, 2024 21:57 UTC (Fri)
by bagder (guest, #38414)
[Link] (16 responses)
I consider it a fairly important property that a security problem should be possible to actually happen on existing hardware where the code was built with existing compilers. If not, it actually is not a security problem and we don't do any service to the world by treating as such.
Posted Feb 23, 2024 22:08 UTC (Fri)
by mb (subscriber, #50428)
[Link] (15 responses)
That means you'll have to constantly re-assess old UB bugs with every new compiler and system release.
Posted Feb 24, 2024 1:03 UTC (Sat)
by jwarnica (subscriber, #27492)
[Link] (13 responses)
Posted Feb 24, 2024 11:53 UTC (Sat)
by mb (subscriber, #50428)
[Link] (12 responses)
UB bugs must be documented as such and must get fixed right away.
Posted Feb 24, 2024 12:07 UTC (Sat)
by jwarnica (subscriber, #27492)
[Link] (11 responses)
DB is DB in the sepc. It is further defined (possibly incorrectly) in the implementation.
System v+1 might change both of these scenarios.
The only difference if v+1 changes your application is how righteous your bug report to the system is.
Posted Feb 24, 2024 12:33 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
Correct.
Also: The defined-behavior check is being mostly done by the release tests of the system components.
> UB is UB in the spec. It is defined in the implementation.
That is not true.
You are talking about Implementation-Defined and just assume that UB is the same thing.
Posted Mar 1, 2024 15:48 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
The concrete behavior is non-deterministic, sure. However, the language spec can say things like "must treat the value as valid" even if its value is not what is expected (or even naïvely possible given the code). Instead, we have "if a data race happens, the bits in the memory must not be interpreted". The JVM has the former (you get *some* behavior of the code you wrote even if it's not what you expect). C has the latter (if the compiler notices a data race, it can instead assume it never happens and optimize accordingly; this may be "no behavior" after optimization passes complete). Note that C compilers don't stick in "if we notice UB at runtime, do_bad_thing()" code (unless requested via UBSAN and friends). Instead, their noticing of UB ends up affecting how optimizations treat the code which *then* transform it into unexpected logic flows based on the raw syntax of the code available.
In this sense, I think that overflowing on a command line argument parse is unlikely to have any real effect as the optimizer is just going to assume it stays in range and "means something". However, instead we have "the timeout parameter takes an integer, multiplication makes an integer…maybe it overflows, but who cares as 'it doesn't happen'". The code is the same whether overflow happens or not…it's just that the value is not interpreted as 1000*input but as 1000*input % UINT_MAX (or whatever type it is). Given the stratospheric values needed to overflow, I dare say that anyone flirting with these values already had a DoS on their plate in case the intended timeout actually ended up having to expire. It's only a real problem for those running with UBSAN on…but they're asking for DoS crashes by doing so anyways.
IMO, should UB be patched and guarded against? Yes. Does *this* instance warrant a CVE? No. `curl | sh` with bogus timeout parameters is more easily handled with replacing a URL with a download from malicious.site instead.
Posted Feb 27, 2024 11:21 UTC (Tue)
by danielthompson (subscriber, #97243)
[Link] (8 responses)
I think this confuses undefined behavior (e.g. reading past the end of an array) with implementation defined behavior (e.g. signess of char, size of int, etc). In some cases (such as reading past the end of an array whose length is not known at compile time) then compiler implementation cannot define what happens!
Signed integer overflow is interesting because it is an undefined behaviour (according to the C spec) and many implementations really do make it's behavior undefined. For example the behavior on overflow can and does change based on the optimization level. Nevertheless it is possible for an implementation to fully define what happens on overflow if the application is willing to accept missed optimization opportunities (hence -fwrapv).
It is also interesting because one of the common undefined behaviors linked to signed overflow is the removal of security bounds checks that (incorrectly) assumed integer overflow would wrap... and this certainly did lead to vulnerabilities.
However, despite the above, I'm unconvinced by the what-about-ism in the "every overflow is a ticking bomb" argument": the set of plausible optimizations on a time conversion during command line processing is relatively small.
Posted Feb 29, 2024 8:35 UTC (Thu)
by anton (subscriber, #25547)
[Link]
The removal of bounds checks is another such "optimization"; however, it is often based on the assumption that pointer arithmetic (not signed integer arithmetic) does not wrap around. Therefore, to avoid this kind of miscompilation, -fwrapv is insufficient. The better option is -fno-strict-overflow (which combines -fwrapv and -fwrapv-pointer).
Posted Feb 29, 2024 11:47 UTC (Thu)
by SLi (subscriber, #53131)
[Link] (6 responses)
While I'm all for making code robust and even moving to safer languages, I think most people are more interested in vulnerabilities in actual binaries running on actual computers than in ones where a theoretical, possibly evil compiler could read the spec and perform a theoretically valid compilation where the output does something bad.
Posted Feb 29, 2024 14:31 UTC (Thu)
by pizza (subscriber, #46)
[Link] (5 responses)
Yeah, that's my take on this too.
It's "undefined" in the spec, but the actual compiled binary (+runtime environment) exhibits highly consistent (albeit unexpected/unintended) behavior. After all, script kiddies couldn't exploit those bugs into a privilege escalation without the binary behaving in this implementation-specific manner.
Posted Feb 29, 2024 19:10 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (4 responses)
Am I right in thinking that you'd agree that it's the output binary that's usually deterministic in its environment (modulo things with clearly-defined non-determinism such as the RDRAND instruction, or runtime data races between multiple threads), and not the combination of compiler (including flags) and source code?
In other words, while this would be a nasty surprise, you wouldn't be surprised if recompiling the same UB-containing source resulted in a different binary, but you would be surprised if using the same binary in the same environment with the same inputs had a different output, unless it was doing something that is clearly specified to have non-deterministic behaviour by the hardware (like a data race, or RDRAND).
Posted Mar 1, 2024 1:37 UTC (Fri)
by pizza (subscriber, #46)
[Link] (3 responses)
Yes, I'd agree with this, and I don't think that it's a terribly controversial opinion.
> In other words, while this would be a nasty surprise, you wouldn't be surprised if recompiling the same UB-containing source resulted in a different binary
If you're recompiling with the same toolchain+options I'd expect the resulting binaries to behave damn near identically from one build to the next [1]. (Indeed, as software supply-chain attacks become more widespread, 100% binary reproducibility is a goal many folks are working towards)
> but you would be surprised if using the same binary in the same environment with the same inputs had a different output, unless it was doing something that is clearly specified to have non-deterministic behaviour by the hardware (like a data race, or RDRAND).
Yep. After all, the script kiddies wouldn't be able to do their thing unless a given binary on a given platform demonstrated pretty consistent behavior.
[1] The main difference would the linker possibly putting things in different places (especially if multiple build threads are involved) but that doesn't change the fundamental attack vector -- eg a buffer overflow that smashes your stack on one build (and/or platform) will still smash your stack on another, but since the binary layout is different, you'll likely need to adjust your attack payload to achieve the results you want. Similarly, data-leakage-type attacks (eg Heartbleed) usually rely on being able to repeat the attack with impunity until something "interesting" is eventually found.
Posted Mar 1, 2024 10:04 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (2 responses)
This is a potentially dangerous expectation in the presence of UB in the source code; there are optimizations that work by searching for a local maximum, and where for fully-defined code (even where it's unspecified behaviour, where there are multiple permissible outcomes), we know that there is only one maximum they can find. We use non-determinism in that search to speed it up, and for UB we run into the problem that there can be multiple maxima, all of which are locally the best option.
Because the search is non-deterministic, exactly which maximum we end up in for some UB cases is also non-deterministic. This does mean that 100% binary reproducibility has the nice side-effect of wanting to reduce UB - by removing UB, you make the search type optimizations find the one and only one optimal stable state every time, instead of choosing a different one each time).
And I'd agree that it's not terribly controversial to believe that a binary running in user mode has no UB - there's still non-deterministic behaviour (like the result of a data race between threads, or the output of RDRAND), and if your binary's behaviour is defined by something non-deterministic, it could end up in what the standards call unspecified behaviour. This is not universally true once you're in supervisor mode (since you can do things on some platforms like change power rails to be out-of-spec, which results in CPUs having UB since the logic no longer behaves digitally, and thus it's possible for software UB to turn into binary defined behaviour of changing platform state such that the platform's behaviour is now unpredictable).
Posted Mar 1, 2024 14:48 UTC (Fri)
by pizza (subscriber, #46)
[Link] (1 responses)
FWIW I've seen my fair share of shenanigans caused by nondeterminsitic compiler/linking behavior. To this day, there's one family of targets in the CI system that yields a final binary that varies by about 3KB from one build to the next depending on which nonidentical build host was used to cross-compile it. I suspect that is entirely due to the number of cores used in the highly parallel build; I've never seen any variance from back-to-back builds on the same machine (binaries are identical except for the intentionally-embedded buildinfo)
But I do understand what you're saying, and even agree -- but IME compilers are already very capable of loudly warning about the UB scenarios that can trigger what you described. Of course, folks are free to ignore/disable warnings, but I have no professional sympathy for them, or the consequences.
I've spent most of my career working in bare-metal/supervisory land, and yeah, even a off-by-one *read* could have some nasty consequences depending on which bus address that happens to hit. OTOH, while the behavior of that off-by-one read is truly unknowable from C's perspective, if the developer "corrects" the bug by incrementing the array size by one (therefore making it too large for the hw resource) the program is now "correct" from C's perspective, but will trigger the same nasty consequences on the actual hardware.
Posted Mar 1, 2024 16:34 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
I spend a lot of time in the land of qualified compilers, where the compiler promises that as long as you stick to the subset of the language that's qualified, you can look only at the behaviours evident in the source code to determine what the binary will do. You're expected, if you're working in this land, to have proper code review and separate code audit processes so that a fix of the type you describe never makes it to production, since it's obvious from the source code that the program, while "correct" from C's perspective, is incorrect from a higher level perspective.
And a lot of the problems I see with the way UB is handled feel like people expect all compilers to behave like qualified compilers, not just on a subset of the language, but on everything, including UB.
Posted Feb 26, 2024 13:16 UTC (Mon)
by bagder (guest, #38414)
[Link]
An UB is a bug that *might* have a security impact. We do not make the world a better place by blindly assuming every UB is a security vulnerability. That's just like crying wolf. It is not helpful. Proper security assessment should still be applied.
Since they are bugs that should be fixed, we don't have to "come back" later to reassess their security impact. Unless we reintroduce the bugs of course.
Those are guidelines I adhere to in the projects I work in.
Posted Feb 24, 2024 5:57 UTC (Sat)
by jmspeex (subscriber, #51639)
[Link] (3 responses)
Correct. But then that is also true of most bugs you find in software. Based on that alone almost all bugs should be filed at the highest severity, which isn't exactly helpful. Any UB should get fixed, but when UB gets discovered in a piece of code, someone has to make a reasonable effort to figure out how bad the impact is likely to be. Write out of bounds on the stack: very very bad. Integer wrap-around in a loop counter: could be bad unless careful analysis shows it's unlikely to be. Integer overflow in the value that only gets printed to a log: much less likely to be exploited unless proven otherwise.
Posted Feb 24, 2024 6:35 UTC (Sat)
by Otus (subscriber, #67685)
[Link] (2 responses)
Having a CVE doesn't imply highest severity, even low severity vulnerabilities are meant to have one. Severity analysis is a separate matter.
Posted Feb 24, 2024 6:39 UTC (Sat)
by jmspeex (subscriber, #51639)
[Link] (1 responses)
Posted Feb 24, 2024 11:30 UTC (Sat)
by Otus (subscriber, #67685)
[Link]
I don't really know what the correct severity would've been here, but the severity part has always been black magic. (I don't think those are particularly useful in practice.)
My point is simply that CVE isn't supposed to be exclusively for highest impact issues, but any vulnerabilities.
Posted Feb 29, 2024 8:49 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Yes, unless you are compiling a benchmark, compiling with -fno-strict-overflow is a good idea. This limits the kinds of shenenigans that the compilers do, a little. There are also a number of other such flags.
Actually, if we consider C "undefined behaviour" to be a security issue all by itself, then a C compiler that does not (by default) limit what it does is a security issue all by itself. Maybe someone (a Rust fan?) should file one CVE for each undefined behaviour (211 in C11 IIRC) for each C compiler, unless that compiler limits that behaviour for that case by default (offering flags like -fstrict-overflow for compiling benchmarks is allowed, of course).
Posted Feb 24, 2024 15:40 UTC (Sat)
by mcatanzaro (subscriber, #93033)
[Link] (3 responses)
But it's also correct. Signed integer overflow is a software vulnerability. It doesn't matter whether it's exploitable or not. CVEs are for tracking vulnerabilities, not exploits.
Posted Feb 25, 2024 23:33 UTC (Sun)
by neggles (subscriber, #153254)
[Link] (2 responses)
Posted Feb 26, 2024 1:15 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
Posted Feb 26, 2024 9:07 UTC (Mon)
by geert (subscriber, #98403)
[Link]
Posted Feb 23, 2024 17:34 UTC (Fri)
by flussence (guest, #85566)
[Link] (43 responses)
This describes a fantasy world where C compilers (and perhaps all software) are made by insane villains and actively abuse people for doing things outside what a written standard specifies, and to be blunt, it's just "free speech" advocacy with different inflection. I for one am glad the tech culture of 40 years ago has been largely stomped out by more reasonable people.
Posted Feb 23, 2024 17:37 UTC (Fri)
by mb (subscriber, #50428)
[Link] (11 responses)
Compilers exploiting UB happens all the time. It is the base of all optimizations.
> for doing things outside what a written standard specifies,
UB is by the very definition of UB outside of what the standard specifies.
Posted Feb 23, 2024 18:44 UTC (Fri)
by fw (subscriber, #26023)
[Link] (1 responses)
Posted Feb 23, 2024 22:18 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Java and C# absolutely do have undefined behavior. It's just handled like Rust handles it: “safe” language guarantees absence of UB by compiler whole “unsafe” part allows on to write programs with UB. Java forces you to write these parts in entirely different language using JNI while C# have an unsafe subset, similarly to Rust, but in both cases UB is still, very much, form the basis for all optimizations. Of course it does. Everything in C depends on absence of undefined behavior. Simply because it's permitted to convert pointer to function into pointer to And it's not even theoretical issue! Back in MS-DOS era That one may be exploiting by finding and changing constants in the compiler code. And that, too, was used back when compilers weren't smart enough to break such tricks.
Posted Feb 24, 2024 12:39 UTC (Sat)
by vegard (subscriber, #52330)
[Link] (7 responses)
The first part is true, but the second seems trivially false. Constant propagation does not in any way relate to or rely on UB, yet it is an optimization. Same with tail call optimizations, inlining, even register allocation just to name a few.
Posted Feb 24, 2024 12:47 UTC (Sat)
by mb (subscriber, #50428)
[Link] (6 responses)
And let me even add something: UB is required for connecting the virtual machine models of the compiler to the real world. Otherwise the virtual machine model would have to *be* the actual machine model. And even then it would still include UB, because actual machines have UB.
Posted Feb 24, 2024 13:21 UTC (Sat)
by vegard (subscriber, #52330)
[Link] (5 responses)
Posted Feb 24, 2024 13:49 UTC (Sat)
by mb (subscriber, #50428)
[Link] (4 responses)
If you are going to rewrite and recompile your Rust program each time the input changes, then the Compiler is part of your program flow. The compiler is the "unsafe-block" in this case, which provides the input data. Without it, the Rust program can't process anything. It would be static.
Yes, we can have fully safe languages like Rust. But they must *always* interface to an unsafe part. Otherwise they can't produce results. Safe-Rust alone is useless. General purpose languages will always have an interface to unsafe code.
The real world is unsafe. The real world has UB.
Posted Feb 25, 2024 0:55 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
Generality is of course a price we cannot often afford. You wouldn't write the WUFFS compiler in WUFFS (the current transpiler is in Go) or an operating system, or indeed a web browser but the point is that our industry got into the habit of using chainsaws everywhere because they're so powerful, rather than using the right tool for the job.
Posted Feb 26, 2024 7:50 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
* "Safe" code (where the compiler will not let you write code that contains UB).
But then this is just a matter of definitions - You can set up your build system in such a way that it will refuse to compile unsafe code without a valid proof of soundness. Then you can consider the proof to be part of the source code, and now your unsafe language, plus the theorem proving language, together function as a safe language.
Formally verified microkernels already exist. The sticking point, to my understanding, is the lack (or at least incompleteness) of a verified-or-safe userland.
Posted Feb 26, 2024 10:02 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (1 responses)
The other sticking point is the difficulty of formal verification using current techniques. Formally verifying seL4 took about 20 person-years to verify code that took 2 person-years to write.
Posted Feb 26, 2024 10:12 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
The question is, is that good enough? Most formal verification wants to prove more than "mere" type safety or soundness, and that tends to be hard because the property you're trying to prove is complex and highly specific to the individual application. But if you just want to prove a lack of UB, that's probably more feasible.
* There are soundness bugs in the current implementation of Rust. Most of them are rather hard to trigger accidentally, but they do exist.
Posted Feb 29, 2024 9:01 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Posted Feb 23, 2024 19:34 UTC (Fri)
by geofft (subscriber, #59789)
[Link] (29 responses)
The point of undefined behavior is not that the compiler is allowed to be lawful-evil about how it interpreters your code, and so you have to be paranoid about what it might do. The point is that an optimizing compiler is permitted to assume that you are writing reasonable code that does not require the compiler to be paranoid about what you meant, and so it can make reasonable optimizations on real-world code. And every compiler that people actually use is optimizing. (There is a loose conceptual connection here with speculative execution vulnerabilities: you can avoid them with a non-speculating CPU, but nobody seems to be buying those.)
The code behind CVE-2023-52071 is actually a pretty good example of this:
However, in the actual case in the curl source code, the dead-code elimination is actually pretty bad! You do really want that code to execute; the coder's intention was not that the block was skippable. The compiler can do the exact same "useful" action and get you a pretty negative result: the curl command does no output (I think), but it's returning success anyway. It's not far-fetched to imagine that in turn leading to unexpected data loss. The compiler does not need to be actively evil to cause a real problem.
(Note that what's happening isn't that the compiler is doing something in response to undefined behavior being invoked. The compiler is simply not doing something on the assumption that undefined behavior is never invoked; specifically, it just doesn't compile the block. No real-world compiler has any interest in inserting code to do something weird that it wouldn't otherwise insert. But even so, optimizing out something that shouldn't have been optimized can cause problems - impact not intent and all that.)
Signed overflow being undefined behavior is a little bit silly because a well-intentioned optimizing compiler will only use that optimization for one purpose: to emit whatever the underlying CPU's most efficient arithmetic instructions are to handle the non-overflowing case. On essentially every modern CPU, that's two's-complement wrapping operations, but the historic existence of other CPUs means that the standard wanted to allow optimizing compilers to have a chance on those platforms too. Today it would be reasonable to make it no longer undefined behavior. All the other types of undefined behavior are undefined because there are reasonable optimizations that users actually want their compilers to do. Strict aliasing means that a loop that reads an array behind a pointer doesn't have to reread the pointer each time through, just in case something else in the loop changed it. Data races are undefined so that compilers don't have to use atomic operations or lock prefixes for everything. Buffer overflows are undefined so that there aren't bounds checks inserted everywhere. And so forth.
Posted Feb 23, 2024 20:04 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (3 responses)
And this is an untrue generalisation :-)
Admittedly I don't use it much, but a lot of people make great use of it - I guess many of the DataBASIC compilers are not optimising. I know OpenQM isn't. The guy who wrote it is documented as saying the extra complexity involved wasn't worth the candle.
Okay, it's probably still vulnerable to optimisation, because DataBASIC compiles to high-level p-code, which is then processed by an interpreter written in C ... but that ain't an optimising compiler.
Cheers,
Posted Feb 23, 2024 20:37 UTC (Fri)
by geofft (subscriber, #59789)
[Link] (1 responses)
Posted Feb 24, 2024 5:12 UTC (Sat)
by willy (subscriber, #9762)
[Link]
Posted Feb 24, 2024 7:53 UTC (Sat)
by jem (subscriber, #24231)
[Link]
Or, if the definition of a non-optimizing compiler is that the binary code is a series of fragments that can clearly be used to identify the corresponding parts of the source code, how on earth are you going to formalize this definition?
Posted Feb 23, 2024 21:26 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
This does seem to be the expectation of many C++ programmers and I'd assume also C programmers.
It's wrong though. Here's a very easy example, the compiler just constant folded your arithmetic overflow out of existence... https://godbolt.org/z/v4evh3eEG
Posted Feb 26, 2024 14:01 UTC (Mon)
by error27 (subscriber, #8346)
[Link]
Posted Feb 23, 2024 22:44 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Sigh. I wonder if “we code for the hardware” guys would ever learn that “well-intentioned optimizing compiler” is an oxymoron, it just simply couldn't exist and doesn't exist. Compiler couldn't have intentions, well-intentions or ill-intentions. It's just simply basis of the whole compiler theory. That discussion happened more than decade ago and it's still relevant. And if you think that gcc have, suddenly, become “well-intentioned” simply because GCC12 or GCC13 don't turn that particular example into pile of goo then you are sorely mistaken: it's only because these have learned to [ab]use SIMD instructions in At this point we should stop pretending C and C++ are salvageable because trouble with them is social, not technical: even after decades of discussions “we code for the hardware” crowd is not ready to give up on their dream of “well-intentioned” compiler while compiler makers are not even trying to discuss changes in the language which may help these people to produce working programs.
Posted Feb 24, 2024 1:08 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (20 responses)
More generally though over the past say five years I've become increasingly comfortable with the "demons fly out of your nose" characterisation despite the fact that yes, technically that specifically won't happen (because demons aren't real). The characterisation is appropriate because it inculcates the appropriate level of caution, whereas the "It will never do anything unreasonable" guidance you prefer reassures C and C++ programmers that they're safe enough when in fact they're in constant danger and _should_ be extremely cautious.
There's an Alisdair Meredith talk which I can't find right now where Alisdair confidently explains that your C++ program cannot delete all the files on a customer's hard disk unless you wrote code to delete all their files - He argues that while sure as a result of UB it might unexpectedly branch to code you wrote that shouldn't normally run, or pass different parameters than you expected; it cannot ever just do arbitrary stuff. This is of course completely untrue, I'm guessing every LWN reader can see why -- but it does make it easier to feel relaxed about having mountains of crappy C++ code. Sure it has "Undefined Behaviour" but that probably just means it will give the wrong answers for certain inputs, right?
If every C and C++ programmer had the "Ralph in danger" meme image on a poster over their desk I'd feel like at least we're on the same page about the situation and they've just got a different appetite for risk. But that's not the world we have.
Posted Feb 24, 2024 1:23 UTC (Sat)
by pizza (subscriber, #46)
[Link] (19 responses)
No, it is categorically true, because "undefined" does not mean "arbitrary and unbounded"
Using your logic, triggering UB means the computer could respond by literally exploding. Ditto if your computer gets hit with a cosmic ray.
If you argue that "no, the computer can't do that because nodody built explosives into it", why can't that argument also be applied to UB arbitrarily deleting your files instead? Sure, both are _possible_ in the sense that "anything is possible" but you're far more likely to have your car hit by a train, an airplane, and a piece of the International Space Station... simultaneously.
Posted Feb 24, 2024 2:03 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (18 responses)
For the file deletion situation the usual way this comes up is that bad guys hijack a program (whatever its purpose may have been) to execute arbitrary code (not something it was intended to do, but ultimately not hard to achieve in some UB scenarios as numerous incidents have demonstrated). Then they do whatever they like, which in some cases may include deleting your files (perhaps after having preserved an encrypted "backup" they can sell to you).
Posted Feb 24, 2024 3:12 UTC (Sat)
by pizza (subscriber, #46)
[Link] (17 responses)
Seriously? Calling that the consequence of "undefined behaviour" is beyond farcical, as the _computer operator_ is *deliberately choosing* to delete files.
Just because the operator is unauthorized doesn't make them not the operator.
And "undefined behaviour" is not a requirement for, nor does it necessarily lead to, arbitrary command execution.
Posted Feb 25, 2024 18:28 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (16 responses)
Emotionally it's satisfying to insist that you're right and Mother Nature is wrong. But pragmatically the problem is that Mother Nature doesn't care how you feel about it
And it's going to keep happening until you stop doing the thing that doesn't work, even though you find that emotionally unsatisfying as an outcome.
Posted Feb 25, 2024 23:39 UTC (Sun)
by pizza (subscriber, #46)
[Link] (15 responses)
No, Alisdair and I both claim it can't happen *unless someone intentionally writes code to make it happen*
...It won't happen by pure happenstance. (Which even your contrived script kiddie example demonstrates)
> But pragmatically the problem is that Mother Nature doesn't care how you feel about it
Uh.. there is nothing "natural" about computer software or even computer hardware; they cannot operate in ways that exceed what they were designed to do. But that's neither here nor there; "Mother Nature" doesn't respond to unexpected stimulus in arbitrary ways either; nature has rules that governs how it functions. (Granted, we don't understand many/most of them, but that doesn't mean they don't exist.)
For example, reading an initialized memory cell yields "undefined" results. However, in reality (sorry, "Nature") the value will either be 1 or 0. It literally cannot be anything else, because the computer can only register a 1 or a 0 in response to that input -- you won't ever get a value of "0.661233123413" or "blue". So yes, it is "undefined" but it is *bounded*. What happens in repsonse to that? That depends on what that value is used for in the larger system.
Going back to the curl not-a-CVE, when the worst possible outcome is that the user gets access to one byte of data they already had access to, there is no path from that read to "nuke your filesystem" unless curl is being used within a system already designed to nuke your filesystem (or the OS or runtime or whatever was intentionally designed to nuke your filesystem) if you read-out-of-bounds.
Another way of looking at this is that sure, the contents of that extra byte is technically undefined, but so is every other byte in the HTTP response from the server -- including whether or not you get one at all. Similarly, what the server does as a result of you making that request is also undefined and largely outside your control. It could trigger thermonuclear war for all you know. But it won't trigger global thermonuclear war unless someone deliberately gave it those capabilities. In other words, undefined, but *bounded*.
Posted Feb 26, 2024 6:40 UTC (Mon)
by mb (subscriber, #50428)
[Link] (11 responses)
That is not true.
The rest of your post is also largely not true. But that has been explained often enough to you, so I won't repeat.
Posted Feb 26, 2024 9:10 UTC (Mon)
by geert (subscriber, #98403)
[Link] (1 responses)
Posted Feb 26, 2024 9:16 UTC (Mon)
by mb (subscriber, #50428)
[Link]
Posted Feb 27, 2024 20:24 UTC (Tue)
by pizza (subscriber, #46)
[Link] (8 responses)
A TRNG does not respond "arbitrarily"; it still can only operate within its design constraints, which of course includes the characteristics of the materials it was constructed from. And, while any given read of a TRNG is "undefined" the value is bounded, with each discrete value being equally probabilistic as long as it is used within its designed operating conditions. [1]
It will always return a value between 0.0 and 1.0. [2] It cannot return "Fred" or kill your cat unless you put it into a box with a vial of poison.
...And the physical phenomenon that the RNG is measuring also has to have bounds, or you'd not be able to detect it -- Certain stimuli can make these events more likely (yay, Fission!) but that's just a change in probabilities [3] The point being, they don't respond "arbitrarily". Your Pb isn't going to turn into Au because a butterfly flapped its wings halfway across the world. Either an atom decays or it doesn't. Either an electron crosses the P-N junction or it doesn't.
[1] Several $dayjobs ago, I helped design a TRNG, so I have a decent idea how they work... and when they fail.
Posted Feb 27, 2024 20:46 UTC (Tue)
by mb (subscriber, #50428)
[Link] (7 responses)
Posted Feb 27, 2024 22:12 UTC (Tue)
by pizza (subscriber, #46)
[Link] (6 responses)
How exactly does a high school physics experiment support your claim that "Mother Nature doesn't respond to unexpected stimulus in arbitrary ways either; nature has rules that governs how it functions" is "not true"? [1]
...This experiment shows that we are still trying to figure out what those rules are, not that they don't exist!
It certainly doesn't change the fact that while any given observation is unpredictable, the probabilities are. (eg you can't predict an the decay of an individual atom, but you can accurately predict the overall _rate_ of decay of a mole of them)
[1] https://lwn.net/Articles/963598/
Posted Feb 28, 2024 10:20 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (5 responses)
That's all, folks.
Cheers,
Posted Feb 28, 2024 12:19 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Feb 29, 2024 14:24 UTC (Thu)
by pizza (subscriber, #46)
[Link] (2 responses)
Yeah, so? That doesn't demonstrate that Nature behaves arbitrarily and doesn't follow rules; it demonstrates that Nature's rules are a lot more complicated than we previously understood.
Posted Feb 29, 2024 18:29 UTC (Thu)
by mb (subscriber, #50428)
[Link]
https://en.wikipedia.org/wiki/Laplace%27s_demon
Nature has inherent randomness and undefined behavior.
Posted Feb 29, 2024 21:08 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
We have classical physics, where everything follows rules and is deterministic.
We have relative, which iirc is the same.
And then we have quantum, where things happen at the micro level, but the rules only work at the macro level - we have no idea what (if any at all) the deterministic rules are. Especially as the main thing behind quantum seems to be the making of something out of nothing - if we have nothing there to start with, how can there be anything there to apply deterministic rules TO!?
So if quantum is basically nothing, surely it's reasonable to assume the quantum rules are nothing, too :-)
Cheers,
Posted Feb 29, 2024 21:02 UTC (Thu)
by rschroev (subscriber, #4164)
[Link]
Posted Feb 26, 2024 7:40 UTC (Mon)
by jem (subscriber, #24231)
[Link] (2 responses)
I can easily imagine a memory technology where reading an uninitialized memory cell produces the value 1, and could on the next read (still uninitialized) produce the value 0. If you repeat the process a sufficient number of times you could end up with a mean value of 0.661233123413.
Posted Feb 26, 2024 11:20 UTC (Mon)
by mpr22 (subscriber, #60784)
[Link]
Mmm, delicious sparkling bits. (The client was doing something ill-advised – can't remember whether it was power down or just hard reset – to the system while MLC NOR Flash was being programmed, which is admittedly something a bit worse than just "uninitialized memory".)
Posted Feb 26, 2024 15:04 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Ordinary DRAM can work that way; the capacitor in the DRAM cell is in one of three states; 0, intermediate, and 1. Intermediate is read out as either 0 or 1, but whether intermediate can stay at intermediate, or ends up forced to 0 or 1 on read depends on details of the DRAM design. Some DRAM designs will force the cell to 0 or 1 state on the first read or refresh, others have a more stochastic process where the cell can stay at intermediate until it's written (which sets the state to either 0 or 1, unconditionally), but may stabilise randomly.
And because this is outside the DRAM specifications (normally - you can get slightly more expensive DRAM ICs which guarantee read stability even without writes), different batches of the same IC may have different behaviours. In practice, you need special cases to observe this, since every refresh cycle counts as a read for the purposes of stabilizing the value, and any cell that's been written to is also stable.
As a result, you need to be depending on reads being stable even if the cell hasn't been written yet, and be reading from the DRAM shortly after it's been powered up, before there's been enough refresh cycles to stabilize its value, and using DRAM that can take several read or refresh cycles to stabilize the cell values. The first is almost certainly a lurking bug in your code (it was in the case I hit, it just took a long time to find and fix it, and the "quick" fix was to buy more expensive DRAM that guaranteed stability while we hunted down the software bug), the second pretty much requires you to be running code directly from flash or ROM, not booting the way a PC does (since the boot sequence takes long enough that you've had many 64ms or shorter refresh cycles during boot), and the third requires you to be unlucky with the specific DRAM ICs you buy.
Posted Feb 24, 2024 11:28 UTC (Sat)
by hsivonen (subscriber, #91034)
[Link]
This is not accurate. Compilers use the signed integer overflow UB for assuming that it doesn’t happen, which permits mathematical reasoning that’s valid in non-modular integer domain.
Google uses int for loop counters, and they seem to want the optimizations that arise from signed overflow being UB and, therefore, assumed not happening by the compiler.
https://google.github.io/styleguide/cppguide.html#Integer...
Posted Feb 24, 2024 18:26 UTC (Sat)
by faramir (subscriber, #2327)
[Link]
https://www.reddit.com/r/roguelikedev/comments/ytlw2f/a_b...
"Then people first started using the #pragma directive in C, its behaviour was implementation-defined. Version 1.34 of gcc, released around 1988/89, mischievously defined the behaviour as "try running hack, then rogue, then Emacs' version of Towers of Hanoi". i couldn't find 1.34 in the gcc archives, but gcc-1.35.tar.bz2 concedes that the directive might in fact be useful:"
Posted Feb 23, 2024 17:37 UTC (Fri)
by mgb (guest, #3226)
[Link] (13 responses)
Posted Feb 23, 2024 18:05 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link] (12 responses)
UB makes sense for signed integer overflows that might return -x or ~x depending on the machine's twos complement. I.e. "we trust that you don't care about the result past this limit". It makes sense for shifts with high bits set, for the same reason (some CPUs will zero the output, others will mask higher bits), etc. The absurd approach of "the program might burn your house" has been causing a lot of harm to a ton of perfectly behaving programs on their usual platforms.
Sure, some optimizations are based on this. But instead of UB they could have presented the known possible outputs (e.g: what acceptable values will x+1 have). It would have permitted the same optimizations without purposely ruining valid tests. And actually, compilers could agree on this without even revisiting the standard.
Posted Feb 23, 2024 22:53 UTC (Fri)
by khim (subscriber, #9252)
[Link] (11 responses)
They could, but what would it change? The real problem is not the fact that C and C++ have so many undefined behaviors but the fact that there are so many people with attitude if your language would have rules that I wouldn't like then I would just go and break them, anyway. If C developers don't even contemplate the idea that they may, gasp, play by the rules, then what may change in the compiler ever achieve? If you may, somehow, identify these people, and then kick them out of your community like Rust did then this exercise starts becoming useful, but I don't believe that may ever happen in C or C++ communities. “We code for the hardware” crowd is too entrenched and influential there.
Posted Feb 24, 2024 8:03 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (7 responses)
As a reminder, the language rules are not public, you have to pay to get them, or find one of the latest drafts. *That* is the biggest issue. Do you imagine I first learned about the UB status of signed integer wrapping something like 20 years after using C because all compilers were applying it as one would expect, i.e. like unsigned ? I think it's gcc 10 or so that first broke it and tons of programs with it.
I guess many C developers came from other languages and already had asm experience before going to C, and there's no such UB in asm, you know what you're doing and it makes no sense to suddenly pretend that the conversion to C will eliminate some code. But that's reality.
IMHO one thing Rust developers did well was to carefully enumerate (and learn so that they can tirelessly recite them) all UB of C and educate people about them. But I can assure you that the vast majority of C developers are not aware of even 10% of them because the language is made to let the developer express what he wants to do in a portable way and the compiler decides otherwise as some obscure rule allows it to, for reasons that initially had nothing to do with their purpose, but were abused for the sake of optimization.
Posted Feb 24, 2024 10:57 UTC (Sat)
by khim (subscriber, #9252)
[Link] (6 responses)
Do you really believe that? That's the craziest idea about UB that I have every had. Can you point out a single person who may genuinely claim “I wrote program with UB simply because I couldn't find the definition of the language?” and mean that? Some use it as an excuse for ignorance, but I would argue that you can't trust anyone who couldn't find draft of the standard anyway: it's much easier to find eel.is link than to write a “hello world” program in any new language. Seriously? First GCC 10 breaks programs in 2020, then I use time machine to go back into 2011 to start that discussion, then nix uses another time machine to go back into 2007 and start that one and yet another time machine is used to plant the evidence into gcc 2.95 which was released last century… don't you think that your explanation is starting to look a bit convoluted? I, for one, don't remember ever jumping from 2020 into 2011 to plan the LWN discussion and I'm pretty sure no one used such a valuable artifact to alter GCC 2.95 behavior retroactively. There are plenty of UB in asm. CGA snow is perfect example. Behavior of Z80 string manipulation instructions is also undefined (it works fine on some systems but stops memory refresh on some others which makes programs unportable). And I'm pretty sure older CPUs had some interesting UBs, too. No, the reality is much simpler: it's simply not possible to translate random code snippets from one language to another if both source and target languages have UB. And if you don't allow UBs in source language then certain things become simply flat out impossible. It's not a coincidence that majority of widely used languages these days don't have UBs. It's hard to deal with languages that have UBs. It's also not a concidence that these same language are always not self-hosting ones. Because to have self-hosting language you need language with UB, you would hit the Rice theorem. When people repeat, decade after decade that it doesn't make any sense to follow math law and how the fact that something is flat out impossible shouldn't stop other guys from delivering things that they want… At some point you just have accept that these guys couldn't be changed, they could only be replaced. That was side effect from the fact that Rust developers (at least some of them) come from math world and thus they are not “following the intuition”, they know some important things. I don't think before they started poking at UBs that C/C++ compilers are using in optimizations anyone ever tried to catalogue them to see if language with these UB is even internally consistent. Rust guys used LLVM as basis and thus, naturally wanted to know what rules it obeys. And they have found that both sides go with “intuitively justified” rules. One side happily assumes that they can use hardware capabilities in a C or C++ program even if language doesn't permit that (which is nonsense), the other side uses “things that are probably correct” to perform optimizations (which may lead to funny results)… and every group postulates that they are right while “the other side” are villains hell-bent on destroying everything in the known universe. How that phrase even makes any sense? UBs exist precisely to “let the developer express what he wants to do in a portable way”! If you write code that triggers UB then your code is not portable! It's as simple as that! This article explains how we have arrived at the point where C/C++ have been turned into “language unsuitable for any purpose” but it doesn't tell us what to do and how can it be made useful, again! The sad thing is that it's not even that hard to understand why we have that fiasco, only most C and C++ tend to insist that they don't need to change anything in their behavior and compiler developers need to just go and give them ponies. That's the fundamental disconnect. One side knows that compiler is dumb, insane entity — simply by definition: there are no organ in a compiler that could have made it sane! “Sane” compiler which may “decide” something are simply just not happening. As in: NONE of compilers that were EVER created had the ability to “descide”. The only way a complicated multi-pass compiler may be written is if you create fixed in advance set of rules that program which is compiled follows and then ensure that all hundreds of passes in the compiler never break any invariants that are embedded in these rules. That's fundamental limitation, we simply have no idea how to make compilers without fixing the rules of input language. It's not even clear if that's possible in principle. “Reasons” is another thing that compilers couldn't use. Because, again, they lack an organ which may help them think about reasons. You probably may attach ChatGPT and then they would get such ability, but I'm not even sure it's good idea: this would just mean that compilers would sometimes understand your “reasons” and sometimes fail to do that. Hardly an improvement: compilers today may be insane and unreasonable, but at least they are consistent and builder are “reproducible”, if you would add “sanity” and “reasoning” to compilers you would lose that. Except they weren't “abused”. Compiler is, by definition, something that transforms source language into machine language. If you couldn't say what they program which you have on the input is supposed to do then you couldn't say whether transformation performed on it is correct or incorrect. Some people actually tried to “help” these developers that complain about how “compiler breaks they programs” and asked them: Okay, standard doesn't say what all these non-C programs are supposed to do… but may you, as human, look on it and say how it's supposed to work? And, of course, the answer was resounding no because without language definition every C user if free to invent their own ideas about how non-C program is supposed to behave! And they actually do invent their own ideas about that! Of course writing the compiler which works with such non-definition of non-language is impossible. You can not play basketball or football if players disagree about the rules of the game. And if half of players think they are playing one game and half of them think they are playing something entirely diffrent then nothing would ever work. And, unfortunately, C (and, to lesser degree, C++) is precisely in such position today.
Posted Feb 24, 2024 14:22 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (5 responses)
> Do you really believe that? That's the craziest idea about UB that I have every had. Can you point out a single person who may genuinely claim “I wrote program with UB simply because I couldn't find the definition of the language?” and mean that?
This has been the case for most of the C devs I know. Did you ever see a teacher at school enumerate all the UB ? No, they aren't aware either. Even some of the C books I used to read 35 years ago to try to learn the language by myself never made mention of such specificities. I was never informed of any such UB and everything I used to do used to work perfectly well and as expected for two decades.
The point is that historically the compiler authors considered that if your code does something undefined, you're on your own, and it's just the developers' job to make sure that the input domain doesn't cause UB. It has been this way for several decades, and developers took care of not causing overflows etc and all was fine. Then later it turned to the compiler trying to detect if some combinations of inputs could possibly trigger an UB, and if so, they would declare the whole block UB if the developer couldn't prove the input cannot be produced. This started to create tons of painful bugs and to make the code much harder to write correctly, since the language doesn't offer provisions for announcing input domains. Sometimes you would just like to be able to tell the compiler "trust me, this variable is not null" or "trust me this value is always between 13 and 27" and be done with it. But no, it will either produce some warnings that are irrelevant to your use case or even eliminate some code considering that it might burn your house so why compile it in the first place.
And the fact that the compiler silently eliminates some code without any option left to at least get a compile-time warning about such decisions is a real disaster.
Posted Feb 24, 2024 15:35 UTC (Sat)
by khim (subscriber, #9252)
[Link]
Why is that ever relevant? You couldn't do much using 35 years manuals today. Heck, most manuals that existed back then would have talked about completely different language which is not even accepted by many compilers today! Times changing and people learn new things. What was perceived acceptable decades ago today is deemed “dangerous”. We have learned how to do continuous integration, fuzzy testing and many other things. Importance of UB and the need to write programs which don't ever trigger UB was supposed to become realized on that road somewhere, too… only lots of C and C++ developers, somehow, happily ignore these changes even today. Nope. Let's take a look on the Turbo C manual. You can download it here. What does it say about the use of registers in assembler? This: If there is a register declaration in a function, in-line assembly may use or change the value of the register variable by using 51 or 01, but the preferred method is to use the C symbol in case the internal implementation of register variables ever changes. This doesn't, yet, elevates UB to the position where it is today (simply because it was assumed, back then, that you wouldn't ever use your program with different compilers and this may rely on behavior of the compiler), but it already talks about how to write “future-proof” code. The only thing that changed in past three decades was change in the availability of new versions of the compilers. Today is it assumed that you write code for some version of language, not for a particular compiler. No. All optimisation's are always starting from deciding that certain “dumb and stupid code” would never be written. And said “dumb and stupid code” always included some for of UB. If you go back to that example that I have already shown you:
This code is fine, according to the Turbo C manual (and it actually works with Turbo C), but it stopped working in last century already (would be interesting to know what compiler was first to break it)! And nobody objected. Maybe if people would have realized back then where this road would, eventually, lead… we would have lived in a different world now. Not sure if it would have been better or worse, but it would have been different for sure. Well, standard doesn't offer anything, but GCC does. No, the disaster is that assumption that compiler may offer such code. Look on that silly If compiler would issue warning for every change that it does that you would get million of warnings for thousand lines of code, most of them pointless. You don't want that. What you do want are warnings for the “dangerous” transformations, but that's not possible, too, because, as an attempt to create friendly C have shown for every optimization which someone consider “dangerous” there are bunch of people who consider that exact same optimization “essential” (start with simplest case of shifting 32bit value by 33 and watch how group of C “we code for the hardware” guys would become divided, repeat 100 times and end up with as many definitions of C as you would have people involved, perhaps with even more definitions because if you ask the same question twice in different forms you would get more than one answer from “we code for the hardware” guys). I recently helped friend with modern, C++17-based course. They haven't listed all possible UBs, but they have been talking about how UBs may affect your program in unpredictable ways and they have been talking about how you have to use asan/tsan/ubsan/etc to avoid UBs. But that doesn't solve the issue of people who have learned C and/or C++ long ago and refuse to accept the fact that they have to avoid UB. It's one of areas where things change in ages-old fashion: An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning. But if that is how that whole stupid story would be resolved then we may, as well, change the language (
Posted Feb 25, 2024 2:54 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (3 responses)
These seem extremely dangerous and I would not recommend them. Being able to insist that things are true when maybe they aren't is almost certainly going to make the things you complain of even worse. Prefer instead to help the compiler figure out for itself that variables must have the properties you desire.
For example Rust's NonNull<T> is a pointer that's not null. If you have a pointer, you can ask NonNull::new to give you an Option<NonNull<T>> instead, and it will either give you Some(NonNull<T>) or None accordingly.
WUFFS will let you express "I claim that value is between 13 and 27" in code, and check that indeed it can see why value is between 13 and 27 so you're correct. For example:
assert value < 27 via "a < b: a < c; c <= b"(c: foo)
That says, a human has promised you elsewhere (in a list of propositions humans have proved) that this obvious rule about comparison operators is true. Using that proposition, and the facts that foo is less than or equal to 27 and that value is less than foo, you can see that value is less than 27. Done.
Note that this won't compile if you're wrong (e.g. value isn't less than foo) unless you fed WUFFS a faulty proposition and then relied on that proposition, so, don't do that, if you need to prove tricky things hire a mathematician. And especially note that I wrote _compile_ there. This isn't a runtime problem, WUFFS rejects faulty programs at compile time.
Posted Feb 25, 2024 10:35 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
And yet you may use You couldn't use your knowledge that That is the fundamental difference between C/C++ and Rust. You have not have someone who kicks out the violators. Because every language is either safe, incomplete and non-self-hosting (and then you have some other language which is used to host that safe language) or unsafe, complete and, optionally, self-hosting. And attempts to skip that duty of keeping violators in line just simply doesn't work: Java and PHP are memory safe languages, yet how many security breaches have people seen via programs written in these languages? Compiler may help you to avoid stupid mistakes, but if you actively lie to the compiler then nothing may ever be truly reliable or secure.
Posted Feb 26, 2024 2:47 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link]
For WUFFS it's not quite the same in two crucial ways, firstly because it's not a general purpose language it can't have and won't have a large audience. So it can be completely picky, it would be fine if they don't want people who like Vim, or anybody whose name has too many vowels in it, no problem.
Secondly though their safety is much more fundamental to the language itself. You could take Rust's technology and build something completely unsafe with that. The Rust community wouldn't, but it's not a restriction of the technology. WUFFS can't really do that, the technology doesn't really do anything else, there's a lot of clever maths stuff going on there which is heading for only one destination, certainty of correctness. The only other extraordinary thing about WUFFS is performance, but even there it does so in a way that only makes sense if you assume correctness is mandatory. I don't believe you would build this technology for any other reason.
For both reasons I don't think there's any need to "kick out the violators". To such people WUFFS makes no sense, it's like worrying about whether the crows on your web cam accept the Terms of Service of your ISP. The crows don't even understand what Terms of Service are as an idea, much less accept these particular terms.
In your terms I'd guess you'd consider WUFFS "incomplete". That's fine, it's not supposed to be "complete" in this sense.
Posted Feb 25, 2024 11:19 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
> These seem extremely dangerous and I would not recommend them. Being able to insist that things are true when maybe they aren't is almost certainly going to make the things you complain of even worse. Prefer instead to help the compiler figure out for itself that variables must have the properties you desire.
Not quite the same thing, but again to be able to say "the only valid values are between 13 and 27". Then the compiler goes "if I detect a value that isn't, that's an error". Or trigger a warning "cannot prove boundaries are met".
Cheers,
Posted Feb 25, 2024 0:18 UTC (Sun)
by laurent.pinchart (subscriber, #71290)
[Link] (2 responses)
Whoever you may be, I hope for the "Rust community" that you don't represent them here. "Kicking people out" when you disagree with them doesn't build welcoming communities, as is clearly explained in the article you linked. This whole thread is degenerating into a cockfight, it would be nice if everybody could take a deep breath.
Posted Feb 25, 2024 1:50 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
I wouldn't perhaps choose the phrase "kicking people out" but it's clearly not going to work to have people who don't agree with your safety culture in a community that specifically values this property. It's the old "tolerance of intolerance" problem. We cannot have shared values *and* welcome people who specifically disagree with those values, either those people aren't welcome or those aren't shared values after all. Posted Feb 23, 2024 18:36 UTC (Fri)
by geofft (subscriber, #59789)
[Link] (9 responses)
A vulnerability is a misbehavior in a program that compromises its security in response to crafted input / malicious actions from an attacker. This requires a few things to exist. "Misbehavior" requires a coherent concept of intended behavior. "Attacker" and "crafted input" require a system that interacts with untrusted parties in some way. "Security" requires things that you're trying to keep secure (maintain confidentiality, integrity, and/or availability) from the attacker.
As an obvious example, the fact that bash executes arbitrary code is not an arbitrary code execution vulnerability. It's what bash is supposed to do. The thing that supplies input to bash is expected to be the legitimate user of bash, not an attacker. bash is supposed to be able to run any command, and so it doesn't have a distinction between behavior and misbehavior. If you can get arbitrary code execution in rbash, for instance, then yes, that's a vulnerability, because rbash is designed to take untrusted input and maintain security of the system it runs against that input. If you can run arbitrary commands by setting environment variables even if you can't control input, then there's probably a vulnerability (as Shellshock was). But for regular bash, if you are setting up a system where you're piping untrusted input into bash, that's a vulnerability in your system, not in bash.
The only way to distinguish that is to know at a human level what bash is supposed to do and where it's supposed to take inputs from. There is no automated technical distinction to be made between bash and rbash. There is no automated technical distinction to determine that commands like bash and python are only intended to be run with trusted input, but commands like grep and sort are supposed to be able to handle untrusted input. You can call this a "gut feeling" if you like, but it's inherent to how we use computers. We never run a program for the sake of running a program; we run a program in order to accomplish some task of human interest, and the parameters of that task, not the program, determine what counts as misbehavior and insecurity.
There is a simple argument that CVE-2020-19909 (note there is a typo in today's article, it's 19909, not 1909) is not a vulnerability. It is not that the undefined behavior doesn't exist, or that the risk of a compiler doing something arbitrarily weird and unwanted is low. It is entirely compatible with a compiler generating code to do
How do we know if this claim is correct? You're right in a technical sense that you can't really know - it's always theoretically possible that someone wrote a web form that takes input from untrusted users and one of the fields is the retry vulnerability value But it's also equally theoretically possible that someone wrote a web form that takes input from untrusted users and splices it into a bash command line without escaping. And, in fact, it's not just theoretically possible, it's quite common. But nobody would say this means there's a vulnerability in bash, would they?
The process of determining what is or isn't a vulnerability has to come down to human judgment. You could plausibly argue, for instance, that Shellshock wasn't a vulnerability because attackers shouldn't be able to pass arbitrary environment variables into bash. But the real-world deployment of CGI meant that there was a large installed base of users where attackers were in fact able to set environment variables to arbitrary values. Moreover, it meant that humanity believed that it was a reasonable design to do that, and the flaw was not with CGI for setting those variables.
And it's not sufficient to just lean on documented behavior. First, would you consider it an adequate fix for the vulnerability if curl made no code changes but just changed the documentation to say that the input to the timeout value must be small enough to avoid integer overflow? But also there have actually been real-world vulnerabilities, that are unquestionably vulnerabilities in human judgment, that were documented. Log4shell comes to mind: the behavior of loading JNDI plugins in log statements was absolutely intentional, and the support in JNDI for loading plugins from arbitrary servers was also absolutely intentional. But the resulting behavior was so unreasonable that the Log4j authors did not argue "there is no deviation from the documented behavior" - which they could have argued with much more certainty than a gut feeling. Or consider the KASLR bypass from the other day: it isn't material whether the kernel developers intended to publish a world-readable file with post-ASLR addresses or not, it is still a security bug either way.
There is, simply, no way to determine what is or isn't a vulnerability without the involvement of human judgment. You can make a reasonable argument that the maintainers of the software are poorly incentivized to make accurate judgments, yes. But someone has to make the judgment.
(Also - I actually fully agree with you about CVE-2023-52071. The argument that it only applies in debug builds and not release builds is reasonable as far as it goes, but in my human judgment, it is totally reasonable to run debug builds in production while debugging things, and you're right that Daniel's claim that it can only possibly cause a crash is incorrect. Because the bad code deterministically does an out-of-bounds access, it's totally reasonable for the compiler to treat the code as unreachable and thus conclude the rest of the block is also unreachable, which can change the actual output of the curl command in a security-sensitive way. The compiler can tear out the whole if statement via dead-code elimination, or it can lay out something that isn't actually valid code in the true case, since it's allowed to assume the true case never gets hit. He's quite possibly right that no compiler actually does that today; he's wrong that it's reasonable to rely on this.)
Posted Feb 23, 2024 19:54 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (8 responses)
How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.
The only semi-legitimate use case for such access is stack protector runtime test scripts (and even those should be done in assembler for 100% reliability).
> warning: array subscript 3 is above array bounds of ‘wchar_t[3]’ {aka ‘int[3]’} [-Warray-bounds=]
int f(void);
Posted Feb 23, 2024 21:16 UTC (Fri)
by geofft (subscriber, #59789)
[Link] (4 responses)
Here's another example, though you might argue that it is also contrived. Suppose you have a binary format that stores numbers between 0 and 32767 in the following way: if the number is less than 128, store it in one byte, otherwise store it in two bytes big-endian and set the high bit.
Posted Feb 24, 2024 5:32 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (3 responses)
Posted Feb 24, 2024 21:22 UTC (Sat)
by jrtc27 (subscriber, #107748)
[Link] (2 responses)
Posted Feb 25, 2024 7:03 UTC (Sun)
by adobriyan (subscriber, #30858)
[Link] (1 responses)
Again, if compiler can 100% prove UB access it should refuse to compile.
If UB access cannot be proven then it should shut up and emit access on the grounds that maybe, just maybe, it doesn't know something.
Linus(?) once made an example that future very powerful gcc 42 LTOing whole kernel may observe that kernel never sets PTE dirty bit
Posted Feb 25, 2024 8:36 UTC (Sun)
by mb (subscriber, #50428)
[Link]
A compiler cannot at the same time assume UB doesn't exist and refuse to compile if it does exist.
You have to decide on a subset of UB that you want to abort instead of assuming it doesn't exist.
We *do* have languages that have a proper language subset without UB. Just use them.
Posted Feb 23, 2024 22:58 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
Such approach is incompatible with SPEC CPU 2006 which means none of compiler vendors would ever release such a compiler.
Posted Feb 24, 2024 5:09 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (1 responses)
Posted Feb 24, 2024 10:57 UTC (Sat)
by excors (subscriber, #95769)
[Link]
Assuming you mean the Fortran code in the bug report, ChatGPT 3.5 (asked to translate it into C) interprets the loop as "for (int I = 1; I <= NAT; I++) {" and the WRITEs as e.g. "if (LAST && MASWRK) fprintf(IW, "%d %s %s %f %f %f %f\n", I, ANAM[I], BNAM[I], PM, QM, PL, QL);", which seems plausible. Though it does fail to mention that the Fortran array indexing starts at 1, while C starts at 0, which is a potentially important detail when talking about bounds-checking.
(But that code seems largely irrelevant to the actual bug, which is that the benchmark illegally declares an array to have size 1 in one source file and 80M in another source file, and GCC optimised it by treating the 1 as a meaningful bound, and SPEC won't fix their buggy benchmarks because they "represent common code in use" and because any changes "might affect performance neutrality" (https://www.spec.org/cpu2006/Docs/faq.html#Run.05), so the compiler developers have to work around it.)
Posted Mar 22, 2024 18:56 UTC (Fri)
by DanilaBerezin (guest, #168271)
[Link] (4 responses)
If "good" software written in C should unilaterally and universally avoid UB then a lot of software you rely on would simply be impossible to. The family of dlopen, dlsym, dlclose functions, for example, relies on casting a function pointer to a `void *`. This is undefined behavior, but nonetheless it is impossible to implement that functionality without eventually performing such a cast. So what is the solution? POSIX defines the behavior of casting a function pointer to a "void *" where the C standard still leaves it undefined. As another example, the Linux kernel is littered with undefined behavior. There are countless violations of the strict aliasing rule when it does a performant checksum by casting some arbitrary pointer to a "u32 *" or when it implements some sort of memory re-use (e.g. pooling, memory allocators etc.). Ironically, it also is littered with countless examples of signed overflow, because they serve an intended purpose see: https://lwn.net/Articles/959189/. The Linux kernel avoids the UB in these cases by supplying the options `-fno-strict-aliasing` and `fno-strict-overflow` to GCC, which tell it to define the behavior in the case that violations of strict aliasing occur or signed overflow occur respectively.
Now, I don't know how Curl is compiled (I don't have the time to build it from source and analyze the makefile, but feel free to try and prove me wrong), but if it is compiled with those options, signed overflow is simply and deterministically not a problem. In the worst case, it presents itself as a logic error where an integer will unintentionally wrap around. Now the case for the out-of-bounds access is certainly tougher to justify. It is impossible to define behavior in the case of an out-of-bounds access since that behavior is undefined on the architectural level. It's certainly a bug and one that should be fixed, but at the same time, I can also see why Daniel would be reluctant to call it a security issue. It *may* be a security issue, but then again so can any other line of code. I don't think it makes sense marking things as security issues when there isn't a proven way to exploit it.
Posted Mar 23, 2024 0:57 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Casting function pointers to "void*" or vice versa is not a UB. It's only UB if the pointer types have different alignment requirements.
Posted Mar 23, 2024 1:04 UTC (Sat)
by DanilaBerezin (guest, #168271)
[Link] (1 responses)
This just shows how abstract the C memory model is. A lot of things that would otherwise not make any sense to make UB due to dominance of modern PC systems, are made UB solely for the purpose of being able to a wide variety of obscure architectures.
Posted Mar 23, 2024 13:35 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
The large memory models in MS-DOS did similar. The code segment and the data segment did not overlap. You needed far-pointers, which included the segment, be able to pass references around.
For PCs with a lot of memory it's feasible to have a unified address space, but micro-controllers where every byte counts it's different. It's perfectly fair for the C standard to consider it undefined behaviour, while POSIX explicitly defines it.
Posted Mar 25, 2024 9:17 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
It's UB in Standard C; POSIX defines it the way you state, but Standard C intends to support Harvard Architecture systems where void * is not large enough to store a function pointer.
This relates to one of the less annoying misconceptions about UB; while the ISO Standard may leave something as UB, or explicitly state that it's UB, it is perfectly permissible for downstream standards and implementations to define something that's UB in Standard C. So, for example, instead of getting Standard C to fix the shifting behaviour, we could (at least in theory) have made it a POSIX obligation on both C and C++ compilers, and required compilers to either drop POSIX support or do the sensible shift behaviour defined in C++20.
Posted Feb 23, 2024 19:56 UTC (Fri)
by ewen (subscriber, #4772)
[Link] (2 responses)
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-5...
show “REJECTED” status for me now (and minimal details on what the original claimed vulnerability was in the change history). It seems to have changed earlier today.
So it seems curl becoming a CNA did give them some additional leverage to reject “CVE” reports they thought were bogus.
But the curl project and others are right that the CVE system is pretty broken. And basically designed for the opposite of today’s situation, ie designed to track good faith reports against vendors unwilling to fix real problems. Sadly there’s a lot of momentum behind the CVE system now, so I don’t see it being fixed any time soon. And all vendors just being able to arbitrarily reject any report without documentation is itself vulnerable to abuse too.
Ewen
Posted Feb 23, 2024 21:34 UTC (Fri)
by bagder (guest, #38414)
[Link]
Posted Feb 26, 2024 14:19 UTC (Mon)
by hkario (subscriber, #94864)
[Link]
those vendors haven't gone away, Microsoft is still operating very much in this way, and I'm sure there are plenty others
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Therefore, signed overflow is UB in curl.
For a given set of compiler, libs, curl, etc.. maybe one can assess that it is a harmless bug by looking at the generated asm code.
But if I use a different compiler version, the assessment is completely invalid. It could very well have security implications in my setup.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Has nothing to do with signed-overflow being UB.
Undefined Behaviour as usual
It doesn't need to have a CVE entry to be remediated quickly.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
I could not imagine a single one.
Undefined Behaviour as usual
Undefined Behaviour as usual
That's an almost infinite amount of work.
Treating UB bugs as security bugs and fixing them right away is easy compared to that.
Undefined Behaviour as usual
Undefined Behaviour as usual
Because re-assessing known UB bugs is completely unnecessary, if they are understood as ticking time bombs that can trigger undefined behavior (including security issues) at any point in time if the ever so slightest bit of environment changes.
Rejecting a CVE is the opposite of that. It is not a responsible reaction to an UB bug. It is hiding your head in the sand.
Documenting and fixing the UB *solves* the problem once and for all. No re-assessment needed.
Undefined Behaviour as usual
Undefined Behaviour as usual
Upon system change: Having to check two things or having to check only one out of the two. What is better?
Data race UB is not defined in the implementation anywhere. It cannot be defined. It is non-deterministic. That's just one example.
It is not.
Undefined Behaviour as usual
Undefined Behaviour as usual
There is no way to "really make it's behaviour undefined", except by compiling to code where the architecture makes the behaviour undefined. Sane computer architects don't, and even insane architects don't do it for anything that a compiler is likely to output for signed addition. When reading past the end of an array, the code produced by the compiler usually just reads past the end of the array; it takes special effort by the compiler writer to do something different; but yes, compiler writers have taken this special effort, with the result being that there are cases of "optimizing" a loop with a read past the end of an array into an infinite loop.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
If you're recompiling with the same toolchain+options I'd expect the resulting binaries to behave damn near identically from one build to the next [1]. (Indeed, as software supply-chain attacks become more widespread, 100% binary reproducibility is a goal many folks are working towards)
Undefined Behaviour as usual
> This is not universally true once you're in supervisor mode (since you can do things on some platforms like change power rails to be out-of-spec [...]
Undefined Behaviour as usual
OTOH, while the behavior of that off-by-one read is truly unknowable from C's perspective, if the developer "corrects" the bug by incrementing the array size by one (therefore making it too large for the hw resource) the program is now "correct" from C's perspective, but will trigger the same nasty consequences on the actual hardware.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
The gcc option -fno-strict-overflow would imply -fwrapv (but I have not looked in the sources whether curl is compiled with that).
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Compilers exploiting UB happens all the time. It is the base of all optimizations.
Not really, Java and C# do not have undefined behavior, and yet there are optimizing compilers.
Even for C, it's a rather extreme position to say that register allocation (probably among the top three optimizations to implement in a compiler for current architectures) depends on undefined behavior. For others like constant propagation it's a bit of a stretch, too.
> Not really, Java and C# do not have undefined behavior, and yet there are optimizing compilers.
Undefined Behaviour as usual
chat *
. This may break your program if it does register allocation in a different way.register
variables could only use si
and di
(and they weren't used for anything else) and thus it was possible to write code that would [ab]use that in it's signal handler.Undefined Behaviour as usual
Undefined Behaviour as usual
Compilers exploiting UB happens all the time. It is the base of most optimizations.
Undefined Behaviour as usual
Undefined Behaviour as usual
You can't do I/O.
Undefined Behaviour as usual
Undefined Behaviour as usual
* "Verified" code (where the compiler will let you write whatever you want, but you also write a machine-verified proof that no UB will actually occur).
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Compilers exploiting UB happens all the time. It is the base of all optimizations
Nonsense! Compilers don't need to assume that the program does not exercise undefined behaviour in order to, e.g., optimize 1+1 into 2. Actually, assuming that the most troublesome undefined behaviours (e.g., signed overflow or whatever is behind strict aliasing) do not happen has little performance impact.
I think the traditional description of undefined behavior as "demons fly out of your nose" has done a disservice to understanding it. (After all, if the generated code had access to either demons or your nose, there would be a security vulnerability somewhere in granting that power to your userspace account in the first place. :) )
Undefined Behaviour as usual
WCHAR prefix[3] = {0};
if (something) {
DEBUGASSERT(prefix[3] == L'\0');
more stuff;
}
While it's pretty obvious to a human reader in context that this is just a typo, it's probably harder for a compiler to distinguish this from, say,
#ifdef LINUX
#define NUMBER_OF_THINGS 10
#else
#define NUMBER_OF_THINGS 5
#endif
thing_t things[NUMBER_OF_THINGS] = {0};
if (some_function_in_another_file_that_only_succeeds_on_linux()) {
thing[7] = something;
more stuff;
}
which, as long as some_function_in_another_file_that_only_succeeds_on_linux()
does what it says, never actually invokes undefined behavior. The compiler can notice that the assignment is undefined in the non-Linux case, and instead of doing something villainous, it can do something useful, i.e., assume that the assignment statement cannot be reached and dead-code-eliminate it and everything after it until the closing brace - and then dead-code eliminate the function call because there's nothing left.
Undefined Behaviour as usual
Wol
Off-topic
Off-topic
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
> Signed overflow being undefined behavior is a little bit silly because a well-intentioned optimizing compiler will only use that optimization for one purpose: to emit whatever the underlying CPU's most efficient arithmetic instructions are to handle the non-overflowing case.
Undefined Behaviour as usual
-O2
mode. Take that away and we are hack to square one.Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
https://en.wikipedia.org/wiki/Hardware_random_number_gene...
Undefined Behaviour as usual
Undefined Behaviour as usual
Unless the hardware access is correctly marked as unsafe volatile memory access. Which of course is necessary for every access to the real world (a.k.a. hardware) outside of the language's virtual machine model.
Undefined Behaviour as usual
[2] Strictly speaking it should also be able to return "Failure"
[3] Which is itself a predictable physical property of the materials, and is taken into account in the TRNG design.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Wol
Undefined Behaviour as usual
Wol
Undefined Behaviour as usual
Undefined Behaviour as usual
Laplace's demon is wrong.
The thinking that we just don't know all the rules is wrong.
Undefined Behaviour as usual
Wol
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
> And actually, compilers could agree on this without even revisiting the standard.
Undefined Behaviour as usual
Undefined Behaviour as usual
> As a reminder, the language rules are not public, you have to pay to get them, or find one of the latest drafts. *That* is the biggest issue.
Undefined Behaviour as usual
Undefined Behaviour as usual
> Even some of the C books I used to read 35 years ago to try to learn the language by myself never made mention of such specificities.
Undefined Behaviour as usual
void foo(int* i) {
i[1] = 42;
}
int bar() {
int i = 3, j = 3, k = 3;
foo(&i);
return j + k;
}
int main() {
printf("%d\n", bar());
}
foo
/bar
code again. Most compilers released in last century would silently eliminate some code without any option left to at least get a compile-time warning from that example. But would you really want warnings there?Rust
is a good candidate, but there are others), too: “the growing generation” doesn't yet have any investments in C or C++ and if the plan is to, gradually, replace developers and rewrite everything… it's just easier to track status of that work if new code is written in a new language, isn't it?Undefined Behaviour as usual
> For example Rust's NonNull<T> is a pointer that's not null.
Undefined Behaviour as usual
unsafe
to forcibly shove null
in it and then “bad thing happen”™.NonNull<T>
is just a 8byte piece of data in memory which accepts sequence of 64 zero bits just fine and use that to save some memory somewhere. This is just not guaranteed to work and people (and not the compiler!) would punish you if you would insist that it's just fine because you have unit tests and they are working.Undefined Behaviour as usual
Undefined Behaviour as usual
Wol
Undefined Behaviour as usual
Undefined Behaviour as usual
You are correct about it being undefined behavior, and you are correct that a compiler can cause the program to behave arbitrarily in response to it, but I would argue it is still not a vulnerability.
Undefined Behaviour as usual
system("rm -rf /")
in the case of signed integer overflow. The argument is that attackers do not have access to set an arbitrary retry delay value, and any real-world system that uses curl where attackers do have this access has a more fundamental vulnerability - e.g., they can provide arbitrary arguments to the curl command line, or they already have code execution in an address space in which libcurl is linked. Even in the most limited case where the attacker can only specify this one value and nothing other than curl is imposing limits on the value they'd still be able to effectively request that curl should hang for 24 days on a 32-bit system or until well past the heat death of the universe on a 64-bit system, which is already a denial of service vulnerability, and fixing that would avoid hitting the integer overflow in curl. And if you, yourself, as the legitimate user of curl or libcurl provide a ludicrous value for the timeout, it cannot be said to be a vulnerability, because you are not an attacker of your own system.
Undefined Behaviour as usual
int main(int argc, char *argv[])
{
wchar_t prefix[3] = {0};
if (f()) {
assert(prefix[3] == L'\0');
}
return EXIT_SUCCESS;
}
I gave a somewhat contrived example in this other comment. It is entirely possible that the OOB-ness of the access is conditional in some way, such as via preprocessor macros or code generation from some template, and the programmer knows that f() is not going to actually return true in the case where the access would be out of bounds.
Undefined Behaviour as usual
inline int is_even(unsigned char *p) {
if (p[0] & 0x80)
return p[1] % 2 == 0;
return p[0] % 2 == 0;
}
unsigned char FIFTEEN[] = {0x15};
if (is_even(FIFTEEN))
printf("15 is even\n");
After inlining there's a line of code talking about FIFTEEN[1]
which is out of bounds, inside an if statement, just like your example. The if statement doesn't succeed, so there's no UB, but you need to do some compile-time constant expression evaluation to conclude that, and it's pretty reasonable, I think, to have a compiler that supports inlining but does no compile-time arithmetic.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
and helpfully optimise away all reads of said bit. Which, of course, will break everything.
Undefined Behaviour as usual
Which kind of defeats the purpose of UB then. It's defined behavior then.
> How is it reasonable? If compiler can prove OOB access to on-stack array, it should _refuse_ to compile and report an error.
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Undefined Behaviour as usual
Stenberg: DISPUTED, not REJECTED
https://nvd.nist.gov/vuln/detail/CVE-2023-52071
Stenberg: DISPUTED, not REJECTED
Stenberg: DISPUTED, not REJECTED