Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Posted Jan 27, 2024 2:31 UTC (Sat) by tialaramex (subscriber, #21167)Parent article: Better handling of integer wraparound in the kernel
Posted Jan 27, 2024 4:00 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Nowadays we have the "abstract machine" and such affordances, so it's no longer seen as surprising that + might compile into something other than an add instruction. So you could also say this is K&R's fault for designing C too close to the metal. But hey, hindsight is 20/20.
Posted Jan 27, 2024 5:49 UTC (Sat)
by willy (subscriber, #9762)
[Link]
So I'd blame the CDC-6600 although you can blame the PDP-1 or LINC if you're determined to make DEC the bad guy.
Posted Jan 27, 2024 5:45 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (12 responses)
Wrapping signed integers is a funny thing by itself.
I'd argue that Algebraic Types Party doesn't do _that_ better because overflowing/saturating is property of an operation not a type.
In a perfect world Rust would invent new syntax for "unsafe" additions/multiplications. Cryptographers would use it and everyone else would use regular + which traps.
Posted Jan 27, 2024 8:30 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (11 responses)
Rust already did that. They're simple method calls on the primitive integer types (e.g. wrapping_add, saturating_sub, checked_div, etc.). Wrapping<T> etc. types are for cases where you want to override what + does (because writing wrapping_add all the time is obnoxious). If you want to manually spell out the operation every time, you can do that.
Posted Jan 27, 2024 9:10 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (10 responses)
I'm thinking more about "a [+] b" for unsafe addition , etc
Posted Jan 27, 2024 15:36 UTC (Sat)
by Paf (subscriber, #91811)
[Link]
Posted Jan 27, 2024 22:50 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (8 responses)
* carrying_add(): A helper method for implementing bignum addition. There's also borrowing_sub() for subtraction. Probably not useful for general-purpose addition.
You could, hypothetically, have a menagerie of different symbols for all of these different use cases, but it's going to look like Perl or APL very quickly. You could, I suppose, pick one of them as the "best" overflow mode and just give that a symbol, but I guarantee you it won't be unchecked_add() as your comment seems to suggest (they are never adding a binary operator that can only be used in unsafe blocks). The more flexible option is picking which mode you "usually" want to use, and using the Wrapping<T> etc. types for that.
One thing that does bother me is the fact that overflowing_add() returns a tuple instead of some kind of ADT wrapper. That would force you to explicitly destructure it and handle both the wrapping and non-wrapping cases. With a tuple, you can forget to check the bool in some code paths. It's not really a disaster because you probably won't have overly elaborate code surrounding the callsite, but it's still an opportunity for things to go wrong.
Posted Jan 28, 2024 8:13 UTC (Sun)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted Jan 28, 2024 10:35 UTC (Sun)
by khim (subscriber, #9252)
[Link] (5 responses)
Useless, pointless and not at all helpful in real code. That's what Wuffs does. Turning that into general-purpose language with pervasive dependent typing is surprisingly hard. Maybe someone would manage to do that 10 or 20 years down the road. No, you can not. The only thing you may achieve with templates are some statically-defined checks and these are not much useful for real programs with dynamically allocated externally-specified objects. And kernel is full of these. You may also reject C++ entirely and built entirely separate language where every type would be your own template type and everything is done using these, but at this point you are firmly in the “pseudo-language written in C++ templates” and it's better to just create a separate language than pretending you are still dealing with C++.
Posted Jan 28, 2024 11:26 UTC (Sun)
by epa (subscriber, #39769)
[Link] (4 responses)
Useful in practice? Perhaps not much. My concern was more about eliminating logic errors (and making explicit the places overflow can occur), rather than memory safety. The common style nowadays is to avoid arbitrary limits. Users would not be happy if grep had a maximum line length of 2000 characters. But in the real world you can usually put a bound on things: no aeroplane carries more than ten thousand passengers and no person is aged over two hundred. Ada does not have bounded integers purely out of design by committee. In the end your integer types will have a bound, like it or not, and forcing the programmer to consider it explicitly can be helpful.
So yes, I do agree with your comment. You can’t realistically retrofit bounded integers into C++ given the amount of existing code; and with everything dynamically allocated (rather than static fixed size buffers) they are not that useful for memory safety. Carrying a full proof of allowed bounds alongside each variable requires more cleverness than compilers have (although even a system that required numerous “trust me” annotations might have its uses). I was blue-skying about my ideal language rather than proposing a practical change to an existing language or codebase.
Posted Jan 28, 2024 12:01 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow. Maybe embedded may work with precomputed limits, but every time I saw Beyond certain program complexity using arbitrary limits is not useful and below that limit using them is not too helpful. That's why I said such types are useless and pointless: where they are useful (small programs, limited scale) they are pointless, where they could be beneficial (large programs, complicated logic, data comes from outside) they are not useful. And then someone tries to use you program for a cruise ship or tries to enter data about three hundreds years Juridical person and everything explodes. Thanks, but no, thanks. It does have them because ALGOL and Pascal have them. But they are not much useful in any of these languages. Perhaps they were useful when they were invented and when programs were written once, run once and then discarded. But today we are not writing programs in such fashion and arbitrary static limits cause more harm than good. And I was thinking about practical applicability instead. I'm pretty sure with enough automation proof-carrying code may be pretty usable, but for now are stuck with Rust and that was right choice: lifetime tracking causes significantly more grief than out-of-bounds access. Perhaps after Rust would become the “new C” we may start thinking about dependent types. They need to have lifetime tracking as their basis, anyway, or else they wouldn't be much useful in practice, thus Rust made the right choice.
Posted Jan 28, 2024 12:15 UTC (Sun)
by epa (subscriber, #39769)
[Link]
Airline passenger management is a kind of middle ground between these two. It's not in itself safety-critical and doesn't need to run on tiny systems. But then, it does need to interact with parts of the real world that have fixed limits. Perhaps the passenger number is at most three digits long, or the dot-matrix printer only has 80 columns, or there is a database system with fixed storage size. In those cases I would prefer to be explicit about the bounds of a number rather than write code that purports to work with any number but in fact does have an upper bound somewhere, just nobody knows quite what it is.
Posted Jan 28, 2024 18:00 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
> No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow.
Stop being an arrogant idiot!
Okay, I can't think of an example, and I guess you haven't even bothered to look, but I'm sure other people will be able to find examples where a certain positive big number indicates an error. Certainly I'm sure there are examples where the mere EXISTENCE of a negative value is an error (in other words 0 is an absolute lower bound).
(Actually, as a chemist, I've just thought of a whole bunch of examples. s is either 1 or 2 (can be empty aka 0). Likewise, p is 1 to 6. d is 1 to 10. f is 1 to 14. ANY OTHER VALUE IS AN ERROR.) To have the compiler make sure I can't screw up would be a wise choice.
And please, WHY ON EARTH would you want to store an entry about a company into a genealogical database? While it's possible it'll change (extremely unlikely), any value for age outside 0 - 126 is pretty much impossible. In fact, surely you DO want to limit that value, precisely in order to trigger an error if someone is stupid enough to try to enter a judicial person into the database!
Cheers,
Posted Jan 28, 2024 22:23 UTC (Sun)
by corbet (editor, #1)
[Link]
Posted Jan 28, 2024 10:53 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
(I'm sure you know this but) note that saturation occurs at _both_ ends of the range of an integer type, if you (-100i8).saturating_add(-100) that's the 8 bit signed integer -128 aka i8::MIN not i8::MAX
Posted Jan 27, 2024 21:14 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (10 responses)
But they magically have different behavior depending on the whether you compile in debug or release mode, because the "debug" and "release" profiles define `overflow-checks` differently. This is surprising, too, and I'm not sure, if that was a good design choice.
Posted Jan 28, 2024 10:35 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
Posted Feb 7, 2024 8:24 UTC (Wed)
by milesrout (subscriber, #126894)
[Link] (4 responses)
"You can only shoot yourself in the foot if you hold it wrong"
"Only incorrect programs have this problem" is the exact issue Rust is meant to prevent. What's the point of Rust if it doesn't prevent one of the most common sources of vulnerabilities?
Posted Feb 7, 2024 10:48 UTC (Wed)
by atnot (guest, #124910)
[Link]
> What's the point of Rust if it doesn't prevent one of the most common sources of vulnerabilities?
The main reason overflows lead to vulnerabilities is by breaking bounds checks. Since bounds checks are still applied by the compiler in a safe after whatever math you did, there's not really any way to turn it into a memory vulnerability. Unless you're manually writing bound checks in unsafe Rust, which is honestly pretty rare.
Posted Feb 7, 2024 10:56 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (2 responses)
The different fully-defined behaviours are, in practice, not a common source of vulnerabilities; by default, in production, you get wrapping (2s complement wrapping if signed), in development you get panics.
This differs from the issue in C, where overflow is a common source of vulnerabilities since it's not defined for signed integers, and thus the compiler is allowed to surprise a developer.
Posted Feb 7, 2024 11:36 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
Okay, okay, that's being optimistic, but that is the likely course of events ...
Cheers,
Posted Feb 7, 2024 12:07 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
That's one component of it; the other is that if it does wrap, you're not going to be surprised by the resulting behaviour, whereas in C, you can get weirdness where overflowing signed arithmetic does unexpected things. Taking the following code as an example:
In standard C semantics, because arithmetic overflow is undefined, it is legal for all four comparisons to evaluate to true, and thus for all four printfs to print. In Rust semantics, because of the wraparound rule. c will be equal to (the equivalent of) INT_MAX - 4, and thus only the b < c condition is true.
This, in turn, means that you're less likely to be surprised by the result in Rust, since it's at least internally consistent, even if it's not what you intended. And thus, if you use the result of the arithmetic operation to do something like indexing an array, instead of the compiler being able to go "well, clearly c should be zero here, so I can elide the bounds check", the compiler does the bounds check and panics for a different reason (index out of bounds). You've now got an "impossible" error, and can debug it.
Note that this relies on Rust's memory safety guarantees around indexing; if you used pointer arithmetic with c instead, you could get an out-of-bounds read, and hence have trouble. The only thing that helps here is that pointer dereferencing is unsafe, and hence if you get unexpected SIGSEGVs or similar, that's one of the first places you'll look.
Posted Jan 29, 2024 2:18 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Are there situations where data integrity might dictate failing rather than producing an incorrect value? Of course, but those are properly handled with checked_add() and friends, not with crashing the entire service. If there is any input that can cause your service to crash, it is an outage waiting to happen.
Posted Jan 29, 2024 7:23 UTC (Mon)
by donald.buczek (subscriber, #112892)
[Link] (1 responses)
Considering your example, isn't there a concern that your internal or external user might have other requirements and would prefer no data over wrong data? Overlooking unexpected overflows might result in serious incidents like Cloudbleed [1]. As a customer, I'd rather see a temporary service disruption than having to learn that my private data has been silently exposed and spilled into search engine caches.
[1]: https://blog.cloudflare.com/quantifying-the-impact-of-clo...
Posted Jan 29, 2024 9:25 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
This is fine if the program is running on the user's computer. If it is not, then a crash will affect everyone whose requests are being processed by that machine, and in the case of cascade failure, fallout will be wider still. There may be situations where failing rather than producing wrong data is preferred, but it is the responsibility of the programmer to understand that requirement and use checked_add etc. instead of regular arithmetic operators. This is no different to any other functional requirement of a software system - if you write the code wrong, it will not work.
> Overlooking unexpected overflows might result in serious incidents like Cloudbleed [1].
Security is hard. I'm not going to solve the general problem of malicious inputs in an LWN comment, but in general, defensive programming is necessary (not sufficient) to deal with problems of this nature. Most languages do not provide integers that always crash on overflow, so handling overflow correctly is something that serious programmers will find themselves having to deal with on a regular basis regardless. Rust provides the necessary methods for doing this front and center, which is more than you can say for most languages (C is only standardizing the equivalent functions in C23!). If you want stronger guarantees than that, I would suggest rewriting the parser (and perhaps a small part of the parser-adjacent application logic) in wuffs.
Besides, if you allow malicious inputs to cause a crash, you make yourself far more vulnerable to denial of service attacks, which are also a security concern (albeit a less serious one).
Posted Jan 29, 2024 11:33 UTC (Mon)
by danielthompson (subscriber, #97243)
[Link]
I think here you are describing a choice of *defaults* for overflow-checks rather than a design choice.
In other words a developer who doesn't care can leave the defaults as they are. If a developer did, for example, wanted their panic handler run for any overflow then they can change the defaults for the release build. Cargo allows them to change it both for their own code and, with a wildcard override, for any crates they depend on!
Posted Feb 5, 2024 16:34 UTC (Mon)
by plugwash (subscriber, #29694)
[Link] (1 responses)
Posted Feb 5, 2024 17:00 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Well, configurable different behaviour with different defaults in debug and release mode - you can panic on overflow in release, and you can wrap on overflow in debug. Plus there's the types like Wrapping which fully define it as wrapping in both modes, plus functions like checked_add if you want to change behaviour on overflow.
That said, this is a potential footgun if you're unaware that overflow is potentially problematic; the reason the default is panic in debug builds is to increase the chances of you noticing that you depend on overflow behaviour, and ensuring that it doesn't happen.
Posted Feb 13, 2024 19:54 UTC (Tue)
by DanilaBerezin (guest, #168271)
[Link] (2 responses)
Posted Feb 13, 2024 20:11 UTC (Tue)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Feb 13, 2024 22:18 UTC (Tue)
by andresfreund (subscriber, #69562)
[Link]
Or at the very least they could have provided a sane way to check if overflow occurs. Introducing that decades after making signed overflow UB is insane. A correct implementation of checking whether the widest integer type overflows is quite painful, particularly for multiplication.
Here's postgres' fallback implementation for checking if signed 64bit multiplication overflows:
/*
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
* checked_add(): Returns an Option<T> which is None if overflow would happen. Nicely compatible with the question mark operator, match, if let, etc. constructs.
* overflowing_add(): Returns the result of wrapping_add() as well as a bool indicating whether we overflowed.
* saturating_add(): Returns (the Rust equivalent of) INT_MAX when we overflow.
* unchecked_add(): Overflow is undefined behavior. Can only be called in an unsafe block (or unsafe function).
* wrapping_add(): Wraps around using two's complement (signed) or modular arithmetic (unsigned).
Better handling of integer wraparound in the kernel
> It would need bounded integer types like those in Ada.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
> But in the real world you can usually put a bound on things
Better handling of integer wraparound in the kernel
TeX there was always hugeTeX for people who need more or something.Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
Ok let's stop this here please. Seriously.
Enough
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
Better handling of integer wraparound in the kernel
int a = 5;
int b = INT_MIN;
int c = b - a;
if (c < b) {
printf("c less than b\n");
}
if (b < c) {
printf("b less than c\n");
}
if (c == 0) {
printf("c is zero\n");
}
if (b == c) {
printf("b equals c\n");
}
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
>> they're signed
>
> But they magically have different behavior depending on the whether you compile
> in debug or release mode, because the "debug" and "release" profiles define
> `overflow-checks` differently. This is surprising, too, and I'm not sure, if that was a
> good design choice.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
That would make a huge difference.
Better handling of integer wraparound in the kernel
> That would make a huge difference.
* Overflow can only happen if at least one value is outside the range
* sqrt(min)..sqrt(max) so check that first as the division can be quite a
* bit more expensive than the multiplication.
*
* Multiplying by 0 or 1 can't overflow of course and checking for 0
* separately avoids any risk of dividing by 0. Be careful about dividing
* INT_MIN by -1 also, note reversing the a and b to ensure we're always
* dividing it by a positive value.
*
*/
if ((a > PG_INT32_MAX || a < PG_INT32_MIN ||
b > PG_INT32_MAX || b < PG_INT32_MIN) &&
a != 0 && a != 1 && b != 0 && b != 1 &&
((a > 0 && b > 0 && a > PG_INT64_MAX / b) ||
(a > 0 && b < 0 && b < PG_INT64_MIN / a) ||
(a < 0 && b > 0 && a < PG_INT64_MIN / b) ||
(a < 0 && b < 0 && a < PG_INT64_MAX / b)))
{
*result = 0x5EED; /* to avoid spurious warnings */
return true;
}
*result = a * b;
return false;
