Better handling of integer wraparound in the kernel

Posted Jan 27, 2024 5:45 UTC (Sat) by adobriyan (subscriber, #30858)
In reply to: Better handling of integer wraparound in the kernel by tialaramex
Parent article: Better handling of integer wraparound in the kernel

>Wrapping<i64>

Wrapping signed integers is a funny thing by itself.

I'd argue that Algebraic Types Party doesn't do _that_ better because overflowing/saturating is property of an operation not a type.

In a perfect world Rust would invent new syntax for "unsafe" additions/multiplications. Cryptographers would use it and everyone else would use regular + which traps.

Better handling of integer wraparound in the kernel

Posted Jan 27, 2024 8:30 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (11 responses)

> In a perfect world Rust would invent new syntax for "unsafe" additions/multiplications. Cryptographers would use it and everyone else would use regular + which traps.

Rust already did that. They're simple method calls on the primitive integer types (e.g. wrapping_add, saturating_sub, checked_div, etc.). Wrapping<T> etc. types are for cases where you want to override what + does (because writing wrapping_add all the time is obnoxious). If you want to manually spell out the operation every time, you can do that.

Better handling of integer wraparound in the kernel

Posted Jan 27, 2024 9:10 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (10 responses)

Yes, they added wrapping_add() and friends, it's lengthy and wordy.

I'm thinking more about "a [+] b" for unsafe addition , etc

Better handling of integer wraparound in the kernel

Posted Jan 27, 2024 15:36 UTC (Sat) by Paf (subscriber, #91811) [Link]

I dunno, it seems like these should be rare, which makes it ok if they’re lengthy and good that the existence of that code construct is well advertised.

Better handling of integer wraparound in the kernel

Posted Jan 27, 2024 22:50 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (8 responses)

The problem with overflow modes is that there are so many to choose from! Rust provides at least the following:

* carrying_add(): A helper method for implementing bignum addition. There's also borrowing_sub() for subtraction. Probably not useful for general-purpose addition.
* checked_add(): Returns an Option<T> which is None if overflow would happen. Nicely compatible with the question mark operator, match, if let, etc. constructs.
* overflowing_add(): Returns the result of wrapping_add() as well as a bool indicating whether we overflowed.
* saturating_add(): Returns (the Rust equivalent of) INT_MAX when we overflow.
* unchecked_add(): Overflow is undefined behavior. Can only be called in an unsafe block (or unsafe function).
* wrapping_add(): Wraps around using two's complement (signed) or modular arithmetic (unsigned).

You could, hypothetically, have a menagerie of different symbols for all of these different use cases, but it's going to look like Perl or APL very quickly. You could, I suppose, pick one of them as the "best" overflow mode and just give that a symbol, but I guarantee you it won't be unchecked_add() as your comment seems to suggest (they are never adding a binary operator that can only be used in unsafe blocks). The more flexible option is picking which mode you "usually" want to use, and using the Wrapping<T> etc. types for that.

One thing that does bother me is the fact that overflowing_add() returns a tuple instead of some kind of ADT wrapper. That would force you to explicitly destructure it and handle both the wrapping and non-wrapping cases. With a tuple, you can forget to check the bool in some code paths. It's not really a disaster because you probably won't have overly elaborate code surrounding the callsite, but it's still an opportunity for things to go wrong.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 8:13 UTC (Sun) by epa (subscriber, #39769) [Link] (6 responses)

In my ideal language there would also be cannot_overflow_add() which checks at compile time overflow is impossible. It would need bounded integer types like those in Ada. You can achieve something like this with template metaprogramming in C++ but I think it important enough to be part of the language.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 10:35 UTC (Sun) by khim (subscriber, #9252) [Link] (5 responses)

> It would need bounded integer types like those in Ada.

Useless, pointless and not at all helpful in real code.

> In my ideal language there would also be cannot_overflow_add() which checks at compile time overflow is impossible.

That's what Wuffs does.

Turning that into general-purpose language with pervasive dependent typing is surprisingly hard.

Maybe someone would manage to do that 10 or 20 years down the road.

> You can achieve something like this with template metaprogramming in C++ but I think it important enough to be part of the language.

No, you can not. The only thing you may achieve with templates are some statically-defined checks and these are not much useful for real programs with dynamically allocated externally-specified objects. And kernel is full of these.

You may also reject C++ entirely and built entirely separate language where every type would be your own template type and everything is done using these, but at this point you are firmly in the “pseudo-language written in C++ templates” and it's better to just create a separate language than pretending you are still dealing with C++.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 11:26 UTC (Sun) by epa (subscriber, #39769) [Link] (4 responses)

I’ve written a toy C++ template library that provides a bounded<n,m> type (guaranteeing n <= x < m) with the basic arithmetic operations yielding the bounds you’d expect. Then explicit coercion with a run time check for creating a bounded type from an ordinary int, or for narrowing the bounds.

Useful in practice? Perhaps not much. My concern was more about eliminating logic errors (and making explicit the places overflow can occur), rather than memory safety. The common style nowadays is to avoid arbitrary limits. Users would not be happy if grep had a maximum line length of 2000 characters. But in the real world you can usually put a bound on things: no aeroplane carries more than ten thousand passengers and no person is aged over two hundred. Ada does not have bounded integers purely out of design by committee. In the end your integer types will have a bound, like it or not, and forcing the programmer to consider it explicitly can be helpful.

So yes, I do agree with your comment. You can’t realistically retrofit bounded integers into C++ given the amount of existing code; and with everything dynamically allocated (rather than static fixed size buffers) they are not that useful for memory safety. Carrying a full proof of allowed bounds alongside each variable requires more cleverness than compilers have (although even a system that required numerous “trust me” annotations might have its uses). I was blue-skying about my ideal language rather than proposing a practical change to an existing language or codebase.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 12:01 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

> But in the real world you can usually put a bound on things

No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow.

Maybe embedded may work with precomputed limits, but every time I saw TeX there was always hugeTeX for people who need more or something.

Beyond certain program complexity using arbitrary limits is not useful and below that limit using them is not too helpful.

That's why I said such types are useless and pointless: where they are useful (small programs, limited scale) they are pointless, where they could be beneficial (large programs, complicated logic, data comes from outside) they are not useful.

> But in the real world you can usually put a bound on things: no aeroplane carries more than ten thousand passengers and no person is aged over two hundred.

And then someone tries to use you program for a cruise ship or tries to enter data about three hundreds years Juridical person and everything explodes.

Thanks, but no, thanks.

> Ada does not have bounded integers purely out of design by committee.

It does have them because ALGOL and Pascal have them. But they are not much useful in any of these languages.

Perhaps they were useful when they were invented and when programs were written once, run once and then discarded. But today we are not writing programs in such fashion and arbitrary static limits cause more harm than good.

> I was blue-skying about my ideal language rather than proposing a practical change to an existing language or codebase.

And I was thinking about practical applicability instead. I'm pretty sure with enough automation proof-carrying code may be pretty usable, but for now are stuck with Rust and that was right choice: lifetime tracking causes significantly more grief than out-of-bounds access.

Perhaps after Rust would become the “new C” we may start thinking about dependent types. They need to have lifetime tracking as their basis, anyway, or else they wouldn't be much useful in practice, thus Rust made the right choice.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 12:15 UTC (Sun) by epa (subscriber, #39769) [Link]

It's the "everything explodes" scenario I am trying to avoid. By having the bounds explicit and tracked at every use, the program will reject the value of twenty thousand passengers when that value is first entered. TeX is an example of an old program written in a fixed-allocation style when dynamic would nowadays be considered better. Nobody is doing safety-critical embedded systems running typesetting with hard real-time constraints, or hardening their TeX macros against an attacker. But then for embedded applications, as you mention, you may need to set an upper bound on memory usage or require that no dynamic allocation and freeing happens.

Airline passenger management is a kind of middle ground between these two. It's not in itself safety-critical and doesn't need to run on tiny systems. But then, it does need to interact with parts of the real world that have fixed limits. Perhaps the passenger number is at most three digits long, or the dot-matrix printer only has 80 columns, or there is a database system with fixed storage size. In those cases I would prefer to be explicit about the bounds of a number rather than write code that purports to work with any number but in fact does have an upper bound somewhere, just nobody knows quite what it is.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 18:00 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> > But in the real world you can usually put a bound on things

> No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow.

Stop being an arrogant idiot!

Okay, I can't think of an example, and I guess you haven't even bothered to look, but I'm sure other people will be able to find examples where a certain positive big number indicates an error. Certainly I'm sure there are examples where the mere EXISTENCE of a negative value is an error (in other words 0 is an absolute lower bound).

(Actually, as a chemist, I've just thought of a whole bunch of examples. s is either 1 or 2 (can be empty aka 0). Likewise, p is 1 to 6. d is 1 to 10. f is 1 to 14. ANY OTHER VALUE IS AN ERROR.) To have the compiler make sure I can't screw up would be a wise choice.

And please, WHY ON EARTH would you want to store an entry about a company into a genealogical database? While it's possible it'll change (extremely unlikely), any value for age outside 0 - 126 is pretty much impossible. In fact, surely you DO want to limit that value, precisely in order to trigger an error if someone is stupid enough to try to enter a judicial person into the database!

Cheers,
Wol

Enough

Posted Jan 28, 2024 22:23 UTC (Sun) by corbet (editor, #1) [Link]

Ok let's stop this here please. Seriously.

Better handling of integer wraparound in the kernel

Posted Jan 28, 2024 10:53 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

> saturating_add(): Returns (the Rust equivalent of) INT_MAX when we overflow.

(I'm sure you know this but) note that saturation occurs at _both_ ends of the range of an integer type, if you (-100i8).saturating_add(-100) that's the 8 bit signed integer -128 aka i8::MIN not i8::MAX