Development quote of the week

Posted Dec 6, 2022 14:03 UTC (Tue) by khim (subscriber, #9252)
In reply to: Development quote of the week by farnz
Parent article: Development quote of the week

Only unsigned types are defined to work like you describe. Signed types work differently.

Development quote of the week

Posted Dec 6, 2022 14:11 UTC (Tue) by farnz (subscriber, #17727) [Link] (7 responses)

I'm sorry, you have me completely confused. I described a system where addition and multiplication are only defined for a subset of pairs of inputs, which is pretty much signed types in C - if I have a 32 bit integer type, then 2**17 * 2 ** 17 is UB in C and in Pure Mathematics for a type that's not a field, but is an abelian group, whereas 4 * 3 is defined, and is the same as 3 * 4 (the property that makes this an abelian group).

Unsigned types are fields, I thought? The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

Development quote of the week

Posted Dec 6, 2022 14:49 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> I described a system where addition and multiplication are only defined for a subset of pairs of inputs

No. I haven't read that in your description and, worse, group is defined like this:

A group is a set 𝐆 together with a binary operation on 𝐆, here denoted "·", that combines any two elements 𝓪 and 𝓫 to form an element of 𝐆, denoted 𝓪·𝓫, such that the following three requirements, known as group axioms, are satisfied: associativity, identity element, inverse element.

You have started talking about group axioms without checking whether result is defined for all elements of supposed group. Unsigned numbers are a group. Signed numbers are not group.

> Unsigned types are fields, I thought?

No, they don't form a field. uint32_t(65536) doesn't have an inverse element (that is: you can not find any 𝔁 to make uint32_t(𝔁)*uint32_t(65536)=uint32_t(1)), it's a ring. Also a well defined mathematical object, but not a field.

> The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

Semigroup, group, abelian group, ring… they all call for operations to be defined for all elements.

But field definition includes UB: reciprocal not defined for zero (and only for zero, that's why unsigned numbers in C are not a field).

That, somehow, never changes the opitions of O_PONIES lovers: they, usually, continue to support the notion that it doesn't mean anything (or, worse, start claiming that the fact that it's UB to divide by zero even for unsigned numbers is also a problem… even if rarely bites anyone because most people test for zero, but not for overflows).

> whereas in what I described, some input values for addition and multiplication result in undefined behaviour

If something doesn't follow the definition then you can't say that it's “𝓧, but not 𝓧”. Well… technically you can say that, but it's a tiny bit pointless: math doesn't work like that, you couldn't use theorems proven for 𝓧 with “𝓧, but not 𝓧”.

You may say that while unsigned numbers properly model ℤ₂ᴺ ring, while signed numbers correctly models ℤ if (and only if) you ensure that operations on them don't overflow, but they don't form semigroup, group, abelian group, or ring.

I guess the idea was that people may need ℤ₂ᴺ, but since ℤ is not feasible to provide in low-level lamguage like C (even python have trouble with ℤ) thus asking developer to use them carefully and avoid them when result is not guaranteed wouldn't too problematic. After all they have to remember not to divide by zero in math, why couldn't be they taught not to overflow in C?

As you can see some people don't like that idea (to put it midly).

Development quote of the week

Posted Dec 6, 2022 14:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

I made mistakes in the terminology (I'd forgotten the details of groups) - but I stand by my claim that it's mathematically reasonable to have a finite set, F, with binary operations + and * defined only for a subset of pairs of elements of F, and that maths does not always claim that because I have a finite set F, with a binary operation +, it must follow all the axioms of arithmetic.

Rather, C's signed integers are a finite set that behaves like the integers in some respects but not others, and results that hold for mathematical integers do not necessarily hold for C signed integers, but that this doesn't mean that C signed integers are not a mathematically acceptable set - so reaching for mathematics and saying "but maths says!" just highlights that you're not particularly good at mathematics, and expect C's signed integers to behave like a set you're familiar with from school, rather than like the sort of sets that you might start discussing in a bachelor's degree.

Development quote of the week

Posted Dec 6, 2022 15:54 UTC (Tue) by anselm (subscriber, #2796) [Link] (1 responses)

The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

ISTR that even in a group, the result of the operation on two members of the group must always be a member of the group. “Undefined behaviour” where applying the operation on two members of the group doesn't yield a result within the group isn't allowed. Nor does it work to exclude, e.g., 2**17 from the set on the grounds that 2**17 * 2**17 isn't in the set; you need it in the set so that (non-overflowing) operations like 2 * 2 ** 16 can have a result. (Fields get that property from the fact that they're basically built on groups.)

Development quote of the week

Posted Dec 6, 2022 16:37 UTC (Tue) by farnz (subscriber, #17727) [Link]

Yes - that's my error. I stand by the claim that C signed integers are a perfectly reasonable mathematical object, and that their behaviour is happily described by pure mathematics.

But I have to retract my claim that they're a group - they're still a finite set, with several operations that give them superficial similarity to the set of integers, but they're not a group.

Development quote of the week

Posted Dec 6, 2022 21:09 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Unsigned types are fields, I thought?

They are rings, not fields (no multiplicative inverse, since we're not looking at floats).

Signed integers are not even rings, because at least one element doesn't have an inverse (e.g. there's -32768 but not 32768).

Development quote of the week

Posted Dec 6, 2022 22:33 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

That's not how rings behave. -32768 is inverse of itself. In fact it's exact same ring which you have with unsigned numbers (most CPUs these days don't even bother to give you separate operations for signed or unsigned numbers, they only distinguish signed and unsigned numbers when you compare them, but ring only deals with addition, multiplication, and, like any ring, equality… ordering is not part of it).

That maybe one reason for why C standard defines signed operations like it does: if you want ring — you already have one, unsigned numbers, why would you need another, identical one?

Development quote of the week

Posted Dec 7, 2022 17:31 UTC (Wed) by kleptog (subscriber, #1183) [Link]

Unsigned integers in C form a well defined ring: Z_2^16 or whatever your bit size is, the integers modulo a power of two. This gives you commutative, associative and distributive properties. The operations don't preserve ordering though.

Signed integers on computers don't form a ring, not in a useful sense anyway. For example 10*19=-66 on an 8-bit machine. You can argue that if you look at the bit patterns it's the same as unsigned integers but that's just relabelling. What I think the C designers wanted was to consider the signed integers a subset of the natural numbers. Because that gives you a meaningful way to deal with ordered comparisons.

For example, the implication a<b => 2a<2b is *not true* in the unsigned integers. This is however super annoying for a compiler optimiser. The standard compiler transformation in C where you change loop counters to count by the width of the object you're looping over is strictly not allowed unless you can prove the loop bound won't overflow. The compiler often can't prove it, so that would leave a simple optimisation on the floor.

Unless you assume overflow is UB.

"a+b > a if b is positive" is false in the unsigned integers. But a very useful thing to be able to assume when optimising code. The classic example of code that changes with -fwrapv:

int f(int i) {
return i+1 > i;
}

Is optimised to true without it.

Pointer arithmetic is another case, you don't want to treat this as a full on ring either, because pointer comparison is so useful (can we assume a[b] is always later in memory than a[c] if b>c?). So the compiler simply assumes overflow cannot happen.

I don't think the compiler writers are out to get you. But if you want signed integers to act 100% like CPU registers, you do need to set the compiler flag.