|
|
Subscribe / Log in / New account

Python cryptography, Rust, and Gentoo

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 12:23 UTC (Thu) by vstinner (subscriber, #42675)
In reply to: Python cryptography, Rust, and Gentoo by roc
Parent article: Python cryptography, Rust, and Gentoo

CPython is made of 500K lines of C code. I can testify that it breaks at every GCC major release. Each time, we discover new "undefined behavior" which were running fine previously.


to post comments

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 14:03 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (40 responses)

> Each time, we discover new "undefined behavior" which were running fine previously.

Should be:

> Each time, we discover where we were using undefined behavior and just getting lucky previously.

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 18:44 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link] (39 responses)

This isn't a terribly meaningful distinction. It doesn't really matter if you want to blame it on C having lax standards that allow undefined behavior or on lazy programmers allowing their programs to depend on it, it shows that C is not such a stable platform for building complex programs in practice. It's not like the CPython team are a bunch of slackers who don't know how to program. If they're running into this kind of problem, it's a problem with the system rather than their specific group.

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 19:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (38 responses)

Oh, I certainly agree with that. Maybe I got a bit too quip-y here.

I'll note that I just added clang-tidy checking to one of our code bases (takes almost 2 hours too :( ). Tons of things are ignored because we've been lax for far too long, but getting things like ASan, UBSan, clang-tidy and a whole host of other tools looking at the code is important for C and C++ code bases to keep their sanity in the unfortunately not-always-well-understood corners of the languages that stick out all over the place.

But it's also a mistake to then turn around and blame the compiler for utilizing the freedom the language gives it for the developer's lack of knowledge in that area (and is why, IMO, the burden of proof is on the developer, not the linter, when masking lint detections). You either have to live with the dish C has been serving all of us for the past 40+ years with all the rot and flavor-enhancing spices we now have available or step up, get in the kitchen, and improve things. IMNSHO, Rust developers have been doing that.

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 19:41 UTC (Thu) by Wol (subscriber, #4433) [Link] (37 responses)

But this is what another post on LWN pointed me at - C IS NO LONGER A LOW-LEVEL LANGUAGE.

The whole point behind "undefined" or "implementation specific" behaviour was that - where CPU behaviour varied - it would do whatever was easiest for the CPU. The logical model behind C and modern processors have diverged so much that there is no longer a simple equivalence between the C language and the processor machine/assembly code. So "undefined behaviour is whatever the hardware does" no longer makes sense, but that is what is supposed to mean!

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 20:44 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (30 responses)

That's fine when you're coding for a given processor. When you're coding a portable program, undefined behavior is just not acceptable (unless someone foolishly decided "whatever C does here" as part of *their* spec).

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 21:30 UTC (Thu) by Wol (subscriber, #4433) [Link] (29 responses)

Which is why C is not a good language - which is what a lot of posters here are saying.

Personally, I find C a perfectly okay language. I just feel that C, and Unix, and all that are a perfect example of what matters is not being any good, what matters is being in the right place at the right time. I cut my coding teeth on FORTRAN, and would probably still be using it if I had the opportunity.

As that article said, C is the perfect language for programming a PDP-11. It's just that modern computers behave completely differently to a PDP-11. Again, I cut my teeth on 50-series Pr1mes. Pr1me tried to re-write a large slab of the system in C, and I suspect that was (a small) part of the reason they went under (the bigger part being microprocessors like the 6502, the 8080 etc, were beginning to eat the minicomputers' lunch). And the 50-series having a strongly segmented architecture, it just didn't map on to the microprocessors' way of working.

Someone needs to do a "C", and design a new low-level language for programming x64.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 21:38 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Someone needs to do a "C", and design a new low-level language for programming x64.

Just as aarch64 enters the stage in a meaningful way? Seems apt ;) .

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 2:18 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (17 responses)

> Someone needs to do a "C", and design a new low-level language for programming x64.

Well, there's assembly language. Or LLVM IR, if you wanted something a bit more optimized. But I imagine you wanted something higher-level than either of those options.

IMHO the single most significant pain point for C is undefined behavior. You can broadly divide UB into three types:

1. Essential UB - UB that results from stack/heap corruption or other cases where "You can only figure out what will happen if you know exactly how everything is laid out in memory, the order in which threads are executed, etc." It's "essential" because knowing what architecture you're using only gives you a little information about the program's likely behavior.
2. Accidental UB - UB that results from differences in architectural behavior (e.g. how negative numbers are represented, whether trap representations are a thing, whether memory is segmented, etc.). It's "accidental" because many of these instances of UB are artifacts of the state of the market at the time C was standardized, rather than fundamental constraints on what we can predict about program behavior.
3. UB that should always crash - Mostly, this is just "dereferencing NULL, dividing by zero, and anything else that everyone agrees should always immediately trap," but for the sake of completeness, I would define this as any situation where it's possible (on a reasonable, modern system, when running in userspace) to immediately detect the problem and crash, with no meaningful performance penalty for doing so (e.g. the runtime doesn't have to do array bounds checking or similar).

For addressing #3, the answer is obvious: Crash, and don't have it be UB. For #2, the answer is similarly obvious: Either pick "whatever the x86-64 does" or say "it's implementation-defined" (and not UB). But for #1, the only really effective way to remove it is to prevent stack/heap corruption statically, at compile time. And if you go down that road, you will fairly quickly find yourself reinventing the Rust wheel. Alternatively, you can insert bounds checks everywhere, and go down the Java road instead, but then you're not really a "low-level language" anymore.

TL;DR: I am unable to visualize anything that matches your description, but doesn't already exist.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 11:09 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> e.g. the runtime doesn't have to do array bounds checking or similar

But even your short list (with two elements) includes two things which are hard to implement on some platforms. Accessing NULL wouldn't be caught on MS-DOS or many other “small” CPUs (and real mode is not dead if we would consider platforms which we are discussing in the article live… heck, in a world where Windows 3.0 support is added to compilers in year 2020 it can be considered more alive than other architectures discussed here). Catching “divide by zero” is not trivial, e.g., on AArch64 (fp exceptions are optional there are you need to periodically check if they happened — looks more-or-less array bounds checking or similar to me).

> Alternatively, you can insert bounds checks everywhere, and go down the Java road instead, but then you're not really a "low-level language" anymore.

But you have just said that you should crash instead! Make up your mind, please!

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 17:57 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> But even your short list (with two elements) includes two things which are hard to implement on some platforms.

Those platforms can use C. I was asked to design a language "for programming x64," so I deliberately neglected to support older platforms.

I also explicitly stated that we were talking about a "modern system." MS-DOS is not a modern system. Windows 3.0 is not a modern system. Please do not snip out parts of my comment and then complain that the snipped out pieces no longer make any sense.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 18:31 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

Java throws an unchecked exception (which is a reasonable, but much more controlled, approximation to "crashes") if you make an out-of-bounds array access.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 1:39 UTC (Sat) by Wol (subscriber, #4433) [Link] (13 responses)

> 3. UB that should always crash - Mostly, this is just "dereferencing NULL, dividing by zero, and anything else that everyone agrees should always immediately trap,"

And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation. The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

(One of the problems people have with infinity(s) is that there are so many, and you can't mix them ... :-)

One of my early projects that I remember involved a lot of Pythagorus. The problem was, the three vertices of my triangle could easily lie on a straight line, which would result (as far as the maths was concerned) in a "divide by divide-by-zero". To which the answer is zero. As far as the program was concerned, though, it did result in a crash, resulting in a load of extra code to trap the fact that computers can't do maths properly :-)

I don't know whether the language people are doing this, but imho they should get rid of both implementation-specific behaviour, and undefined behaviour. Let's take the example of shifting a negative amount. imho the principle of least surprise says that a negative left shift is a right shift, so if you explicitly ask for the new standard you get the defined behaviour. Unless you also ask explicitly for the old behaviour. If you don't ask for anything it remains implementation-specific (until the compiler default advances to the new standard :-) And fix undefined behaviour the same way - that should only be allowed when asking for something non-sensical :-)

They had this exact problem with FOR/NExT loops in 1977 :-) FORTRAN did the test at the end, so all loops executed at least once, while Fortran77 did it at the start so loops could possibly not execute at all. So we had switches to force either new or old behaviour.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 9:02 UTC (Sat) by mkbosmans (subscriber, #65556) [Link] (4 responses)

> And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation.
> The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

That is not the case at all.
While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

And as for the more general point, calculus deals with real numbers for the most part. Computers operate on floating point and integer numbers. Operations that make sense in one domain don't necessarily translate 1:1 to another.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 10:09 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

> That is not the case at all.
> While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

But isn't that what the MATHS MEANS? It doesn't follow that x will *reach* 0, but if it does, then n/x *must* equal infinity. (Quite often, x=0 is illegal in the problem domain.)

> And as for the more general point, calculus deals with real numbers for the most part. Computers operate on floating point and integer numbers. Operations that make sense in one domain don't necessarily translate 1:1 to another.

Principle of least surprise. Yes, floating point is a quantum operation, while reals aren't, but given that (I believe) the IEEE definition of floating point includes both NaN and inf, it would be nice if computers actually used them - I believe some popular computers did 40 years ago (DEC Vax), and I guess it's the ubiquity of x86 that killed it :-(

And the whole point of fp is to imitate real. Again, principle of least surprise, the fp model should not crash when fed something that is valid in the real domain. It should get as close as possible.

People are too eager to accept that "digital is better" "because it's maths", and ignore the fact that it's just a model. And people find it hard to challenge a mathematical model, even when it's blatantly wrong, "because the maths says so".

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 11:38 UTC (Sat) by Jonno (subscriber, #49613) [Link]

> While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

No, you can't say that. For n∈ℝ⁺, lim (x⭢0)⁺ (n/x) = ∞, but lim (x⭢0)⁻ (n/0) = -∞, so lim x⭢0 (n/0) does not exist. [For n∈ℝ⁻, lim (x⭢0)⁺ (n/x) = -∞ and lim (x⭢0)⁻ (n/x) = ∞, so lim (x⭢0) does not exist either; but for n∈{0}, lim (x⭢0)⁺ (n/x) = 0 and lim (x⭢0)⁻ (n/x) = 0, and so lim (x⭢0) (n/x) = 0].

> But isn't that what the MATHS MEANS? It doesn't follow that x will *reach* 0, but if it does, then n/x *must* equal infinity. (Quite often, x=0 is illegal in the problem domain.)

No, the maths says that the domain of the divisor does not include zero; that the closer to zero a positive divisor gets, the closer to positive infinity the value gets; and that the closer to zero a negative divisor gets, the closer to negative infinity the value gets.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 13:15 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

x86 floating point is (hardware defects aside) IEEE-754, floating point division by 0.0 is defined in C, and if you compile:

#include <stdio.h>
int main()
{
float f = 1.0f/0.0f;
printf("%f\n", f);
return 0;
}

with gcc or clang and link against glibc, it prints "inf".

Integer division by 0, on the other hand, is undefined under finite-width two's complement (or unsigned) arithmetic.

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 2:47 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Infinity is neither an integer nor a real number (when both terms are defined in the mathematical sense rather than the computational sense). The real numbers observe something called the "Archimedean property," which states that there are no infinities or infinitesimals (except that zero is infinitely smaller than all non-zero values).

Why do real numbers have this limitation? Well, the blunt fact is, there's only one totally ordered metrically complete field, and it's the real numbers.[1] If you want to introduce an infinity, you have to give up one of the following:

1. The field axioms (which, broadly speaking, say that you can add, subtract, multiply, and divide real numbers, and these operations behave in sensible and familiar ways).
2. The total ordering of the reals (i.e. for any two reals a and b, either a > b, a < b, or a = b, where = is given its ordinary interpretation of "are literally the same mathematical object" rather than something which the ordering is allowed to define).
3. Two additional axioms about how the ordering interacts with the field operations (basically, you can always add the same number to both sides of an inequality without invalidating it, and the product of two positive numbers is always positive).
4. The reals are Dedekind-complete (in simple terms, "you can take limits" - in more precise terms, every non-empty subset of the reals that has an upper bound, has a least upper bound).

For example, IEEE 754:

1. Everything is non-associative, which is not allowed under the field axioms. Also, NaN and ±inf don't have additive inverses.
2. Since 1.0 / 0.0 != 1.0 / -0.0, we cannot have 0.0 and -0.0 be "the same value" (because you get different answers when you try to use them in the same expression). Neither number is greater than the other according to IEEE 754, and so they violate total ordering. Also, all of the NaNs violate total ordering, too.
3. There are cases for which x < y, but x + z == y == y + z (because x is the largest value with exponent n and y is the smallest value with exponent n+1). Also, you can trivially break this with ±inf.
4. Almost satisfied: The set of negative floats has two upper bounds which are incomparable (0.0 and -0.0), so we cannot say which is the "least" upper bound. But I'm pretty sure this is the only counterexample (ignoring trivial alterations such as "negative floats greater than -1.0," etc.) because I can't think of a way to construct a counterexample out of NaN or inf.

Or the extended real numbers, which IEEE 754 is intended to mimic:

1. inf - inf is not defined (rather than giving an NaN), which is not allowed under the field axioms. Also, ±inf don't have additive inverses.
2. Satisfied if you assume that -inf < all reals < inf.
3. Not satisfied because, for finite numbers x and y with x < y, x + inf = y + inf = inf.
4. The extended reals are compact, which in this context is an even stronger property than completeness.

Or the hyperreal numbers, which are explicitly designed to "follow all of the usual rules" (for use in nonstandard analysis):

1. Satisfied by the transfer principle.
2. Satisfied by the transfer principle.
3. Satisfied by the transfer principle.
4. Not satisfied: Consider the set of infinitesimals. This is clearly bounded, but it cannot have a *least* upper bound, or else you could derive contradictions by doubling this least upper bound (which must give you a non-infinitesimal) and reasoning about the relationship between the resulting number and the set of infinitesimals.

TL;DR: If you like doing calculus etc. in the usual way, then you can't have infinities or infinitesimals.

[1]: Every totally ordered metrically complete field is isomorphic to the real numbers. So they're all, effectively, "the same field" with different names for their elements. We need this caveat because, if you really wanted to, you could just start calling 2 ** 128 "infinity." But it would still be 2 ** 128. You could still multiply two by itself 128 times to get to your "infinity." "A rose by any other name" and all that.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 15:59 UTC (Sat) by mafrasi2 (guest, #144830) [Link] (7 responses)

> And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation. The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

> (One of the problems people have with infinity(s) is that there are so many, and you can't mix them ... :-)

Division by zero is *not* a common mathematical operation. It is literally undefined in mathematics as well.

In fact, it is absolutely fundamental that it is undefined, because otherwise you could do for any number x

x / 0 = infinity
x = infinity * 0

which would mean that any number is equal to any other number (because they all equal infinity * 0 and equality is transitive).

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 23:57 UTC (Sat) by Wol (subscriber, #4433) [Link] (6 responses)

Except that anything multiplied by 0 is zero.

So we have infinity *0 = 5*0 = 4*0 =3*0 = 2*0 = 1*0 =0*0.

If we divide each term by 0, does that mean infinity = 5 = 4 = 3 = 2 = 1 = 0?

I think we've fallen foul of Godel's incompleteness theorem. In order to make the maths work, we need special rules outside of the maths like "divide by zero, you get infinity" and "divde by infinity, you get zero". And a whole lot of physics depends on infinities. I can't give you any examples (or maybe I can), but there are various different types such that quite often infinity != infinity, and the physics doesn't work. And the "is it 10 or 11 dimensions" model of space-time works, I believe, because it just happens to be true that infinity does actually equal infinity.

Infinity and zero are special cases, required by Godel, that are needed to make everything else work. Take that example I gave of calculating the sides of a triangle - as soon as we accept that "divide by divide-by-zero equals 0" I can use THE SAME maths on any two points in a cartesian system to calculate the distance between them. Basically I try and construct a right angle triange and calculate the hypotenuse, and if the triangle collapses into a line THE SAME maths still works. And it makes sense that it works...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 1:28 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (5 responses)

> So we have infinity *0 = 5*0 = 4*0 =3*0 = 2*0 = 1*0 =0*0.

inf * 0 is an indeterminite form. It isn't zero, it isn't inf, it isn't a number. Your logic just breaks down here.

> I think we've fallen foul of Godel's incompleteness theorem.

Umm, no. This is way before Gödel gets involved.

> In order to make the maths work, we need special rules outside of the maths like "divide by zero, you get infinity" and "divde by infinity, you get zero".

No, these rules don't exist (in normal mathematics, see later). They may "make sense" in specific instances, but they are nonsense if you try to extrapolate from them. Dividing by zero is not an operation you can do. It isn't inf, nan, or any other "thing", it just can't be done (at least in the axiomatic framework generally used; IEEE is notably lacking in axioms, so sure inf is fine there). I'm sure one could make an algebra where division by zero "makes sense" (cf. modular algebra or surreal numbers for other number systems; surreal *might* have division by zero, but it is…weird), but it might not be as useful as the algebra we use all the time.

> And a whole lot of physics depends on infinities.

I think you mean infinite series or infinitesimals, not infinities.

> I believe, because it just happens to be true that infinity does actually equal infinity.

Maybe you're thinking of the continuum hypothesis? Though I don't think string theory cares about it in particular (its truth is independent of ZF or ZFC). Though I don't know for sure in that specific instance.

> Infinity and zero are special cases, required by Godel,

I feel like you're not understanding Gödel. Gödel states that there are truths that are unprovable in any given proof system that is consistent. Or you can have all truths, but then you gain all falsities as well without the power to tell the difference. There's nothing in it about infinity or zero (as applied to number theory). Those existed before Gödel came along and are fine. I recommend the book Gödel's Proof by Nagel and Newman which is what finally turned the light bulb on for me (after not getting it in Gödel, Escher, Bach by Hofstadter and another reference I can't remember).

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 10:06 UTC (Sun) by Wol (subscriber, #4433) [Link] (2 responses)

As I understand Godel, a simple way to put it is "you cannot use a system to prove itself correct". So it's easy to prove boolean logic correct, AS LONG AS you don't restrict the proof to using only boolean logic. It's easy to prove number theory correct AS LONG AS you don't restrict tthe proof to using only number theory. That's why we can't prove logic correct, because we have nothing else to throw into the mix.

So I have no qualms about throwing that infinity stuff into the proof, because otherwise you can't class zero as a number, because it behaves completely differently to all the other numbers. "My logic breaks down". Yes, because my logic (as per Godel) MUST be either incomplete, or inconsistent. Without that rule, it's inconsistent. With that rule it's incomplete. Pick one ... I've gone for consistency.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 13:08 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

> As I understand Godel, a simple way to put it is "you cannot use a system to prove itself correct".

There are two parts.

The first is that any sufficiently powerful[1] system of arithmetic is incomplete. In this sense it means that there are statements one can make in the system for which no proof exists (of either its truth or falsity).

The second is that in such a system, the consistency of the system itself is one such statement.

It makes no such claim as to which statement is required.

> That's why we can't prove logic correct, because we have nothing else to throw into the mix.

Sure, but *that* system is also not provably correct. So what have you gained? You (claim to have) jumped one rung up a countably infinite ladder among a countably infinite selection of such ladders. Yay? :)

> So I have no qualms about throwing that infinity stuff into the proof, because otherwise you can't class zero as a number, because it behaves completely differently to all the other numbers.

Zero is a number. It works just fine. Division has a singularity at its value, but all kinds of functions have singularities. Do we need something else for tan(π/2)? Why not extend to the complex numbers with sqrt(-1) while we're at it? Quaternions? Octonions? Sedenions? Each of these is a separate algebra, an extension of algebra over the reals. We don't use them in general because we don't need the additional power they offer in day-to-day uses.

> Yes, because my logic (as per Godel) MUST be either incomplete, or inconsistent. Without that rule, it's inconsistent. With that rule it's incomplete. Pick one ... I've gone for consistency.

You're using the wrong definition of "consistency". It isn't consistent as in "all values must be able to take places of all other values in all expressions".[2] It is consistent as in "there are no contradictions between provable statements" which is *way* more important in (useful) mathematics.

[1] Peano arithmetic is sufficiently powerful. Arithmetic with just the natural numbers, addition, and multiplication, I believe, is not.
[2] You're still "inconsistent" in this sense about the square root of negative numbers for example. Why not toss those in? Why stop at trying to make division "consistent" in this sense when you're leaving out the trigonometric functions, square root, and the other infinite singularity-containing or domain-limited functions alone?

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 16:34 UTC (Sun) by nix (subscriber, #2304) [Link]

> Each of these is a separate algebra, an extension of algebra over the reals. We don't use them in general because we don't need the additional power they offer in day-to-day uses.

... and because they gain annoying limitations (lose a useful property of the reals) with every such extension, and by the time you get to the sedenions there's not really very many useful properties they have left (associativity is more or less it).

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 7:17 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> I feel like you're not understanding Gödel.

This is nothing to be ashamed of, by the way. I took an entire college course just focusing on the incompleteness theorems, and I still only have a very loose ability to follow their basic form. The incompleteness theorems are very, *very* hairy math. You cannot simply skim a couple of Wikipedia articles and expect to understand Gödel.

If you insist on trying to figure out what Gödel was saying without spending multiple years of your life studying the surrounding mathematics, then I would suggest starting out with Gödel Escher Bach. Yes, that's a very thick book, no, you should not skim it. The main advantage of GEB is that it actually does explain why and how completeness breaks down under arithmetic, using a real (if rather awkward) implementation of PA. For this reason, it is not an easy read, but it's better than an introductory model theory textbook.

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 22:46 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

GEB is not an easy read, but it is probably as easy and fun a read as any book that makes a serious pretense of explaining Gödel's Incompleteness Theorem is likely to be.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 3:29 UTC (Fri) by zev (subscriber, #88455) [Link] (4 responses)

C is the perfect language for programming a PDP-11. It's just that modern computers behave completely differently to a PDP-11. [...] Someone needs to do a "C", and design a new low-level language for programming x64.

I see this kind of thing said a lot, and frankly it's never made the slightest bit of sense to me. What exactly about C itself is remotely PDP-specific? It doesn't strike me as terribly specialized for a PDP or any other particular ISA as it is for the Von Neumann model of computation, which was still pretty ubiquitous last time I checked. If we were all doing dataflow on FPGAs or whatnot, then sure, it'd be a poor fit, but we're still fetching and executing instructions (semantically) one at a time that load and store bytes in memory, pretty much just like Ken and Dennis did on their DECs.

"But modern machines have out-of-order execution and branch prediction and multi-level cache hierarchies!" I've seen some people argue...sure, but the whole point of that kind of microarchitectural sophistication is that it's microarchitectural -- it's not even directly visible at the assembly level, let alone in a high-level language. (Itanium exposed bits of its microarchitecture at the ISA level and look what a raging success that was.)

C's not without its shortcomings, but this notion that it's inappropriate for today's machines because it was initially run on a PDP-11 seems rather silly. Some of those shortcomings:

  • unsafety: if a 1970 PDP had been exposed to the variety of hostile inputs today's internet-connected machines are, this would have been just as much an issue.
  • pointer aliasing: C's challenges are much more entangled with modern compilers than they are with our hardware.
  • lack of abstractions/facilities for "programming in the large": pretty clearly unrelated to the underlying hardware.

None of these seem at all connected to its PDP origins. How would a "language for programming x64" differ?

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 4:21 UTC (Fri) by roc (subscriber, #30627) [Link]

Memory accesses are much slower on modern machines relative to other operations, so it is more important than it used to be to avoid redundant loads and stores. Thus, alias analysis has become more important to optimization, and C compilers more aggressive about exploiting whatever assumptions they can get away with (e.g. type-based alias analysis).

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 11:20 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> C's not without its shortcomings, but this notion that it's inappropriate for today's machines because it was initially run on a PDP-11 seems rather silly. Some of those shortcomings:

It's not that it's inappropriate. One of its major failings is that people *think* it's low level, but it doesn't map that well to what modern processors actually DO. In short, we treat it like the low-level language it *was*.

And it's that disconnect between what we think, and what actually happens, that causes all the problems.

Let's take your "unsafety" point, for example. On a PDP-11, I could have easily reasoned about what was ACTUALLY HAPPENING inside the CPU. That's not to say my programming is perfect, but my mental model of reality would have been reasonably close to reality. Nowadays, that's not true AT ALL.

And that's what bites kernel programmers all the time. Especially the noobs, their mental model of what's going on is wildly out of kilter with reality. The compiler takes the code they wrote and massively rewrites it behind their backs. And then the CPU effectively runs the object code in an interpreter I often get the impression ...

That's the point of a low-level language. Imho, if you have well-written code in a low-level language, the compiler SHOULD NOT be able to do that much optimisation. That's not a description of modern C !!!

And therein lies our problem.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 19:08 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

<font class="QuotedText">&gt; That's the point of a low-level language. Imho, if you have well-written code in a low-level language, the compiler SHOULD NOT be able to do that much optimisation.</font>

<p>If we would use that definition then modern systems don't have <b>any</b> low-level languages. Not even machine code conforms: CPUs with speculative execution may do massive changes to what you wrote in your code!</p>

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 23:13 UTC (Fri) by Wol (subscriber, #4433) [Link]

Didn't I say that the CPU was an object code INTERPRETER? :-)

But if I'm unable to REASON LOGICALLY about what the CPU is going to do, how on earth am I going to get deterministic (ie it does what I want it to do) behaviour from my program?

It's turtles all the way down and logic (and the ability to debug!) has just gone down the plughole with the bathwater ...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 15:28 UTC (Mon) by anton (subscriber, #25547) [Link] (4 responses)

Interestingly, I have seen repeated claims that current widely-used architectures have been designed for C. While I don't think that's what actually happened in most cases, the claim that C is a bad fit for current architectures is grotesque (although, admittedly, C does not have language features for all the architectural features that architectures have; some are reflected in GNU C extensions, e.g., labels-as-values or vector extensions).

Concerning a new low-level language, yes, we need that, not because C (used as a low-level language) is a bad fit for current architectures, but because the gcc and clang maintainers do not want to support C as a low-level language, and the mindset behind that seems to pervade the C compiler community.

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 19:44 UTC (Mon) by Wol (subscriber, #4433) [Link] (3 responses)

Thing is, C *is* a bad fit for modern architectures. It has a whole bunch of features that are undefined, or implementation-defined, which are MEANT to be low-level "match the hardware". Except that they aren't.

Let's just get rid of all these so-called "low level" cock-ups, accept that C is now a high-level language and that undefined and implemetation-specific behaviours shouldn't exist, and move on.

Someone brought up retpolines - that monstrosity that tries to make sure that both the hardware and the software agree on the imaginary hardware interface that's where the CPU microcode and language macrocode meet ... wtf are we doing!

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 9:27 UTC (Tue) by anton (subscriber, #25547) [Link] (2 responses)

I did not mean the language lawyer version of C. That version is a bad fit for any architecture (including the PDP-11). However, it's great for adversarial compiler maintainers who want to do whatever they want (e.g., produce good benchmark results, grudgingly cater to requests by paying customers and tell other users that their bug reports are invalid), because this version allows them to always blame the programmer for something or other. After all, no terminating C program is a "strictly conformant program", and whenever someone mentions "conformant program" (the only other conformance level for program defined in the C standard), the adversaries produce advocacy why we should consider "conformant programs" as meaningless (interestingly, they claim that we should take the rest of the C standard at face value).

I mean C as used in many programs, which has a pretty simple correspondence to architectural features (and you see it easily in contexts where optimization does not set in, e.g., when you separately compile a function that performs just one thing).

The adversaries want us to consider C as a high-level language with no correspondence to the bare metal; that makes it easier to blame the programmers and absolve compiler maintainers of responsibility. The question is why any programmer would want that. We have plenty of high-level languages, often better than C in that capacity, but not that many low-level languages; basically C is the only popular one.

Concerning a totally defined C: I think that is at odds with a low-level language for multiple architectures, but as most (all?) C compilers have demonstrated for the first quarter-century of the language, that's no hindrance for implementing C in a benign rather than adversarial way. And for those who don't know how to do that, I have written a paper (which also explains why I consider totally defined C impractical).

I don't know what retpolines have to do with any of that. They are a workaround for a vulnerability in some microarchitectures and they cannot be implemented in C (there are limitations to C's low-level nature). The vulnerability should be fixed at the microarchitecture level, and I expect that the hardware manufacturers will come out with microarchitectures that do not have this vulnerability.

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 13:18 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> I did not mean the language lawyer version of C.

What version of C are you talking about? The ISO standard? The image of C that you have in your head? My head? The C that GCC 2.95 accepted and worked with?

Let's imagine a world where C compilers magically stop doing "magic optimization" steps that tend to break code. What's going to happen is that C programmers that don't know this stuff already is going to have their code be pessimized and, presumably, slower in practice. What are they going to do? Start writing their C in such a way that the compiler was doing internal transformations during optimization passes anyways. They'll learn C more (and, I imagine, less satisfied with it), hopefully be using linters and tooling to tell them where their NULL checks are inverted with uses and such.

Rereading that, maybe it wouldn't be so bad. Maybe folks would migrate to better languages. Others might actually learn more about how loose C is in practice. The optimization passes could be migrated to the linters rather than the compiler to explain "hey, you could reorder your code to This Way and gain some performance". Maybe these passes would then gain some prose explaining what and why of them.

Then again, I have no idea such a C would be specified at ISO to disallow these optimizations while still allowing for architectures to not be forced into twos-complement representations or the like because "it's faster/easier for them".

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 15:28 UTC (Tue) by anton (subscriber, #25547) [Link]

As I wrote: "I mean C as used in many programs", and I actually point to a paper where I explain this in more detail. As for "pessimizing", it's certainly the case that advocates of adversarial C compilers claim that the adversarial behaviour is good for performance, invariably without giving any numbers to support these claims; maybe they think that repeating these claims and wishful thinking makes them true.

Wang et al. checked that for gcc-4.7 and clang-3.1 and found that the adversarial "optimizations" produced a minimal speedup on SPECint 2006, and that speedup could also be achieved by small changes to the source code in two places.

Yes, a performance advisor that points out places where changing the source code may improve performance would be a more productive way to spend both the compiler maintainer's and the compiler user's time than writing "optimizations" that break existing code, "sanitiziers" to find the places where the "optimizations" break it, and, on the programmer's side, "sanitizing" their code to withstand the latest attacks by "optimizers" (but not the next one). Moreover, such an advisor could point out optimizations that a programmer can do but a conformant compiler cannot (e.g., because there is an alias that even language lawyering cannot explain away). Of course, unlike maintained programs, benchmarks would not benefit from such a performance advisor, that's why no work goes into performance advisors; and conversely, "optimizations" don't break benchmarks (the compiler maintainers revert the "optimization" in that case), unlike other programs, and that's why we see "optimizations".

But what's more, by insisting on a very limited interpretation of what C means, the language lawyers remove opportunities for optimizations that programmers can make in the source code. I have discussed this at length.

I, too am skeptical that trying to change the C standard is the way to get rid of adversarial C compilers (not the least because you won't be able to achieve consensus with the implementors of these compilers on the committee), and I guess that's why advocates of adversarial compilers like to direct the blame for the misdeeds of these compilers at the standard, rather than at the compiler maintainers. It's not the standard that requires them to miscompile existing, tested and working, programs, it's the compiler maintainers' choice, so they alone are responsible.

Concerning architectures with other than two's complement representation of signed numbers, the last new such architecture was introduced over 50 years ago, and descendants of such architectures exist only in very few places and run select programs. There are far fewer of these machines than of the architectures (all two's-complement) that are not LLVM targets and that have brought about the parent article. And coding in a way that makes use of knowledge about the representation is one of the things that you can do for performance in a low-level language (and compilers do not perform all of these optimizations).

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 10:55 UTC (Fri) by khim (subscriber, #9252) [Link] (5 responses)

> undefined behaviour is whatever the hardware does

If that is the definition behavior then what the heck is implementation-defined behavior?

No, the confusion is much deeper. “Undefined behavior” always meant what it means today. And, in fact, most types of undefined behavior don't cause any confusion. Attempts to read from pointer after calling free or reading from undefined variable rarely cause confusion.

Something like this:

int foo() {
  int i;
  i = 42;
}

int bar() {
  int i;
  return i;
}

int main() {
  foo();
  printf("%d\n", bar());
}

Should code like above work or not? Clang breaks it even when compiled with -O0 (but gcc with -O0 works, although any other optimization level breaks it).

I don't know any practicing programmer who says compilers should support code like the above example.

Tragedy happened when decisions of C standards committee clashed with developer's expectations. Because C was designed to create portable programs lots of things which are, actually, well-defined (yet different!) on many platforms were put into “undefined behavior, don't use” bucket (instead of “implementation-defined, use carefully” bucket).

The intention was, of course, to make programs portable, but completely different things happened instead: so many “implementation-defined, use carefully” things were marked as “undefined behavior”s that developers started thinking that “undefined behavior” means precisely that: whatever the hardware does.

And now we have all that mess.

But no, “undefined behavior” never meant whatever the hardware does. Not even in C89. It was always “something your program should never do.”

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 12:52 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

I think that mistake proves my point ... :-)

Undefined, implementation dependent, whatever. The point is, it BREAKS THE PROGRAMMER'S MENTAL MODEL.

And however much you want to blame the programmer, if programmers keep on doing it, it's a design fault ...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 17:38 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> And however much you want to blame the programmer, if programmers keep on doing it, it's a design fault ...

Got it. So we have issues with C which even Rust doesn't fully address:

— if you put check outside of loop the it would't test all elements of array.

— if you initialize your variable after it's used then program doesn't work.

— if you change the variable then other variables (which were calculated on basis on that variable) don't change as they should.

— you need to actually allocate memory for your data structure, just declaring pointer doesn't mean you can use these.

And I can probably add dozens more.

</sarcasm off>.

Granted: these are expectations of people who have started studying programming about two month ago… but they are very-very common.

Should we do something about them? If yes then what… if no, then why the heck no?…

> The point is, it BREAKS THE PROGRAMMER'S MENTAL MODEL.

Sure — but pretty much anything can break it if programmer is not taught properly.

The C (and C++) suffer mostly from Hyrum's Law: many thing which were supposed not to work… actually work — with real-world compiler. And then, later… they stop (even if documentation always warned not to use them)… that is when trouble happens (think glibc story).

That's the only problem with C/C++… but it's pretty severe: C language on paper and C language as implemented by typical compiler were different for so long that it's unclear what can be done at this point.

The thing is: I'm not sure switching to Rust (or any other language) would save us. After 10-20-30 years they would be in the same situation, too.

I'm not even really sure what can be done about it. Have just one fixed compiler without any changes? I don't think it would really work.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 22:20 UTC (Fri) by roc (subscriber, #30627) [Link]

> I'm not sure switching to Rust (or any other language) would save us. After 10-20-30 years they would be in the same situation, too.

No they won't.

Rust is designed to eliminate "undefined" or "implementation defined" behavior outside of explicit "unsafe" blocks. Yes, there will be compiler bugs etc, but really there will be vastly less of such problematic behaviors in Rust programs than in C and C++ programs.

That means we can expect Rust programs to behave much more consistently over time than C/C++ programs, as hardware and compilers evolve.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 18:13 UTC (Fri) by anselm (subscriber, #2796) [Link] (1 responses)

Tragedy happened when decisions of C standards committee clashed with developer's expectations. Because C was designed to create portable programs lots of things which are, actually, well-defined (yet different!) on many platforms were put into “undefined behavior, don't use” bucket (instead of “implementation-defined, use carefully” bucket).

AFAIR, the C89 standard carefully distinguished between “undefined” and “implementation-defined” behaviour. “Implementation-defined” behaviour is very emphatically not “undefined” behaviour, it's just that it is not defined by the language standard but by the various implementations (or their underlying platforms).

For example, the result of the >> operator applied to a negative signed integer is implementation-defined – many platforms offer a choice between arithmetical and logical right-shift and the compiler writer needs to pick one of the two, but after that, that particular compiler on that platform will always do it that way. (The reason why this particular behaviour was declared implementation-defined is probably that Ritchie didn't stipulate what was desired and by the late 1980s there were enough C implementations doing it one way or the other that nobody could agree anymore on which way was “correct” without making the other half of the industry “wrong”, and breaking programs that relied on the other behaviour.)

With appropriate care, you can exploit implementation-defined behaviour – especially if your set of implementations is small –, but with undefined behaviour, all bets are off. If you're interested in C code that is maximally portable between implementations, implementation-defined behaviour is, of course, something to avoid, but again it is a good idea to flag it as such in the standard so people can be aware of it.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 20:31 UTC (Fri) by khim (subscriber, #9252) [Link]

> AFAIR, the C89 standard carefully distinguished between “undefined” and “implementation-defined” behaviour.

Yes, but that wasn't my point.

You have explained perfectly why right shift of the negative value is “implementation-defined” behavior. All is very logical and proper.

But what about shift by negative value? Many (most?) low-level programmers expect that this would be “implementation-defined”, too. After all most CPUs do something predictable when they get negative value as shift value (different ones do different things but all CPUs I know do something predictable). More-or-less same as with shift of negative value: there may be different outcomes on different CPUs, yet there would be some outcome, right?

Well… no.

If you would actually open C89 standard you would see that “the result of a right shift of a negative-valued signed integral type (6.3.7)” is listed in “Appendix G, part 3 Implementation-defined behavior”… yet “an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted (6.3.7)” is not in part 3… it's in “Appendix G, part 2 Undefined behavior”!

I would love to know why that difference is there? Do some CPUs lock up when faced with negative shift? Or does something crazy happens (like: it takes so long that DRAM starts losing it's contents)? Or maybe some compiler couldn't handle it? Or… maybe committee just decided that if they would declare it “undefined behavior” then people would stop using it and compiler writers can generate better code?

I have no idea, really. But the end result: -1 >> 1 is “implementation-defined behavior” yet 1 >> -1 is “undefined behavior”.

To most low-level guys this is sheer insanity… yet that's how C89 is defined.

> If you're interested in C code that is maximally portable between implementations, implementation-defined behaviour is, of course, something to avoid, but again it is a good idea to flag it as such in the standard so people can be aware of it.

It's actually done in exactly this way. Not only C standard distinguishes “unspecified behavior”, “implementation-defined behavior”, and “undefined behavior”. It actually have all of them listed in three appendixes! To make sure noone would mix them up.

The only problem: actual programmers don't consult these when they are writing code. They try to guess. Based on their mental model. And for most programmers mental model either says that you could't shift negative value and you couldn't shift by negative value, too (these are sorta-lucky ones: they may not write fastest code, yet they tend to write correct code) or, alternatively, they assume you can push anything you want into a shift and get something back… and then they write something like (a >> (i-1)) * i with comment /* if i == 0 then result is zero and we don't care what a >> (i-1) produces */… only then modern compiler “looks” on that, notices that i couldn't ever be zero (because this would lead to undefined behavior) and happily nukes check if (i == 0) and removes “dead code”.

And that is where shouting starts. C89 standard clearly says that “undefined behavior” could lead to anything at all… yet “advanced programmers” say that “removing code which I specifically wrote there to catch errors is not anything at all in my book”… hilarity ensues.

P.S. I wonder if people who developed C89 are still alive and can say what they think about all that… does anyone know?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds