|
|
Log in / Subscribe / Register

Python cryptography, Rust, and Gentoo

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 21:30 UTC (Thu) by Wol (subscriber, #4433)
In reply to: Python cryptography, Rust, and Gentoo by mathstuf
Parent article: Python cryptography, Rust, and Gentoo

Which is why C is not a good language - which is what a lot of posters here are saying.

Personally, I find C a perfectly okay language. I just feel that C, and Unix, and all that are a perfect example of what matters is not being any good, what matters is being in the right place at the right time. I cut my coding teeth on FORTRAN, and would probably still be using it if I had the opportunity.

As that article said, C is the perfect language for programming a PDP-11. It's just that modern computers behave completely differently to a PDP-11. Again, I cut my teeth on 50-series Pr1mes. Pr1me tried to re-write a large slab of the system in C, and I suspect that was (a small) part of the reason they went under (the bigger part being microprocessors like the 6502, the 8080 etc, were beginning to eat the minicomputers' lunch). And the 50-series having a strongly segmented architecture, it just didn't map on to the microprocessors' way of working.

Someone needs to do a "C", and design a new low-level language for programming x64.

Cheers,
Wol


to post comments

Python cryptography, Rust, and Gentoo

Posted Feb 11, 2021 21:38 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Someone needs to do a "C", and design a new low-level language for programming x64.

Just as aarch64 enters the stage in a meaningful way? Seems apt ;) .

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 2:18 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (17 responses)

> Someone needs to do a "C", and design a new low-level language for programming x64.

Well, there's assembly language. Or LLVM IR, if you wanted something a bit more optimized. But I imagine you wanted something higher-level than either of those options.

IMHO the single most significant pain point for C is undefined behavior. You can broadly divide UB into three types:

1. Essential UB - UB that results from stack/heap corruption or other cases where "You can only figure out what will happen if you know exactly how everything is laid out in memory, the order in which threads are executed, etc." It's "essential" because knowing what architecture you're using only gives you a little information about the program's likely behavior.
2. Accidental UB - UB that results from differences in architectural behavior (e.g. how negative numbers are represented, whether trap representations are a thing, whether memory is segmented, etc.). It's "accidental" because many of these instances of UB are artifacts of the state of the market at the time C was standardized, rather than fundamental constraints on what we can predict about program behavior.
3. UB that should always crash - Mostly, this is just "dereferencing NULL, dividing by zero, and anything else that everyone agrees should always immediately trap," but for the sake of completeness, I would define this as any situation where it's possible (on a reasonable, modern system, when running in userspace) to immediately detect the problem and crash, with no meaningful performance penalty for doing so (e.g. the runtime doesn't have to do array bounds checking or similar).

For addressing #3, the answer is obvious: Crash, and don't have it be UB. For #2, the answer is similarly obvious: Either pick "whatever the x86-64 does" or say "it's implementation-defined" (and not UB). But for #1, the only really effective way to remove it is to prevent stack/heap corruption statically, at compile time. And if you go down that road, you will fairly quickly find yourself reinventing the Rust wheel. Alternatively, you can insert bounds checks everywhere, and go down the Java road instead, but then you're not really a "low-level language" anymore.

TL;DR: I am unable to visualize anything that matches your description, but doesn't already exist.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 11:09 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> e.g. the runtime doesn't have to do array bounds checking or similar

But even your short list (with two elements) includes two things which are hard to implement on some platforms. Accessing NULL wouldn't be caught on MS-DOS or many other “small” CPUs (and real mode is not dead if we would consider platforms which we are discussing in the article live… heck, in a world where Windows 3.0 support is added to compilers in year 2020 it can be considered more alive than other architectures discussed here). Catching “divide by zero” is not trivial, e.g., on AArch64 (fp exceptions are optional there are you need to periodically check if they happened — looks more-or-less array bounds checking or similar to me).

> Alternatively, you can insert bounds checks everywhere, and go down the Java road instead, but then you're not really a "low-level language" anymore.

But you have just said that you should crash instead! Make up your mind, please!

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 17:57 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> But even your short list (with two elements) includes two things which are hard to implement on some platforms.

Those platforms can use C. I was asked to design a language "for programming x64," so I deliberately neglected to support older platforms.

I also explicitly stated that we were talking about a "modern system." MS-DOS is not a modern system. Windows 3.0 is not a modern system. Please do not snip out parts of my comment and then complain that the snipped out pieces no longer make any sense.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 18:31 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

Java throws an unchecked exception (which is a reasonable, but much more controlled, approximation to "crashes") if you make an out-of-bounds array access.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 1:39 UTC (Sat) by Wol (subscriber, #4433) [Link] (13 responses)

> 3. UB that should always crash - Mostly, this is just "dereferencing NULL, dividing by zero, and anything else that everyone agrees should always immediately trap,"

And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation. The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

(One of the problems people have with infinity(s) is that there are so many, and you can't mix them ... :-)

One of my early projects that I remember involved a lot of Pythagorus. The problem was, the three vertices of my triangle could easily lie on a straight line, which would result (as far as the maths was concerned) in a "divide by divide-by-zero". To which the answer is zero. As far as the program was concerned, though, it did result in a crash, resulting in a load of extra code to trap the fact that computers can't do maths properly :-)

I don't know whether the language people are doing this, but imho they should get rid of both implementation-specific behaviour, and undefined behaviour. Let's take the example of shifting a negative amount. imho the principle of least surprise says that a negative left shift is a right shift, so if you explicitly ask for the new standard you get the defined behaviour. Unless you also ask explicitly for the old behaviour. If you don't ask for anything it remains implementation-specific (until the compiler default advances to the new standard :-) And fix undefined behaviour the same way - that should only be allowed when asking for something non-sensical :-)

They had this exact problem with FOR/NExT loops in 1977 :-) FORTRAN did the test at the end, so all loops executed at least once, while Fortran77 did it at the start so loops could possibly not execute at all. So we had switches to force either new or old behaviour.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 9:02 UTC (Sat) by mkbosmans (subscriber, #65556) [Link] (4 responses)

> And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation.
> The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

That is not the case at all.
While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

And as for the more general point, calculus deals with real numbers for the most part. Computers operate on floating point and integer numbers. Operations that make sense in one domain don't necessarily translate 1:1 to another.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 10:09 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

> That is not the case at all.
> While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

But isn't that what the MATHS MEANS? It doesn't follow that x will *reach* 0, but if it does, then n/x *must* equal infinity. (Quite often, x=0 is illegal in the problem domain.)

> And as for the more general point, calculus deals with real numbers for the most part. Computers operate on floating point and integer numbers. Operations that make sense in one domain don't necessarily translate 1:1 to another.

Principle of least surprise. Yes, floating point is a quantum operation, while reals aren't, but given that (I believe) the IEEE definition of floating point includes both NaN and inf, it would be nice if computers actually used them - I believe some popular computers did 40 years ago (DEC Vax), and I guess it's the ubiquity of x86 that killed it :-(

And the whole point of fp is to imitate real. Again, principle of least surprise, the fp model should not crash when fed something that is valid in the real domain. It should get as close as possible.

People are too eager to accept that "digital is better" "because it's maths", and ignore the fact that it's just a model. And people find it hard to challenge a mathematical model, even when it's blatantly wrong, "because the maths says so".

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 11:38 UTC (Sat) by Jonno (guest, #49613) [Link]

> While you can say: lim x→0 n/x = inf, it does not follow that n/0 = inf.

No, you can't say that. For n∈ℝ⁺, lim (x⭢0)⁺ (n/x) = ∞, but lim (x⭢0)⁻ (n/0) = -∞, so lim x⭢0 (n/0) does not exist. [For n∈ℝ⁻, lim (x⭢0)⁺ (n/x) = -∞ and lim (x⭢0)⁻ (n/x) = ∞, so lim (x⭢0) does not exist either; but for n∈{0}, lim (x⭢0)⁺ (n/x) = 0 and lim (x⭢0)⁻ (n/x) = 0, and so lim (x⭢0) (n/x) = 0].

> But isn't that what the MATHS MEANS? It doesn't follow that x will *reach* 0, but if it does, then n/x *must* equal infinity. (Quite often, x=0 is illegal in the problem domain.)

No, the maths says that the domain of the divisor does not include zero; that the closer to zero a positive divisor gets, the closer to positive infinity the value gets; and that the closer to zero a negative divisor gets, the closer to negative infinity the value gets.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 13:15 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

x86 floating point is (hardware defects aside) IEEE-754, floating point division by 0.0 is defined in C, and if you compile:

#include <stdio.h>
int main()
{
float f = 1.0f/0.0f;
printf("%f\n", f);
return 0;
}

with gcc or clang and link against glibc, it prints "inf".

Integer division by 0, on the other hand, is undefined under finite-width two's complement (or unsigned) arithmetic.

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 2:47 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Infinity is neither an integer nor a real number (when both terms are defined in the mathematical sense rather than the computational sense). The real numbers observe something called the "Archimedean property," which states that there are no infinities or infinitesimals (except that zero is infinitely smaller than all non-zero values).

Why do real numbers have this limitation? Well, the blunt fact is, there's only one totally ordered metrically complete field, and it's the real numbers.[1] If you want to introduce an infinity, you have to give up one of the following:

1. The field axioms (which, broadly speaking, say that you can add, subtract, multiply, and divide real numbers, and these operations behave in sensible and familiar ways).
2. The total ordering of the reals (i.e. for any two reals a and b, either a > b, a < b, or a = b, where = is given its ordinary interpretation of "are literally the same mathematical object" rather than something which the ordering is allowed to define).
3. Two additional axioms about how the ordering interacts with the field operations (basically, you can always add the same number to both sides of an inequality without invalidating it, and the product of two positive numbers is always positive).
4. The reals are Dedekind-complete (in simple terms, "you can take limits" - in more precise terms, every non-empty subset of the reals that has an upper bound, has a least upper bound).

For example, IEEE 754:

1. Everything is non-associative, which is not allowed under the field axioms. Also, NaN and ±inf don't have additive inverses.
2. Since 1.0 / 0.0 != 1.0 / -0.0, we cannot have 0.0 and -0.0 be "the same value" (because you get different answers when you try to use them in the same expression). Neither number is greater than the other according to IEEE 754, and so they violate total ordering. Also, all of the NaNs violate total ordering, too.
3. There are cases for which x < y, but x + z == y == y + z (because x is the largest value with exponent n and y is the smallest value with exponent n+1). Also, you can trivially break this with ±inf.
4. Almost satisfied: The set of negative floats has two upper bounds which are incomparable (0.0 and -0.0), so we cannot say which is the "least" upper bound. But I'm pretty sure this is the only counterexample (ignoring trivial alterations such as "negative floats greater than -1.0," etc.) because I can't think of a way to construct a counterexample out of NaN or inf.

Or the extended real numbers, which IEEE 754 is intended to mimic:

1. inf - inf is not defined (rather than giving an NaN), which is not allowed under the field axioms. Also, ±inf don't have additive inverses.
2. Satisfied if you assume that -inf < all reals < inf.
3. Not satisfied because, for finite numbers x and y with x < y, x + inf = y + inf = inf.
4. The extended reals are compact, which in this context is an even stronger property than completeness.

Or the hyperreal numbers, which are explicitly designed to "follow all of the usual rules" (for use in nonstandard analysis):

1. Satisfied by the transfer principle.
2. Satisfied by the transfer principle.
3. Satisfied by the transfer principle.
4. Not satisfied: Consider the set of infinitesimals. This is clearly bounded, but it cannot have a *least* upper bound, or else you could derive contradictions by doubling this least upper bound (which must give you a non-infinitesimal) and reasoning about the relationship between the resulting number and the set of infinitesimals.

TL;DR: If you like doing calculus etc. in the usual way, then you can't have infinities or infinitesimals.

[1]: Every totally ordered metrically complete field is isomorphic to the real numbers. So they're all, effectively, "the same field" with different names for their elements. We need this caveat because, if you really wanted to, you could just start calling 2 ** 128 "infinity." But it would still be 2 ** 128. You could still multiply two by itself 128 times to get to your "infinity." "A rose by any other name" and all that.

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 15:59 UTC (Sat) by mafrasi2 (guest, #144830) [Link] (7 responses)

> And here we have to disagree - computers are supposed to do maths, and division by zero is a common mathematical operation. The result is (scalar) infinity, I believe, and it's actually absolutely fundamental to that branch of mathematics known as calculus.

> (One of the problems people have with infinity(s) is that there are so many, and you can't mix them ... :-)

Division by zero is *not* a common mathematical operation. It is literally undefined in mathematics as well.

In fact, it is absolutely fundamental that it is undefined, because otherwise you could do for any number x

x / 0 = infinity
x = infinity * 0

which would mean that any number is equal to any other number (because they all equal infinity * 0 and equality is transitive).

Python cryptography, Rust, and Gentoo

Posted Feb 13, 2021 23:57 UTC (Sat) by Wol (subscriber, #4433) [Link] (6 responses)

Except that anything multiplied by 0 is zero.

So we have infinity *0 = 5*0 = 4*0 =3*0 = 2*0 = 1*0 =0*0.

If we divide each term by 0, does that mean infinity = 5 = 4 = 3 = 2 = 1 = 0?

I think we've fallen foul of Godel's incompleteness theorem. In order to make the maths work, we need special rules outside of the maths like "divide by zero, you get infinity" and "divde by infinity, you get zero". And a whole lot of physics depends on infinities. I can't give you any examples (or maybe I can), but there are various different types such that quite often infinity != infinity, and the physics doesn't work. And the "is it 10 or 11 dimensions" model of space-time works, I believe, because it just happens to be true that infinity does actually equal infinity.

Infinity and zero are special cases, required by Godel, that are needed to make everything else work. Take that example I gave of calculating the sides of a triangle - as soon as we accept that "divide by divide-by-zero equals 0" I can use THE SAME maths on any two points in a cartesian system to calculate the distance between them. Basically I try and construct a right angle triange and calculate the hypotenuse, and if the triangle collapses into a line THE SAME maths still works. And it makes sense that it works...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 1:28 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (5 responses)

> So we have infinity *0 = 5*0 = 4*0 =3*0 = 2*0 = 1*0 =0*0.

inf * 0 is an indeterminite form. It isn't zero, it isn't inf, it isn't a number. Your logic just breaks down here.

> I think we've fallen foul of Godel's incompleteness theorem.

Umm, no. This is way before Gödel gets involved.

> In order to make the maths work, we need special rules outside of the maths like "divide by zero, you get infinity" and "divde by infinity, you get zero".

No, these rules don't exist (in normal mathematics, see later). They may "make sense" in specific instances, but they are nonsense if you try to extrapolate from them. Dividing by zero is not an operation you can do. It isn't inf, nan, or any other "thing", it just can't be done (at least in the axiomatic framework generally used; IEEE is notably lacking in axioms, so sure inf is fine there). I'm sure one could make an algebra where division by zero "makes sense" (cf. modular algebra or surreal numbers for other number systems; surreal *might* have division by zero, but it is…weird), but it might not be as useful as the algebra we use all the time.

> And a whole lot of physics depends on infinities.

I think you mean infinite series or infinitesimals, not infinities.

> I believe, because it just happens to be true that infinity does actually equal infinity.

Maybe you're thinking of the continuum hypothesis? Though I don't think string theory cares about it in particular (its truth is independent of ZF or ZFC). Though I don't know for sure in that specific instance.

> Infinity and zero are special cases, required by Godel,

I feel like you're not understanding Gödel. Gödel states that there are truths that are unprovable in any given proof system that is consistent. Or you can have all truths, but then you gain all falsities as well without the power to tell the difference. There's nothing in it about infinity or zero (as applied to number theory). Those existed before Gödel came along and are fine. I recommend the book Gödel's Proof by Nagel and Newman which is what finally turned the light bulb on for me (after not getting it in Gödel, Escher, Bach by Hofstadter and another reference I can't remember).

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 10:06 UTC (Sun) by Wol (subscriber, #4433) [Link] (2 responses)

As I understand Godel, a simple way to put it is "you cannot use a system to prove itself correct". So it's easy to prove boolean logic correct, AS LONG AS you don't restrict the proof to using only boolean logic. It's easy to prove number theory correct AS LONG AS you don't restrict tthe proof to using only number theory. That's why we can't prove logic correct, because we have nothing else to throw into the mix.

So I have no qualms about throwing that infinity stuff into the proof, because otherwise you can't class zero as a number, because it behaves completely differently to all the other numbers. "My logic breaks down". Yes, because my logic (as per Godel) MUST be either incomplete, or inconsistent. Without that rule, it's inconsistent. With that rule it's incomplete. Pick one ... I've gone for consistency.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 13:08 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

> As I understand Godel, a simple way to put it is "you cannot use a system to prove itself correct".

There are two parts.

The first is that any sufficiently powerful[1] system of arithmetic is incomplete. In this sense it means that there are statements one can make in the system for which no proof exists (of either its truth or falsity).

The second is that in such a system, the consistency of the system itself is one such statement.

It makes no such claim as to which statement is required.

> That's why we can't prove logic correct, because we have nothing else to throw into the mix.

Sure, but *that* system is also not provably correct. So what have you gained? You (claim to have) jumped one rung up a countably infinite ladder among a countably infinite selection of such ladders. Yay? :)

> So I have no qualms about throwing that infinity stuff into the proof, because otherwise you can't class zero as a number, because it behaves completely differently to all the other numbers.

Zero is a number. It works just fine. Division has a singularity at its value, but all kinds of functions have singularities. Do we need something else for tan(π/2)? Why not extend to the complex numbers with sqrt(-1) while we're at it? Quaternions? Octonions? Sedenions? Each of these is a separate algebra, an extension of algebra over the reals. We don't use them in general because we don't need the additional power they offer in day-to-day uses.

> Yes, because my logic (as per Godel) MUST be either incomplete, or inconsistent. Without that rule, it's inconsistent. With that rule it's incomplete. Pick one ... I've gone for consistency.

You're using the wrong definition of "consistency". It isn't consistent as in "all values must be able to take places of all other values in all expressions".[2] It is consistent as in "there are no contradictions between provable statements" which is *way* more important in (useful) mathematics.

[1] Peano arithmetic is sufficiently powerful. Arithmetic with just the natural numbers, addition, and multiplication, I believe, is not.
[2] You're still "inconsistent" in this sense about the square root of negative numbers for example. Why not toss those in? Why stop at trying to make division "consistent" in this sense when you're leaving out the trigonometric functions, square root, and the other infinite singularity-containing or domain-limited functions alone?

Python cryptography, Rust, and Gentoo

Posted Feb 14, 2021 16:34 UTC (Sun) by nix (subscriber, #2304) [Link]

> Each of these is a separate algebra, an extension of algebra over the reals. We don't use them in general because we don't need the additional power they offer in day-to-day uses.

... and because they gain annoying limitations (lose a useful property of the reals) with every such extension, and by the time you get to the sedenions there's not really very many useful properties they have left (associativity is more or less it).

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 7:17 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> I feel like you're not understanding Gödel.

This is nothing to be ashamed of, by the way. I took an entire college course just focusing on the incompleteness theorems, and I still only have a very loose ability to follow their basic form. The incompleteness theorems are very, *very* hairy math. You cannot simply skim a couple of Wikipedia articles and expect to understand Gödel.

If you insist on trying to figure out what Gödel was saying without spending multiple years of your life studying the surrounding mathematics, then I would suggest starting out with Gödel Escher Bach. Yes, that's a very thick book, no, you should not skim it. The main advantage of GEB is that it actually does explain why and how completeness breaks down under arithmetic, using a real (if rather awkward) implementation of PA. For this reason, it is not an easy read, but it's better than an introductory model theory textbook.

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 22:46 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

GEB is not an easy read, but it is probably as easy and fun a read as any book that makes a serious pretense of explaining Gödel's Incompleteness Theorem is likely to be.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 3:29 UTC (Fri) by zev (subscriber, #88455) [Link] (4 responses)

C is the perfect language for programming a PDP-11. It's just that modern computers behave completely differently to a PDP-11. [...] Someone needs to do a "C", and design a new low-level language for programming x64.

I see this kind of thing said a lot, and frankly it's never made the slightest bit of sense to me. What exactly about C itself is remotely PDP-specific? It doesn't strike me as terribly specialized for a PDP or any other particular ISA as it is for the Von Neumann model of computation, which was still pretty ubiquitous last time I checked. If we were all doing dataflow on FPGAs or whatnot, then sure, it'd be a poor fit, but we're still fetching and executing instructions (semantically) one at a time that load and store bytes in memory, pretty much just like Ken and Dennis did on their DECs.

"But modern machines have out-of-order execution and branch prediction and multi-level cache hierarchies!" I've seen some people argue...sure, but the whole point of that kind of microarchitectural sophistication is that it's microarchitectural -- it's not even directly visible at the assembly level, let alone in a high-level language. (Itanium exposed bits of its microarchitecture at the ISA level and look what a raging success that was.)

C's not without its shortcomings, but this notion that it's inappropriate for today's machines because it was initially run on a PDP-11 seems rather silly. Some of those shortcomings:

  • unsafety: if a 1970 PDP had been exposed to the variety of hostile inputs today's internet-connected machines are, this would have been just as much an issue.
  • pointer aliasing: C's challenges are much more entangled with modern compilers than they are with our hardware.
  • lack of abstractions/facilities for "programming in the large": pretty clearly unrelated to the underlying hardware.

None of these seem at all connected to its PDP origins. How would a "language for programming x64" differ?

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 4:21 UTC (Fri) by roc (subscriber, #30627) [Link]

Memory accesses are much slower on modern machines relative to other operations, so it is more important than it used to be to avoid redundant loads and stores. Thus, alias analysis has become more important to optimization, and C compilers more aggressive about exploiting whatever assumptions they can get away with (e.g. type-based alias analysis).

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 11:20 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> C's not without its shortcomings, but this notion that it's inappropriate for today's machines because it was initially run on a PDP-11 seems rather silly. Some of those shortcomings:

It's not that it's inappropriate. One of its major failings is that people *think* it's low level, but it doesn't map that well to what modern processors actually DO. In short, we treat it like the low-level language it *was*.

And it's that disconnect between what we think, and what actually happens, that causes all the problems.

Let's take your "unsafety" point, for example. On a PDP-11, I could have easily reasoned about what was ACTUALLY HAPPENING inside the CPU. That's not to say my programming is perfect, but my mental model of reality would have been reasonably close to reality. Nowadays, that's not true AT ALL.

And that's what bites kernel programmers all the time. Especially the noobs, their mental model of what's going on is wildly out of kilter with reality. The compiler takes the code they wrote and massively rewrites it behind their backs. And then the CPU effectively runs the object code in an interpreter I often get the impression ...

That's the point of a low-level language. Imho, if you have well-written code in a low-level language, the compiler SHOULD NOT be able to do that much optimisation. That's not a description of modern C !!!

And therein lies our problem.

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 19:08 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

<font class="QuotedText">&gt; That's the point of a low-level language. Imho, if you have well-written code in a low-level language, the compiler SHOULD NOT be able to do that much optimisation.</font>

<p>If we would use that definition then modern systems don't have <b>any</b> low-level languages. Not even machine code conforms: CPUs with speculative execution may do massive changes to what you wrote in your code!</p>

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 23:13 UTC (Fri) by Wol (subscriber, #4433) [Link]

Didn't I say that the CPU was an object code INTERPRETER? :-)

But if I'm unable to REASON LOGICALLY about what the CPU is going to do, how on earth am I going to get deterministic (ie it does what I want it to do) behaviour from my program?

It's turtles all the way down and logic (and the ability to debug!) has just gone down the plughole with the bathwater ...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 15:28 UTC (Mon) by anton (subscriber, #25547) [Link] (4 responses)

Interestingly, I have seen repeated claims that current widely-used architectures have been designed for C. While I don't think that's what actually happened in most cases, the claim that C is a bad fit for current architectures is grotesque (although, admittedly, C does not have language features for all the architectural features that architectures have; some are reflected in GNU C extensions, e.g., labels-as-values or vector extensions).

Concerning a new low-level language, yes, we need that, not because C (used as a low-level language) is a bad fit for current architectures, but because the gcc and clang maintainers do not want to support C as a low-level language, and the mindset behind that seems to pervade the C compiler community.

Python cryptography, Rust, and Gentoo

Posted Feb 15, 2021 19:44 UTC (Mon) by Wol (subscriber, #4433) [Link] (3 responses)

Thing is, C *is* a bad fit for modern architectures. It has a whole bunch of features that are undefined, or implementation-defined, which are MEANT to be low-level "match the hardware". Except that they aren't.

Let's just get rid of all these so-called "low level" cock-ups, accept that C is now a high-level language and that undefined and implemetation-specific behaviours shouldn't exist, and move on.

Someone brought up retpolines - that monstrosity that tries to make sure that both the hardware and the software agree on the imaginary hardware interface that's where the CPU microcode and language macrocode meet ... wtf are we doing!

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 9:27 UTC (Tue) by anton (subscriber, #25547) [Link] (2 responses)

I did not mean the language lawyer version of C. That version is a bad fit for any architecture (including the PDP-11). However, it's great for adversarial compiler maintainers who want to do whatever they want (e.g., produce good benchmark results, grudgingly cater to requests by paying customers and tell other users that their bug reports are invalid), because this version allows them to always blame the programmer for something or other. After all, no terminating C program is a "strictly conformant program", and whenever someone mentions "conformant program" (the only other conformance level for program defined in the C standard), the adversaries produce advocacy why we should consider "conformant programs" as meaningless (interestingly, they claim that we should take the rest of the C standard at face value).

I mean C as used in many programs, which has a pretty simple correspondence to architectural features (and you see it easily in contexts where optimization does not set in, e.g., when you separately compile a function that performs just one thing).

The adversaries want us to consider C as a high-level language with no correspondence to the bare metal; that makes it easier to blame the programmers and absolve compiler maintainers of responsibility. The question is why any programmer would want that. We have plenty of high-level languages, often better than C in that capacity, but not that many low-level languages; basically C is the only popular one.

Concerning a totally defined C: I think that is at odds with a low-level language for multiple architectures, but as most (all?) C compilers have demonstrated for the first quarter-century of the language, that's no hindrance for implementing C in a benign rather than adversarial way. And for those who don't know how to do that, I have written a paper (which also explains why I consider totally defined C impractical).

I don't know what retpolines have to do with any of that. They are a workaround for a vulnerability in some microarchitectures and they cannot be implemented in C (there are limitations to C's low-level nature). The vulnerability should be fixed at the microarchitecture level, and I expect that the hardware manufacturers will come out with microarchitectures that do not have this vulnerability.

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 13:18 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> I did not mean the language lawyer version of C.

What version of C are you talking about? The ISO standard? The image of C that you have in your head? My head? The C that GCC 2.95 accepted and worked with?

Let's imagine a world where C compilers magically stop doing "magic optimization" steps that tend to break code. What's going to happen is that C programmers that don't know this stuff already is going to have their code be pessimized and, presumably, slower in practice. What are they going to do? Start writing their C in such a way that the compiler was doing internal transformations during optimization passes anyways. They'll learn C more (and, I imagine, less satisfied with it), hopefully be using linters and tooling to tell them where their NULL checks are inverted with uses and such.

Rereading that, maybe it wouldn't be so bad. Maybe folks would migrate to better languages. Others might actually learn more about how loose C is in practice. The optimization passes could be migrated to the linters rather than the compiler to explain "hey, you could reorder your code to This Way and gain some performance". Maybe these passes would then gain some prose explaining what and why of them.

Then again, I have no idea such a C would be specified at ISO to disallow these optimizations while still allowing for architectures to not be forced into twos-complement representations or the like because "it's faster/easier for them".

Python cryptography, Rust, and Gentoo

Posted Feb 16, 2021 15:28 UTC (Tue) by anton (subscriber, #25547) [Link]

As I wrote: "I mean C as used in many programs", and I actually point to a paper where I explain this in more detail. As for "pessimizing", it's certainly the case that advocates of adversarial C compilers claim that the adversarial behaviour is good for performance, invariably without giving any numbers to support these claims; maybe they think that repeating these claims and wishful thinking makes them true.

Wang et al. checked that for gcc-4.7 and clang-3.1 and found that the adversarial "optimizations" produced a minimal speedup on SPECint 2006, and that speedup could also be achieved by small changes to the source code in two places.

Yes, a performance advisor that points out places where changing the source code may improve performance would be a more productive way to spend both the compiler maintainer's and the compiler user's time than writing "optimizations" that break existing code, "sanitiziers" to find the places where the "optimizations" break it, and, on the programmer's side, "sanitizing" their code to withstand the latest attacks by "optimizers" (but not the next one). Moreover, such an advisor could point out optimizations that a programmer can do but a conformant compiler cannot (e.g., because there is an alias that even language lawyering cannot explain away). Of course, unlike maintained programs, benchmarks would not benefit from such a performance advisor, that's why no work goes into performance advisors; and conversely, "optimizations" don't break benchmarks (the compiler maintainers revert the "optimization" in that case), unlike other programs, and that's why we see "optimizations".

But what's more, by insisting on a very limited interpretation of what C means, the language lawyers remove opportunities for optimizations that programmers can make in the source code. I have discussed this at length.

I, too am skeptical that trying to change the C standard is the way to get rid of adversarial C compilers (not the least because you won't be able to achieve consensus with the implementors of these compilers on the committee), and I guess that's why advocates of adversarial compilers like to direct the blame for the misdeeds of these compilers at the standard, rather than at the compiler maintainers. It's not the standard that requires them to miscompile existing, tested and working, programs, it's the compiler maintainers' choice, so they alone are responsible.

Concerning architectures with other than two's complement representation of signed numbers, the last new such architecture was introduced over 50 years ago, and descendants of such architectures exist only in very few places and run select programs. There are far fewer of these machines than of the architectures (all two's-complement) that are not LLVM targets and that have brought about the parent article. And coding in a way that makes use of knowledge about the representation is one of the things that you can do for performance in a low-level language (and compilers do not perform all of these optimizations).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds