Development quote of the week

Posted Dec 3, 2022 17:38 UTC (Sat) by anton (subscriber, #25547)
In reply to: Development quote of the week by khim
Parent article: Development quote of the week

Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them).

It is just as feasible as for Linux. Programs that are not supposed to work with a particular C compiler are as irrelevant for the question of whether a newer version of that compiler breaks a program as programs that are not supposed to work on Linux are irrelevant for the question of whether a newer version of Linux breaks a Linux user space program (and there are probably more programs that work in Linux user space than are compiled by, say, gcc).

On my paper about backwards compatibility for C compilers:

I saw that before. Like all O_PONIES proposals they end up in a trash can (like similar proposals for Linux which “doesn't break userspace”, lol) because they lack consensus.
It's completely not clear why your proposals should be accepted and not bazillion other proposals thus compiler developers do a sensible thing and just wait for the ISO C and ISO C++ committee to sort all that mess out.

You may have seen it, but you failed to understand it. My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs. Such programs may be compiler-specific and architecture-specific (so beyond anything that a standard tries to address), but that's no reason to break them on the next version of the same compiler on the same architecture.

My paper does not tell anyone how to make a compiler-specific program portable between compilers, nor how to make an architecture-specific program portable between architectures. The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

Development quote of the week

Posted Dec 3, 2022 19:14 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

> My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs.

What's the difference? If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers. If it does make it possible to establish that then it may as well be a proposal for a change in a C standard.

> Such programs may be compiler-specific and architecture-specific (so beyond anything that a standard tries to address), but that's no reason to break them on the next version of the same compiler on the same architecture.

But nobody tries to break any programs. They just follow the rules.

> The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

You haven't proven that in your article. You haven't presented any “friendly C compiler”, you haven't proven it can do auto-vectorization, but, most importantly, you haven't proven that if you create such a compiler you would be able to make anyone happy.

I'm pretty sure that “writers to the hardware” would find a way to become angry even on your compiler, but it's hard to check because there are no compiler to look on.

At least Regehr tried to do something constructive. The only thing you do is tell “O_PONIES are possible, trust me” in a different words.

It's typical for proponents of O_PONIES: they never bother to explain how exactly their “friendly C compiler” would work, they never explain what they propose to put inside, they just repeatedly assert that creation of black box of some shape is possible.

That's not very constructive.

Development quote of the week

Posted Dec 4, 2022 18:31 UTC (Sun) by anton (subscriber, #25547) [Link] (3 responses)

If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers.

My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves. That may be useless for the developer of an adversarial C compiler, true.

[...] you haven't proven that if you create such a compiler you would be able to make anyone happy.

I leave it up to the reader to decide whether a backwards-compatible compiler would make them happier than an adversarial compiler. But the idea of a proof that a certain kind of compiler makes anybody happy is interesting. What methodology would you accept for the proof?

Development quote of the week

Posted Dec 4, 2022 18:57 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

> My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves.

Yup. O_PONIES, O_PONIES, and more O_PONIES.

> That may be useless for the developer of an adversarial C compiler, true.

Inventing different derogatory names for the people when you are trying to convince them to do something for you is not very good strategy.

> I leave it up to the reader to decide whether a backwards-compatible compiler would make them happier than an adversarial compiler.

I'm not asking about where someone would think if such compiler would make them happy but whether it would actually make them happy. These are different things, you know.

It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy.

CompCert made a decent shot at what you are demanding, apparently, why haven't you became happy with it and still try to convince makers of “adversarial” compilers to do something (and do that by calling them childish names and, in general, trying to make sure they wouldn't hear you)?

It's not as if it's just a problem of getting people aware, CompCert is not a new thing.

> What methodology would you accept for the proof?

Easy: people tend to use what they like and don't use what they dislike. In 10 years no significant users of “adversarial” compilers have made that switch. They prefer to complain about unfair treatment yet continue to use gcc and clang.

Even Linus, who, famously, refuses to entertain the notion of using clang (which is funny if you consider the fact that certain facilities in kernel are no compatible with GCC) haven't made the switch. Why do you think it happens?

Development quote of the week

Posted Dec 6, 2022 18:51 UTC (Tue) by anton (subscriber, #25547) [Link] (1 responses)

John Regehr wrote: "A sufficiently advanced compiler is indistinguishable from an adversary." I don't agree that this is an "advance", but if compiler maintainers take the attitude that you are advocating here, the compilers are certainly going to become more adversarial the more sophisticated they get.

I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.

Your CompCert link does not mention anything that sounds like what I describe. Instead, the headline feature is formal verification of the compiler. CompCert's description of the supported C dialect also makes no mention of any such ambitions.

As for why people have not made "the switch". The switch to what? Compcert, a research project that has few targets and does not fully support setjmp() and longjmp(), and does not even talk about anything related to the issue we have been discussing here, and has deviations from the standard ABI of the platforms it supports?

GCC and Clang are apparently not adversarial enough for that; the approach seems to be that they try to be backwards-compatible by testing with a lot of real-world code out there (which is good), and mainly unleash the adversarial attitude when reacting to bug reports (not good). Also, the C language flags (like -fwrapv) available cover the most common issues, the remaining cases have not been painful enough to make people switch to a different C compiler (which one?).

Switching to a language with a more friendly compiler maintainer attitude is a big job, and is not done easily. However, when starting a new project, that's a good time to switch programming languages; now we just need a way to count how many new projects use C as its primary language now, compared to, say, 10 years ago.

Development quote of the week

Posted Dec 6, 2022 19:26 UTC (Tue) by khim (subscriber, #9252) [Link]

> I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.

These are fine if you don't plan to ask someone to do something for you. And it wasn't invented by me. It was, basically, invented on the LKML precisely when people started discussing situation about applications expected specific semantic which was never guaranteed or promised and which new versions of Linux kernel stopped providing. So much for 100% backward compatibility being a panacea for everything.

As you can guess the end result was precisely and exactly like with C compilers: there was much anguish, lots of discussions but in the end it was declared that since these guarantees were never there and code just happened to work because of accident app developers would have to rewrite their code if they want these guarantees.

> Compcert, a research project that has few targets and does not fully support setjmp() and longjmp(), and does not even talk about anything related to the issue we have been discussing here, and has deviations from the standard ABI of the platforms it supports?

So now you want full compliance with everything, too? Even more O_PONIES.

> now we just need a way to count how many new projects use C as its primary language now, compared to, say, 10 years ago.

Obviously that number would go down. C, basically, refused to advance when other languages did. C18 is very similar to C90 and almost undistinguishable from C99. I don't think it would be interesting idea to look on that, C was slowly turning into COBOL without any tales of adversarial compilers.

More interesting would be fate of C++. Use of C++ was growing, not shrinking, recently. Would be interesting to see what will happen to it.

Development quote of the week

Posted Dec 4, 2022 0:30 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (12 responses)

> My paper does not tell anyone how to make a compiler-specific program portable between compilers, nor how to make an architecture-specific program portable between architectures. The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

This basically leads down to an "IBSan" tool that detects implementation-defined behavior and signals on used-to-be-UB-but-is-now-arch-dependent. Portability is a benefit of code and if I know that my x86-compiled code is UB-free, it'll have the same behavior (but certainly not performance profile) on ShinyNewArch that gets released a decade from now. I really don't want to have to go to every project and make sure that they CI test my pet arch to make sure I don't have live grenades being lobbed my way on every update. I expect that Debian and NetBSD porters to obscure architectures appreciate that breaking these rules is just as "bad" on the "native" development platform(s) as they are on their target(s).

Now, if there were an in-language (no, the preprocessor doesn't count) to say "this is targeting x86 because we're talking to an IME, give me native behavior", *then* I could see there being some new "undefined-if-portable behavior" bucket for these kinds of things to go into.

Development quote of the week

Posted Dec 4, 2022 17:46 UTC (Sun) by anton (subscriber, #25547) [Link] (11 responses)

Portability is valuable in many settings, and I think I am quite experienced in the area, with Gforth (which "breaks the rules" (in your terminology) a lot) usually working out of the box on new architectures and operating systems.

But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.

Portability is an orthogonal requirement. Your hypothetical "IBSan" tool may be helpful, although I have my doubts, see below. In practice I test for portability by making test runs on as many different platforms as I can get my hands on. That's not 100% reliable, but it tends to work quite well.

I have my doubts about "IBSan" because it assumes one binary that should cover all portability variants. Real-world portable C programs often have lots of conditional compilation and stuff coming from configure to help with portability. If you write "the preprocessor doesn't count", it's obvious that you are not interested in C as it is used in the real world.

Development quote of the week

Posted Dec 4, 2022 18:27 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

> But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.

Well, neither C, C++ or Rust are even trying to be “friendly” by that definition (here's recent example where Rust 1.65 doesn't accept source which Rust 1.64 accepted). That's fine with Rust users, yet, apparently not fine with small (but very vocal) group of C users.

That's basically why C and C++ are doomed: in their world compiler users and compiler developers each talk in ultimatums which the other side is not willing to accept, which means conflict could never be resolved.

I have seen so much talks about “friendly C” (O_PONIESs, really) from C users, but don't even know a single optimizing compiler developer who subscribes under that idea.

Development quote of the week

Posted Dec 4, 2022 20:45 UTC (Sun) by pizza (subscriber, #46) [Link] (2 responses)

> Well, neither C, C++ or Rust are even trying to be “friendly” by that definition (here's recent example where Rust 1.65 doesn't accept source which Rust 1.64 accepted). That's fine with Rust users, yet, apparently not fine with small (but very vocal) group of C users.

That should read -- "that's apparently fine with the current Rust users".

C and C++ have several orders of magnitude of users than Rust. And those users (and compiler writers, and language stewards) are all trying to pull in their own, often incompatible, directions, collectively with literally billions of lines of code/baggage.

Rust, by virtue of being rather youthful, doesn't yet have a significant mass of users or use cases. There is only one implementation, produced by the same folks who define the language, and most of the users are still of the True Believer sort. All of this will inevitably change, and when it does, the needs of these various sub-groups will inevitably begin to diverge, and then the current "our way or the highway" language+implementation stewardship model will start failing.

If Rust does eventually succeed (ie ends up as a "legacy" language with many hundreds of millions of lines of code in wide deployment across tens of thousands (if not more) of organizations with divergent needs spanning a couple decades or so) then continuing to evolve it will run into many of the same sorts of problems that C and C++ face today -- ie problems of politics and governance.

I don't have any skin in this particular game, but I've been around long enough to see certain patterns, including the "we're smarter than those other guys so we'll be immune to their problems" hubris that *always* comes back to bite.

Development quote of the week

Posted Dec 4, 2022 21:53 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

> C and C++ have several orders of magnitude of users than Rust.

I don't think so. There are certainly a lot more existing code in C and C++, because they had several decades of headstart. As for number of actual users it's hard to say for sure, but recent countings put at around half of Go or third of Kotlin (and about ten times less popular than JavaScript which, you must admit, it definitely more popular than C, C++ or Rust).

> Rust, by virtue of being rather youthful, doesn't yet have a significant mass of users or use cases.

No. The important thing is not fact that Rust is youthful, but the fact that Rust users are youthful. The fiasco that happened with C and C++ is mostly caused by old people who still remember times where it was possible to pretend that C is “portable assembler”, “program to the hardware” and expect that compiler wouldn't screw you.

I dealt with quite a few newgrads and they accept strange and bizzare rules of standard C/C++ without much complaints. For them it's just how this weird language works. Strange rules, but hey, rules are rules. And the same happens with Rust.

But in C, very often, they have to deal with these old “relax, I know what I'm doing, I'm older than C, I know how it works” guys. While in Rust these guys, as I have said, are expelled from the community, instead.

I don't think this would change. Even if number of Rust developers is not ⅓ of number of C/C++ developers but closer to ⅒ of number of C/C++ developers it's pretty obvious that C/C++ style disaster wouldn't happen to Rust.

Plank's principle in action: An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning

I have meet real old software-related guys when I was in college and what I observe today reminds me of their tales about how structural programming arrived. The exact same refusal to accept new idea, the insistence that “proper” design is with the use of flowcharts on A1 (or A0 for complex cases) papers and that all these newfanged things like stacks or loops are just making development difficult and so on so forth.

The only big question is whether this time Rust (and Rust-like) language would actually win or if history would repeat itself and after initial success of languages properly structural like Algol or Pascal some half-backed newcomer would come and take over (like C and C++ did).

Time will tell.

> I don't have any skin in this particular game, but I've been around long enough to see certain patterns, including the "we're smarter than those other guys so we'll be immune to their problems" hubris that *always* comes back to bite.

Nah. I don't think there's any chance of Rust making the same mistake as C and C++ did, but it certainly can do an entirely new ones.

E.g. its approach to async programming… I'm still not convinced it's the right one and wouldn't lead to dead end.

It's a tale as old as time.

Posted Dec 5, 2022 16:09 UTC (Mon) by smoogen (subscriber, #97) [Link]

Most of these threads seem to mirror conversations I remember in the late 1980's when obfuscated C programs were big and many of the people who are now old, argued the same things about how the compiler should OR should not have allowed it. It also mirrors arguments between K&R C 1978 and K&R C 1988 version. The fact that many compilers would allow some middle road between 78 and 88 until the early 00's just allowed for 'what does C mean?' arguments even longer.

Development quote of the week

Posted Dec 5, 2022 12:02 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

Given "If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform", what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?

This is not a pure hypothetical - GCC 3 is not merely a "new version of GCC", it's a new compiler (the egcs GCC fork) that was adopted by GCC as the clear better outcome. If you set a rule like your proposed rule, what stops GCC21 being a new compiler, not version 21 of GCC?

Development quote of the week

Posted Dec 6, 2022 22:46 UTC (Tue) by anton (subscriber, #25547) [Link]

what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?

Self-respect.

But actually that's somewhat the situation we have with gcc (and probably clang) now, only the maintainers don't say explicitly that their compilers are not backwards-compatible (they certainly have declared bug reports as invalid that clearly state that the code has worked with earlier gcc versions), so some people think of switching to a newer version of gcc as being an upgrade. It's not.

Even when starting with the same code base a compiler can be backwards-incompatible (as demonstrated by some gcc versions newer than 3), and with a different code base it can be compatible (but that's hard). and actually ecgs was forked from the pre-gcc-2.8 code base.

Development quote of the week

Posted Dec 6, 2022 3:26 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (4 responses)

> If you write "the preprocessor doesn't count", it's obvious that you are not interested in C as it is used in the real world.

I'm interested in *improving* things so that the compiler can *see* "this code is x86-bound, feel free to optimize appropriately" with proper attributes rather than code-masking performed by the preprocessor. Flowing "this code was selected based on a check of `defined(__x86_64__)`" is unlikely to be tenable with how complicated some preprocessor checks are (and *their abstractions* used in various libraries).

Development quote of the week

Posted Dec 6, 2022 22:55 UTC (Tue) by anton (subscriber, #25547) [Link] (3 responses)

There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.

When you write "this code is x86-bound, feel free to optimize appropriately", what optimization do you have in mind?

Development quote of the week

Posted Dec 6, 2022 23:03 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> When you write "this code is x86-bound, feel free to optimize appropriately", what optimization do you have in mind?

I'm thinking that the optimizers can assume specific behavior for things instead of considering it UB. For example, left shift by too much can keep the same value (IIRC, ARM makes it 0). The programmer *intent* that this is target-specific is what is important here. Bare C code doing such a shift is still in the "this doesn't mean what you think it means, so I will assume that such Bad Things™ don't happen".

> There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.

I also think that conditional compilation is here to stay. However, it being a code-blind copy/paste mechanism doesn't have to be true. With `constexpr` instead of preprocessor symbols, it is possible to have something like D's `static if` or Rusts `cfg!()` mechanisms to hide code during compilation. This allows it to still be syntax checked and formatted appropriately instead of being a wild west of sadness when some long-dormant branch with unbalanced curly braces finally gets activated.

Development quote of the week

Posted Dec 7, 2022 0:29 UTC (Wed) by khim (subscriber, #9252) [Link]

> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).

It's a bit worse than that. ARM uses low byte to do shift which means that shift by 128 is, indeed, zero, but shift by 256 doesn't change anything (and doesn't touch flags).

Development quote of the week

Posted Dec 7, 2022 0:54 UTC (Wed) by khim (subscriber, #9252) [Link]

> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).

Note BTW, that the very first CPU, 8086 (and 8088) performs like ARM, not like all subsequent CPUs.

Means Intel took advantage of this UB back when it was developing Intel 80186 forty years ago.

ARM also have similar case, e.g., it has push and pop instructions which may push or pop from 1 to 16 registers as result of one instructions. If you specify 0 registers then some manufacturers treat it as NOP, some treat as UD, but it's also permitted to load random set registers from stack including PC counter!

So much for predictable hardware, huh? In fact document called ARMv8 AArch32 UNPREDICTABLE behaviours lists more than 50 of these.