Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:34 UTC (Wed) by farnz (subscriber, #17727)
In reply to: Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack) by pizza
Parent article: Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about, since many of the interesting UBs are "no diagnostic required", and the compiler doesn't issue a diagnostic even with -Wall -Wextra, whereas the Rust compiler errors out on semantically identical code without "unsafe". This means that UB creeps back into the C++ codebase over time, as developers believe that they're doing something safe, but they've in fact misunderstood C++, and the compiler doesn't stop us making that mistake; instead, it has to be caught at code review time, or by someone running other static analysis tools and determining that, in this case, it's not a false positive, it's a genuine UB.

What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:20 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

> Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about

Dude, you keep moving the goal posts here.

I thought folks weren't using static analyzers because dealing with their (mostly legit!) voliminous output is too haaaard?

Again, how is "rewrite things in a much less forgiving language" -- which requires _considerably more_ short-medium-term effort -- going to fly?

> What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

...Yet you keep claiming that "Nerd harder! Learn and use entirely new/different/tougher tools!" will work.

Seriously.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:37 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

My goalposts are immobile - they're my lived experience.

My experience is the voluminous output of static analysers is not mostly legit - it's mostly false positives. But the time taken to establish that they're false positives is huge, because you have to check each and every instance and work out what annotations are needed to convince the analyser that this instance is legitimate, without simply going for the "big hammer" of "trust me, I know what I'm doing" (which is the very thing we've agreed humans are bad at using, because virtually all C++ is "trust me" in this regard). This is why, IME, you get more than one problem per line of code on average when you start applying C++ static analysis tools; and getting people to rewrite all 10M lines of C++ in static analyser friendly C++ is a huge demand to make of them.

Compiler errors are a lot less problematic, because compiler authors take care to not put warnings in that have a large number of false positives, but then you have a huge number of false negatives, because the compilers don't error on constructs that are conditionally UB unless they are completely confident that the condition that makes it UB holds true in all cases.

Rewriting things in a much less forgiving language helps, because the static analysis of the less forgiving language has orders of magnitude fewer false positives, and the annotations needed to help the static analyser understand what we're doing are correspondingly simpler to add in the cases where it gets it wrong. On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:19 UTC (Wed) by khim (subscriber, #9252) [Link] (8 responses)

> On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

You don't really need a new, "fun" language for that, though. You need different community.

Look on what Ada people did. They first introduced more strict language, then, gradually, made it part of core language, and then, eventually, made it work with pointers (by taking ideas from Rust, of course).

They basically implemented the Bjarne Stroustrup plan, if you'll think about it!

But why have they succeded where C and C++ couldn't even moved a tiny bit? Because Ada community always worked with a language, while C and C++ community still have a lot of people who are preaching “a portable assembler” myth.

But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

If you start adding more and more as if tranformations then start losing the ability to predict what assembler would be produced from a given source and after certain threshold the only approach that works is to work with formally defined language (like Ada developers do, like Java developers do, like Rust developers do… and like many C and C++ developers don't do).

And while this process is gradual the outcome is unevitable: to reach something safe you need to, somehow, change the community and the simplest and most robust way to do that is to start with a different community which doesn't try to talk about “a portable assembler”.

Note: we have already passed through couple (or more?) such transitions. First when people like Real Programmer Mel was replaced with people who started using assembler. Then another time when hardcode real programmers who felt that jumping from the middle of one procedure into a middle of another one is an Ok thing to do was replaced with next generation. Then there were couple of failed attempts to do similar transformations with OOP and managed languages, which failed because they impose too much limitations on the execution environments which meant they can only occupy certain niches but couldn't replace low-level libraries written in C/C++/FORTRAN (if you want to say that this attempt “haven't failed” then recall that literally everything was supposed to be implemented on top of JVM in one alternate reality and on top of CLR in another one).

Now we have another attempt. Only time will tell whether something that Rust discovered, almost by accident, would change computing world as radically as introduction of structured programming did, or whether this would be another niche solution like OOP and managed languages.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:01 UTC (Wed) by vadim (subscriber, #35271) [Link] (7 responses)

> But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

I think you're taking the analogy too far. When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

So in a "portable assembler", "a+b" works out to the architecture's ADD instruction. Optimization is fine, you can do loop unrolling or subexpression elimination, but what you write still works the way the underlying machine does, so for instance, overflow still does whatever overflow does on that CPU.

Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing. The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:24 UTC (Wed) by mb (subscriber, #50428) [Link]

>I imagine a compiler that targets the underlying architecture

Such a thing probably doesn't exist and didn't exist for many decades.
And there are good reasons for that. Among: The number of required compiler variants would explode immediately otherwise. There are hundreds of active in use target hardware variants.

>Optimization is fine, [..], but what you write still works the way the underlying machine does

The majority of optimization steps don't even know what the "underlying" machine architecture is and they also don't know what source language the intermediate code came from.

>Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

It very well is an "architecture thing". Look into the hardware manuals. Look at the ARM manual. You will find that many (probably most) instructions have some kinds of UB scenarios documented. You can't just issue random instructions and expect the instruction stream to have defined behavior. That is not how hardware works.

>The architecture is perfectly capable of doing those things

Yes, but "the architecture" does not define the other properties of a pointer, besides its address. These properties still are UB.

You can't ignore the C machine model, when programming C with an optimizing compiler. The "underlying architecture" doesn't help to resolve the UB in many if not most cases.
Only restricting the language would help. But that would create a language fork.

"Programming for the hardware" and "C is a portable assembler" are fundamentally broken ways to think about C.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:50 UTC (Wed) by khim (subscriber, #9252) [Link] (4 responses)

> When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

Yeah, you are thinking about magic, O_PONIES and other such things. Which don't even exist. We have already established that.

> So in a "portable assembler", "a+b" works out to the architecture's ADD instruction.

Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

> Optimization is fine, you can do loop unrolling or subexpression elimination

How would that work? Some programs would stop working if you do that.

Literally everything in a C/C++ depends on lack of UB. I can easily construct an example which would break if you would do loop unrolling but then we would endlessly discuss various things about how to do loop unrolling t “properly” hus let's discuss something simple. For example this function:

void foo(int x) {
  int y = x;
}

This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Maybe you can do better? Just why this function can be converted (by subexpression elimination) to something that doesn't touch that sacred slot on stack? Any assembler than I know would do that, after all, why “portable assembler” wouldn't?

> overflow still does whatever overflow does on that CPU.

That one also makes cenrtain optimizations impossible, but it's not too much interesting: unsigned types always behaved in a predictable fashion thus most compilers have code to support that mode, it's only matter of enabling it for signed types, too.

> Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

Sure, but these UBs are not too much interesting, they even have flags to disable them. Violations of some other UBs (like attempts to access variable outside of it's lifetime like it could be done in early versions of FORTRAN) are much more interesting.

They very quickly lead to the need to pick one of two choices:

Define what you program does in terms entirely unrelated to machine code generated (and it doesn't matter whether bytecode is involved, C used bytecode before JVM, after all), or
Disable essentially all optimizations (which existing compilers already support with -O0 or similar switches)

Both alternatives are considered unacceptable by people demanding “portable assembler”, but there are no other choice, really.

> The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

And that's why there are switches that enable these things. They are genuinely useful, but they don't change the fact that you are still writing the code for the virtual machine which is entirely unrelated to “what hardware is actually does” (only connected via language spec).

Once more: you may argue that making that spec too different from actual hardware would be unwise, and I would even agree, but you are still coding for that abstract machine, and not to actual hardware. Heck, the fact that C had virtual machine in year 1983 in real, practical, not academic, projects and Java only got its start in 1991 tells us something, isn't it?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:47 UTC (Fri) by vadim (subscriber, #35271) [Link] (3 responses)

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

Sure, why not?

> This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

> Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Ah, I see. You're taking the "portable assembler" thing extremely literally. I take it very metaphorically, as an unreachable ideal. Because I think there can't be such a thing as a true "portable assembler". CPUs can be different enough that making a single language that accurately reflects both of them is impossible. That's why actual assembly is non-portable.

I suppose some other term would be less confusing to use, if there's one that fits.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 17:26 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

> I take it very metaphorically, as an unreachable ideal.

But how can code written for “an unreachable ideal” be used to produce something that can be trusted? How do you know what would or wouldn't work?

> I suppose some other term would be less confusing to use, if there's one that fits.

Problem is not with terms, problem is with expectations. “Portable assembler” usually means “I know what assembler code would be generated and I can use that knowledge”. But the problem is that this makes complicated optimizing compilers not possible and not feasible: if optimizer does some transformation which code writer couldn't imagine, then it's not “portable assembler” anymore… but how may compiler writer know which optimizations compiler user may or may not imagine?

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?
Sure, why not?

Because in normal assembler I can scan the instructions, find that ADD instruction and change it to SUB. Or do something else equally tricky.

> CPUs can be different enough that making a single language that accurately reflects both of them is impossible.

Yes, but which tricks can survive the optimization passes and which tricks you shouldn't attempt?

One approach is for the list of “forbidden tricks” is precisely equal to lists of UBs. That approach works. Making list of UBs more intuitive is worthwhile work, but if developers that make compilers compilers and developers that use compilers talk to each other then some consensus is possible (just take a look on attempts of Rust developers try to invent rules which which can present certain useful programming idioms with stacked borrows and tree borrows).

Alternate position tested and working programs should continue to work as intended by the programmer with future versions of the same C compiler is not a position at all, that's just pure wishful thinking.

Any change in the compiler, no matter how small, may break tested and working programs simply because in C one can take address of function, convert it to char* and inspect it. And no, please don't tell me “that's so preposterous, no one would ever do that”. I, personally, wrote some tricky programs which were disassembling certain functions and then used that information. And Windows compiler literally prepares functions for such handling.

This works fine if you talk to toolchain guys and they know about what happens. But to expect that it would be supported automatically, without any communication effort and beyond-language-spec-agreements like “we code for the hwardware” folks want? Sorry, that's just not possible.

Because without formal definition of the language we have no idea which transformations are valid and which are invalid the only way to provide something resembling “portable assembler” is to disable all optimizations.

IOW: changing list of UBs may or may not be useful (it's easy to reduce number of UBs by just simply converting existing ones into smaller and even more convoluted number of them), but attempts to insist that compilers should preserve behavior of programs with UB… it leads literally nowhere.

Again: most compilers (with optimizations disabled) produce the same output for this tested and working program and yet, somehow, anton and others like him ignore that (if they write any answer at all) or (even worse) propose to crate bazillion switches which make UB selectable (without explaining why they do not believe that -fwrapv option does not solve closes the problem of integer overflow, but bazillion similar switches added to standard would solve it).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 19:19 UTC (Sat) by jem (subscriber, #24231) [Link] (1 responses)

The "portable assembler" term comes from its usage as the output format from various compilers. Compilers traditionally (at least on Unix-type operating systems) do not produce an object file directly, but they output an assembly file, which of course is processor specific. By instead producing a C source file as output, and relying on the C compilers on the target machines as a "portable assembler", you don't need to write a separate backend for each of the target machines.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 8:33 UTC (Sun) by khim (subscriber, #9252) [Link]

Maybe some, indeed, call C a “portable assembler” for this reason, but most O_PONIES lovers have different meaning in mind: “I use C to create a machine code only because using assembler is too tedious thus I should be able to use any tricks that machine code can employ and that's why “portable assembler” is good name”.

When faced with “insane” examples which optimizers always broke (like I cook up) they have many different reactions, but mostly of “I don't write such an insane code thus I don't care” form.

Which, essentially, reduces the whole thing to the following, completely unconstructive definition: something may be called “portable assembler” when it includes optimizations that work fine on my programs and can not be called “portable assembler” when it includes optimizations that break my programs.

And as you may guess that approach doesn't scale: how may compiler writers know which code you consider “insane” and which code you consider “sane” if there are no definition?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:33 UTC (Thu) by anton (subscriber, #25547) [Link]

Several years ago I wrote up a position paper for benign C compilers, which may be what you are thinking of when you write "portable assembler".