Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:25 UTC (Wed) by pizza (subscriber, #46)
In reply to: Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack) by farnz
Parent article: Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

> and am describing how the sheer volume of warnings they produce is enough that you might as well rewrite from scratch in a different language rather than rewrite in the "approved" subset of C++.

There's no silver bullet to avoid doing tons of work; all you can do is expend large quantities of lead:

https://techcrunch.com/2011/10/25/lead-bullets

Your codebase has a f-ton of warnings? Nothing to be done except start fixing them [1] After all, how can you expect to *successfully* rewrite the code in a different language if what it's supposed to be doing is already unclear? [2]

Rewrite-everything efforts rarely succeed in the real world. You need an incremental path.

As the saying goes, the way to eat an elephant is one bite at a time.

[1] While also changing your code acceptance policies to reject any commit requests that don't compile cleanly; that way you get incremental improvement over time.
[2] The traditional answer to this is "refer to the specs/documentation" [3]
[3] Which can be found in storage next to the spherical cows and O_PONIES

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 15:10 UTC (Wed) by farnz (subscriber, #17727) [Link] (40 responses)

In practice, we know what the code is "meant" to do, it's just that the programmer is unclear about how to express that in fully-defined C++. Rewriting incrementally in Rust, with bindings between C++ and Rust so that we can avoid rewriting everything, means that we focus on the bits that are actually changing rapidly, and leave the thoroughly tested code for later.

End result tends to be a multi-language project - you still have the ancient FORTRAN code in there, because it works, and therefore doesn't need changing, along with some C, plenty of C++, and some Rust. Over time, the amount of legacy code in there falls because we get rid of the code that no longer meets requirements, but it's unlikely to go to zero (just as the FORTRAN hasn't gone completely).

This does depend on a decent FFI between the new language and the old; Rust has cxx.rs for C++ FFI, and the pair of cbindgen and bindgen for C FFI, which enables you to rewrite incrementally, only changing small amounts of code at a time. Given that I still work on codebases where there's comments about wanting to redo things once ANSI FORTRAN is available (and where the code was last modified in the 1960s), I suspect I'll never see a world where I have no legacy code at all.

But you can replace incrementally with a new language, reducing the amount of legacy code you have, as long as your new language allows for this (e.g. C# and Java do not if your legacy codebase includes FORTRAN).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:24 UTC (Wed) by pizza (subscriber, #46) [Link] (39 responses)

> In practice, we know what the code is "meant" to do, it's just that the programmer is unclear about how to express that in fully-defined C++

uh.... no.

How are to know what the code is "meant" to do when the programmer is unclear in their expression?

(If the code was sufficient to determine what was "meant" by the programmer, compilers could JustDoTheRightThing(tm) as a matter of course and this entire discussion would be moot)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:47 UTC (Wed) by farnz (subscriber, #17727) [Link] (38 responses)

Because we, as human programmers, are able to read the C++, notice the lack of clarity and then take actions: talk to the original programmer (if still available), talk to the current product managers, even sometimes talk to customers; in general, we can go outside the code to determine what it's meant to do.

And it's all those things we can do that the compiler can't do that allow us to determine what's meant. Compilers don't go "hey, this is ambiguous, but only one meaning makes sense in the context of the product, so I'll take that meaning"; humans can do that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:06 UTC (Wed) by pizza (subscriber, #46) [Link] (37 responses)

> Because we, as human programmers, are able to read the C++, notice the lack of clarity and then take actions: talk to the original programmer (if still available), talk to the current product managers, even sometimes talk to customers; in general, we can go outside the code to determine what it's meant to do.

In other words, the software, as written, is under-specified.

In order to successfully port software that contains said underspecified behavior, you're going to need to resolve those ambiguities. So why not resolve those ambiguities back into the original codebase?

If folks won't go through the effort of using even the most basic code-improvement tools they've had for years (eg turning on -Wall -Wextra and taking one bite out of the elephant at a time) what makes you think trying the much harder task of rewriting things in a completely different language is going to gain more traction?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:18 UTC (Wed) by mb (subscriber, #50428) [Link] (13 responses)

>what makes you think trying the much harder task of rewriting things
>in a completely different language is going to gain more traction?

If the maintenance pain or public pressure is big enough, people do start to rewrite things.

We do already have many re-implementations in safe languages. That is not going to stop.
https://crates.io/crates/rustls

In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:28 UTC (Wed) by pizza (subscriber, #46) [Link] (12 responses)

> We do already have many re-implementations in safe languages. That is not going to stop.
https://crates.io/crates/rustls

Funny, I can't find equivalents to my current employer's >100KLOC of bare-metal C, or my previous employer's >10MLOC of domain-specific C++ there.

Similarly, the, none of the code that only exists in various >20yr-old F/OSS projects I'm involved with can be found there.

...where exactly are all of these magical reimplementations supposed to come from, again?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:51 UTC (Wed) by mb (subscriber, #50428) [Link] (11 responses)

Nice try, but nobody said that.

I wrote:
> In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:01 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

> In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

It doesn't matter.... until it does, and xkcd # 2347 is demonstrated all over again in the form of yet another vulnerability-with-its-own-cutsey-domain is released.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:13 UTC (Wed) by mb (subscriber, #50428) [Link] (9 responses)

>It doesn't matter.... until it does

If you keep bombing your customers with security bugs in your application, sooner or later they will force you to rewrite things by cancelling the contract and buying something else.
Either that, or if your customer doesn't care, then it doesn't matter. Why rewrite it then?

People have learnt in the past decades that software always has vulnerabilities. They have learnt to live with that.
*But* people are currently learning that this is not written in stone. Software can have less or almost no vulnerabilities, if done right. That creates pressure to do it right or to leave and let somebody else do it right.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:42 UTC (Wed) by pizza (subscriber, #46) [Link] (8 responses)

> That creates pressure to do it right or to leave and let somebody else do it right.

You're forgetting the third rail -- the cost of doing it right.

Give folks a choice, and Every. Single. Time. they will choose the cheaper-in-the-short-term option, even if you make it explicit that this will come back to bite them later.

If you bid the project doing things the "right way" you'll lose out to the (much) lower bidder who will cut as many corners as they can get away with.

So yeah, it's absolutely a culture problem. But it's the culture of the folks who control the budgets, not the minions on the ground trying to do the best they can with the shit sandwich they're stick working with.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 22:56 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

It is my hope that the efforts being done at the US Federal and EU along the lines of "you're responsible for the bugs you ship" start to electrify that third rail against low bidders who hope that they can just ignore it.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:05 UTC (Thu) by smurf (subscriber, #17840) [Link] (6 responses)

> Give folks a choice, and Every. Single. Time. they will choose the cheaper-in-the-short-term option

Surprise: as your code's complexity increases, there's going to be a point where starting off with a language that excludes an entire class of errors is cheaper, even if you factor in the time required to learn the new language and/or paradigm, and even if you have to go and rewrite or adapt entire libraries to the new paradigm/language.

There's ample real-world evidence for this. The Linux kernel isn't gaining the ability to support drivers written in Rust (and ultimately more) just because people have nothing better to do.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:20 UTC (Thu) by pizza (subscriber, #46) [Link] (5 responses)

> Surprise: as your code's complexity increases, there's going to be a point where starting off with a language that excludes an entire class of errors is cheaper,

I disagree, because you're conflating two separate points.

*starting off* for something new, it makes complete sense to use Rust.

But we're not talking about something new here, we're talking about large existing codebases.

> There's ample real-world evidence for this.

...Is there, really? I see a lot of *complete replacements* of simpler stuff being rewritten, driven more by culture/licensing than technical merit (not unlike every other language ecosystem out there) but very little use within _existing_ codebases.

> The Linux kernel isn't gaining the ability to support drivers written in Rust (and ultimately more) just because people have nothing better to do.

I don't know about that -- From the outside, most research efforts are indistinguishable from "nothing better to do"

But more seriously, I'm not sure Linux kernel is a good example of _anything_ any more. According to modern sensiibilities, everything about it is wrong or impossible, including its development model (email? how quaint) and indeed its continued existence.

(The goal of the Rust-on-Linux folks is to eventually move *everything* to Rust, via the kernel-of-thesius approach. Quite frankly nothing else makes sense, and anything else represents a massive duplication of effort and much increased support burden. I don't have a problem with this, but I do wish they'd be more honest about it..)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:59 UTC (Thu) by farnz (subscriber, #17727) [Link]

From inside the corporate firewall, I see bindings from existing C++ to Rust being used to allow you to replace C++ over time, in a ship-of-Theseus approach. Most of the code is still C++, but new features are done in Rust, and the C++ code is gradually being eroded from the top down; where the "surface" layer of C++ is buggy, it's replaced with a new Rust module, rather than with new C++.

Over time, this is having two good effects:

The remaining C++ is the stuff that's been proven in the field, and is focused on by the best programmers in the company, because it's the critical layers that underpin everything.
The pace of feature development is increasing, because we're spending less time on "impossible" bugs that got missed in code review in the new features and bug fixes.

And I don't expect us to ever end up with nothing but Rust - heck, there's FORTRAN 77 in there still, with documentation that says that we'll do something better when ANSI FORTRAN is available to replace FORTRAN IV - but over time, we will lose the buggy C++ that bites us every so often in the field.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 13:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

> ...Is there, really? I see a lot of *complete replacements* of simpler stuff being rewritten, driven more by culture/licensing than technical merit (not unlike every other language ecosystem out there) but very little use within _existing_ codebases.

Yup.

And quite often the result is disaster. In the Pick world there are far too many horror-stories of management-driven migrations away, only for management to discover that the replacement is far less functional, the costs are ballooning, the customer base may be fleeing as their stuff stops working, and the management "double down" the sunk cost fallacy into bankruptcy.

Marketing people (who don't understand what they are talking about) sell silver bullets to management (who don't understand what they are talking about) and the technical people are left trying to pull rabbits out of hats. If you're talking governments throwing other peoples' money at it, that *sometimes* works - sometimes the tech people just can't do the impossible - but for private businesses sheer survival (or more often failure to do so) determines the end result.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 14:37 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

> But more seriously, I'm not sure Linux kernel is a good example of _anything_ any more. According to modern sensiibilities, everything about it is wrong or impossible, including its development model (email? how quaint) and indeed its continued existence.

The scale of coordination and development is certainly worth investigating and replicating. Far smaller projects with far fewer people involved aren't as smoothly run.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 15:24 UTC (Thu) by Wol (subscriber, #4433) [Link]

I think one MASSIVE reason behind the (continued) success of Linux is Linus. As a psychologist, he's been a pro at managing his workforce.

Most successful projects have one or two stars behind them, if we're lucky they continue to soar once the stars are gone ...

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 15:47 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Like others mentioned, I disagree. I've seen an uptick in Rust usage in places where people need something faster than Python, but don't want to wade into the swamp that is C++. The bits that are still C++ are the bits that work fine and don't need to be modified. Interestingly, it's not even the safety that attracts people, it's the module system.

If you want something not in the standard library, in C++ you need to install header files and libraries, probably via packages from the host OS, create Makefiles or whatever. Or learn something like CMake. Then you need to convince your buildbot to also do all those things in a reliable way.

Or you create a Cargo.toml file which lists all your dependencies and it Just Works(tm). The safety aspects of Rust just mean that they'll never have to learn how to become intimate with gdb to figure where this invalid pointer came from. And my experience with junior developers is that by learning up front to think about object lifetimes they produce significantly better code.

I suppose they could also have chosen Go, but they didn't.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:34 UTC (Wed) by farnz (subscriber, #17727) [Link] (11 responses)

Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about, since many of the interesting UBs are "no diagnostic required", and the compiler doesn't issue a diagnostic even with -Wall -Wextra, whereas the Rust compiler errors out on semantically identical code without "unsafe". This means that UB creeps back into the C++ codebase over time, as developers believe that they're doing something safe, but they've in fact misunderstood C++, and the compiler doesn't stop us making that mistake; instead, it has to be caught at code review time, or by someone running other static analysis tools and determining that, in this case, it's not a false positive, it's a genuine UB.

What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:20 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

> Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about

Dude, you keep moving the goal posts here.

I thought folks weren't using static analyzers because dealing with their (mostly legit!) voliminous output is too haaaard?

Again, how is "rewrite things in a much less forgiving language" -- which requires _considerably more_ short-medium-term effort -- going to fly?

> What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

...Yet you keep claiming that "Nerd harder! Learn and use entirely new/different/tougher tools!" will work.

Seriously.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:37 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

My goalposts are immobile - they're my lived experience.

My experience is the voluminous output of static analysers is not mostly legit - it's mostly false positives. But the time taken to establish that they're false positives is huge, because you have to check each and every instance and work out what annotations are needed to convince the analyser that this instance is legitimate, without simply going for the "big hammer" of "trust me, I know what I'm doing" (which is the very thing we've agreed humans are bad at using, because virtually all C++ is "trust me" in this regard). This is why, IME, you get more than one problem per line of code on average when you start applying C++ static analysis tools; and getting people to rewrite all 10M lines of C++ in static analyser friendly C++ is a huge demand to make of them.

Compiler errors are a lot less problematic, because compiler authors take care to not put warnings in that have a large number of false positives, but then you have a huge number of false negatives, because the compilers don't error on constructs that are conditionally UB unless they are completely confident that the condition that makes it UB holds true in all cases.

Rewriting things in a much less forgiving language helps, because the static analysis of the less forgiving language has orders of magnitude fewer false positives, and the annotations needed to help the static analyser understand what we're doing are correspondingly simpler to add in the cases where it gets it wrong. On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:19 UTC (Wed) by khim (subscriber, #9252) [Link] (8 responses)

> On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

You don't really need a new, "fun" language for that, though. You need different community.

Look on what Ada people did. They first introduced more strict language, then, gradually, made it part of core language, and then, eventually, made it work with pointers (by taking ideas from Rust, of course).

They basically implemented the Bjarne Stroustrup plan, if you'll think about it!

But why have they succeded where C and C++ couldn't even moved a tiny bit? Because Ada community always worked with a language, while C and C++ community still have a lot of people who are preaching “a portable assembler” myth.

But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

If you start adding more and more as if tranformations then start losing the ability to predict what assembler would be produced from a given source and after certain threshold the only approach that works is to work with formally defined language (like Ada developers do, like Java developers do, like Rust developers do… and like many C and C++ developers don't do).

And while this process is gradual the outcome is unevitable: to reach something safe you need to, somehow, change the community and the simplest and most robust way to do that is to start with a different community which doesn't try to talk about “a portable assembler”.

Note: we have already passed through couple (or more?) such transitions. First when people like Real Programmer Mel was replaced with people who started using assembler. Then another time when hardcode real programmers who felt that jumping from the middle of one procedure into a middle of another one is an Ok thing to do was replaced with next generation. Then there were couple of failed attempts to do similar transformations with OOP and managed languages, which failed because they impose too much limitations on the execution environments which meant they can only occupy certain niches but couldn't replace low-level libraries written in C/C++/FORTRAN (if you want to say that this attempt “haven't failed” then recall that literally everything was supposed to be implemented on top of JVM in one alternate reality and on top of CLR in another one).

Now we have another attempt. Only time will tell whether something that Rust discovered, almost by accident, would change computing world as radically as introduction of structured programming did, or whether this would be another niche solution like OOP and managed languages.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:01 UTC (Wed) by vadim (subscriber, #35271) [Link] (7 responses)

> But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

I think you're taking the analogy too far. When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

So in a "portable assembler", "a+b" works out to the architecture's ADD instruction. Optimization is fine, you can do loop unrolling or subexpression elimination, but what you write still works the way the underlying machine does, so for instance, overflow still does whatever overflow does on that CPU.

Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing. The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:24 UTC (Wed) by mb (subscriber, #50428) [Link]

>I imagine a compiler that targets the underlying architecture

Such a thing probably doesn't exist and didn't exist for many decades.
And there are good reasons for that. Among: The number of required compiler variants would explode immediately otherwise. There are hundreds of active in use target hardware variants.

>Optimization is fine, [..], but what you write still works the way the underlying machine does

The majority of optimization steps don't even know what the "underlying" machine architecture is and they also don't know what source language the intermediate code came from.

>Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

It very well is an "architecture thing". Look into the hardware manuals. Look at the ARM manual. You will find that many (probably most) instructions have some kinds of UB scenarios documented. You can't just issue random instructions and expect the instruction stream to have defined behavior. That is not how hardware works.

>The architecture is perfectly capable of doing those things

Yes, but "the architecture" does not define the other properties of a pointer, besides its address. These properties still are UB.

You can't ignore the C machine model, when programming C with an optimizing compiler. The "underlying architecture" doesn't help to resolve the UB in many if not most cases.
Only restricting the language would help. But that would create a language fork.

"Programming for the hardware" and "C is a portable assembler" are fundamentally broken ways to think about C.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:50 UTC (Wed) by khim (subscriber, #9252) [Link] (4 responses)

> When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

Yeah, you are thinking about magic, O_PONIES and other such things. Which don't even exist. We have already established that.

> So in a "portable assembler", "a+b" works out to the architecture's ADD instruction.

Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

> Optimization is fine, you can do loop unrolling or subexpression elimination

How would that work? Some programs would stop working if you do that.

Literally everything in a C/C++ depends on lack of UB. I can easily construct an example which would break if you would do loop unrolling but then we would endlessly discuss various things about how to do loop unrolling t “properly” hus let's discuss something simple. For example this function:

void foo(int x) {
  int y = x;
}

This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Maybe you can do better? Just why this function can be converted (by subexpression elimination) to something that doesn't touch that sacred slot on stack? Any assembler than I know would do that, after all, why “portable assembler” wouldn't?

> overflow still does whatever overflow does on that CPU.

That one also makes cenrtain optimizations impossible, but it's not too much interesting: unsigned types always behaved in a predictable fashion thus most compilers have code to support that mode, it's only matter of enabling it for signed types, too.

> Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

Sure, but these UBs are not too much interesting, they even have flags to disable them. Violations of some other UBs (like attempts to access variable outside of it's lifetime like it could be done in early versions of FORTRAN) are much more interesting.

They very quickly lead to the need to pick one of two choices:

Define what you program does in terms entirely unrelated to machine code generated (and it doesn't matter whether bytecode is involved, C used bytecode before JVM, after all), or
Disable essentially all optimizations (which existing compilers already support with -O0 or similar switches)

Both alternatives are considered unacceptable by people demanding “portable assembler”, but there are no other choice, really.

> The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

And that's why there are switches that enable these things. They are genuinely useful, but they don't change the fact that you are still writing the code for the virtual machine which is entirely unrelated to “what hardware is actually does” (only connected via language spec).

Once more: you may argue that making that spec too different from actual hardware would be unwise, and I would even agree, but you are still coding for that abstract machine, and not to actual hardware. Heck, the fact that C had virtual machine in year 1983 in real, practical, not academic, projects and Java only got its start in 1991 tells us something, isn't it?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:47 UTC (Fri) by vadim (subscriber, #35271) [Link] (3 responses)

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

Sure, why not?

> This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

> Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Ah, I see. You're taking the "portable assembler" thing extremely literally. I take it very metaphorically, as an unreachable ideal. Because I think there can't be such a thing as a true "portable assembler". CPUs can be different enough that making a single language that accurately reflects both of them is impossible. That's why actual assembly is non-portable.

I suppose some other term would be less confusing to use, if there's one that fits.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 17:26 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

> I take it very metaphorically, as an unreachable ideal.

But how can code written for “an unreachable ideal” be used to produce something that can be trusted? How do you know what would or wouldn't work?

> I suppose some other term would be less confusing to use, if there's one that fits.

Problem is not with terms, problem is with expectations. “Portable assembler” usually means “I know what assembler code would be generated and I can use that knowledge”. But the problem is that this makes complicated optimizing compilers not possible and not feasible: if optimizer does some transformation which code writer couldn't imagine, then it's not “portable assembler” anymore… but how may compiler writer know which optimizations compiler user may or may not imagine?

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?
Sure, why not?

Because in normal assembler I can scan the instructions, find that ADD instruction and change it to SUB. Or do something else equally tricky.

> CPUs can be different enough that making a single language that accurately reflects both of them is impossible.

Yes, but which tricks can survive the optimization passes and which tricks you shouldn't attempt?

One approach is for the list of “forbidden tricks” is precisely equal to lists of UBs. That approach works. Making list of UBs more intuitive is worthwhile work, but if developers that make compilers compilers and developers that use compilers talk to each other then some consensus is possible (just take a look on attempts of Rust developers try to invent rules which which can present certain useful programming idioms with stacked borrows and tree borrows).

Alternate position tested and working programs should continue to work as intended by the programmer with future versions of the same C compiler is not a position at all, that's just pure wishful thinking.

Any change in the compiler, no matter how small, may break tested and working programs simply because in C one can take address of function, convert it to char* and inspect it. And no, please don't tell me “that's so preposterous, no one would ever do that”. I, personally, wrote some tricky programs which were disassembling certain functions and then used that information. And Windows compiler literally prepares functions for such handling.

This works fine if you talk to toolchain guys and they know about what happens. But to expect that it would be supported automatically, without any communication effort and beyond-language-spec-agreements like “we code for the hwardware” folks want? Sorry, that's just not possible.

Because without formal definition of the language we have no idea which transformations are valid and which are invalid the only way to provide something resembling “portable assembler” is to disable all optimizations.

IOW: changing list of UBs may or may not be useful (it's easy to reduce number of UBs by just simply converting existing ones into smaller and even more convoluted number of them), but attempts to insist that compilers should preserve behavior of programs with UB… it leads literally nowhere.

Again: most compilers (with optimizations disabled) produce the same output for this tested and working program and yet, somehow, anton and others like him ignore that (if they write any answer at all) or (even worse) propose to crate bazillion switches which make UB selectable (without explaining why they do not believe that -fwrapv option does not solve closes the problem of integer overflow, but bazillion similar switches added to standard would solve it).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 19:19 UTC (Sat) by jem (subscriber, #24231) [Link] (1 responses)

The "portable assembler" term comes from its usage as the output format from various compilers. Compilers traditionally (at least on Unix-type operating systems) do not produce an object file directly, but they output an assembly file, which of course is processor specific. By instead producing a C source file as output, and relying on the C compilers on the target machines as a "portable assembler", you don't need to write a separate backend for each of the target machines.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 8:33 UTC (Sun) by khim (subscriber, #9252) [Link]

Maybe some, indeed, call C a “portable assembler” for this reason, but most O_PONIES lovers have different meaning in mind: “I use C to create a machine code only because using assembler is too tedious thus I should be able to use any tricks that machine code can employ and that's why “portable assembler” is good name”.

When faced with “insane” examples which optimizers always broke (like I cook up) they have many different reactions, but mostly of “I don't write such an insane code thus I don't care” form.

Which, essentially, reduces the whole thing to the following, completely unconstructive definition: something may be called “portable assembler” when it includes optimizations that work fine on my programs and can not be called “portable assembler” when it includes optimizations that break my programs.

And as you may guess that approach doesn't scale: how may compiler writers know which code you consider “insane” and which code you consider “sane” if there are no definition?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:33 UTC (Thu) by anton (subscriber, #25547) [Link]

Several years ago I wrote up a position paper for benign C compilers, which may be what you are thinking of when you write "portable assembler".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:40 UTC (Wed) by khim (subscriber, #9252) [Link] (10 responses)

> what makes you think trying the much harder task of rewriting things in a completely different language is going to gain more traction?

Because that's how human society works. Planck's principle: A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it…

> So why not resolve those ambiguities back into the original codebase?

Because that would be done by the same people who created this whole mess in the first place. Just look on this discussion right here: @vadim is most definitely not interested in resolving those ambiguities back into the original codebase, he only wants to bend compilers to his will, somehow. And if you have different people who have different wills in one project…

The main, basically unfixable, issue with C++ is social, not technical, that's why social solution works: when you switch from C/C++ to Rust you are not just changing language, you are changing the community, too. And that works.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:55 UTC (Wed) by vadim (subscriber, #35271) [Link] (6 responses)

> Because that would be done by the same people who created this whole mess in the first place. Just look on this discussion right here: @vadim is most definitely not interested in resolving those ambiguities back into the original codebase

I'm merely being pragmatic. I realize that C++ can't be turned into Rust. There's little point, because that'd mean changing it so radically that it'd require a full rewrite of everything anyway. At that point, might as well rewrite in Rust, which already exists and works.

So my compromise is for pushing compilers into a saner, more debuggable direction, even if that never results in reaching what Rust aims for.

> he only wants to bend compilers to his will, somehow. And if you have different people who have different wills in one project…

Compilers are already bending to my will, somewhat. What I want is -fwrapv, -fno-delete-null-pointer-checks (thanks to @foom for that one, I thought it did something different from what it does), and to keep adding more and more of those.

I also want a cultural change where additional UB is avoided in the future, and things like -fwrapv become the new default.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:07 UTC (Wed) by mb (subscriber, #50428) [Link] (5 responses)

>I also want a cultural change where additional UB is avoided in the future, and things like -fwrapv become the new default.

These defaults basically never change in C compilers. It could break somebody's code.
The world already has agreed that signed overflow UB is a bad idea these days. That's why this option exists and is in broad use.

But I also don't see the problem with putting these flags into your project's CFLAGS. That's easy enough.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:35 UTC (Wed) by pizza (subscriber, #46) [Link]

> These defaults basically never change in C compilers. It could break somebody's code.

Generally speaking... you're correct, but in GCC 5, the default changed from gnu89 to gnu11, and to gnu17 in GCC 11. This broke a lot of (old, barely-if-at-all-maintained) codebases that had neglected to specify which standard they utilized. This was easily rectified (eg by adding -std=gnu89 to your CFLAGS)

(For C this wasn't _that_ big of a deal, but C++11 also came with a significant ABI break, which significantly affected the common practice of using 3rd-party-supplied binary libraries)

On the other hand, the set of optimizations/checks enabled at different standard levels (eg -O1/2/s/etc) usually changes with each compiler release, and that can lead to "breakages" in existing code. (One can quibble about how how this doesn't actually count as "defaults", but it's still something folks have to deal with in the real world)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:04 UTC (Wed) by vadim (subscriber, #35271) [Link] (3 responses)

> These defaults basically never change in C compilers. It could break somebody's code.

You can't break something that's currently declared UB. UB is UB, there are no rules.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:28 UTC (Wed) by mb (subscriber, #50428) [Link]

>You can't break something that's currently declared UB. UB is UB, there are no rules.

That's exactly why C optimizer developers think it's Ok to throw away security checks.

In reality, though, C/C++ programs are full of UB. The compiler is just not smart enough to break it, yet.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:59 UTC (Thu) by anton (subscriber, #25547) [Link] (1 responses)

You can't break something that's currently declared UB.

If the UB lovers repeat their position often enough (and there is lots of that repetition in this discussion), you may find yourself adopting it even if it is incompatible with your other positions. Beware!

As for -fwrapv, making it the default is very unlikely to break existing, tested code for gcc, because gcc usually compiles code as if -fwrapv was given, and only deviates from that if it can detect a special case. No experienced programmer will bet on that special case being treated the same way after some program maintenance or the like.

A more likely reason for gcc not making -fwrapv the default is that it would require sign-extending int in array accesses on some 64-bit architectures in some code. In SPECint 2006, one of the benchmarks was slowed down by 7.2% by this sign extension if -fwrapv was enabled, resulting in <1% lower result for the whole benchmark suite (as reported by Wang et al.).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:36 UTC (Fri) by farnz (subscriber, #17727) [Link]

And this is where the culture thing shows up; turning on -fwrapv is clearly a win for safety, since it means that the behaviour of signed integer overflow matches what most developers think it "should" be. But because there's a benchmark on which turning it on is a significant regression in performance, the default is "off".

If there was a different culture around C and C++, then -fwrapv would be the default, and there would be a way to opt-out of it if you know that you don't depend on the behaviour of signed integer wrapping, and want the performance back.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:03 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> The main, basically unfixable, issue with C++ is social, not technical, that's why social solution works: when you switch from C/C++ to Rust you are not just changing language, you are changing the community, too. And that works.

Sure, that works when you have a stick [1] to coerce others into doing what you want.

[1] or more accurately, funding for salaries.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:29 UTC (Wed) by khim (subscriber, #9252) [Link]

That problem would just automatically fix itself and pretty soon (by historical timeline: around 5 to 10 years from now).

The majority of the software written today is made to solve problem that don't exist and funded by money that are hallucinated into reality by belief that one may get something for nothing.

After collapse of western civilization such software would stop being written which would give enough resources to write things “properly” (that's not hard, just make “no liability” disclaimer illegal like it's not legal in most other industries).

But that discussion is way outside of scope for what we are discussing here. I agree that as long as software which works only by accident and couldn't be trusted is considered the norm C and C++ would continue to thrive.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:41 UTC (Thu) by farnz (subscriber, #17727) [Link]

Sure, and if Open Source is going to stick to ancient languages because nobody's got a stick, then that's fine - it'll become the choice of people who want their systems to be buggy, crashy things.

Practically, though, I observe that skilled engineers don't want their code to be buggy; no-one skilled in the arts is writing UB in C or C++ because they don't care about bugs, they're writing UB in C or C++ because the set of rules you must follow to avoid UB in C or C++ are neither simple enough to stick around in memory, nor trivially machine-checked so that you get reminded when you break them, nor only applicable in a small subset of your program. As a matter of pride, those people are likely, over time, to switch to a language that makes it easier for them to write code that has no bugs (performance bugs, correctness bugs, any other sort of bug), because they don't want to be well-known for writing code that's buggy.