Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack) [LWN.net]

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 18:00 UTC (Mon) by dd4235 (subscriber, #136612) [Link] (27 responses)

The context here is a long-running back-and-forth where some large institution reminds everybody that C++ is memory-unsafe by default and Bjarne responds by questioning what safety is and suggesting that a C++ linter is sufficient.

In late 2022, it was the NSA (https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/...) with Bjarne's response here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/.... There have been a couple of other smaller rounds of that before and after.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 18:35 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (22 responses)

C++ principals can cast aspersions around
the definition and value of memory safety all they want. The beauty of the market is that the more they do so, the more people will adopt Rust. Living in denial has consequences.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 22:59 UTC (Mon) by warrax (subscriber, #103205) [Link] (18 responses)

It really just sounds like (transparent) deflection at this point. It's understandable when one has invested so much of oneself into a thing, but I mean... I don't think anyone's buying it any longer.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:04 UTC (Tue) by linuxrocks123 (subscriber, #34648) [Link] (17 responses)

Making C++ safe would be simple: just wrap every memory access. I'll probably write an LLVM option to do that one day. The issue is finding a way to make it safe while continuing to have it outperform Rust by default, because C++ is not willing to sacrifice performance for anything unless the developer uses explicit syntax to opt in to the the performance-damaging feature. I'm sure they'll find a way to get a "safety mode" into the language while allowing everyone with a clue to ignore that feature, but finding the best way to do that will take time.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:30 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (11 responses)

You're not going to be able to "outperform Rust by default" even in the status quo. You have to get down into the nitty gritty with algorithmic choices and library features and even CPU models and so it's much too messy for such categorical claims even without incurring any cost at all. One Monday you write a Benchmark Game winner in C++ that cuts a second off the record time then by next Monday some Rust fanatic wrote a Rust solution that shaves a further second off that, it goes back and forth until one of you gets bored by which time both codebases are likely unmaintainable nightmares of micro-optimisations.

The biggest performance loss C++ has to take is lack of aliasing optimisations. On a good day C++ gets some simple cases where it can use its type rules to rule out aliasing, on a bad day the programmers turned that off because it broke their awful software. But for Rust every day is a good day in this respect, it's fundamentally unsound in Rust to have any aliasing and mutability, so all those optimisations are available. The biggest problem is bugs in LLVM (but don't fret, these bugs often bite C++ too, it's just easier to prove they're bugs with Rust).

And no, C++ for at least the last several years and arguably going back decades is very willing to sacrifice performance, but not to get safety. What C++ trades away performance for is endless back compatibility. Why are these data structures so needlessly huge? So that code written in 1998 still works. Why is this obvious code so slow? Because if it did it the quick way it would mess with assumptions people made in 2005. Why isn't this faster way to do things enabled in C++ without vendor extensions? Because that would break some source code written last century.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:39 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link] (10 responses)

So, there's too much wrong stuff here to correct all of it, so I'll just start and end my reply with "Yes, C++ allows aliasing optimizations. The keyword is 'restrict'."

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 6:52 UTC (Wed) by roc (subscriber, #30627) [Link]

In practice "restrict" is not used much because it is very risky: the consequences of getting it wrong are horrendous (unpredictable UB) and there aren't even good tools for dynamically checking "restrict" usage.

The C++ big projects I'm familiar with even treat TBAA as too risky and turn it off. No way they're going to sprinkle "restrict" around their code.

In Rust however you get the equivalent of pervasive "restrict", TBAA, and more, by default, with almost no risk.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 8:40 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (8 responses)

> "Yes, C++ allows aliasing optimizations. The keyword is 'restrict'."

That's a C keyword. Immediately you aren't even writing standard C++ but some vendor flavour with no clearly defined semantics.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 22:53 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link] (7 responses)

Oh my god! So terrifying! What will become of my code!?!?

Oh, wait, GCC, Clang, and MSVC all support it with the same semantics. But you're right you do have to do:

#ifdef _WIN32
#define __restrict restrict
#else
#define __restrict__ restrict
#endif

To make it work on every compiler anyone cares about. Terrible cost, those five lines of boilerplate.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 4:21 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

Probably use _MSC_VER instead of _WIN32. I don't do Windows much :)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 6:01 UTC (Thu) by marcH (subscriber, #57642) [Link] (2 responses)

If it's so simple, why is it still not part of the C++ standard 24 years after it was formally defined in C?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 22:25 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link] (1 responses)

If it's so complicated, why did every compiler that matters implement it decades ago?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 14:19 UTC (Sat) by foom (subscriber, #14868) [Link]

When Rust attempted to use the LLVM IR implementation underlying this feature ("noalias"), it took multiple years of work fixing LLVM, before it no longer caused miscompilations. Most of those errors were also expressible in C code -- it was not simply that Rust is a different language. How can that be, given the age of this feature? Because it's so rarely used in C or C++ that those edge cases were not discovered. But then Rust came along and used it _everywhere_.

The current LLVM implementation doesn't even represent many forms of "restrict" -- often, the information is just dropped. There's been ongoing work in LLVM since 2019 (search for "full restrict") to enhance the representation. Not yet landed.

It's also unclear what the interaction should be, or is, between restrict and C++ operations like creating a reference from a pointer. E.g. given `int* restrict y` does `int& x = *y;` count as an access to the object that could violate restrict, even if you never read `x`? Or, how about calling a non-virtual member function, where the function body doesn't reference any member variables. Does that access the "this" object in a way that would violate restrict?

You might claim that the answer is obviously "no" for both, because there's no "actual" memory accesses. But that's only true under the "C++ as portable assembly" world. Per the specification, these do nominally access the object, and the compiler takes advantage of that to reorder loads for better performance. So, maybe it is/should be a violation of restrict? I dunno, who can really say, since restrict isn't specified for C++!

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 22:19 UTC (Thu) by bartoc (guest, #124262) [Link]

MSVC's __restrict has different semantics than C restrict.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 8:23 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

> #define __restrict__ restrict

That should be the other way 'round.

Sure, you can sprinkle "restrict" onto everything, but given a nontrivial program you're going to hit a case where that is wrong (or, worse, will be wrong in the future, when the compiler gains another class of optimizations), and thus introduce a bug that's almost impossible to find.

On the other hand, (safe) Rust implicitly tags everything as "restrict" and guarantees that that won't have adverse effects.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 2:29 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link]

Yeah, you're right. It should probably be:

#ifdef _MSC_VER
#define restrict __restrict
#else
#define restrict __restrict__
#endif

Of course you don't sprinkle restrict on everything. You only put it on parameters where the invariant actually holds.

But, yes, you do actually have to use restrict to get the benefits of restrict. However, the "alias analysis is easier in Rust, therefore we will beat C" argument strikes me as very similar to the "profiling is easier in the JVM, therefore we will beat C" argument. Like, okay, yes, for idiomatic code in both languages, you do have that one minor performance advantage, but you also have a lot of performance disadvantages, and you will still lose overall.

The reason you will still lose overall is because C/C++ still has a way to compile with profiling when it's important and still has a way to specify non-aliasing memory access when it's important. So, in any case where your advantage really matters, someone will do the small amount of optimization work necessary to negate your advantage.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:37 UTC (Tue) by roc (subscriber, #30627) [Link]

It's not that simple. "wrap every memory access", ok, but what are those wrappers going to do? Determining whether an access should be allowed requires knowing type, bounds and lifetype information. That's the hard part.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:18 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

Why would you want to do that?

I don't want a program that crashes, errors out, or does *whatever* when something unsafe happens, esp. since it will most likely do so randomly in production but not deterministically during testing.

I want a program that, as verified by the compiler, doesn't contain any unsafe code in the first place.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 11:35 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

Git has the tools; they're just not being used by GitLab or GitHub: https://gitlab.com/gitlab-org/gitlab/-/issues/24096

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 11:36 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Sorry, got my tabs mixed up and replied to the wrong comment.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 17:03 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> I want a program that, as verified by the compiler, doesn't contain any unsafe code in the first place.

I feel that means you're targeting the language's abstract machine directly. That tends to not have things like…syscalls. Or do you have a different definition of "unsafe" here?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:12 UTC (Tue) by gerdesj (subscriber, #5446) [Link] (2 responses)

Five/seven/five syllables
is allowed in a haiku;
you over did it.

Try a limerick 8)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:49 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> Five/seven/five syllables

So…how do *you* get 5 here? ;)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 9:32 UTC (Wed) by Wol (subscriber, #4433) [Link]

> > Five/seven/five syllables

> So…how do *you* get 5 here? ;)

"You over did it"

Though I wonder what he thinks he's replying to ???

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 18:44 UTC (Mon) by eharris (guest, #144549) [Link] (2 responses)

"Safety"...."Security"....
Most ordinary people simply do not associate these words with programming langauges!
The real problems which ordinary people associate with these words can be encapsulated in other words...."hackers", "ransomware", "NSA", "impersonation", and so on.
Of course, the security and reliability of programming languages does have some connection......but likely only a very tenuous connection....with the fears and risks encountered by ordinary people!

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 14:36 UTC (Tue) by dvdeug (guest, #10998) [Link]

Most ordinary people don't associate those words with metallurgy, either. They still don't want their condo collapsing or engines falling off the plane they're riding on.

> Of course, the security and reliability of programming languages does have some connection......but likely only a very tenuous connection....with the fears and risks encountered by ordinary people!

For one, hackers can't do much unless there's a security hole they can break into. That's sometimes social hacking, but it's also root holes in the software, often caused by bugs in C.

Secondly, tell those ordinary people that your organization has declined to use methods to protect them from "hackers", "ransomware" and "impersonation". You should do everything reasonably within your power to protect your customers, even to the point of doing less important things because they're all you can do.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:45 UTC (Tue) by gerdesj (subscriber, #5446) [Link]

"Most ordinary people simply do not associate these words with programming langauges!"

As you say, we ordinary people don't ... associate (etc). We use them as tools to do a job. I do assume that my scripts and programmes won't set the world on fire. However my recent "patch a VMware cluster without DRS" jobbie in Powershell using PowerCLI now does the job without trying to put all VMs on one host.

And I posted it on Github and I use Arch exclusively on my personal computers (actually).

So, am I and everyone else "ordinary"? Once you've dealt with that, what exactly is a programming language and should we worry about syntax errors?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 13, 2023 18:42 UTC (Mon) by Kluge (subscriber, #2881) [Link]

Any idea of why that NSA document singled out Ruby from among the "scripting" programming languages (Python, Perl, Lua et al.)? Is Ruby actually more memory-safe?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 19:06 UTC (Mon) by smurf (subscriber, #17840) [Link] (209 responses)

Mr. Stroustrup suffers from a couple of misconceptions. Id like to correct them.

One. IMHO you cannot prove that a program is "safe", whatever that means, in principle, when the language in question doesn't even have an unambiguous grammar.

Two. The probability of arriving at a "safe" system, ditto, by way of starting with a large set of unsafe capabilities and then tacking on safety measures until some combination of analysis tools determines that some given code is now "safe" is approximately zero.

Three. The C++ standard(s) declare(s) a large heap of behaviors as "undefined", which allows the compiiler to do literally anything whatsoever and which do not even elicit a warning. These undefined behaviors are not going to go away because too many system and library headers depend on them, directly or indirectly. However, undefined behavior and "safe" behavior are mutually exclusive.

Therefore this will not happen Any Time Soon. You want a "safe" language, in 2023, you use Rust.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 19:32 UTC (Mon) by mb (subscriber, #50428) [Link] (203 responses)

>However, undefined behavior and "safe" behavior are mutually exclusive.
>Therefore this will not happen Any Time Soon. You want a "safe" language, in 2023, you use Rust.

Well, I think you are missing the point here.
C++ can probably be made into a safe language by restricting it. That is: By making it backwards incompatible.
That is basically what Bjarne's proposal is with "profiles", as far as I understand it.

It's the same in Rust: If you restrict yourself to not use unsafe blocks, then you get certain guarantees.

The question rather is: Can we make the restrictions on C++ small enough to hurt backwards compatibility such that almost nobody notices and at the same time make useful safety guarantees?
I would say: Just rewrite it in Rust. ;)
But that's not really a practical thing to do for existing code bases. Therefore, iteratively adding restrictions and thus iteratively adjusting the codebase may be a sane way forward to make these old codebases safer.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 21:23 UTC (Mon) by Wol (subscriber, #4433) [Link] (202 responses)

> >However, undefined behavior and "safe" behavior are mutually exclusive.
> >Therefore this will not happen Any Time Soon. You want a "safe" language, in 2023, you use Rust.

> Well, I think you are missing the point here.
> C++ can probably be made into a safe language by restricting it. That is: By making it backwards incompatible.
> That is basically what Bjarne's proposal is with "profiles", as far as I understand it.

It could be made a lot safer very easily. I think people have said you can't fix all of it, but a lot of the trouble could be got rid of by turning undefined behaviour into implementation-defined (and DOCUMENTED, UNCHANGEABLE WITHOUT A FLAG) behaviour.

And that would also lead to a far better general spec. You could then implement other compilers' behaviours in your own, too.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 12:05 UTC (Tue) by khim (subscriber, #9252) [Link] (201 responses)

> It could be made a lot safer very easily.

You have made way to many typos in the word “impossible”.

And that would also lead to a far better general spec. You could then implement other compilers' behaviours in your own, too.

As was discussed (here and elsewhere) it just doesn't work.

The problems with undefined behavior in C/C++ is, ultimately, not technical, but social. The actual list of undefined behaviors is not important if people just refuse to accept it.

Yes, you can trim list of “undefined behaviors” by turning many of them into “implementation-defined” behaviors but how would it help with the fact that people are ignoring all definitions and rules?

You have to define subset of the language which doesn't includes “undefined behaviors” (easy) and then make people start writing code which doesn't rely on code which triggers formally-undefined constructs (hard).

And unless you have an idea how to deal with hard part discussing easy one doesn't make much sense.

Rust solved the problem years ago. In the only way that actually works: if someone doesn't follow the rules then s/he have to be kicked out of the community (eve if said someone is genuinely knowledgeable and well-meaning person). Similarly to how the only way to enjoy baseball or football game is to kick everyone who doesn't believe in rules.

C/C++? I don't think C/C++ community even accepted the fact that this problem even exists.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 16:02 UTC (Tue) by vadim (subscriber, #35271) [Link] (200 responses)

> Yes, you can trim list of “undefined behaviors” by turning many of them into “implementation-defined” behaviors but how would it help with the fact that people are ignoring all definitions and rules?

It would help by turning them into more manageable problems. So take this:

https://gcc.godbolt.org/z/fG77qTKrG

Removing UB is one of two things to me:

1. "static Function Do" is implicitly set to nullptr, and attempts to jump to address 0x0. This causes a very clear, understandable, safe crash. Easy to debug.
2. "static Function Do" is invalid syntax. You must explicitly initialize it to something. Which may well be nullptr.

Either way, the code still contains a bug, that hasn't changed. However it's now turned from a bizarre execution of apparently dead code which could randomly change based on compiler arguments and what else the code might contain, to a completely predictable, easy to understand error.

That in my view would be a vast improvement.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:34 UTC (Tue) by khim (subscriber, #9252) [Link] (19 responses)

> "static Function Do" is implicitly set to nullptr

But that's exactly what happens now in all valid C/C++ programs, isn't it?

> This causes a very clear, understandable, safe crash.

Except it doesn't. Not even in Rust. You can convert your example into Rust and it would contain UB in preceisely the same place where it does have it in C++. You can test it with Miri.

> "static Function Do" is invalid syntax. You must explicitly initialize it to something. Which may well be nullptr.

Which wouldn't change anything: you would still have UB there and may still hit the same issue at runtime. You may say that it's not a problem since Rust, today, doesn't miscompile that code like C, but no, it doesn't work like that. Consider another infamous example: to_be_or_not_to_be. It worked correctly in Rust despite being UB. For years. Then, suddenly, about one year ago: bam — it stopped.

I think you answer have shown the problem with C/C++ perfectly: instead of accepting UB as “something that should never happen in a correct program” (the only way which may lead somewhere, safety-wise) you still look on a compiler as on “something that produces machine code for me even if I break any and all rules” (which doesn't work and would never work).

> However it's now turned from a bizarre execution of apparently dead code which could randomly change based on compiler arguments and what else the code might contain, to a completely predictable, easy to understand error.

Which is precisely and exactly what we can not do with UB in principle. We couldn't write O_PONIES compiler which wouldn't convert UB into completely unpredictable, hard to understand errors (in some cases) and yet still would perform decent amount of optimizations. Because the process that compiler uses to make your code 10-20-50 times faster than -O0 is exactly the same process that amplifies UB and turns completely predictable, easy to understand error into “WTF they were smoking when they created something that mangles my code this badly”. No, they weren't smoking anything, they were just adding hundreds of passes that make perfect sense if your program is assumed to be correct (and thus never trigger UB).

There are not one, but two problems with C++:

Completely insane language spec which includes, literally, hundreds of UBs which no one may ever rememeber.
Community which refuses to accept that O_PONIES solution to the “problem of UB” doesn't exist and couldn't exist.

And while problem #1 (technical one) is perfectly fixable (and Stroustrup's plan may even lead us in that direction) without fixing problem #2 (social one) any solution would only provide very marginal improvements.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:15 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

By the way that second Compiler Explorer link is a nice illustration of the easy case of MaybeUninit<T>::uninit().assume_init() which is, clearly when our type T doesn't occupy all the bits this is definitely UB as maybe we said it's a T when it's just not. There are two bools, true and false, and 256 values of that byte, so... Doing this must be Undefined Behaviour if we want type safety and optimisers.

Whereas if our type occupies all possible bit values, well, the only reason this wouldn't work is that the optimiser is too clever for its own good, and there are (very few but they exist) algorithms which are correct and have excellent performance despite the unfavourable circumstances if the optimiser doesn't trip them up. Is it OK to use assume_init() when, in fact, we definitely didn't actually initialize this memory? Maybe not at least for semantic reasons.

Apparently this has been proposed as something like MaybeUninit<T>::uninit().freeze() and like assume_init() it wouldn't emit any machine code, it would just tell the backend that actually this memory does have a T in it even though we've got no idea what T that is, and so it mustn't do optimisations which assume otherwise. The main benefit would come to readers/ maintenance programmers, who can see that we did know this isn't initialized (and presumably can then read our extensive documentation of the algorithm explaining why it is correct despite that)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 20:14 UTC (Tue) by khim (subscriber, #9252) [Link]

> Maybe not at least for semantic reasons.

The answer, currently is pretty unambigious: it's not Ok to call assume_init in these cases because the rules of the language say so.

It's similar to law: there are places where rules embedded in the law may be pretty fruitfully discussed (in congress, duma, parliament, etc) and yet outside of these special places, rules can not be discussed, they can only be followed.

There's a difference between rules of the language and law, of course: while law assumes “common sense” to be present in all actors we couldn't assume such luxury with languages. We don't know how to create compilers with a “common sense” and even if we would use LLMs it's not entirely clear whether it's even a good idea: law mandates presence of complex system of courts to manage differences of “common sense” amang different actors, would we really be better off with complex hierarchy of compilers with “supreme compiler committee” which would decide how especially tricky cases are supposed to be interpreted? how would compiler contact such committee when in the CI pipeline?

Practically we have to assume that there are certain rules which we shouldn't be violating when we are writing programs and if these rules are semantic, not lexical then we, by necessity, either have UB or some perfectly valid code can not be written.

For high-level code some limitation may be acceptable, but system-level code usually needs to be able to handle very tricky corner-cases which means UB is almost impossible to get rid of.

> The main benefit would come to readers/ maintenance programmers, who can see that we did know this isn't initialized (and presumably can then read our extensive documentation of the algorithm explaining why it is correct despite that)

The main benefit would be the ability to implement such algorithms at all. Currently you have to use something outside of Rust to express them (usually asm or code in some other language).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:08 UTC (Tue) by vadim (subscriber, #35271) [Link] (16 responses)

> But that's exactly what happens now in all valid C/C++ programs, isn't it?

In my understanding, both are UB but don't necessarily compile to the same code. I could be wrong though.

> Except it doesn't. Not even in Rust. You can convert your example into Rust and it would contain UB in preceisely the same place where it does have it in C++. You can test it with Miri.

On your link, the function fails to run. The program crashes without outputting anything. That's my desired behavior: a reliable, fatal error.

> I think you answer have shown the problem with C/C++ perfectly: instead of accepting UB as “something that should never happen in a correct program” (the only way which may lead somewhere, safety-wise) you still look on a compiler as on “something that produces machine code for me even if I break any and all rules” (which doesn't work and would never work).

My conception is that of course the rules are now different. I'm not breaking any rules. Dereferencing a null pointer becomes an allowable way of causing a crash.

> Because the process that compiler uses to make your code 10-20-50 times faster than -O0 is exactly the same process that amplifies UB and turns completely predictable, easy to understand error into “WTF they were smoking when they created something that mangles my code this badly”.

I don't see it. Without UB, the program is very optimizable down to a single instruction: Jump to address 0x0.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:32 UTC (Tue) by mb (subscriber, #50428) [Link] (15 responses)

>Without UB, the program is very optimizable down to a single instruction: Jump to address 0x0.

You would have to emit this code in every possible case where it *could* be possible to dereference a NULL pointer. In an optimizing compiler, where NULL deref is UB, you can assume it never happens and elide all the *branches* that lead to a NULL deref.
That is an optimization advantage.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:45 UTC (Tue) by vadim (subscriber, #35271) [Link] (6 responses)

I realize so, but I personally don't care. For me it's a completely acceptable performance loss.

I of course realize that many people disagree. So I'd take a compiler option.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:49 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)

You already have that compiler option available: -fno-delete-null-pointer-checks, which is supported by both GCC and Clang for many years now.

The name is slightly unfortunate, because it doesn't "prevent deleting null pointer checks" (A flag that says the compiler cannot delete a branch that checks a pointer against null? That'd be a patently-nonsensical idea). What the flag does do -- at least in Clang -- is better named "null_pointer_is_valid" (as it's named in LLVM IR). It means that the compiler will treat a known-null pointer just like any other unknown pointer value: it is neither known-UB nor known-valid to dereference it.

Note that this not O_PONIES "DWIM damnit", but has concrete semantics. That means that it's also potentially useful for non-O_PONIES use-cases, e.g. on an embedded platform with tiny amounts of memory you may actually have memory mapped at address zero that you'd like to be able to access like any other memory.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:06 UTC (Wed) by vadim (subscriber, #35271) [Link]

Oh hell yeah! This is definitely goes in my Makefiles now.

I knew it existed, but I had the mistaken impression of that it was only of relevance in the kernel.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:30 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

> I realize so, but I personally don't care. For me it's a completely acceptable performance loss.

What should the program do when your runtime checker encounters a NULL? Panic? It may or may not do that anyway.

I for one don't want my program to panic. I want the compiler to prove that my program cannot panic, i.e. that it cannot read uninitialized or access freed memory, that it cannot jump through NULL or uninitialized function pointers, etc.etc.etc.

At the bare minimum I want the compiler to bloody well warn me when it can prove that I'm jumping through NULL. Current Clang and GCC don't even do that: just remove "NeverCalled" for a demonstration. GCC now just jumps directly to NULL, while clang does nothing and allows the program to succeed.

No output warning me about this nonsense is to be found anywhere.

Owch.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:54 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

> No output warning me about this nonsense is to be found anywhere.

Well. That is because there is no single point in the compiler's processing where it says "oh, I see this is always NULL, therefore I can optimize this check/call and then I can do this and that instead".
It rather is a long chain of small transformations. None of them has the full view of what is happening at large scale.
There is no point where you could emit a warning. The point that eliminates the NULL check probably can't know, if the check came from the original program, or from a previous optimization step. Therefore, it would be wrong to warn in some cases.

Of course you could write a checker for these things. And in fact, these checkers do exist, of course. But that is not related to the compiler.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 11:34 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

> there is no single point in the compiler's processing where it says "oh, I see this is always NULL"

A compiler's optimizer is perfectly capable of tracking whether something is NULL or not, which is why it can (and does) elide NULL checks.

Also, the compiler can and does emit warnings for uninitialized values, despite (a) jumping through an uninitialized variable is about as UB as jumping through a NULLed one and (b) an optimizer that keeps track of the one can equally well keep track of the other.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 17:22 UTC (Thu) by mb (subscriber, #50428) [Link]

You misquoted me. That is not what I said and it is not what I was talking about.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:24 UTC (Wed) by Wol (subscriber, #4433) [Link] (7 responses)

> You would have to emit this code in every possible case where it *could* be possible to dereference a NULL pointer. In an optimizing compiler, where NULL deref is UB, you can assume it never happens and elide all the *branches* that lead to a NULL deref.

And, as others have said, this is the root of the problem. You are assuming the compiler is compiling some "perfect" program. Which, as we all know, is pretty much impossible.

The correct approach is to convert all undefined behaviour into the "crash" operator, at which point you can optimise the code to "if input is valid then call function else crash". You've optimised multiple "elseif" operators into just one.

Okay, that's not QUITE as efficient as optimising the test out completely and just assuming that undefined behaviour never happens. But it's far safer, and at very little cost.

That is where Rust scores - it doesn't assume the programmer is omniscient and can write code without undefined behaviour. It optimises it, and if it finds it outside an unsafe block it treats it as a coding error.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 21:14 UTC (Wed) by khim (subscriber, #9252) [Link] (6 responses)

> That is where Rust scores - it doesn't assume the programmer is omniscient and can write code without undefined behaviour.

Rust absolutely does assume that and I have shown quite a few examples where it does that.

It just “assumes that programmer is omniscient” if said programmer uses that magical unsafe keyword.

That's very big difference from what “we code for the harwdare” O_PONIES lovers demand from C++ compilers.

> It optimises it, and if it finds it outside an unsafe block it treats it as a coding error.

Sure. But that whole scheme to work for low-level system-programming style language it have to assume that at least some developers are omniscient. And they always avoid UBs. Only in normal, “safe” Rust detection of UBs is responsibility of the compiler and in unsafe Rust that's responsibility of the developer.

Rust does give them Miri but that one is only helper in a fight with UBs: if it detects an UB in your code then it's almost certainly a bug and have to be fixed (there are very few false positives in Miri, but they do exist), the main tool is still omnisciency of developers with unsafe superpowers.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:12 UTC (Thu) by kleptog (subscriber, #1183) [Link] (5 responses)

> It just “assumes that programmer is omniscient” if said programmer uses that magical unsafe keyword.

It a cultural thing though. If a programmer used unsafe in the examples you provided the patch would simply be rejected out of hand during code review. So the example is in that sense contrived: such code would never be merged into a real project, so why discuss it? There is no reason to use unsafe in normal situations. I would actually expect that kernel drivers written in Rust would be required to have zero uses of unsafe. Anywhere that it might be needed would be abstracted into a separate module that can be audited and shared.

So the programmer doesn't need to be omniscient, they just need to not use unsafe, which is totally feasible. None of the Rust programs I've ever written has needed unsafe.

PS. I'm kind of awe at the use of formatting in this thread. Are people typing all this HTML by hand?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:34 UTC (Thu) by farnz (subscriber, #17727) [Link]

Yes, I type all the HTML I use for formatting by hand; I've been using HTML for decades, so it's not challenging to do, including getting the various > and < type escapes right.

And it's important, I think, to call out that the unsafe keyword is a cultural enabler; on it's own, it does nothing to improve coding standards, but it means that as a reviewer, I can focus harder on the few places where unsafe is used, with a view to being confident that the safety guarantees are upheld. Doing this for the entire change is sufficiently hard that most reviewers won't bother.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:03 UTC (Thu) by pizza (subscriber, #46) [Link] (3 responses)

> It a cultural thing though. If a programmer used unsafe in the examples you provided the patch would simply be rejected out of hand during code review.

Uh, code review by whom, exactly?

Review by the your same corporate peers who are currently writing crappy C++?

Most code written will never escape outside the corporate firewall, which means the culture that will actually be applied is the same corporate malaise that produces the current awful C++ status quo.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:51 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

Yes, the same corporate peers who try to write great C++ but fail because of the difficulty in doing so.

The culture issue is that you only have a limited amount of time to do everything; without mechanical assistance, doing a good job is simply too hard. Because C++ makes it challenging to provide mechanical assistance that says (among other things) "this program is ill-formed (no diagnostic required)" or "this program executes UB (no diagnostic required", people take short-cuts; they assume that if they see enough tests, and if the -Wall -Wextra -Werror CI is green, then the program has no UB, but this is false.

A change of language (to Java, to Rust, to Python) means that those shortcuts they're taking now work; if CI is green, the program really does have fully defined behaviour.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 22:42 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

There are times that you can screw things up in Rust, but it's around things that a compiler (understandably) has no knowledge of. It's not UB, but a deadlock still isn't nice. Take this anecdote.

We had deadlocks in a few tests in the test suite. Fine, rerun the CI job when it happens…usually OK. When I finally got around to tracking it down, it turns out that it is 100% in the test suite itself (so doesn't affect the deployment; I kind of knew this already and it was why it was on the backburner for so long):

- we have one thread act as a "hosting service" that takes the role of a forge dealing with git hook calls triggering webhook events and such;
- one thread acts as the "client" for the tests to emulate performing actions;
- there is a shared RwLock'd data block so that the client can get access read-only data access as needed (avoiding the need for synchronous channel communications on the client side);
- there is a sized channel to send requests to the service (in lieu of HTTP in normal usage).

In the deadlock, the "service" side was blocked waiting on write access to the data to handle an event. The "client" was blocked on the full channel trying to send a new event. However, it did so while holding a read-only lock on that shared data structure. Sure, an unbuffered channel would have worked too, but that seemed "hacky" to me. Instead, the fix is to only send events on the channel when the read lock is *not* held. That is fixed here:

https://gitlab.kitware.com/utils/ghostflow-director/-/com...

C++ would have had the same fix for this problem. Fine, the test suite is happy.

However, I knew that over time, guaranteeing that the read lock was not held anywhere in the call stack when sending on the channel was a hard problem. But Rust enforces "mutable XOR shared" access, so the "this will never happen again" fix is possible:

https://gitlab.kitware.com/utils/ghostflow-director/-/com...

Now, access to a read lock is mediated through a `&mut` reference to a struct containing the channel and the data. Sending on the channel *also* requires a `&mut` reference. The compiler will therefore enforce that if the channel is accessed, the data lock is not held (as to get it, you need to hold a `&mut` reference to the structure).

This is the kind of stuff one can construct with Rust's rules: not only is the bug fixed, but *it can never happen again*.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:27 UTC (Fri) by farnz (subscriber, #17727) [Link]

That's the sort of thing I was thinking of with "CI is green" checks; you've been able to take a hard problem, and convert it to "if CI goes green, then the problem doesn't exist". Because developers use "CI is green" as a short-cut for "the code works as intended", doing this means that taking that short-cut is OK.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:05 UTC (Tue) by farnz (subscriber, #17727) [Link] (177 responses)

If "static Function Do" is set to nullptr, then you cannot call it, per C++ rules. As a result, it will not attempt to jump to address 0x0, since you've not changed the rules that the compiler depends upon; it can assume that given that you call Do(), there must be an initialization of Do to a value other than nullptr, and that as a result, NeverCalled() must in fact be called, since if it's not, Do is not initialized to a value other than nullptr, and if that's the case, you the programmer would not have attempted to call Do(), but since you've called it, it must be initialized to something other than nullptr.

And, indeed, if I explicitly set it to nullptr, I get exactly the same machine code - because I've not changed the bit that matters. You'll notice, too, that I have a separate function I could call that initializes Do to nullptr, but that the compiler can assume isn't called.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 22:25 UTC (Tue) by vadim (subscriber, #35271) [Link] (176 responses)

No, I mean I would define all of the UB in this code.

So, an uninitialized variable is now either initialized to nullptr, or the compiler forces you to initialize it. And a null pointer dereference is now defined as actually jumping to/reading address 0x0, which in Linux userspace will segfault.

UB ceases to exist in this imagined version of C++. And NeverCalled() gets removed by the optimizer, because it indeed is never called.

Godbolt of course can't illustrate that because no such compiler currently exists.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 22:46 UTC (Tue) by farnz (subscriber, #17727) [Link] (175 responses)

I don't like the sound of a C++ optimizer that can remove visible symbols from my compiled code (like NeverCalled) just because it's not called within the same translation unit, but could be called externally. While the call to NeverCalled may not be visible to the compiler, it's definitely possible that when I link my code, it's called via a constructor or similar - turning that into a link error is just pure evil.

And while you can "fix" the definition of dereferencing a nullptr (albeit possibly at the loss of optimization opportunity in generated code), you still have the problem of what to do about the `be_or_not_to_be` example, where there's no "good" answer to what value `be` takes. If you fix enough of C++'s accumulated UB, then you have the problem that no extant C++ code compiles with your fixed compiler, and you need to write everything from scratch in this new version - at which point, why stick to C++?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:06 UTC (Tue) by vadim (subscriber, #35271) [Link] (174 responses)

> I don't like the sound of a C++ optimizer that can remove visible symbols from my compiled code (like NeverCalled) just because it's not called within the same translation unit, but could be called externally.

It's safe to remove because it's declared static. It's not possible to call it externally.

> And while you can "fix" the definition of dereferencing a nullptr (albeit possibly at the loss of optimization opportunity in generated code), you still have the problem of what to do about the `be_or_not_to_be` example, where there's no "good" answer to what value `be` takes.

That one is easy. Either force the programmer to initialize it to something, or implicitly initialize it to 0/false.

> If you fix enough of C++'s accumulated UB, then you have the problem that no extant C++ code compiles with your fixed compiler, and you need to write everything from scratch in this new version - at which point, why stick to C++?

No, that's the cool part about it. Since it's UB there's no rules about what should happen. Since there are no rules, it's completely legal for a compiler to come up with an answer, and the programmer has no grounds for complaining regardless of what solution the compiler picks.

The end result of turning bizarre outcomes into an easily understandable, documented behavior that most of the time is clearly understandable in a debugger is a huge improvement IMO.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:21 UTC (Tue) by farnz (subscriber, #17727) [Link] (173 responses)

Firstly, NeverCalled is not static - if it was, clang (at least) recognises that it's never called and thus doesn't come up with the surprising version of main. The surprising version of main is output because the only way this translation unit can have fully defined behaviour is if another TU contains a static constructor that calls NeverCalled.

Second, you've fallen into a dangerous trap that catches a lot of people out; if uninitialised variables have a defined value before initialization, then various optimizations become unsound to apply to your dialect. The result is that your dialect of C++ either forces code changes (use without first initializing is a compile error as in Rust), or your dialect loses benchmarks against compilers that don't support it, and against languages like Rust.

This is the core issue with "safer C++" - to make C++ safer means either forcing existing code to change, or leaving significant optimization opportunities behind. In practice, people willing to accept the hit of poor optimization have already moved away from C++ (e.g. to Python, or Java), so you're left with a language without a niche.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:39 UTC (Tue) by vadim (subscriber, #35271) [Link] (172 responses)

> Firstly, NeverCalled is not static

Oops, my bad. It's EraseAll() that was static.

> Second, you've fallen into a dangerous trap that catches a lot of people out; if uninitialised variables have a defined value before initialization, then various optimizations become unsound to apply to your dialect. The result is that your dialect of C++ either forces code changes (use without first initializing is a compile error as in Rust), or your dialect loses benchmarks against compilers that don't support it, and against languages like Rust.

No trap. Intentional preference. I *want* to make that tradeoff. I don't obsess over compiler benchmarks. There's an enormous gulf in between Python and C++, and I'm completely willing to compromise performance somewhat for the sake of having less bugs and security issues.

> This is the core issue with "safer C++" - to make C++ safer means either forcing existing code to change

Absolutely fine with me. New compilers tend to at least add new warnings, and I try and build everything I can with -Wall -Werror. If something is kind of a bad idea I want to opt-in at the very least into forbidding it in the codebase.

> In practice, people willing to accept the hit of poor optimization have already moved away from C++ (e.g. to Python, or Java), so you're left with a language without a niche.

In practice, there's enormous amounts of existing C++ code like the 400K lines of C++ that I'm working on. Converting that to Rust is a very tall order. Gradually making it better behaved, now there's a plan.

But yeah, long term I think Rust is going to be the future if C++ continues to be obstinate about this.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:01 UTC (Wed) by Wol (subscriber, #4433) [Link] (170 responses)

> > Second, you've fallen into a dangerous trap that catches a lot of people out; if uninitialised variables have a defined value before initialization, then various optimizations become unsound to apply to your dialect. The result is that your dialect of C++ either forces code changes (use without first initializing is a compile error as in Rust), or your dialect loses benchmarks against compilers that don't support it, and against languages like Rust.

> No trap. Intentional preference. I *want* to make that tradeoff. I don't obsess over compiler benchmarks.

Or the compiler implements a modifier "uninitialised", which keeps the existing behaviour. The point is, it is now DEFINED behaviour - you've told the compiler you don't care.

Personally, I'm not asking for compilers to change - I'm asking them NOT to change - at least not without warning, and respecting the principle of "least surprise". If the user wants uninitialised variables, fine. Just don't assume that uninitialized variables are UB and give them fairy_dust O_PONIES instead!

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:21 UTC (Wed) by mb (subscriber, #50428) [Link] (168 responses)

>Just don't assume that uninitialized variables are UB

If you don't want UB, you need to define a behavior for every possible case.

Keep in mind that C/C++ are defined on an abstract machine model.
Therefore, "just do whatever the hardware does" is not really possible.
And even if you amend your machine model with "For every missing definition, please take the definition from the hardware specification instead" then you would *still* have UB. Just take a look into the ARM manual for example. It is full of UB.

It's not possible to make behavior defined in all cases.
It is only possible to restrict the language to a subset of possible things to do and then define all those things. However, such a language *still* needs to have the possibility to call into unsafe code to interact with the real world.
The real world is not safe.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:43 UTC (Wed) by vadim (subscriber, #35271) [Link] (167 responses)

> Keep in mind that C/C++ are defined on an abstract machine model. Therefore, "just do whatever the hardware does" is not really possible.

It's extremely possible since UB makes zero promises regarding to what's going to happen. So literally any alternative is valid, including "whatever the hardware does".

> It's not possible to make behavior defined in all cases.

No, I don't think C++ can be turned into Rust, but it can at least made to be less infuriating. That's where I'd like to see things go. Some UB remaining would be fine. What I want is a commitment towards minimizing it. Don't add new UB in new versions if at all possible. Redefine former UB to something predictable to the extent possible.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:50 UTC (Wed) by farnz (subscriber, #17727) [Link] (154 responses)

In addition to the abstract machine problem, "just do whatever the hardware does" is not really possible if you want to remove UB because the hardware has UB, too. So even if I map every possible C++ construct directly to unoptimized machine code, I still have UB-related problems.

The only way around this is to fully define the abstract machine, and insist that the compiler generates sub-optimal code if the abstract machine doesn't map nicely to what the hardware does; so, for example, the compiler may have to generate extra code to handle saturated shifts because the hardware shifts mod 256, or mod register size. This is doable, but is then unpopular because you actually have to deal with these cases if you care about performance, rather than ignoring them, and the subset of C++ users who don't care about performance have already moved onto Python or Java, leaving behind users who do care.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:17 UTC (Wed) by vadim (subscriber, #35271) [Link] (61 responses)

> In addition to the abstract machine problem, "just do whatever the hardware does" is not really possible if you want to remove UB because the hardware has UB, too

Yup, I got that. I don't expect perfection. I want a gradual movement towards predictable behavior, to the extent possible.

Regarding shift operations my approach would be "whatever the CPU does", plus library support for specific behaviors. So if you're okay with the behavior over the register size being implementation defined, either because it works for you or because you're sure you won't ever get there, then you just write "x >> y".

And if you want a specific behavior, you write "std::shift_left_saturated(x, y)". On some CPUs that'll compile to the most straightforward instruction because it already does what you wanted. And on some the compiler will have to generate extra code.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:25 UTC (Wed) by farnz (subscriber, #17727) [Link] (59 responses)

You now have a problem the moment you want to move to a different CPU; you have a lot of implementation defined behaviour that's redefined when you do that. How confident are you that everywhere that someone's written "x >> y", they're happy with all the possibilities for implementing it?

What you need to get rid of problems, as opposed to punting them to the moment you try and port to a different OS or CPU, is to have developer-visible errors when I write "x >> y", but y is greater than or equal to the bit size of x. Then, at least, people will think about what they've written, and whether it's actually what they meant. But this then means that people will get upset because their 400K line legacy C++ project is now generating errors because it depended on something that's specific to the current choice of platform.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:34 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

>How confident are you that everywhere that someone's written "x >> y",
>they're happy with all the possibilities for implementing it?

Well, remember where we came from.
We came from UB. The number of people that are happy if UB is invoked it exactly zero.

Having implementation defined behavior can of course break your program if you move to a different implementation.
It depends on what you do and what your expectations are.
But invoking UB will *always* break your program sooner or later. No matter what your expectations are. That is what makes it so dangerous.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:39 UTC (Wed) by farnz (subscriber, #17727) [Link]

The number of people who are happy if it's UB is roughly the same as the number of people who are happy with any implementation defined behaviour that meets the spec - they're already guaranteeing that y is in range.

The issue here is that people are bad at recognising when they've written buggy code; they don't believe that they've written something that's UB, since x >> y is not UB if y is in range, and unless you force them to spot their bug, they'll assume that it's fully defined, not implementation defined, and blame the compiler when what they've written does not do what they want.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:58 UTC (Wed) by khim (subscriber, #9252) [Link]

We came from UB. The number of people that are happy if UB is invoked it exactly zero.

You are describing Rust community, not C/C++ community. That's the main difference:

I'm asking them NOT to change - at least not without warning, and respecting the principle of "least surprise". If the user wants uninitialised variables, fine. Just don't assume that uninitialized variables are UB and give them fairy_dust O_PONIES instead!

Sadly that sentence is not a sarcasm, but is what significant portion of the community actually expects. This idea:

But invoking UB will *always* break your program sooner or later. No matter what your expectations are. That is what makes it so dangerous.

is very actively rejected by significant proportion of C/C++ community. In their mind if they are writing “good program with UB” then it's the compiler fault if it suddenly stops working. How the hell compiler is supposed to understand which UB is “bad” and which UB is “good” and how can it produce “fairy dust” which would “respect “the principle of least susrprise””… that is not their problem.

Changing UB definition without addressing the problem of said sub-community wouldn't lead to safer code. If you lie to the compiler then you shouldn't expect that compiler would work reliably!

This, seemingly simple, idea is not accepted by C/C++ community and that is why C and C++ are not fixable.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:47 UTC (Wed) by vadim (subscriber, #35271) [Link] (9 responses)

> You now have a problem the moment you want to move to a different CPU; you have a lot of implementation defined behaviour that's redefined when you do that. How confident are you that everywhere that someone's written "x >> y", they're happy with all the possibilities for implementing it?

Not very. But to me it's still the preferred way to go. You compile image handling code on a new CPU. Images look all corrupted. You start debugging. You quickly come to the conclusion that something is weird with the bit shifts. A trivial test program shows exactly what bit shifts do on this and that CPU. You realize that you're shifting a 16 bit variable by 17 bits. Easy fix. Grep the code for ">>" everywhere else, add asserts for good measure.

> What you need to get rid of problems, as opposed to punting them to the moment you try and port to a different OS or CPU

The problem is that the way compilers treat UB doesn't help a programmer get rid of problems. This program:

https://gcc.godbolt.org/z/fG77qTKrG

Generates no warnings whatsoever. If the compiler isn't going to tell me "Hey, that's not allowed", then at least I want a simple to understand, simple to debug behavior.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:20 UTC (Wed) by farnz (subscriber, #17727) [Link]

The problem here is that if the compiler starts generating diagnostics for UB, there will be a lot of them; you can buy tools like PVS-Studio, PC Lint and more that generate diagnostics for some UB, but most people don't bother because the work needed to silence believed false positives and get the diagnostic spam down to a reasonable level.

Worse, the C++ standard (representative of what the C++ community cares about) contains plenty of instances of undefined behaviour where "no diagnostic is required", along with the terrifying phrase "ill-formed; no diagnostic required". If you simply required diagnostics for all the cases where a program is "ill-formed", has "undefined behaviour" or has "unspecified behaviour", then you'd have a huge pile of diagnostics to wade through, many of which are currently a non-issue.

And this then leads you straight back to the social problem; because C++ has said that it trusts programmers to not write ill-formed programs, or programs with UB, while programmers tend to assume that something that the compiler accepts is well-defined, you'll have problems with programmers insisting that the compiler is being overzealous by complaining. You can see this for yourself if you buy something like PVS-Studio and try to make your codebase warning-free against that tool; you will get huge pushback on everything it picks up, and will end up having to make use of things like PVS-Studio's ability to track which warnings are new, and not complain about your 400K lines of legacy.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:25 UTC (Wed) by foom (subscriber, #14868) [Link] (7 responses)

> If the compiler isn't going to tell me "Hey, that's not allowed"

The compiler can tell you that you've screwed up, if you ask it to do so!

Use -fsanitize=undefined (see https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.ht...). If you also pass -fsanitize-trap, it won't even require a runtime support library for the additional checking -- it simply emits code to invoke an illegal instruction ("ud1" on x86) upon detecting certain UB behaviors.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 23:18 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (6 responses)

The compiler can arrange for your program to tell you, when something bad happens, that you screwed up by allowing that to happen. But the contrast is against compilers which tell you that you screwed up immediately, shifting left considerably.

Notice that UBSAN and similar tools are explicitly not intended for production use. So, if your code is broken but your tests never see that, and it's found in the wild either (a) UBSAN doesn't save you because it's not enabled for production software or worse (b) UBSAN is enabled and it helps bad guys set everything on fire by exploiting your UB.

Shifting left is extremely valuable. Take that recent image security problem which hit Apple (the one this year, not the previous one). WUFFS doesn't implement the affected format, but it does implement a format which uses more or less the same compression tables that were defective. If you write the broken code in...

C or C++ as was reality: This is an exploitable security hole

Rust: At runtime bad guys can panic your phone, disruptive but likely less dangerous over all

WUFFS: It doesn't compile

While writing this code in Rust (as several people argued should be done) would have reduced the blast impact, writing it in WUFFS, as I have repeatedly advocated, means the programmers who got it wrong wouldn't even have shipped this mistake, they'd have been surprised when the compiler said it's wrong and then, if they were any good, they'd have reconciled with the fact that they're idiots and they'd have fixed it. Otherwise maybe they call in more experienced people until somebody explains that yup, the compiler is correct, this isn't safe.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 7:19 UTC (Thu) by smurf (subscriber, #17840) [Link] (3 responses)

> Rust: At runtime bad guys can panic your phone, disruptive but likely less dangerous over all

Why should that happen? Rust is perfectly capable of returning an error result when a decoder fails, and its compiler is equally capable of not compiling your code when it detects a possible out-of-bounds access or similar nonsense.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:34 UTC (Thu) by farnz (subscriber, #17727) [Link]

It'll happen because the programmer decided that a given error was completely impossible, so used unwrap(), expect("message") or other form of panic-based error handling to deal with the error result, instead of forwarding it out a layer, or handling it in a better way.

Similar happens if you use indexing with an invalid index: i[x] is defined to panic if x is out of bounds, and you're expected to use get (or a higher level construct like iterators) if you believe that x might be out of bounds.

Remember that underlying this is that programmers are imperfect; they make bad assumptions; safe Rust aims to panic when a programmer makes a bad assumption.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 13:12 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

No the Rust compiler won't reject code with a function which takes an integer n and then does foo[n]. It wouldn't have Undefined Behaviour if n happens to be too big in some case - it panics. But you do not get a compiler error, or even a warning, this code is fine.

Rust does by default emit compiler errors when the optimisers reduce all your code to a panic, on the rationale that you probably didn't mean to do all this work just to get a panic at runtime (but you can turn off this compiler behaviour). However this won't happen unless the compiler can see that the program necessarily always panics, so the variable case I described above wouldn't trigger that and nor would the recent image library bug. People had successfully used the affected software for years without problems, only a hand-crafted exploit input blows up the faulty code.

WUFFS uses the type system to say well foo has 492 entries, and the programmer wrote foo[n] therefore the variable n's type is constrained to the range 0..492 and so if we call this function our parameter must be likewise constrained. If WUFFS sees that the programmer tried to do say read-arbitrary-input-byte, multiply-by-two, stick that in n, it violates the constraint because read-abitrary-input-byte is 0..256 and multiple-by-two gets you 0..512 but 0..512 is larger than 0..492 and so the compiler will reject your code.

WUFFS cheerfully pays a very high price to do this, (it is not a General Purpose language) you would never choose to pay that price in Rust or C++, but that's the whole point.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 13:44 UTC (Thu) by khim (subscriber, #9252) [Link]

> WUFFS cheerfully pays a very high price to do this, (it is not a General Purpose language) you would never choose to pay that price in Rust or C++, but that's the whole point.

I actually think in 10, or maybe 20, years we would reach the point where general purpose language with dependent typing would be ready to become a mainstream general purpose languages. Today they are either very limited (like WUFFS) or very experimental (like Agda or Idris).

But you should learn to walk before you can run. Retiring developers which live in a delusion that they are using “portable assembler” is necessary first step.

And the best way to do that is to rewrite programs in Ada, Rust, or any other language with suitable community. Rust just happen to be the most popular.

Again: rewrite is needed not because Ada or Rust are better than hyphotetical C++40 with bunch of safety features (they are better than C++17 or C++20, but that's not important).

What's important is that Ada and Rust communities know and accept that they are not “writing code for the hardware” or “using portable assembler”.

That part is the most important one… and I don't see it even being mentioned in all these talks about C++ reforms to make it safer.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 8:01 UTC (Thu) by Pheisho (subscriber, #122210) [Link]

> Notice that UBSAN and similar tools are explicitly not intended for production use.

Quoting from https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html:

> There is a minimal UBSan runtime available suitable for use in production environments. This runtime has a small attack surface. It only provides very basic issue logging and deduplication, and does not support -fsanitize=vptr checking.

> To use the minimal runtime, add -fsanitize-minimal-runtime to the clang command line options. For example, if you’re used to compiling with -fsanitize=undefined, you could enable the minimal runtime with -fsanitize=undefined -fsanitize-minimal-runtime.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:03 UTC (Thu) by Tobu (subscriber, #24111) [Link]

There is also the Rust + no-panic option; it forces you to replace any panicking assertion with appropriate error handling, but does let you use array indexing/unwrap()/unreachable in places where the optimiser is able to prove that panics can't happen. Sometimes that is more convenient than waiting for a safe abstraction to emerge.

That would have the same outcome that you describe for WUFFS. I've no idea which is more practical.

For anyone following along: the similar WUFFS decoder is wuffs/std/deflate, mentioned here; you can look here for this.huffs[0][bits & mask]

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:00 UTC (Wed) by Wol (subscriber, #4433) [Link] (45 responses)

> You now have a problem the moment you want to move to a different CPU; you have a lot of implementation defined behaviour that's redefined when you do that. How confident are you that everywhere that someone's written "x >> y", they're happy with all the possibilities for implementing it?

That's their problem for not explicitly saying what they want. If they port it to a new CPU, they should expect stuff to break (which is different from today, how?)

What I really do NOT want, is for the same hardware and the same (my) code, to break, just because the compiler was updated.

The whole point is we are NOT asking for perfection. Hell, it's a mathematical abstract machine - reality gives us ABSOLUTELY NO guarantees that it's going to work! What we want is for the ABILITY TO REQUEST WHAT WE WANT. Like uninitialised variables - currently it is undefined behaviour. So the STANDARD should say "there MUST be a flag to DEFINE what the programmer wants". If the programmer then doesn't bother setting that flag, that's down to them, but they can then say "use before define is an error / warning / don't care".

And as an absolute minimum, "whatever the underlying abstraction model does" is fine. C/C++ is maths. Unless you want to explicitly demand randomness it's easy to define all undefined behaviour out of the model. Hardware is hardware and can fail - it's not guaranteed to do what you tell it.

So your mathematical language model should be - quite simply - "if the hardware doesn't do what you tell it, all bets are off. But the language model itself is maths, AND CAN BE 100% DEFINED BEHAVIOUR". Tell the hardware to multiply an integer by 2 and you get overflow? THAT'S THE HARDWARE'S FAULT. "Multiply by 2" is, at the pure maths level, guaranteed to succeed. Where hardware errors are expected (like here) let the programmer tell the compiler how to handle them - don't let the compiler sprinkle O_PONIES everywhere.

> But this then means that people will get upset because their 400K line legacy C++ project is now generating errors because it depended on something that's specific to the current choice of platform.

ABSO-BLOODY-LUTELY TOUGH.

But if the compiler DIDN'T WARN them that it was depending on platform-specific behaviour - and more to the point didn't give them the chance to tell the compiler that that was the behaviour they wanted - then imho that's a compiler problem, not a platform or user-code problem.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:16 UTC (Wed) by farnz (subscriber, #17727) [Link]

Your compiler is part of your platform. If you don't want the meaning of implementation defined behaviour to change on the same hardware, stick to the same compiler.

Otherwise, you have the compilers able to change the implementation definition between versions, which puts you back in the place you don't want - dealing with Gödel's incompleteness theorem as it applies to the mathematical model of the language. And the language model is maths, and cannot be 100% defined behaviour without tripping over Rice's Theorem, where you cannot express perfectly reasonable concepts in the language because they are not 100% defined.

This is why a language like Rust needs "unsafe"; there are useful programs that cannot be expressed in any 100% defined language, and if you want those programs to be in your language, you need to allow me to go to a place where UB is possible, and be careful not to trip over it.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:26 UTC (Wed) by khim (subscriber, #9252) [Link] (43 responses)

> So your mathematical language model should be - quite simply - "if the hardware doesn't do what you tell it, all bets are off. But the language model itself is maths, AND CAN BE 100% DEFINED BEHAVIOUR". Tell the hardware to multiply an integer by 2 and you get overflow? THAT'S THE HARDWARE'S FAULT. "Multiply by 2" is, at the pure maths level, guaranteed to succeed. Where hardware errors are expected (like here) let the programmer tell the compiler how to handle them - don't let the compiler sprinkle O_PONIES everywhere.

That mode is also implemented by most compilers. Usually the mode is triggered with -O0, although MSVC calls it /Od. But that doesn't satisfy C/C++ developers either.

> But if the compiler DIDN'T WARN them that it was depending on platform-specific behaviour - and more to the point didn't give them the chance to tell the compiler that that was the behaviour they wanted - then imho that's a compiler problem, not a platform or user-code problem.

Yup. And that attitude is quite real and fundamentally unfixable problem of C/C++. If you couldn't clean your community of people who sprout such nonsense then you can not achieve any semblance of safety. It just wouldn't work.

You program may only be safe if all components of it adhere to the same rules and if some members of the community think that rules are written to be broken… then that's it. Your changes to the spec may only do very martinal improvements.

Just how is compiler supposed to know whether your code was depending on platform-specific behaviour? How? By replacing two hundred UBs with ten thousand rules which explain how to transform C code into machine code and which are different for different CPUs?

Why do you believe that people who couldn't remember the list of two hundred UBs would rememeber two hundred thousand rules which explain how compiler may convert your code into a machine code?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:05 UTC (Wed) by farnz (subscriber, #17727) [Link] (42 responses)

On top of that, there are tools already available today, like Astrée, PC Lint, PVS-Studio, Infer, Coverity, CodeSonar, Klocwork, Svace, Helix QAC, Parasoft C++test and many, many more that will warn about various subsets of UB. It's telling, IMO, that the people complaining about how compilers don't help enough aren't also showing us how the various static analysers they use also don't help enough - whereas I've used a static analyser on C++ in the past, and am describing how the sheer volume of warnings they produce is enough that you might as well rewrite from scratch in a different language rather than rewrite in the "approved" subset of C++.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 14:25 UTC (Wed) by pizza (subscriber, #46) [Link] (41 responses)

> and am describing how the sheer volume of warnings they produce is enough that you might as well rewrite from scratch in a different language rather than rewrite in the "approved" subset of C++.

There's no silver bullet to avoid doing tons of work; all you can do is expend large quantities of lead:

https://techcrunch.com/2011/10/25/lead-bullets

Your codebase has a f-ton of warnings? Nothing to be done except start fixing them [1] After all, how can you expect to *successfully* rewrite the code in a different language if what it's supposed to be doing is already unclear? [2]

Rewrite-everything efforts rarely succeed in the real world. You need an incremental path.

As the saying goes, the way to eat an elephant is one bite at a time.

[1] While also changing your code acceptance policies to reject any commit requests that don't compile cleanly; that way you get incremental improvement over time.
[2] The traditional answer to this is "refer to the specs/documentation" [3]
[3] Which can be found in storage next to the spherical cows and O_PONIES

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 15:10 UTC (Wed) by farnz (subscriber, #17727) [Link] (40 responses)

In practice, we know what the code is "meant" to do, it's just that the programmer is unclear about how to express that in fully-defined C++. Rewriting incrementally in Rust, with bindings between C++ and Rust so that we can avoid rewriting everything, means that we focus on the bits that are actually changing rapidly, and leave the thoroughly tested code for later.

End result tends to be a multi-language project - you still have the ancient FORTRAN code in there, because it works, and therefore doesn't need changing, along with some C, plenty of C++, and some Rust. Over time, the amount of legacy code in there falls because we get rid of the code that no longer meets requirements, but it's unlikely to go to zero (just as the FORTRAN hasn't gone completely).

This does depend on a decent FFI between the new language and the old; Rust has cxx.rs for C++ FFI, and the pair of cbindgen and bindgen for C FFI, which enables you to rewrite incrementally, only changing small amounts of code at a time. Given that I still work on codebases where there's comments about wanting to redo things once ANSI FORTRAN is available (and where the code was last modified in the 1960s), I suspect I'll never see a world where I have no legacy code at all.

But you can replace incrementally with a new language, reducing the amount of legacy code you have, as long as your new language allows for this (e.g. C# and Java do not if your legacy codebase includes FORTRAN).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:24 UTC (Wed) by pizza (subscriber, #46) [Link] (39 responses)

> In practice, we know what the code is "meant" to do, it's just that the programmer is unclear about how to express that in fully-defined C++

uh.... no.

How are to know what the code is "meant" to do when the programmer is unclear in their expression?

(If the code was sufficient to determine what was "meant" by the programmer, compilers could JustDoTheRightThing(tm) as a matter of course and this entire discussion would be moot)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:47 UTC (Wed) by farnz (subscriber, #17727) [Link] (38 responses)

Because we, as human programmers, are able to read the C++, notice the lack of clarity and then take actions: talk to the original programmer (if still available), talk to the current product managers, even sometimes talk to customers; in general, we can go outside the code to determine what it's meant to do.

And it's all those things we can do that the compiler can't do that allow us to determine what's meant. Compilers don't go "hey, this is ambiguous, but only one meaning makes sense in the context of the product, so I'll take that meaning"; humans can do that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:06 UTC (Wed) by pizza (subscriber, #46) [Link] (37 responses)

> Because we, as human programmers, are able to read the C++, notice the lack of clarity and then take actions: talk to the original programmer (if still available), talk to the current product managers, even sometimes talk to customers; in general, we can go outside the code to determine what it's meant to do.

In other words, the software, as written, is under-specified.

In order to successfully port software that contains said underspecified behavior, you're going to need to resolve those ambiguities. So why not resolve those ambiguities back into the original codebase?

If folks won't go through the effort of using even the most basic code-improvement tools they've had for years (eg turning on -Wall -Wextra and taking one bite out of the elephant at a time) what makes you think trying the much harder task of rewriting things in a completely different language is going to gain more traction?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:18 UTC (Wed) by mb (subscriber, #50428) [Link] (13 responses)

>what makes you think trying the much harder task of rewriting things
>in a completely different language is going to gain more traction?

If the maintenance pain or public pressure is big enough, people do start to rewrite things.

We do already have many re-implementations in safe languages. That is not going to stop.
https://crates.io/crates/rustls

In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:28 UTC (Wed) by pizza (subscriber, #46) [Link] (12 responses)

> We do already have many re-implementations in safe languages. That is not going to stop.
https://crates.io/crates/rustls

Funny, I can't find equivalents to my current employer's >100KLOC of bare-metal C, or my previous employer's >10MLOC of domain-specific C++ there.

Similarly, the, none of the code that only exists in various >20yr-old F/OSS projects I'm involved with can be found there.

...where exactly are all of these magical reimplementations supposed to come from, again?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:51 UTC (Wed) by mb (subscriber, #50428) [Link] (11 responses)

Nice try, but nobody said that.

I wrote:
> In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:01 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

> In cases where it doesn't matter, nobody is going to rewrite things, of course. And that is fine.

It doesn't matter.... until it does, and xkcd # 2347 is demonstrated all over again in the form of yet another vulnerability-with-its-own-cutsey-domain is released.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:13 UTC (Wed) by mb (subscriber, #50428) [Link] (9 responses)

>It doesn't matter.... until it does

If you keep bombing your customers with security bugs in your application, sooner or later they will force you to rewrite things by cancelling the contract and buying something else.
Either that, or if your customer doesn't care, then it doesn't matter. Why rewrite it then?

People have learnt in the past decades that software always has vulnerabilities. They have learnt to live with that.
*But* people are currently learning that this is not written in stone. Software can have less or almost no vulnerabilities, if done right. That creates pressure to do it right or to leave and let somebody else do it right.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:42 UTC (Wed) by pizza (subscriber, #46) [Link] (8 responses)

> That creates pressure to do it right or to leave and let somebody else do it right.

You're forgetting the third rail -- the cost of doing it right.

Give folks a choice, and Every. Single. Time. they will choose the cheaper-in-the-short-term option, even if you make it explicit that this will come back to bite them later.

If you bid the project doing things the "right way" you'll lose out to the (much) lower bidder who will cut as many corners as they can get away with.

So yeah, it's absolutely a culture problem. But it's the culture of the folks who control the budgets, not the minions on the ground trying to do the best they can with the shit sandwich they're stick working with.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 22:56 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

It is my hope that the efforts being done at the US Federal and EU along the lines of "you're responsible for the bugs you ship" start to electrify that third rail against low bidders who hope that they can just ignore it.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:05 UTC (Thu) by smurf (subscriber, #17840) [Link] (6 responses)

> Give folks a choice, and Every. Single. Time. they will choose the cheaper-in-the-short-term option

Surprise: as your code's complexity increases, there's going to be a point where starting off with a language that excludes an entire class of errors is cheaper, even if you factor in the time required to learn the new language and/or paradigm, and even if you have to go and rewrite or adapt entire libraries to the new paradigm/language.

There's ample real-world evidence for this. The Linux kernel isn't gaining the ability to support drivers written in Rust (and ultimately more) just because people have nothing better to do.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:20 UTC (Thu) by pizza (subscriber, #46) [Link] (5 responses)

> Surprise: as your code's complexity increases, there's going to be a point where starting off with a language that excludes an entire class of errors is cheaper,

I disagree, because you're conflating two separate points.

*starting off* for something new, it makes complete sense to use Rust.

But we're not talking about something new here, we're talking about large existing codebases.

> There's ample real-world evidence for this.

...Is there, really? I see a lot of *complete replacements* of simpler stuff being rewritten, driven more by culture/licensing than technical merit (not unlike every other language ecosystem out there) but very little use within _existing_ codebases.

> The Linux kernel isn't gaining the ability to support drivers written in Rust (and ultimately more) just because people have nothing better to do.

I don't know about that -- From the outside, most research efforts are indistinguishable from "nothing better to do"

But more seriously, I'm not sure Linux kernel is a good example of _anything_ any more. According to modern sensiibilities, everything about it is wrong or impossible, including its development model (email? how quaint) and indeed its continued existence.

(The goal of the Rust-on-Linux folks is to eventually move *everything* to Rust, via the kernel-of-thesius approach. Quite frankly nothing else makes sense, and anything else represents a massive duplication of effort and much increased support burden. I don't have a problem with this, but I do wish they'd be more honest about it..)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 12:59 UTC (Thu) by farnz (subscriber, #17727) [Link]

From inside the corporate firewall, I see bindings from existing C++ to Rust being used to allow you to replace C++ over time, in a ship-of-Theseus approach. Most of the code is still C++, but new features are done in Rust, and the C++ code is gradually being eroded from the top down; where the "surface" layer of C++ is buggy, it's replaced with a new Rust module, rather than with new C++.

Over time, this is having two good effects:

The remaining C++ is the stuff that's been proven in the field, and is focused on by the best programmers in the company, because it's the critical layers that underpin everything.
The pace of feature development is increasing, because we're spending less time on "impossible" bugs that got missed in code review in the new features and bug fixes.

And I don't expect us to ever end up with nothing but Rust - heck, there's FORTRAN 77 in there still, with documentation that says that we'll do something better when ANSI FORTRAN is available to replace FORTRAN IV - but over time, we will lose the buggy C++ that bites us every so often in the field.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 13:12 UTC (Thu) by Wol (subscriber, #4433) [Link]

> ...Is there, really? I see a lot of *complete replacements* of simpler stuff being rewritten, driven more by culture/licensing than technical merit (not unlike every other language ecosystem out there) but very little use within _existing_ codebases.

Yup.

And quite often the result is disaster. In the Pick world there are far too many horror-stories of management-driven migrations away, only for management to discover that the replacement is far less functional, the costs are ballooning, the customer base may be fleeing as their stuff stops working, and the management "double down" the sunk cost fallacy into bankruptcy.

Marketing people (who don't understand what they are talking about) sell silver bullets to management (who don't understand what they are talking about) and the technical people are left trying to pull rabbits out of hats. If you're talking governments throwing other peoples' money at it, that *sometimes* works - sometimes the tech people just can't do the impossible - but for private businesses sheer survival (or more often failure to do so) determines the end result.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 14:37 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

> But more seriously, I'm not sure Linux kernel is a good example of _anything_ any more. According to modern sensiibilities, everything about it is wrong or impossible, including its development model (email? how quaint) and indeed its continued existence.

The scale of coordination and development is certainly worth investigating and replicating. Far smaller projects with far fewer people involved aren't as smoothly run.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 15:24 UTC (Thu) by Wol (subscriber, #4433) [Link]

I think one MASSIVE reason behind the (continued) success of Linux is Linus. As a psychologist, he's been a pro at managing his workforce.

Most successful projects have one or two stars behind them, if we're lucky they continue to soar once the stars are gone ...

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 15:47 UTC (Thu) by kleptog (subscriber, #1183) [Link]

> ...Is there, really? I see a lot of *complete replacements* of simpler stuff being rewritten, driven more by culture/licensing than technical merit (not unlike every other language ecosystem out there) but very little use within _existing_ codebases.

Like others mentioned, I disagree. I've seen an uptick in Rust usage in places where people need something faster than Python, but don't want to wade into the swamp that is C++. The bits that are still C++ are the bits that work fine and don't need to be modified. Interestingly, it's not even the safety that attracts people, it's the module system.

If you want something not in the standard library, in C++ you need to install header files and libraries, probably via packages from the host OS, create Makefiles or whatever. Or learn something like CMake. Then you need to convince your buildbot to also do all those things in a reliable way.

Or you create a Cargo.toml file which lists all your dependencies and it Just Works(tm). The safety aspects of Rust just mean that they'll never have to learn how to become intimate with gdb to figure where this invalid pointer came from. And my experience with junior developers is that by learning up front to think about object lifetimes they produce significantly better code.

I suppose they could also have chosen Go, but they didn't.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:34 UTC (Wed) by farnz (subscriber, #17727) [Link] (11 responses)

Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about, since many of the interesting UBs are "no diagnostic required", and the compiler doesn't issue a diagnostic even with -Wall -Wextra, whereas the Rust compiler errors out on semantically identical code without "unsafe". This means that UB creeps back into the C++ codebase over time, as developers believe that they're doing something safe, but they've in fact misunderstood C++, and the compiler doesn't stop us making that mistake; instead, it has to be caught at code review time, or by someone running other static analysis tools and determining that, in this case, it's not a false positive, it's a genuine UB.

What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:20 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

> Because we're already using -Wall -Wextra, and finding that they don't catch all the errors we care about

Dude, you keep moving the goal posts here.

I thought folks weren't using static analyzers because dealing with their (mostly legit!) voliminous output is too haaaard?

Again, how is "rewrite things in a much less forgiving language" -- which requires _considerably more_ short-medium-term effort -- going to fly?

> What makes you think that the strategy of "nerd harder! stop making mistakes when writing C++" will start working?

...Yet you keep claiming that "Nerd harder! Learn and use entirely new/different/tougher tools!" will work.

Seriously.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:37 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

My goalposts are immobile - they're my lived experience.

My experience is the voluminous output of static analysers is not mostly legit - it's mostly false positives. But the time taken to establish that they're false positives is huge, because you have to check each and every instance and work out what annotations are needed to convince the analyser that this instance is legitimate, without simply going for the "big hammer" of "trust me, I know what I'm doing" (which is the very thing we've agreed humans are bad at using, because virtually all C++ is "trust me" in this regard). This is why, IME, you get more than one problem per line of code on average when you start applying C++ static analysis tools; and getting people to rewrite all 10M lines of C++ in static analyser friendly C++ is a huge demand to make of them.

Compiler errors are a lot less problematic, because compiler authors take care to not put warnings in that have a large number of false positives, but then you have a huge number of false negatives, because the compilers don't error on constructs that are conditionally UB unless they are completely confident that the condition that makes it UB holds true in all cases.

Rewriting things in a much less forgiving language helps, because the static analysis of the less forgiving language has orders of magnitude fewer false positives, and the annotations needed to help the static analyser understand what we're doing are correspondingly simpler to add in the cases where it gets it wrong. On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:19 UTC (Wed) by khim (subscriber, #9252) [Link] (8 responses)

> On top of that, it's easier to motivate people (social issues, again) to rewrite into a new, "fun" language than it is to get them to drop decades of misconceptions about "safe C++" and start writing static analyser friendly C++.

You don't really need a new, "fun" language for that, though. You need different community.

Look on what Ada people did. They first introduced more strict language, then, gradually, made it part of core language, and then, eventually, made it work with pointers (by taking ideas from Rust, of course).

They basically implemented the Bjarne Stroustrup plan, if you'll think about it!

But why have they succeded where C and C++ couldn't even moved a tiny bit? Because Ada community always worked with a language, while C and C++ community still have a lot of people who are preaching “a portable assembler” myth.

But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

If you start adding more and more as if tranformations then start losing the ability to predict what assembler would be produced from a given source and after certain threshold the only approach that works is to work with formally defined language (like Ada developers do, like Java developers do, like Rust developers do… and like many C and C++ developers don't do).

And while this process is gradual the outcome is unevitable: to reach something safe you need to, somehow, change the community and the simplest and most robust way to do that is to start with a different community which doesn't try to talk about “a portable assembler”.

Note: we have already passed through couple (or more?) such transitions. First when people like Real Programmer Mel was replaced with people who started using assembler. Then another time when hardcode real programmers who felt that jumping from the middle of one procedure into a middle of another one is an Ok thing to do was replaced with next generation. Then there were couple of failed attempts to do similar transformations with OOP and managed languages, which failed because they impose too much limitations on the execution environments which meant they can only occupy certain niches but couldn't replace low-level libraries written in C/C++/FORTRAN (if you want to say that this attempt “haven't failed” then recall that literally everything was supposed to be implemented on top of JVM in one alternate reality and on top of CLR in another one).

Now we have another attempt. Only time will tell whether something that Rust discovered, almost by accident, would change computing world as radically as introduction of structured programming did, or whether this would be another niche solution like OOP and managed languages.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:01 UTC (Wed) by vadim (subscriber, #35271) [Link] (7 responses)

> But to have “a portable assembler” you need something like primitive early C compiler or, equally primitive, Forth compiler where you can look on the code and predict what kind of assembler every line of code would produce!

I think you're taking the analogy too far. When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

So in a "portable assembler", "a+b" works out to the architecture's ADD instruction. Optimization is fine, you can do loop unrolling or subexpression elimination, but what you write still works the way the underlying machine does, so for instance, overflow still does whatever overflow does on that CPU.

Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing. The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:24 UTC (Wed) by mb (subscriber, #50428) [Link]

>I imagine a compiler that targets the underlying architecture

Such a thing probably doesn't exist and didn't exist for many decades.
And there are good reasons for that. Among: The number of required compiler variants would explode immediately otherwise. There are hundreds of active in use target hardware variants.

>Optimization is fine, [..], but what you write still works the way the underlying machine does

The majority of optimization steps don't even know what the "underlying" machine architecture is and they also don't know what source language the intermediate code came from.

>Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

It very well is an "architecture thing". Look into the hardware manuals. Look at the ARM manual. You will find that many (probably most) instructions have some kinds of UB scenarios documented. You can't just issue random instructions and expect the instruction stream to have defined behavior. That is not how hardware works.

>The architecture is perfectly capable of doing those things

Yes, but "the architecture" does not define the other properties of a pointer, besides its address. These properties still are UB.

You can't ignore the C machine model, when programming C with an optimizing compiler. The "underlying architecture" doesn't help to resolve the UB in many if not most cases.
Only restricting the language would help. But that would create a language fork.

"Programming for the hardware" and "C is a portable assembler" are fundamentally broken ways to think about C.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:50 UTC (Wed) by khim (subscriber, #9252) [Link] (4 responses)

> When I think of "portable assembler" I imagine a compiler that targets the underlying architecture, rather than something like Java, which targets the JVM instead, and reflects the underlying architecture's behaviors.

Yeah, you are thinking about magic, O_PONIES and other such things. Which don't even exist. We have already established that.

> So in a "portable assembler", "a+b" works out to the architecture's ADD instruction.

Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

> Optimization is fine, you can do loop unrolling or subexpression elimination

How would that work? Some programs would stop working if you do that.

Literally everything in a C/C++ depends on lack of UB. I can easily construct an example which would break if you would do loop unrolling but then we would endlessly discuss various things about how to do loop unrolling t “properly” hus let's discuss something simple. For example this function:

void foo(int x) {
  int y = x;
}

This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Maybe you can do better? Just why this function can be converted (by subexpression elimination) to something that doesn't touch that sacred slot on stack? Any assembler than I know would do that, after all, why “portable assembler” wouldn't?

> overflow still does whatever overflow does on that CPU.

That one also makes cenrtain optimizations impossible, but it's not too much interesting: unsigned types always behaved in a predictable fashion thus most compilers have code to support that mode, it's only matter of enabling it for signed types, too.

> Treating overflow or null dereferences as UB is a compiler thing, not an architecture thing.

Sure, but these UBs are not too much interesting, they even have flags to disable them. Violations of some other UBs (like attempts to access variable outside of it's lifetime like it could be done in early versions of FORTRAN) are much more interesting.

They very quickly lead to the need to pick one of two choices:

Define what you program does in terms entirely unrelated to machine code generated (and it doesn't matter whether bytecode is involved, C used bytecode before JVM, after all), or
Disable essentially all optimizations (which existing compilers already support with -O0 or similar switches)

Both alternatives are considered unacceptable by people demanding “portable assembler”, but there are no other choice, really.

> The architecture is perfectly capable of doing those things, and in some cases they're actually useful (eg, on X86 real mode, 0x0 is the address of the interrupt vector table, and the first byte of RAM points to the divide by zero handler)

And that's why there are switches that enable these things. They are genuinely useful, but they don't change the fact that you are still writing the code for the virtual machine which is entirely unrelated to “what hardware is actually does” (only connected via language spec).

Once more: you may argue that making that spec too different from actual hardware would be unwise, and I would even agree, but you are still coding for that abstract machine, and not to actual hardware. Heck, the fact that C had virtual machine in year 1983 in real, practical, not academic, projects and Java only got its start in 1991 tells us something, isn't it?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:47 UTC (Fri) by vadim (subscriber, #35271) [Link] (3 responses)

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?

Sure, why not?

> This function, “according to me”™ stores value of x in the slot on stack called y. And then you can use it from another function (works perfectly, as you can see, at least on clang/gcc/icc, but also on many other compilers if you disable optimizations).

> Yet, somehow, most proponents of “portable assembler” rarely accept that interpretation yet never offer a sensible reason to interpret it any differently.

Ah, I see. You're taking the "portable assembler" thing extremely literally. I take it very metaphorically, as an unreachable ideal. Because I think there can't be such a thing as a true "portable assembler". CPUs can be different enough that making a single language that accurately reflects both of them is impossible. That's why actual assembly is non-portable.

I suppose some other term would be less confusing to use, if there's one that fits.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 17:26 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

> I take it very metaphorically, as an unreachable ideal.

But how can code written for “an unreachable ideal” be used to produce something that can be trusted? How do you know what would or wouldn't work?

> I suppose some other term would be less confusing to use, if there's one that fits.

Problem is not with terms, problem is with expectations. “Portable assembler” usually means “I know what assembler code would be generated and I can use that knowledge”. But the problem is that this makes complicated optimizing compilers not possible and not feasible: if optimizer does some transformation which code writer couldn't imagine, then it's not “portable assembler” anymore… but how may compiler writer know which optimizations compiler user may or may not imagine?

> Precisely. Which means that one couldn't replace it with LEA, right? Or could it?
Sure, why not?

Because in normal assembler I can scan the instructions, find that ADD instruction and change it to SUB. Or do something else equally tricky.

> CPUs can be different enough that making a single language that accurately reflects both of them is impossible.

Yes, but which tricks can survive the optimization passes and which tricks you shouldn't attempt?

One approach is for the list of “forbidden tricks” is precisely equal to lists of UBs. That approach works. Making list of UBs more intuitive is worthwhile work, but if developers that make compilers compilers and developers that use compilers talk to each other then some consensus is possible (just take a look on attempts of Rust developers try to invent rules which which can present certain useful programming idioms with stacked borrows and tree borrows).

Alternate position tested and working programs should continue to work as intended by the programmer with future versions of the same C compiler is not a position at all, that's just pure wishful thinking.

Any change in the compiler, no matter how small, may break tested and working programs simply because in C one can take address of function, convert it to char* and inspect it. And no, please don't tell me “that's so preposterous, no one would ever do that”. I, personally, wrote some tricky programs which were disassembling certain functions and then used that information. And Windows compiler literally prepares functions for such handling.

This works fine if you talk to toolchain guys and they know about what happens. But to expect that it would be supported automatically, without any communication effort and beyond-language-spec-agreements like “we code for the hwardware” folks want? Sorry, that's just not possible.

Because without formal definition of the language we have no idea which transformations are valid and which are invalid the only way to provide something resembling “portable assembler” is to disable all optimizations.

IOW: changing list of UBs may or may not be useful (it's easy to reduce number of UBs by just simply converting existing ones into smaller and even more convoluted number of them), but attempts to insist that compilers should preserve behavior of programs with UB… it leads literally nowhere.

Again: most compilers (with optimizations disabled) produce the same output for this tested and working program and yet, somehow, anton and others like him ignore that (if they write any answer at all) or (even worse) propose to crate bazillion switches which make UB selectable (without explaining why they do not believe that -fwrapv option does not solve closes the problem of integer overflow, but bazillion similar switches added to standard would solve it).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 19:19 UTC (Sat) by jem (subscriber, #24231) [Link] (1 responses)

The "portable assembler" term comes from its usage as the output format from various compilers. Compilers traditionally (at least on Unix-type operating systems) do not produce an object file directly, but they output an assembly file, which of course is processor specific. By instead producing a C source file as output, and relying on the C compilers on the target machines as a "portable assembler", you don't need to write a separate backend for each of the target machines.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 8:33 UTC (Sun) by khim (subscriber, #9252) [Link]

Maybe some, indeed, call C a “portable assembler” for this reason, but most O_PONIES lovers have different meaning in mind: “I use C to create a machine code only because using assembler is too tedious thus I should be able to use any tricks that machine code can employ and that's why “portable assembler” is good name”.

When faced with “insane” examples which optimizers always broke (like I cook up) they have many different reactions, but mostly of “I don't write such an insane code thus I don't care” form.

Which, essentially, reduces the whole thing to the following, completely unconstructive definition: something may be called “portable assembler” when it includes optimizations that work fine on my programs and can not be called “portable assembler” when it includes optimizations that break my programs.

And as you may guess that approach doesn't scale: how may compiler writers know which code you consider “insane” and which code you consider “sane” if there are no definition?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:33 UTC (Thu) by anton (subscriber, #25547) [Link]

Several years ago I wrote up a position paper for benign C compilers, which may be what you are thinking of when you write "portable assembler".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:40 UTC (Wed) by khim (subscriber, #9252) [Link] (10 responses)

> what makes you think trying the much harder task of rewriting things in a completely different language is going to gain more traction?

Because that's how human society works. Planck's principle: A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it…

> So why not resolve those ambiguities back into the original codebase?

Because that would be done by the same people who created this whole mess in the first place. Just look on this discussion right here: @vadim is most definitely not interested in resolving those ambiguities back into the original codebase, he only wants to bend compilers to his will, somehow. And if you have different people who have different wills in one project…

The main, basically unfixable, issue with C++ is social, not technical, that's why social solution works: when you switch from C/C++ to Rust you are not just changing language, you are changing the community, too. And that works.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 18:55 UTC (Wed) by vadim (subscriber, #35271) [Link] (6 responses)

> Because that would be done by the same people who created this whole mess in the first place. Just look on this discussion right here: @vadim is most definitely not interested in resolving those ambiguities back into the original codebase

I'm merely being pragmatic. I realize that C++ can't be turned into Rust. There's little point, because that'd mean changing it so radically that it'd require a full rewrite of everything anyway. At that point, might as well rewrite in Rust, which already exists and works.

So my compromise is for pushing compilers into a saner, more debuggable direction, even if that never results in reaching what Rust aims for.

> he only wants to bend compilers to his will, somehow. And if you have different people who have different wills in one project…

Compilers are already bending to my will, somewhat. What I want is -fwrapv, -fno-delete-null-pointer-checks (thanks to @foom for that one, I thought it did something different from what it does), and to keep adding more and more of those.

I also want a cultural change where additional UB is avoided in the future, and things like -fwrapv become the new default.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:07 UTC (Wed) by mb (subscriber, #50428) [Link] (5 responses)

>I also want a cultural change where additional UB is avoided in the future, and things like -fwrapv become the new default.

These defaults basically never change in C compilers. It could break somebody's code.
The world already has agreed that signed overflow UB is a bad idea these days. That's why this option exists and is in broad use.

But I also don't see the problem with putting these flags into your project's CFLAGS. That's easy enough.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:35 UTC (Wed) by pizza (subscriber, #46) [Link]

> These defaults basically never change in C compilers. It could break somebody's code.

Generally speaking... you're correct, but in GCC 5, the default changed from gnu89 to gnu11, and to gnu17 in GCC 11. This broke a lot of (old, barely-if-at-all-maintained) codebases that had neglected to specify which standard they utilized. This was easily rectified (eg by adding -std=gnu89 to your CFLAGS)

(For C this wasn't _that_ big of a deal, but C++11 also came with a significant ABI break, which significantly affected the common practice of using 3rd-party-supplied binary libraries)

On the other hand, the set of optimizations/checks enabled at different standard levels (eg -O1/2/s/etc) usually changes with each compiler release, and that can lead to "breakages" in existing code. (One can quibble about how how this doesn't actually count as "defaults", but it's still something folks have to deal with in the real world)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:04 UTC (Wed) by vadim (subscriber, #35271) [Link] (3 responses)

> These defaults basically never change in C compilers. It could break somebody's code.

You can't break something that's currently declared UB. UB is UB, there are no rules.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 20:28 UTC (Wed) by mb (subscriber, #50428) [Link]

>You can't break something that's currently declared UB. UB is UB, there are no rules.

That's exactly why C optimizer developers think it's Ok to throw away security checks.

In reality, though, C/C++ programs are full of UB. The compiler is just not smart enough to break it, yet.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:59 UTC (Thu) by anton (subscriber, #25547) [Link] (1 responses)

You can't break something that's currently declared UB.

If the UB lovers repeat their position often enough (and there is lots of that repetition in this discussion), you may find yourself adopting it even if it is incompatible with your other positions. Beware!

As for -fwrapv, making it the default is very unlikely to break existing, tested code for gcc, because gcc usually compiles code as if -fwrapv was given, and only deviates from that if it can detect a special case. No experienced programmer will bet on that special case being treated the same way after some program maintenance or the like.

A more likely reason for gcc not making -fwrapv the default is that it would require sign-extending int in array accesses on some 64-bit architectures in some code. In SPECint 2006, one of the benchmarks was slowed down by 7.2% by this sign extension if -fwrapv was enabled, resulting in <1% lower result for the whole benchmark suite (as reported by Wang et al.).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:36 UTC (Fri) by farnz (subscriber, #17727) [Link]

And this is where the culture thing shows up; turning on -fwrapv is clearly a win for safety, since it means that the behaviour of signed integer overflow matches what most developers think it "should" be. But because there's a benchmark on which turning it on is a significant regression in performance, the default is "off".

If there was a different culture around C and C++, then -fwrapv would be the default, and there would be a way to opt-out of it if you know that you don't depend on the behaviour of signed integer wrapping, and want the performance back.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:03 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> The main, basically unfixable, issue with C++ is social, not technical, that's why social solution works: when you switch from C/C++ to Rust you are not just changing language, you are changing the community, too. And that works.

Sure, that works when you have a stick [1] to coerce others into doing what you want.

[1] or more accurately, funding for salaries.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 19:29 UTC (Wed) by khim (subscriber, #9252) [Link]

That problem would just automatically fix itself and pretty soon (by historical timeline: around 5 to 10 years from now).

The majority of the software written today is made to solve problem that don't exist and funded by money that are hallucinated into reality by belief that one may get something for nothing.

After collapse of western civilization such software would stop being written which would give enough resources to write things “properly” (that's not hard, just make “no liability” disclaimer illegal like it's not legal in most other industries).

But that discussion is way outside of scope for what we are discussing here. I agree that as long as software which works only by accident and couldn't be trusted is considered the norm C and C++ would continue to thrive.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:41 UTC (Thu) by farnz (subscriber, #17727) [Link]

Sure, and if Open Source is going to stick to ancient languages because nobody's got a stick, then that's fine - it'll become the choice of people who want their systems to be buggy, crashy things.

Practically, though, I observe that skilled engineers don't want their code to be buggy; no-one skilled in the arts is writing UB in C or C++ because they don't care about bugs, they're writing UB in C or C++ because the set of rules you must follow to avoid UB in C or C++ are neither simple enough to stick around in memory, nor trivially machine-checked so that you get reminded when you break them, nor only applicable in a small subset of your program. As a matter of pride, those people are likely, over time, to switch to a language that makes it easier for them to write code that has no bugs (performance bugs, correctness bugs, any other sort of bug), because they don't want to be well-known for writing code that's buggy.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:10 UTC (Wed) by smurf (subscriber, #17840) [Link]

> Yup, I got that. I don't expect perfection. I want a gradual movement towards predictable behavior, to the extent possible.

The real fun part of this code is that you can just remove "NeverCalled". Now it's perfectly provable that "Do" will never be initialized. Does clang care? No it does not, not even with "-Wall", neither does GCC, and neither does msvc.

If the compilers cannot even be convinced to do that much, all talk about "safe" programs is just so much smoke and mirrors.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 22:49 UTC (Thu) by anton (subscriber, #25547) [Link] (91 responses)

the hardware has UB, too

No, the hardware does not. What you may be thinking of is that a certain low-quality architecture defines some things as "undefined"; but if you try these things on some implementation of the architecture, i.e., on the hardware, you will find that it has a certain behaviour. Other architects have experienced the effects of Hirum's law and have learned the hard way that it's a bad idea to leave things undefined in the architecture.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:20 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (90 responses)

One could just as easily say, of undefined behaviour in programming languages:

A certain low-quality programming language specification defines some things as "undefined"; but if you try these things on some implementation of the language, you will find that it has a certain behaviour. Other language designers have experienced the effects of Hyrum's law and have learned the hard way that it's a bad idea to leave things undefined in the language specification.

In either case, the "certain behaviour" of your code might well be dependent on the behaviour of other code you aren't able to adequately predict, such that the results are not usefully certain.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 23:43 UTC (Thu) by vadim (subscriber, #35271) [Link] (4 responses)

No, that doesn't quite work.

Modern hardware is tested to extremely high standards. When something is said to be "undefined", most of the time what happens if you violate a rule is well defined, but the manufacturer just doesn't want to promise it won't change in future revisions. Eg, an unused instruction opcode may just replicate some other instruction, be a NOP, or rarely produce some obscure but useful result, and that will work identically for every CPU of that model. Of course in the next model one of those formerly unused opcodes might actually now be used to implement a new instruction.

With compilers, UB is not like that. It may vary depending on the time of the day, optimization level, memory contents, surronding code, and any other arbitrary thing you please and pretty much never does anything useful.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:34 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

You claims about CPUs do not match my experience of CPU cores from Arm or from Intel. IME, when something is said to be "undefined", it's because the CPU core does not reach a stable state within a clock cycle. For things that the manufacturer does not want to commit to, I more normally see them saying that the behaviour is defined by the microarchitectural stepping, not that it's undefined.

This is pretty much the same distinction as undefined behaviour versus implementation defined behaviour in compilers.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 13:55 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> You claims about CPUs do not match my experience of CPU cores from Arm or from Intel. IME, when something is said to be "undefined", it's because the CPU core does not reach a stable state within a clock cycle

In my experience [1], CPU designers spend _obscene_ amounts of simulation cycles ensuring there is no "undefined" (using your definition) behavior. "Unable to meet timing constraints" is universally considered an instant-fail, MUSTFIX problem.

They use "undefined" to mean:

1) Don't do this.
2) We make no promises about what happens if you do this, and reserve the right to arbitrarily change it in the future.
3) Don't do this.

[1] From working at a large vendor of licensable CPU cores

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:08 UTC (Fri) by mb (subscriber, #50428) [Link]

No matter how you turn it, it is undefined behavior because the data sheet says so.
Just like the language spec says so. No difference.

Saying that "something specific" happens when UB in the hardware is triggered is exactly the same as saying that "something specific" happens when UB in the language is triggered.
Most of the time the behavior of UB in the language is also predictable, if you just look at what actual asm code has been emitted. Or has been deleted by the optimizer. Deleted code does one specific thing: nothing.

There is no difference. "Programming for the hardware" and defining all language-UB as "what the hardware does" still results in some UB. You have to live with that.

Removing UB can only be solved by restricting the environment. Not by opening it up by saying "whatever the HW does".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 15:28 UTC (Fri) by khim (subscriber, #9252) [Link]

> When something is said to be "undefined", most of the time what happens if you violate a rule is well defined, but the manufacturer just doesn't want to promise it won't change in future revisions.

One example of undefined behavior is “cross-modifying code”. From Intel's errata: Software using unsynchronized XMC to modify the instruction byte stream of a processor can see unexpected instruction execution from the processor that is executing the modified code.

In reality if you don't follow certain protocols then you may achieve pretty crazy results including execution of instructions that are neither in old code stream nor in new code stream.

But yes, true “unrestricted UB” is pretty limited in hardware, and it's usually not an expected behavior.

> It may vary depending on the time of the day, optimization level, memory contents, surronding code, and any other arbitrary thing you please and pretty much never does anything useful.

Hardware us the same. Processors today are, in reality, complicated JIT-compilers thus we have things like these. They are not rare.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 0:55 UTC (Fri) by anton (subscriber, #25547) [Link] (84 responses)

Yes, one could say that, and one would be right. From earlier Rust advocacy, I expected that in Rust without unsafe stuff all code that compiles has defined behaviour. So the UB advocacy of the Rust advocates here really is a big turnoff for Rust for me.

Concerning the "usefully certain" results, there are cases where the results of benign compilers are indeed usefully certain; e.g., the C int bit-rotation idiom (x<<n)|(x>>(32-n)), if compiled benignly, works for n=0 (producing x for the idiom) on hardware where shifting x by 32 produces 0 as well as on hardware where it produces x.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:25 UTC (Fri) by farnz (subscriber, #17727) [Link] (83 responses)

The cultural value of "unsafe" in Rust is that where your Rust code has no "unsafe" in it, all code either fails to compile, or has fully defined behaviour. This means that safe Rust is not as performant as C++ on certain problems, precisely because there's no lee-way to optimize differently for the hardware at hand.

This is a tradeoff; Rust says that "fully defined, less performant" is the right default, and that there should be an opt-in way to say that you're worrying about UB; C++ says that you should always be worrying about UB, and if you're not, you're holding the language wrong.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:34 UTC (Fri) by vadim (subscriber, #35271) [Link] (82 responses)

That's a very fuzzy area. On some matters, Rust is more performant than C++ because the limits it imposes create some guarantees that can be used for optimization.

Also, UB means some matters need to be approached defensively in C++ at a cost of performance.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:51 UTC (Fri) by farnz (subscriber, #17727) [Link] (81 responses)

On the other hand, you can always apply those optimizations by hand (in both languages), and you can ignore UB (in both languages), relying on the programmer to "hold the tool properly".

It's just that C++'s default presumption is that it's OK to tell the programmer to "hold the tool properly" (see also -fwrapv as an example, where if that were the default, some programs would be a bit slower, but there'd be a lot less UB out there, and it's usually easy to fix the cases where -fwrapv slows down the output code), while Rust's default is that it's OK to sacrifice a little performance in order to avoid having to tell the programmer to "hold the tool properly".

This difference, BTW, is why I don't hold out much hope for a "safe C++"; if the people involved in C++ cared, then something like -fwrapv would be trivial to enable by default (with an opt-out for older code), and they'd be removing the "no diagnostic required" language from the C++ standard, so that everything where the compiler's interpretation is not defined at least requires a warning.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 12:26 UTC (Fri) by vadim (subscriber, #35271) [Link] (80 responses)

Yeah, but it's not necessarily that clear-cut.

If you're writing something you can't guarantee will be built with -fwrapv (eg, it's a header only library), then you'll have to code defensively. And I believe the UB-proof implementation will be bigger and slower than a straightforward implementation + -fwrapv.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 12:45 UTC (Fri) by Wol (subscriber, #4433) [Link] (64 responses)

But why do you need to enable -fwrapv everywhere?

As I understand it, it says that arithmetic overflow will wrap around? Which is what happens anyway with 2's complement?

So they simply spec that the default for arithmetic overflow is "whatever the chip does", which means fwrapv is now the DEFINED behaviour on x86_64 and similar, which gives you no performance hit for any existing UB-free code.

Okay, it does break the compiler that optimises detected UB to a no-op, but that's tough.

Doesn't break anything that isn't already broken. Converts UB into whatever any sane rational programmer would expect. What's wrong with that?

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 13:42 UTC (Fri) by farnz (subscriber, #17727) [Link] (12 responses)

Because the hardware does not wrap around in all cases. Specifically, the hardware wraps around if the register size and the variable size are the same; but to fully meet the definition for -fwrapv for 8, 16 and 32 bit integers on x86_64, you have to follow mixed-bitness arithmetic operations (e.g. arithmetic on a 32 bit integer being used to index an array, which is a 64 bit pointer) with sign extension, because, per the implicit casting rules of the language, you must do the arithmetic in the wider type, then sign-extend as you convert back down to the narrower type.

If you also forbade mixed-type arithmetic without explicit conversions (so "int x = 1; long y =2; return x + y;" is now illegal, because x is int, y is long, and you need to tell the compiler which type to convert them to for the arithmetic), then you'd not have the performance problem. But that's a huge pile of pain fixing all sorts of code that does things like "for(int i = 0; i < len; ++i) { array[i] = func(i) }" (since i in this context has to be a size_t, and size_t is an unsigned integer, not a signed one).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 17:16 UTC (Sun) by anton (subscriber, #25547) [Link] (11 responses)

It depends on the way the result is used (e.g., when you pass it to a function, it depends on the ABI) whether you have to sign-extend the result. And it depends on the architecture. E.g., AMD64 (aka x86_64) has a 32-bit addition operation, but it always zero-extends; it has an 8-bit and 16-bit addition which leave the rest of the 32 bits alone. RV64GC has a 32-bit addition which sign-extends, and not 8-bit or 16-bit addition, and no zero-extending 32-bit addition, so you may be seeing zero-extending instructions when, e.g., performing array indexing with unsigned ints. ARM A64 has addressing modes that sign-extend or zero-extend a value in a register, so you can avoid explicit sign- or zero extension of 32-bit values for use in addressing.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 10:26 UTC (Mon) by farnz (subscriber, #17727) [Link] (10 responses)

Right, but this means that "what the hardware does" is also ill-defined; not only is it different on different hardware (which is fair enough), it's also different on the same hardware depending on the use of the result in the C abstract machine.

It would be far simpler to just fully define the behaviour, as -fwrapv does, and accept that in some cases, there's a performance cost that can only be addressed by fixing your code. But that would mean losing on the SPECint 2006 benchmark, which is less acceptable than having a language full of UB…

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 18:44 UTC (Tue) by anton (subscriber, #25547) [Link] (9 responses)

"What the hardware does" is a idea, but indeed not a specification (I guess that's why UB advocates love to beat on strawmen that include this statement).

However, you can define one translation from an operation-type combination in the C abstract machine to machine code. If the results are used in some way, conditioning the data for that use is part of that use. One interesting case is p+i: Thanks to the I32LP64 mistake, on 64-bit platforms p is 64 bits wide, while i can be 32 bits wide, but most hardware does not have instructions that performs a scaled add of a sign-extended (or, if i is unsigned, zero-extended) 32-bit value and a 64-bit. So one would translate this operation to a sign/zero-extend instruction, a multiply by the size of *p, and an add instruction.

And then you can optimize: If the instruction producing (signed) i already produced a sign-extended result, you can leave the sign extension away; or you may be able to combine the instruction that produces i with the sign/zero extending instruction and replace it with some combined instruction. And for that you can see how the various features of the architectures mentioned above play out.

As for -fwrapv or more generally -fno-strict-overflow, all architectures in wide use have been able to support that for many decades, so yes, that would certainly be something that can be done across the board and making it the default and putting that behaviour into the C standard is certainly a good idea. C compiler writers worrying about performance can then warn about loop variable types that require a sign extension or zero extension on every loop iteration.

BTW, on machines with 32-bit ints, there is no need to sign-extend 8-bit or 16-bit additions, because the way that C is defined, all additions happen on ints (i.e., 32 bits) or wider. So you convert your 8-bit or 16-bit operands to ints first, and then add them.

Standardization on a fully-defined behaviour is unlikely for cases where architectural differences are more substantial, e.g., shifts. You can then define -fwrapv-like flags, and that might be a good porting help, but in the absence of that, having consistent behaviour on a specific platform would already be helpful.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 19:17 UTC (Tue) by farnz (subscriber, #17727) [Link] (6 responses)

But therein lies the core of the problem: -fwrapv costs some subsets of SPECint 2006 around 5% to 10% performance numbers. Which means, in turn, that people have already refused to turn on -fwrapv by default, since they're depending on the performance boost they get from the compiler treating "int" as "unsigned", rather than promising sign extension and wraparound.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 10:10 UTC (Wed) by anton (subscriber, #25547) [Link] (5 responses)

It's actually only one program in SPECint2006 where it costs 7.2% with gcc (as reported by Wang et al.); and the performance improvement of the default -fno-wrapv can alternatively be achieved by changing the type of one variable in that program.

You write about people and culture, I write about compiler maintainers and attitude. And my hypothesis is that this attitude is due to the compiler maintainers (in particular those working on optimization) evaluating their work by running their compilers on benchmarks, and then seeing how an optimization affects the performance. The less guarantees they give, the more they can optimize, so they embrace UB. And they produced advocacy for their position.

Admittedly, there are also other motivations for some people to embrace UB:

Language lawyering, because UB gives them something that distinguishes them from ordinary programmers.
Elitism: The harder C programming is (and the more UB, the harder it is), the more they feel part of an elite. Everyone who does not want it that hard should switch to a nerfed language like Java, or better get out of programming at all.
Compiler supremacy: The compiler always knows best and magically optimizes programs beyond what human programmers can do. So if, by not defining behaviour, the compiler reduces the ways in which a programmer can express something, that's only for the best: The compiler will more than make up for any potential slowdown from that by being able to optimize better. After all (and this is where the compiler writer advocacy kicks in), UB is the source of optimization, and without having as much UB in C as there is, you might as well use -O0.
It's free: There are those who have not experienced that the compiler changed the behaviour of their program based on the assumption that the program does not exercise UB (or have not noticed it yet, in cases where the compiler optimizes away a bounds check or the code that erases a secret). They can fall for the compiler writer's advocacy that UB gives them performance for free. The cost of having to "sanitize" (hunting and eliminating UB in) their programs is not yet obvious to them.

There are, however, also other positions, advocated by many (including me), so I think that the C compiler writer's position on UB is not "the C culture" (it may be "the C++ culture", I don't know about that). In particular, I think (and have evidence for it) that humans are superior at optimizing programs, and that, if the goal is performance, programmer's time is better spent at performing such optimizations (by, e.g., changing the type of one variable, but also more involved transformations) than at "sanitizing" the program.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 10:57 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

What I see, however, when I look at what the C standards committee is doing, is that the people who drive C and C++ the languages forwards are dominated by those who are embracing UB; this is largely because "make it UB, and you can define it if you really want to" is the committee's go-to answer to handling conflicts between compiler vendors; if there are at least two good ways to define behaviour (e.g. arithmetic overflow, where the human-friendly definitions are "error", "saturate", and "wrap"), and two compiler vendors refuse to agree since the definition either way results in the output code being less optimal on one of the two compilers, the committee punts on the decision.

And it's not just the compiler maintainers and the standards committees at fault here; both GCC and Clang provide flags to provide human-friendly defined behaviours for things that in the standard are UB (-fwrapv, -ftrapv, -fno-delete-null-pointer-checks, -fno-lifetime-dse, -fno-strict-aliasing and more). Users of these compilers could insist on using these flags, and simply state that if you don't use the flags that define previously undefined behaviour, then you're Holding It Wrong, but they don't.

Perhaps if you got (say) Debian and Fedora to change their default CFLAGS and CXXFLAGS to define behaviours that in standard C and C++ are undefined, I'd believe that you were anything more than a minority view - but the dominant variants of both C and C++ cultures don't do that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 11:45 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> this is largely because "make it UB, and you can define it if you really want to" is the committee's go-to answer to handling conflicts between compiler vendors;

Arghhhh ....

The goto answer SHOULD be "make it implementation defined, and define a flag that is on by default". If other compilers don't choose to support that flag, that's down to them - it's an implementation-defined flag.

(And given I get the impression this SPECInt thingy uses UB, surely the compiler writers should simply optimise the program to the null statement and say "here, we can run this benchmark in no time flat!" :-)

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 12:38 UTC (Wed) by farnz (subscriber, #17727) [Link]

The C standard (and the C++ standard) do not define flags that change the meaning of a program, because they are trying to define a single language; if something is implementation defined, then the implementation gets to say exactly what that means for this implementation, including whether or not the implementation's definition can be changed by compiler flags.

There's four varieties of incompletely standardised behaviour for a standards-compliant program that do not require diagnostics in all cases (I'm ignoring problems that require a diagnostic, since if you ignore warnings from your compiler, that's on you):

"ill-formed program". This is a program which simply has no meaning at all in the standard. For example, I am a fish. I am a fish. I am a fish is an ill-formed program (albeit one that requires a diagnostic). Ill-formed programs can always be optimized down to the null statement.
"undefined behaviour". This is a case where, due to input data or program constructs, the behaviour of the operation is undefined, and the consequence of that undefinedness is that the entire program has no meaning attributed to it by the standard. A compiler can do anything it likes once you've made use of UB, but in order to do this (e.g. compile down to the null statement), the compiler first has to show that you're using UB; e.g. you've executed a statement with undefined behaviour, such as arithmetic overflow in C99. If you do something that has UB in some cases but not others, the compiler can assume that the UB cases don't happen.
"unspecified behaviour". The behaviour of the program upon encountering unspecified behaviour is not set by the standard, but by the implementation. The implementation does not have to document what the behaviour will be, nor remain consistent between versions. Unspecified behaviour can be constrained by the standard to a set of possible options; for example, C++03 says that statics initialized via code are initialized in an unspecified order, but for each initializer, all statics must either be fully-initialized or zero-initialized at the point the initializer runs. This means that code like int a() { return 1; }; int b = a(); int c = b + a(); can set c to either 1 (b was zero-initialized) or 2 (b was fully-initialized), but not to any other value. However, because this is unspecified, the behaviour can change every time you run the resulting executable.
"implementation-defined behaviour". The behaviour of the program upon encountering unspecified behaviour is not set by the standard, but by the implementation. The implementation must document what behaviour it chooses; like unspecified behaviour, the allowed options may be set by the standard.

And it's been the case in the past that programs have compiled down to the null statement because they always execute UB; the problem with the SPECint 2006 benchmark in question is that it's conditionally UB in the C standard language, and thus the compiler must produce the right result as long as UB does not happen, but can do anything if UB happens.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 18:55 UTC (Wed) by anton (subscriber, #25547) [Link] (1 responses)

What I see, however, when I look at what the C standards committee is doing, is that the people who drive C and C++ the languages forwards are dominated by those who are embracing UB; this is largely because "make it UB, and you can define it if you really want to" is the committee's go-to answer to handling conflicts between compiler vendors

Yes, not defining something where existing practice diverges is the usual case in standardization. That's ok if you are aware that a standard is only a partial specification; programs written for compiler A would stop working (or require a flag to use counter-standard behaviour) if the conflicting behaviour of compiler B was standardized. However, if that is the reason for non-standardization, a compiler vendor has a specific behaviour in mind that the programs of its customers actually rely on. For a case (e.g., signed overflow) where compiler vendors actually say that they consider the programs that do this buggy and reject bug reports about such programs, compiler vendors do not have this excuse.

We certainly use all such flags that we can find. Not all of the following flags are for defining what the C standard leaves undefined, but for gcc-10 we use: -fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv -fno-strict-aliasing -fno-cse-follow-jumps -fno-reorder-blocks -fno-reorder-blocks-and-partition -fno-toplevel-reorder -falign-labels=1 -falign-loops=1 -falign-jumps=1 -fno-delete-null-pointer-checks -fcf-protection=none -fno-tree-vectorize -pthread -fno-defer-pop -fcaller-saves

I think both views are minority views, because most C programmers are relatively unaware of the issue. That's because the maintainers of gcc (and probably clang, too) preach one position, but, from what I read, practice something much closer to my position: What I read is that they check whether a new release actually builds all the Debian packages that use gcc with the release candidate and whether these packages then pass some tests (probably their self-tests). I assume that they then fix those cases where the package then does not work (otherwise, why should they do this checking? Also, Debian and other Linux distributions are unlikely to accept a gcc version that breaks many packages). This covers a lot of actual usage (including a lot of UB). However, it will probably not catch cases where a bounds check is optimized away, because the tests are not very likely to test for that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 9, 2023 11:23 UTC (Thu) by farnz (subscriber, #17727) [Link]

Right, but you're a rarity. I see that big visible users of C (Debian, Fedora, Ubuntu, SLES, RHEL, FreeRTOS, ThreadX and more) don't set default flags to define more behaviour in C, while at the committee level John Regehr (part of the CompCert verified compiler project) could not get agreement on the idea that the ~200 UBs in the C standard should all either be defined or require a diagnostic from the compiler. And compiler authors aren't pushing on the "all UB should either be defined or diagnosed" idea, either.

So, for practical purposes, C users don't care enough to insist that their compilers define behaviours that the standard leaves undefined, nor do they care enough to insist that compilers must provide diagnostics when their code could execute UB (and thus is, at best, only conditionally valid). The standard committee doesn't care, either; and compiler authors aren't interested in providing diagnostics for all cases where a program contains UB.

From my point of view, this is a "C culture overall is fine with UB" situation - people have tried to get the C standard to define more behaviours, and the in charge of the C standard said no. People have managed to get compilers to define a limited set of behaviours that the standard leaves undefined, and most C users simply ignore that option - heck, if C users cared, it would be possible to go through the Fedora Change Process, or the Debian General Resolution process to have those flags set on by default for entire distributions, overruling the compiler maintainers. Given the complete lack of interest in either top-down (start at the standards body and work down) or bottom-up (get compiler writers to introduce flags, set them by default and push for everyone to set them by default) fixes to the C language definition, what else should I conclude?

And note that in the comments to this article, we have someone who agrees that too much of C is UB saying that they'll not simply insist that people use the extant compiler flag and rely on the semantics that are created by that - which is basically C culture's problem in a nutshell; we have a solution to part of the problem, but we're going to complain instead of trying to push the world to a "better" place.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 22:27 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> "What the hardware does" is a idea, but indeed not a specification (I guess that's why UB advocates love to beat on strawmen that include this statement).

> However, you can define one translation from an operation-type combination in the C abstract machine to machine code. If the results are used in some way, conditioning the data for that use is part of that use.

No it's nice and simple ... presumably in your example the processor has an op-code to carry out the operation. So instead of UB, we now have Implementation Defined - the compiler chooses an op-code, and now we have Hardware Defined.

If the compiler writers choose idiotic op-codes, more fool them. But the behaviour of your code is now predictable, given a fixed compiler and hardware. Of course "same compiler and hardware" has to be defined to mean all versions of the compiler and all revisions of the architecture.

"What the hardware does" means the compiler writers have to pick an implementation, AND STICK WITH IT. (Of course, a flag to force a different implementation is fine.)

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 10:51 UTC (Wed) by farnz (subscriber, #17727) [Link]

presumably in your example the processor has an op-code to carry out the operation

Not usually, no. The processor typically has between 0 and 3 opcodes that you can use to implement any low-level operation, with different behaviours; if it has zero opcodes, then there are multiple choices for the sequence of opcodes you use to implement the C abstract machine, each with different behaviours.

And inherently, if you're asking the compiler writers to pick an option and stick to it forever, you're also saying that you don't want the optimizer to ever do a better job than it does in the current version; the entire point of optimizing is to choose different opcodes for a given C program, such that the resulting machine code program is faster.

This differs to things like -fwrapv, and -funreachable-traps, since those options define the behaviour of the source code where the standard says it's UB, and promise you that whatever opcodes they end up picking, they'll still meet this definition of behaviour; but a negative consequence of that is that there are programs where the new definition costs you a register or more opcodes. Now, you can almost certainly fix those programs to not have the performance bug; but that's a tradeoff that people choose not to make.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 15:53 UTC (Fri) by khim (subscriber, #9252) [Link] (50 responses)

> What's wrong with that?

The wrong with that is such thing:

> whatever any sane rational programmer would expect

SIMPLY DOESN'T EXIST

Consider something like this: (2 >> 33) + (4 >> 257). Different CPUs would give the following answers:

The answer may be 2 (ARM), or
The answer may be 3 (80386), or
The answer may be 0 (x86-64 if fully vectorized), or
The answer may be 1 (x86-64 if partially vectorized), or
The answer may be 2 (x86-64 if partially vectorized)

Out of these possibilities what's the whatever “any sane rational programmer would expect”? #1, #2, or #3? And if compiler would produce #4 or #5… is it still “sane” or not?

Note that half-vectorized code breaks that all-important anton's “C int bit-rotation idiom” (x<<n) | (x>>(32-n)) while faitfully converting each operand to precisely one assembler instructions and then “doing what the hardware is doing”! Is it still “a portable assembler” or not, at this point, hmm?

The only way to achieve any semblance of “sanity” is to discuss these things, write them into a language spec and then ignore “what the hardware is doing” from that point on.

Because hardware is different and inconsistent. Vector instructions are doing things differently from scalar intructions on x86-64 and RISC-V permits different behavior on different cores of the same big-LITTLE CPU!

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:13 UTC (Fri) by vadim (subscriber, #35271) [Link] (46 responses)

> Out of these possibilities what's the whatever “any sane rational programmer would expect”?

Personally I find all of them acceptable.

Yes, it varies between CPUs and instructions used. But to me what's important is that it executes as code, I can step by step through it in GDB, and easily figure out where I'm going wrong and fix it.

> The only way to achieve any semblance of “sanity” is to discuss these things, write them into a language spec and then ignore “what the hardware is doing” from that point on.

In that case the language should declare a single correct interpretation and emulate it as necessary on every CPU.

My stance is that I find basically 2 approaches to be acceptable:

1. Do like Java and ignore the hardware. Declare that eg, when shifting only the lowest 5 bits matter, and if it costs performance to ensure that on the underlying CPU, too bad, you take the hit.
2. Translate it as-is, and whatever happens on this CPU, happens

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:38 UTC (Fri) by khim (subscriber, #9252) [Link] (31 responses)

> Personally I find all of them acceptable.

Then you are Ok with the fact that two of them break that C rotation idiom, right? Why is it Ok to break in that way but not in some other way?

> But to me what's important is that it executes as code, I can step by step through it in GDB, and easily figure out where I'm going wrong and fix it.

Then -O0 should be your choice. Why is it not acceptable for you?

> Translate it as-is, and whatever happens on this CPU, happens

That sends us back to -O0 solution, isn't it?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:25 UTC (Fri) by vadim (subscriber, #35271) [Link] (30 responses)

> Then you are Ok with the fact that two of them break that C rotation idiom, right?

Ideally, I'd like #3 or #2.

#3 has the most human appeal. #2 makes sense from the technical standpoint of "only the lowest 5 bits matter". I'm not sure where ARM's #1 comes from exactly.

IMO, #4 and #5 shouldn't happen without a dedicated compiler argument in the style of -funsafe-math-optimizations that implies some accuracy is being sacrificed somewhere. Optimization shouldn't change visible behavior without explicitly opting into it.

> Why is it Ok to break in that way but not in some other way?

I don't particularly like it, but it's still better than "this is UB and will be silently compiled to nothing". I asked for a shift, so I expect to get a shift, exact equivalent, or a compiler error.

> Then -O0 should be your choice. Why is it not acceptable for you?

It's indeed how I build my code in development by default. What I don't like is the behavior changing with higher optimization levels. I'm fine with optimizations, so long the code is guaranteed to arrive at exactly the same result as with -O0, unless some explicit opt-in is given.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 17:42 UTC (Sat) by khim (subscriber, #9252) [Link] (19 responses)

> I'm not sure where ARM's #1 comes from exactly.

ARM just takes low byte of argument and then treats it like human would.

> Optimization shouldn't change visible behavior without explicitly opting into it.

In language like C this would leave list of possible optimizations empty.

Simply because any change to generated code may be classified as “change in visible behavior” for a program that takes address of function and then disassebles it.

> I asked for a shift, so I expect to get a shift, exact equivalent, or a compiler error.

What that phrase even means? What is I asked for shift, what is I get a shift, what is exact equivalent?

You have, basically, replaced hard task (implementing complicated but, presumably, unambigious spec) with impossible task (implementing bunch of things which different people interpret differently). How is that an improvement?

> I'm fine with optimizations, so long the code is guaranteed to arrive at exactly the same result as with -O0, unless some explicit opt-in is given.

IOW: you are fine with optimizations that break someone's else code, but not Ok with optimizations that break your code. O_PONIES, O_PONIES and more O_PONIES…

> What I don't like is the behavior changing with higher optimization levels.

If they change the behavior of correct C or C++ program then it's bug that should be fixed. But that's not what you are talking about, right?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 20:41 UTC (Sat) by Wol (subscriber, #4433) [Link] (9 responses)

> > I asked for a shift, so I expect to get a shift, exact equivalent, or a compiler error.

> What that phrase even means? What is I asked for shift, what is I get a shift, what is exact equivalent?

Okay, so we define left shift mathematically as move all the bits left one space. A "specification compliant" program will get the same as "times 2" because a specification-compliant program won't left-shift something too big. In other words, the spec should DEFAULT to assuming the program is well-formed, and the result will be exactly the same as a guy doing it with pen and paper.

At which point the question is "What happens if the hardware can't do what's asked of it" or "what happens if the programmer does something stupid" (eg 256*256 in unsigned byte arithmetic)?

We're not really even asking for C/C++ to define what happens for a malformed program. What we need is for the standard / compiler to give us sufficient information to work out what will happen if things go wrong. "Does << mean left shift with rotate, or left shift and drop?" Will x << 32 on an int32 give you x or 0?

And like fwrapv, are you confident enough to let the static analyser assume a well-formed program, or are you assuming malicious input intended to cause trouble?

And actually, this brings into clarity the reason why the current compiler-writer obsession with optimisation is HARMFUL to computer security! It's pretty obvious that the combination of compiler writers assuming that all input is well-formed (whether to the compiler, or a program compiled by the compiler), and crackers searching for ever more ingenious ways to feed malformed input into programs, is going to lead to disaster.

At the end of the day, to be secure ANY language needs to define what SHOULD happen. And given that programming is *allegedly* (should I really feel forced to use that word?) maths, defining what should happen should be easy. At which point you then have to define what happens in the corner cases, eg overflow, wraparound, etc etc.

Without a CONCERTED and SERIOUS effort to remove as much UB as possible from the language (and to implement the principle of "least surprise"), the use of C or C++ should be a major security red flag.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 21:23 UTC (Sat) by smurf (subscriber, #17840) [Link]

> At the end of the day, to be secure ANY language needs to define what SHOULD happen

That's not enough. It also needs to define what SHOULD NOT happen, and be able (language design problem) and empowered (here's that cultural problem again) to halt the compilation and/or runtime if/when it does.

Arithmetic overflows and similar problems are easy to protect against. They're local. You can just wrap your possibly-overflowing in protective coating. GCC has a bunch of intrinsics that tell you when an operation overflows. Also, adding two numbers is unlikely to modify a random bit somewhere else.

The large class of memory leak / double free / aliasing / "array" overrun / … errors which C/C++ is (in)famous for, and which comprise the overwhelming majority of exploits on programs written in it, is a rather different kettle of fish.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 9:55 UTC (Sun) by khim (subscriber, #9252) [Link] (7 responses)

> We're not really even asking for C/C++ to define what happens for a malformed program.

Seriously?

> What we need is for the standard / compiler to give us sufficient information to work out what will happen if things go wrong.

Ah. So you replace hard problem with impossible problem and call that an improvement?

Ultimately the whole discussion is classic half-full vs half-empty discussion.

Let's start with the facts:

There are hundreds of UBs in C and C++
Some of these are very obscure and no one remembers all of them.
The wast majority of these can easily be converted to full-implemented definition or implementation-depenedent definition
Examples are integer overflow, lack of newline at the end of file (yes, that's UB in C!), etc.
But small minority of these are “hard” UB, the ones which couldn't be easily defined, the most you can do is some kind of UB detector (like Rust's Miri and various clang/gcc sanitizers) which detected them (and then you rewrite the code instead of trying to reason about how that code with UB would work)
For example “an attempt to use valiable outside of it's scope like I showed here”

And what makes C/C++ unfixable is dishonesty on both sides.

O_PONIES lovers (including you, I guess) concentrate on fact #2 and demand that compliers treat all UBs in a predictable way. That's impossible because of fact #3.

Compiler developers concentrate on fact #3 and then demand that all UBs were avoided because small subset of these couldn't be ever handled by optimizing compiler at all. That's dishonest but is the only option in the presence of people who refuse fact #3.

The only solution which can lead somewhere is the following:

Kick-out/remove from the community guys who ignore the fact #3.
They can never be satisfied and as long as the code from such people is in circulation you may never trust that your programs would behave adequately.
Accept that fact #1 is unacceptable and invent some plan of reducing number of UBs.
Because today no one, not even compiler developers, may remember all these UBs that are peppering language specifications and standard library specifications.
Open the dialogue between compiler developers and compiler users about how #2 UB should be treated.

And so far I don't see the most important step #1 being even acknowledged by C/C++ communities, let alone being addressed.

In Rust it works, as I have shown already. Steve may call it “a sad day for Rust”, but in reality it was absolutely necessary “rite of passage”. Because social problems can only be solved by social measures and refusal to accept existence of fact #3 is social issue.

> At the end of the day, to be secure ANY language needs to define what SHOULD happen.

That's impossible in low-level language which allows one to insert arbitrary assembler code (and both C/C++ and Rust make it possible). And C# or Java or JavaScript or Python or… literally anything would, by that definition, be unsafe if it allows loading shared libraries which may include arbitrary assembler code.

On the contrary: to achieve any semblance security you need to kick out people who insist on the definition of everything. That's impossible. What you have to do is ensure that there are certain rules your “crazy code” (that part that is arbitrary assembler code, if nothing else) have to obey and then, on top of that, you can build other things.

Demanding that compiler developers would give you ponies is not constructive. Even if asking them to reduce number of UBs is constructive. But you may never reduce that number to zero and you may never get predictability from programs that hit an UB case.

> And given that programming is *allegedly* (should I really feel forced to use that word?) maths, defining what should happen should be easy.

Why? How? It's precisely because programming is math certain things are impossible. Rice's theorem, Halting theorem and many other things restrict decidability.

It's precisely because of math that we know fact #3. Certain UBs (not all of them, of course!) literally can not contained. Any modification of program with some “bad” UBs can break it. Including, of course, any optimization, too.

If you want to have both “unrestricted assembler code” and optimizations, then “predictability in face of UB” is not an option.

> At which point you then have to define what happens in the corner cases, eg overflow, wraparound, etc etc.

These are easy to define. What's impossible to define are things related to lifetimes. That's why Rust feels like a breath of fresh air: instead to making developer of handling all lifetimes manually it makes compiler to reason about the majority of them. Only unsafe have to deal with UB (and, of course, “unrestricted assembler code” is unsafe for obvious reason).

> Without a CONCERTED and SERIOUS effort to remove as much UB as possible from the language (and to implement the principle of "least surprise"), the use of C or C++ should be a major security red flag.

These things are complementary: the less UB you leave the more “surprises” you may expect from the remaining ones.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 17:38 UTC (Sun) by Wol (subscriber, #4433) [Link] (6 responses)

> > At the end of the day, to be secure ANY language needs to define what SHOULD happen.

> That's impossible in low-level language which allows one to insert arbitrary assembler code (and both C/C++ and Rust make it possible). And C# or Java or JavaScript or Python or… literally anything would, by that definition, be unsafe if it allows loading shared libraries which may include arbitrary assembler code.

And this is where your logic goes over the top stupid. IF IT AIN'T C, THEN THE C GUARANTEES CANNOT HOLD. !!! !!!

Dial yourself back a bit, and let's keep the discussion about what a SANE language should be doing. Firstly, it should not be saying stuff about what happens over things out of its control!

Which is why something as simple as saying "256 * 256 will give you 65K unless you trigger overflow. If overflow occurs, it's not C's problem, you get what the hardware gives you". Or "multiplying two int16s will give you internal int32. If you don't give it a big enough int to store the result, you get what you get". That at least gives you some ability to reason.

Yes Godel says you can't define a completely logical mathematical model, but if you take THIS approach, where C / C++ *actually* *takes* *responsibility* for things *within* *its* *power*, that will get rid of a lot of UB.

And it is NOT the COMPILER'S JOB to determine what happens if the programmer asks for something illogical. Take use-after-free. Why is the C compiler trying to dictate what happens when the programmer does that? Either you declare something like that "out of scope", or if you detect it the compiler prints an error.

This is the problem with fwrapv, for example. And with loads of UB, actually. The compiler is allowed to take advantage of UB to do stuff that 99% of people don't want !!! If you even just said "any optimisations that are enabled by UB must be explicitly switched on, not enabled by default", that would probably make C/C++ a safer language at a stroke! Even if *some* stuff slips through the net that shouldn't.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 19:11 UTC (Sun) by khim (subscriber, #9252) [Link] (4 responses)

> Firstly, it should not be saying stuff about what happens over things out of its control!

How would that work? If you call assembler function then it may do literally anything with any part of the code and data at any point in the future. How can compiler and compiler writers guarantee ANYTHING AT ALL if the requirement is to ensure that code work with adversary which changes everything and anything in your program randomly?

> Which is why something as simple as saying "256 * 256 will give you 65K unless you trigger overflow.

You can not do that if you wouldn't limit parts of the assembler in your program. Heck, on most platforms that assembler part may install “random flipper” which would, after starting it, randomly flip bits in the accessible code and data areas. How can you predict result of 256 * 256 in the presence of such “helper”? How can you predict anything at all?

> Yes Godel says you can't define a completely logical mathematical model, but if you take THIS approach, where C / C++ *actually* *takes* *responsibility* for things *within* *its* *power*, that will get rid of a lot of UB.

Sure. But what's the point? If some random guy or gal insists that any UB must be handled by SANE language predictably, even “random bit flipper” in your process, then your only choice is to sigh and start tedious process of reviewing or removing code written by such a person, what other choice is there? Such person clearly doesn't understand how program development work, if s/he wrote anything that actually works, by accident, you may never be sure it wouldn't stop working at random time in the future.

Yes, lots of C/C++ UBs may be redeclared as not UBs. But that would never satisfy O_PONIES lovers which insist of entirely different treatment of UB and as long as they are part of you community… any semblance of safety is impossible.

> If overflow occurs, it's not C's problem, you get what the hardware gives you

What's the point of that stupidity? If you want predictability then it's much better to just fully specify the result. If you want portability then UB is better because it makes you code usable on wider range of systems.

What the hardware gives you is pretty much useless: it gives you neither predictability nor portability, why even bother?

> And it is NOT the COMPILER'S JOB to determine what happens if the programmer asks for something illogical. Take use-after-free. Why is the C compiler trying to dictate what happens when the programmer does that?

Because that the only way to apply as if rule. Optimizations should preserve behavior of the program. But if your program doesn't have a predictable behavior then it's impossible to answer the question of whether optimization would preserve it or not.

> Either you declare something like that "out of scope", or if you detect it the compiler prints an error.

Once more: consider the following code:

void foo(int x) {
  int y = x;
}

This piece doesn't include UB, doesn't do anything strange, it only does one simple (if useless) assignment.

But if you remove that assignment you may break some other piece of code!

Cases like these make optimizations of programs that may trigger UB pretty much impossible. Critically: it makes it impossible to even optimize these parts of the code that don't trigger UB.

> The compiler is allowed to take advantage of UB to do stuff that 99% of people don't want !!!

Why are you so sure? How can you know that that 99% of people don't want these optimizations? On the contrary: when an attempt was made to find out what optimizations people actually want then almost every UB was found to be liked to significant percentage of developer.

> If you even just said "any optimisations that are enabled by UB must be explicitly switched on, not enabled by default", that would probably make C/C++ a safer language at a stroke!

You mean something like CompCert C? It's not too much popular. If people actually wanted what you preach then they would have switched to it in droves.

In reality… I think there are more Rust users than CompCert C users.

Without a single language definition you no longer have a community, you have bunch of silos and each one not too big.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 20:12 UTC (Sun) by jem (subscriber, #24231) [Link] (3 responses)

Once more: consider the following code:
void foo(int x) {
  int y = x;
}
This piece doesn't include UB, doesn't do anything strange, it only does one simple (if useless) assignment.
But if you remove that assignment you may break some other piece of code!

The code that relies on this assignment does exhibit undefined behaviour: it uses the value of an uninitialised variable. I can't understand why anyone would want to write code like this, which assumes that a function local variable occupies the same address as a variable in a previously called function that has already returned. I don't see the usefulness of this, and even the most junior programmer should realise this is an extremely risky piece of code. This has got nothing to do with the compiler taking advantage of UB to optimise code. Just normal compilers optimisations like deleting the unused variable results in a non-working program.

One could even argue that your program is broken even if it was written in pure assembly language. The stack slot the popped variable occupied is not part of the stack anymore, and I can imagine target machines where the stack grows and shrinks automatically. On such a machine the physical page could be unmapped as soon as the hardware detects that the stack has shrunk past the page border.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 21:50 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

> I can't understand why anyone would want to write code like this, which assumes that a function local variable occupies the same address as a variable in a previously called function that has already returned.

Computer can't understand that, either. Because it couldn't understand anything at all. It doesn't have conscience or common sense.

> I don't see the usefulness of this, and even the most junior programmer should realise this is an extremely risky piece of code.

So what? Remember the context here: If you follow the discipline above, all optimization levels behave the same way, so why talk about -O0?

If you so pompously proclaim that there are some magic discipline which allows you to optimize any code without relaying on the absence of UB then you have to explain how it works on this example (and on other, even more convoluted examples, too).

Or, if you don't plan to optimize any working code then you have to explain why this piece of code is not deserve all optimization levels behave the same way treatment without adding things like I don't see the usefulness of this or I can't understand why anyone would want to write code like this. If you can not do that then your proclamations about how there are “benigh optimizations that don't rely on the absence of UB” are exposed as lies.

> This has got nothing to do with the compiler taking advantage of UB to optimise code.

This had got everything to do with compiler taking advantage of absence of UB to optimize code.

You either say that code which hits UB at runtime may not preserve behavior after optimization or, alternatively, be prepared how to optimize any code that predictably works without optimizations. And any means any here, not just “the code I may want to write”.

> One could even argue that your program is broken even if it was written in pure assembly language.

It works, sorry.

> The stack slot the popped variable occupied is not part of the stack anymore, and I can imagine target machines where the stack grows and shrinks automatically.

Sure, but x86-64 doesn't behave that way.

> On such a machine the physical page could be unmapped as soon as the hardware detects that the stack has shrunk past the page border.

How is that relevant? On x86-64 red zone is large enough to keep values from being overwritten. And O_PONIES lovers always talk about how they have the right to take advantage of any and all quirks of the target platform and yet compiler have to process such programs anyway… I'm not using anything beyond that here.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 7:49 UTC (Mon) by jem (subscriber, #24231) [Link] (1 responses)

>This had got everything to do with compiler taking advantage of absence of UB to optimize code.

Ok, then why are you using code with UB to prove your point? I am talking about the "other piece of [C] code".

Anyway, this discussion isn't leading anywhere, so I am out.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 8:03 UTC (Mon) by mb (subscriber, #50428) [Link]

> Ok, then why are you using code with UB to prove your point? I am talking about the "other piece of [C] code".

The code is "programmed to the machine" and the compiler is expected to just emit instructions and the behavior shall be whatever the machine's stack would do.
The compiler breaks this by optimizing away the stack store, because the whole program is undefined in the language machine model.

"Programming to the hardware" is fundamentally broken.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 7:12 UTC (Mon) by smurf (subscriber, #17840) [Link]

> The compiler is allowed to take advantage of UB to do stuff that 99% of people don't want !!!

You're coming at this from the wrong end of the road.

Everybody (*) wants their code to be as small and/or as fast as possible.
Every optimization whatsoever that a compiler performs can be proven to be perfectly valid *if* there are no UB conditions around.

Compiling for UB-free code allows the optimizer to skip checks and to ignore conditions that correct, non-UB-ish code simply doesn't need or have in the first place. It's not the optimizer steps' jobs to determine whether their transformations introduce errors under UB conditions, because they can't read your mind and can't know what you WANT to happen (and thus, what constitutes an error). The code doesn't tell them. It can't – if it could, it wouldn't be UB in the first place.

The problem is that C++ has no checks(**) to assure that your code does not exhibit UB, no culture to introduce such checks, and a governance that's not interested in changing the culture. The -fwrapv-is-not-the-default-because-of-a-trivially-fixed-SPECint-benchmark controversy is just the tip of the tip of the iceberg.

* OK, 99%; the remaining 1% are stepping through their functions while debugging.

** This would require code annotations equivalent to Rust's mutability and lifetime stuff – which are very hard, if not impossible, to introduce into a language that's *already* a mess of conflicting annotations and ambiguous grammar. Not to mention the code that's supposed to act on them.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 22:49 UTC (Sun) by vadim (subscriber, #35271) [Link] (8 responses)

> In language like C this would leave list of possible optimizations empty.

Nonsense

> Simply because any change to generated code may be classified as “change in visible behavior” for a program that takes address of function and then disassebles it.

Visible as in affecting the program's output. Eg, no normal optimization option should result in a sequence of mathematical operations produce different results. Eg, there should be no way for -O3 to calculate a value for Pi different from -O0.

Obviously, program size, execution speed and so on are not invariants to be preserved.

> What that phrase even means? What is I asked for shift, what is I get a shift, what is exact equivalent?

The mathematical equivalent. Eg, a multiplication or division that returns the same numerical result. So by that standard, vectorization shouldn't be done if there's any way the result could vary, without explicitly opting into that possibility.

> IOW: you are fine with optimizations that break someone's else code, but not Ok with optimizations that break your code.

I just recognize the rare usefulness of special purpose options. But they must be explicitly opt-in and separate from normal optimization.

> If they change the behavior of correct C or C++ program then it's bug that should be fixed. But that's not what you are talking about, right?

Yes, guarantees can't be made regarding incorrect programs. However the language spec should resist with all might any attempts to add additional ways in which a program may be successfully compiled and yet still be incorrect, and to try to reduce the amount of such things in future releases.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 23:20 UTC (Sun) by khim (subscriber, #9252) [Link]

> Visible as in affecting the program's output.

And what prevents one from taking address of some random function and changing output depending on how that function body is encoded?

> Obviously, program size, execution speed and so on are not invariants to be preserved.

Obviously… to whom? Obviously… why? In the absence of UB what prevents one from writing the program whose output depends on the program size, execution speed and so on?

> Eg, a multiplication or division that returns the same numerical result.

But shift returns different result on the same platform, depending on which instruction was choosen. And as @anton helpfully tells us this is even true for simple operations like additions and/or subtractions.

> So by that standard, vectorization shouldn't be done if there's any way the result could vary, without explicitly opting into that possibility.

By that standard nothing whatsoever can be done to anything.

> But they must be explicitly opt-in and separate from normal optimization.

Except we have yet to establish what “normal optimization” even mean. And if any exist in principle.

> However the language spec should resist with all might any attempts to add additional ways in which a program may be successfully compiled and yet still be incorrect, and to try to reduce the amount of such things in future releases.

That part only becomes sensible after existence of unrestricted UB (as in: UB which is “crazy enough” that any program which hits it may produce any random output at all).

After you establish that such UB exists you may “separate the wheat from the chaff”: programs which hit “unrestricted UB” may be miscompiled (since they don't have any predetermined behavior to begin with), programs which don't trigger it must be compiled on the as if basis.

And list of unrestricted UBs becomes subject of negotiations of the language team which includes both compiler writers and compiler users.

If you don't accept the existence of unrestricted UB then discussing about when compiler may or may not do to change the output of program becomes pointless: compiler which would satisfy O_PONIES lovers requirements couldn't exist thus why would anyone spend time to try to create one?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 6:54 UTC (Mon) by mb (subscriber, #50428) [Link] (6 responses)

> Obviously, program size, execution speed and so on are not invariants to be preserved.

Why is that obvious?

My my multi threaded PI calculation program running on an embedded system uses carefully crafted timing loops to avoid data race UB. Your speed optimization breaks my perfectly working program!

In another part of the program my carefully written program inspects the generated machine code, which I carefully wrote the C code "to the machine" for. It does decisions based on what the compiler generated. Your optimized code is smaller and breaks my code!

You must realize that "sane" is not a trait that you can use in optimizers.
You either have to define the behavior in terms of the language's virtual machine model, you leave it UB.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 15:44 UTC (Mon) by khim (subscriber, #9252) [Link]

Note that from Atari 2600 to Playstation 2 counting cycles and doing syncronization that was was the norm (in fact Atari 2600 uses cheap CPU version which doesn't have interrupts thus it's the only way to do syncronization there).

And if people who did that would be presented by “Portable Assembler” idea they would, naturally, expect something like that.

And inspection of function body is something that iced includes in it's tutorial… do you think it's done because no one does that?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 9:27 UTC (Tue) by vadim (subscriber, #35271) [Link] (4 responses)

Because by specifying the "optimize" flag you're explicitly asking the compiler to try and make it faster, and therefore change the timing.

And even without optimization I don't think timing can be guaranteed in C or C++. At any time a compiler could be updated to learn of a new instruction, or fix a code generation bug.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 10:24 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

>Because by specifying the "optimize" flag

So we're back to: You must specify -O0, if you "program to the hardware".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 19:12 UTC (Tue) by vadim (subscriber, #35271) [Link] (2 responses)

No, I don't think even that will do if you're counting cycles, because even with -O0 the possibility exists that the compiler will make different decisions about what instructions to use depending on bug fixes/implementation. I don't think there's for instance any guarantee that GCC and Clang will both produce the same binary with -O0 in anything but the most trivial cases.

So if you're counting cycles you should probably be actually coding it in assembler.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 19:17 UTC (Tue) by mb (subscriber, #50428) [Link] (1 responses)

So you are saying that you can't "program to the hardware" in C at all?
I fully agree.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 13:16 UTC (Wed) by vadim (subscriber, #35271) [Link]

You can to a limited extent.

Eg, under DOS you can write C code that sets interrupt handlers, or does low level control of the floppy drive. The ability to do such low level things is precisely why C gets used to write operating systems.

Like I said elsewhere, "portable assembler" is in my view a very metaphorical description, because obviously there can't be such a thing in the absolute sense. Proper assembler reflects the CPU's architecture, and a single language can't accurately depict the wildly different designs that exist. However it can get there part of the way given some compromises.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 21:09 UTC (Sat) by smurf (subscriber, #17840) [Link] (9 responses)

> I'm fine with optimizations, so long the code is guaranteed to arrive at exactly the same result as with -O0

How should the compiler know which result the -O0-compiled code should generate? It doesn't know about your data input and your preconditions. It doesn't know what you want to achieve. It can't know whether that struct whose address you're passing to your function is going to be modified by the seemingly-unrelated function you're calling, or whether two arrays (sorry – memory ranges you use as arrays) overlap, or (if they do) whether that's a bug or your intent, or … and so on, and so forth.

Current C/C++ optimizers are simple-minded: *Their* precondition for the guarantee you want isn't quite O_PONIES, but that your input doesn't cause any UB.

The problem is, they do not help with ensuring that this precondition (or rather, very large set of preconditions) is actually met. "UB, no warning necessary" idiocy in the standard(s) is one strong indicator that this problem is unlikely to be resolved any time soon. There are others – as should be obvious from other answers in this thread.

NB: Surprise: Quite a few of these documented-as-UB conditions don't even need an optimizer to trigger. I'll leave finding examples for this assertion as a fairly-trivial exercise for the esteemed reader.

Stroustrup's Plan does not address the "UB no warning" problem, or the "include file" problem, or … I could go on.

Compilers are not mind readers.

Rust has aliasing prohibitions etc. that actually mean something (mostly-)concrete, not "more magic" – like 'restrict' and 'volatile' do.

Grafting the concept of sharing-XOR-mutability, or lifetime annotations, or something else that might achieve similar results of UB prevention, onto C/C++ at this point is wishful thinking. That's not just because of the culture, or the heap of old code you'd need to adapt. There's also the include file mess, which pretty much guarantees that you can't mix "old" and "new" code and have the result mean anything. There's the fact that C++ has grown to an unwieldy mess that doesn't even have a context-free unambiguous grammar any more (assuming that it ever did …), thus by this time it's no longer possible IMHO to add something this fundamental and have it MEAN something.

I could go on, but it's late.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 10:12 UTC (Sun) by khim (subscriber, #9252) [Link]

> There's also the include file mess, which pretty much guarantees that you can't mix "old" and "new" code and have the result mean anything.

C compilers had the ability to mix code with different meanings for decades.

> There's the fact that C++ has grown to an unwieldy mess that doesn't even have a context-free unambiguous grammar any more (assuming that it ever did …)

Even the very first edition of The C++ Programming Language mentioned ambiguity that causes the most vexing parse. I think you are talking about undecidability. That achevement is newer, it needs templates, but it's in C++98, already.

That's why GCC stopped trying to bend bison to it's will and switched to completely ad-hoc parser.

> by this time it's no longer possible IMHO to add something this fundamental and have it MEAN something.

Why not? All these things are fixable in the Ship of Theseus fashion.

But for that you need the community that is willing to change. And C/C++ community doesn't fit the bill.

That is the critical failure of that plan, everything else is fixable.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 18:17 UTC (Sun) by anton (subscriber, #25547) [Link] (7 responses)

How should the compiler know which result the -O0-compiled code should generate?

The compiler writer decides one the behaviour for the platform once, and sticks with it. E.g., if the compiler writer decides to compile + to behave like an add instruction on the Alpha, and * into a mul instruction, it is fine to optimize l1+l2*8 to use an s8add instruction, because this instruction behaves the same way as a sequence of mul (by 8) and add. Once you have decided on the add behaviour, it is not ok to use an addv instruction for +, because that behaves differently (it traps on signed overflow).

As for -O0-compiled: If you follow the discipline above, all optimization levels behave the same way, so why talk about -O0? Plus, in compilers that assume that programs don't exercise UB, -O0 does not help: I have seen gcc assume that signed overflow does not happen with -O0.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 6:46 UTC (Mon) by smurf (subscriber, #17840) [Link] (6 responses)

> The compiler writer decides one the behaviour for the platform once, and sticks with it

Oh, how nice, you now have platform-dependent code, thus zero guarantees that your "x86-64" code will run on tomorrow's ARM64 or MIPS platforms. And, crucially, no way to find out – because C++ standard is perfectly OK with the compiler not telling you.

Also, compiler writers don't "decide on the behavior". They decide on a series of optimization steps which are assumed to be correct transformations – assuming that there are no UB conditions.

How the heck should they know that introducing or expanding one such step affects your UB code? it's explicitly out of scope.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 8:34 UTC (Mon) by anton (subscriber, #25547) [Link] (5 responses)

Oh, how nice, you now have platform-dependent code, thus zero guarantees that your "x86-64" code will run on tomorrow's ARM64 or MIPS platforms.

Correct. In that respect this discipline does not improve on the current situation. But the improvement is that existing, tested programs still work on the next version of the compiler, unlike in the current situation.

While portability is a worthy goal, and Java shows that it can be achieved, the friendly C story shows that it is one bridge too far for C.

Compiler writers certainly decide on the behaviour, including for the UB cases. E.g., on the Alpha the gcc maintainers decided to compile signed + to add, not to addv; both behave the same way for code that does not exercise UB. They also make a decision when they implement an "optimization" that, e.g., "optimizes" x-1>=x to false. The problem with such "optimizations" is that 1) this behaviour is not aligned with the behaviour of x-1 in the usual case and 2) in cases that should be equivalent (e.g., when the x-1 happens in a different compilation unit than the comparison) the behaviour is different.

How should the compiler writers know? They generally don't, therefore in the usual case they should preserve the behaviour on optimizations. For exceptions read section 5 of this paper.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 12:12 UTC (Mon) by smurf (subscriber, #17840) [Link] (4 responses)

> the improvement is that existing, tested programs still work on the next version of the compiler, unlike in the current situation.

Random changes in the optimizer can still break your code.

"Random" in the sense that they seem to be unrelated to the code in question. Modern compilers contain a lot of optimizer passes and rules/patterns they apply to your code. Given the mountain of UBs in C++ it's not reasonable to demand that changes in these rules and patterns never affect the result when any of them is violated.

It's not reasonable to demand that the next version of a compiler should not come with any improvements to its optimizer.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 9:03 UTC (Wed) by anton (subscriber, #25547) [Link] (3 responses)

A correct optimization must not change the behaviour, not that of the transformed code, and not that of seemingly unrelated code. Even UB fans accept this, they only modify the definition of behaviour to exclude UB. So anyone working on an optimizer has to reason about how an optimization affects behaviour.

And the fact that C (and maybe also C++, but I don't follow that) compiler writers offer fine-grained control over the behaviour they exclude, with flags like -fwrapv, -fwrapv-pointer, -fno-delete-null-pointer-checks, -fno-strict-aliasing etc. shows that they are very confident that they can control which kind of behaviour is changed by which optimization.

It's not reasonable to demand that the next version of a compiler should not come with any improvements to its optimizer.

That's a straw man argument.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 9:34 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

> anyone working on an optimizer has to reason about how an optimization affects behaviour

… unless that behaviour is related to UB, in which case most bets are off.

Sure you can control some optimizer aspects with some flags, but (a) that's too coarse-grained (b) there's heaps of UB conditions that are not related to the optimizer and thus cannot be governed by any flags (c) replacing some random code (random in the sense of "if it's UB the compiler can do anything it damn well pleases") with some slightly less random code doesn't fix the underlying problems (d) there's plenty of UB that isn't related to the optimizer.

Contrast all of this that with Rust's definition of soundness, which basically states that if you don't use the "unsafe" keyword you cannot cause any UB behavior, period end of discussion.

C/C++ is lightyears away from that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 10:53 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

I think you're being a bit harsh on those flags - they define behaviours that are UB in the standard, and require the compiler to act as-if those behaviours are fully defined in the documented manner.

The problem with the flags is that people aren't willing to take any hit, no matter how minor, in order to have those flags on everywhere, preferring to stick to standard C, not that there exist flags that reduce the amount of UB you can run across.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 8, 2023 11:49 UTC (Wed) by khim (subscriber, #9252) [Link]

The problem are not the people who specify or not specify these flags, but people who don't even think about these flags and UBs related to these flags.

They apply “common sense” and refuse to divide problems in parts. They mash everything (millions of lines of code) into one huge pile and then try to reason about that. Here's perfect example: the gcc maintainers decided to recognize this idiom in order to pessimize it.

How that crazy conclusion was reached? Easy: ignore the CPU model that gcc uses, ignore the process that GCC optimizer uses, imagine something entirely unrelated to what happens in reality, then complain about object of your imagination that it doesn't work as you expect.

It's not possible to achieve safety if you do that! If you refuse to accept reality and complain about something that doesn't exist then it's not possible to do anything to satisfy you. It's as simple as that.

P.S. It's like a TV repairer who refuses to ever look on what's happening inside and just tries to “fix” things by punching TV from different directions. Decades ago when TV included half-dozen tubes this even worked and some such guys even knew how to “fix” things by punching them lightly and harshly. But after TVs have become more complicated they stopped being able to do their magic. And modern TVs don't react to punches at all. Similar thing happened to compilers. Same result: A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it. And the best way to achieve that is to use some other language: new generation ,may learn Ada or Rust just as easily as they may learn C++ (easier, arguably) and there are no need for opponents to physically die off, if they would just stop producing new code the end result would be approximately the same.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 7:14 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (13 responses)

> 2. Translate it as-is, and whatever happens on this CPU, happens

If it is translated as-is, it means that an optimizer is not allowed to touch it. An "as-is" rule is far more restrictive on code transformations than an "as-if" rule. Of course, this also assumes that targets even have operations for the relevant abstract operation (e.g., consider floating point-lacking hardware).

Here's a question: C doesn't have a way to specify "fused multiply and add" at all. Should C offer a library intrinsic to access such instructions? Require inline assembly? If a processor supports `popcount`, what do you want me to do to my source to access it besides something like `x && x & (x - 1) == 0` becoming `popcount x == 1`. After all, I wrote those bitwise operations, I'd expect to see them in the assembly in your world, no?

> In that case the language should declare a single correct interpretation and emulate it as necessary on every CPU.

Sure, except that we have noises about *bounds* checking being too intrusive and expensive. What makes you think that every `(var >> nbitshift)` expression being guarded with some check/sanitizing code would be acceptable?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 9:08 UTC (Tue) by anton (subscriber, #25547) [Link]

C doesn't have a way to specify "fused multiply and add" at all. Should C offer a library intrinsic to access such instructions?

Looking at the output of man fma, it reports that the functions fma(), fmaf(), and fmal() are conforming to C99.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 10:52 UTC (Tue) by farnz (subscriber, #17727) [Link] (11 responses)

Forget bounds checking being too expensive; we have noises about -fwrapv being too expensive, and "all" that does is say that signed integer overflow/underflow is defined in terms of wrapping a 2s complement representation. If your code is already safe against UB, then this flag is a no-op; it can only cause performance issues if you could have signed integer overflow causing issues in your code.

If you can't get agreement on something that trivial, where the performance cost (while real, as shown by SPECint 2006) can be deal with by relatively simple refactoring to make things that should be unsigned into actual unsigned types instead of using int for everything, what hope is there for fixing other forms of UB?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 17:35 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (10 responses)

> we have noises about -fwrapv being too expensive

-fwrapv gains/losses are trivial to measure in theory:
Gentoo allows full distro recompile with seemingly arbitrary compiler flags.
I was using "-march=native" more or less since the moment it was introduced.

> If you can't get agreement on something that trivial, where the performance cost (while real, as shown by SPECint 2006) can be deal with by relatively simple refactoring to make things that should be unsigned into actual unsigned types instead of using int for everything, what hope is there for fixing other forms of UB?

Making types unsigned is not simple, the opportunities for introducing new bugs are limitless.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 17:43 UTC (Tue) by farnz (subscriber, #17727) [Link] (9 responses)

-fwrapv slows some sub-tests in SPECint 2006 by around 5% to 10% as compared to -fno-wrapv. This is unacceptably large, even though in the cases where it regresses, someone's already done the analysis to confirm that it regresses because it used int for array indexing instead of size_t.

And note that, by the nature of -fwrapv, every case where it regresses performance is one where the source code is already buggy, because it depends on UB being interpreted in a way that suits the programmer's intent, and not in a different (but still legal) way. It cannot change anything where the program's behaviour was fully defined without -fwrapv, since all -fwrapv actually does is say "these cases, which used to be Undefined Behaviour, now have the following defined semantics". But that was already a legal way to interpret the code before the flag changed the semantics of the language, since UB is defined as "if you execute something that contains UB, then the entire meaning of the program is undefined and the compiler can attribute any meaning it likes to the source code".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 18:34 UTC (Tue) by smurf (subscriber, #17840) [Link] (2 responses)

> This is unacceptably large

… if you value "no performance regression even on UB-buggy code" higher than "sane(r) semantics and less UB".

As long as people who think along these lines are in charge of the C/C++ standards, Stroustrup’s plan (or indeed any plan to transform the language(s) into something safe(r)) has no chance whatsoever to get adopted.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 9, 2023 14:43 UTC (Thu) by pizza (subscriber, #46) [Link]

> As long as people who think along these lines are in charge of the C/C++ standards,

That's a little disingenuous; it's not that they're "in charge of C/C++" it's that there's a large contingent of *users* of C/C++ that ARE VERY VERY VOCAL about performance regressions in _existing_ code.

There's a *lot* of existing C/C++ code out in the wild, representing a *lot* of users. And many of those users are pulling the C/C++ standards in mutally-incompatible ways.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 16, 2023 20:14 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

The plan (AFAICT) seems to be along the lines of defining rules that code must obey in order to satisfy profiles. Once the rules are known, language and library constructs can be analyzed to see if they adhere to those rules and called out by the compiler from there. While it will take time for code to be under a set of profiles that gets Rust-like safety, I think it is probably the most reasonable plan given the various constraints involved. But with this, one can start putting code behind "we checked for property X and want the compiler to enforce it from here on out" until you can start to say it project-wide and then start flipping the switch to say "we know we cannot adhere to property X for this function" getting something like Rust's `unsafe` "callout" that something special is going on here.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 9, 2023 23:15 UTC (Thu) by foom (subscriber, #14868) [Link] (5 responses)

> And note that, by the nature of -fwrapv, every case where it regresses performance is one where the source code is already buggy

Nope.

The flag only affects the _results_ if the program previously exhibited UB, but, it removes flexibility from the optimizer by requiring the result be wrapped. This may require additional conditions or less efficient code. If the more-optimal version didn't produce the correct result when the value wrapped, it cannot be used any more.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 10, 2023 13:55 UTC (Fri) by Wol (subscriber, #4433) [Link] (4 responses)

Which is why my take is that - ON A 2S COMPLEMENT PROCESSOR - -fwrapv should be defined as the default. So by default, you get the expected behaviour.

Then they can compile SPECInt with a flag that switches off fwrapv to give the old behaviour and say "you want speed? Here you are! But it's safe by default".

So UB has now become hardware- or implementation-defined behaviour but the old behaviour is still available if you want it.

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 10, 2023 14:00 UTC (Fri) by farnz (subscriber, #17727) [Link]

Now go and convince the Clang, GCC, Fedora or Debian maintainers that this should be the default state. That's the hard part - getting anyone whose decisions will influence the C standards body to declare that they want less UB, even at the expense of a few % of speed on some benchmarks.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 13, 2023 13:13 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

Is there anything that's been produced in the last 10 years that is /not/ twos-complement? (If "nothing" - in the last 20?)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 13, 2023 17:14 UTC (Mon) by kreijack (guest, #43513) [Link]

> Is there anything that's been produced in the last 10 years that is /not/ twos-complement? (If "nothing" - in the last 20?)

My understanding is that the ISO c++20 standard already mandates the two's complement:

If you look at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm, you can find more information; even an analysis about which processor is/was not "two's complement".

And also it seems that C23 also is following the same path.

Anyway I think that the right question is "which architecture" supported by GCC (or CLANG...) is/isn't two's complement.

https://en.wikipedia.org/wiki/C23_(C_standard_revision)#cite_note-N2412-62
https://en.wikipedia.org/wiki/C%2B%2B20#cite_note-32
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/...

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 14, 2023 14:30 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

For "basic" operations, probably very few. What about vectorized operations? Are they consistently twos-complement? How about GPU and other specialized hardware?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:16 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> The only way to achieve any semblance of “sanity” is to discuss these things, write them into a language spec and then ignore “what the hardware is doing” from that point on.

OR you say "all those options are possible, there are flags to specify which one you expect, and if you don't specify you are explicitly choosing whatever the hardware happens to give you".

Personally, I think I'd expect option 3 - right-shifting 2 33 times looks like 0 to me. If I have a flag to say "that is what I expect", then the compiler can either do what I asked for, or tell me "you're on the wrong hardware". Or take ten times as long to do it. All those options comply with "least surprise". Maybe not quite the last one, but tough!

If I don't tell it what I want, and I'm on a partially vectorised chip, more fool me!

But the compiler is not free to reduce my program to the null statement because I didn't realise I was in the wrong reality!

Cheers,
Wol

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:02 UTC (Fri) by khim (subscriber, #9252) [Link]

> OR you say "all those options are possible, there are flags to specify which one you expect, and if you don't specify you are explicitly choosing whatever the hardware happens to give you".

Nope, that doesn't work. That's precisely what C had before C90 and it was a nightmare. Writing any non-trivial code which would work on more than one compiler (and often on just one version of one compiler) wasn't possible.

The problem here is that people don't want any rules.

What they actually want is that “simple” property: if my program stops working then I should be able to incrementally change it to find out where.

And they refuse to accept that this “simple” property is not possible to guarantee.

It's no wonder that developers of lowest part of the stack accepted Rust first: they know that hardware today is in state where it's not actually possible to support that properly… and if hardware couldn't provide it then compiler couldn't do that, either.

But there are still old-school developers who still remember times when that was actually possible. They don't want to change anything, they want to return these “good old times”… and they refuse to think about what made it possible to debug things reliably in these “good old times”.

And no, that wasn't because compilers were worse back then or because language was simple. It was possible to debug things simply because they were so simple: number of transistors in CPUs were measured in thousands, memory was measured in kilobytes… it was just simply impractical to implement sophisticated algorithms which can generate unpredictable results if applied few times.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 7:07 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> Personally, I think I'd expect option 3 - right-shifting 2 33 times looks like 0 to me. If I have a flag to say "that is what I expect", then the compiler can either do what I asked for, or tell me "you're on the wrong hardware". Or take ten times as long to do it. All those options comply with "least surprise". Maybe not quite the last one, but tough!

Note that optimization passes tend not to be aware of the literal input source or, necessarily, the target. Without that knowledge, it would mean that any optimization around a shift with a variable on the right is impossible to do because it could be doing What Was Intended™ and assuming any given behavior may interfere with that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 14:10 UTC (Fri) by farnz (subscriber, #17727) [Link] (14 responses)

Yes, but in a language that cared about safety, given that -fwrapv is a clear win for safety, a wash in performance on most benchmarks, and a slight but easily fixed loss (under 10%) in some benchmarks, the language community would decide that if you run with -fno-wrapv, you're doing something you know is dangerous, and therefore we don't need to care about coding defensively for that case - you've explicitly asked for pain.

Instead, because there's a performance regression for a set of users who could refactor to recover the full performance if they cared, the language remains stuck with a situation where you have to be careful to not overflow your signed integers, just in case.

And that's the problem tradeoff - you can insist on -fwrapv, and accept that people who are careless lose up to 10% performance as compared to people who take care to use the right types, but that there's no UB, or you can say that -fno-wrapv is the default, and insist that people make sure that there's no UB manually, even if that makes their implementation bigger and slower than an implementation that relies on the changes in the -fwrapv dialect of C++. If the standards bodies and the compiler authors said "screw the 10% performance cost on badly written benchmark code like the examples in SPECint 2006, we'd prefer to define more behaviour", then I'd think that there's a chance of the C++ community choosing to get rid of UB. But with a well-defined change to remove a commonly misunderstood chunk of UB, which regresses performance slightly in a way that's easy to fix if you're affected, the community chooses to keep UB (no diagnostic required) instead of removing the UB, or at least requiring a diagnostic - and that makes me sceptical of any attempt to make the language contain less UB.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 15:51 UTC (Fri) by vadim (subscriber, #35271) [Link] (12 responses)

Oh, that's not quite what I mean.

What I mean that declaring a thing UB is in some cases a net loss. Eg, take testing for underflow.

If we want to test whether signed "x" is as small as it can get, we could use "if (x-1>x)". But since that's UB, what we have to write instead is "if(x==LONG_MIN)". And that actually turns out to take more bytes of code, because LONG_MIN is a 64 bit constant.

So it's not entirely true that UB's existence is a positive from the point of view of optimization. Sometimes having to tiptoe around it means you have to write worse code.

And even if -fwrapv exists, maybe you're writing this in a header in a library and therefore can't expect that flag to be used, and therefore must use the more roundabout implementation.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:18 UTC (Fri) by khim (subscriber, #9252) [Link] (6 responses)

> But since that's UB, what we have to write instead is "if(x==LONG_MIN)"

No. You can write something like if ((long)((unsigned long)x-1UL)>x). This code doesn't have UB and doesn't need LONG_MAX constant.

> What I mean that declaring a thing UB is in some cases a net loss.

Sure, but that's discussion for the time where one defined rules of the game. For example some people sat that defining left and right shifts as they are defined in Rust (runtime error in debug mode, wrapping on release mode) was a mistake.

But as long as these are the rules of the game everyone have to follow.

> So it's not entirely true that UB's existence is a positive from the point of view of optimization.

Without UB optimizations are simply not possible. Proof: you still haven't told us what gives someone the right to remove useless code from that foo function here except for that rule about correct program (they shouldn't contain UB).

> Sometimes having to tiptoe around it means you have to write worse code.

Writing code with UB is like crossing the street on red: you may save 1 minute 100 times, but sooner or later you'll be hit by semi and spend a lot of time in hospital (if you would survive at all).

The end result is net loss. Don't do that. > And even if -fwrapv exists, maybe you're writing this in a header in a library and therefore can't expect that flag to be used, and therefore must use the more roundabout implementation.

It's not too much roundabout since there are no overflow for unsigned numbers. But, ultimately, it's not important: while I agree that list of UBs in C/C++ is, frankly, insane, that's not the main problem. The main problem are people which refuse to play by rules.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 17:49 UTC (Sun) by anton (subscriber, #25547) [Link] (5 responses)

You can write something like if ((long)((unsigned long)x-1UL)>x). This code doesn't have UB and doesn't need LONG_MAX constant.

I have tried something like this, and some version of gcc (IIRC in the 3.x days, but I have not tried it later) "optimized" it to false. So I have switched to comparing with LONG_MIN, and last I looked (gcc-10.3) gcc did not optimize it into small code (even though that would be a proper optimization, not an "optimization" that assumes that the program does not exercise undefined behaviour). So in this case a UB "optimization" lead to inefficiency; maybe it was a bug in the compiler, but that does not help, I still have to work around it; and the source of the bug (if it is one) is the idea that you can "optimize" by assuming that UB is never exercised.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 18:28 UTC (Sun) by khim (subscriber, #9252) [Link] (4 responses)

> last I looked (gcc-10.3) gcc did not optimize it into small code (even though that would be a proper optimization, not an "optimization" that assumes that the program does not exercise undefined behaviour)

That's very strange. For me it does optimize that to small code — if you request small code.

Otherwise it produces faster code, why shouldn't it do that?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 9:06 UTC (Mon) by anton (subscriber, #25547) [Link] (3 responses)

As is usual for advocates of undefined behaviour, you make claims about "faster code" without presenting any evidence. This is especially unbelievable in the present case. The two code sequences are:

movabs $0x8000000000000000,%rcx
cmp    %rcx,%rdx
je     ...

(15 bytes) and

cmp    $0x1,%rdx
jo     ...

(6 bytes) or (if rdx is dead afterwards)

dec    %rdx
jo     ...

(5 bytes). What makes you think that the longer sequence is faster?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 10:44 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

If I run the code through llvm-mca, which uses machine models to determine the cycle time of a given sequence of code, I see that the longer sequence generated by GCC is expected to take 8 cycles when running out of L1$, while the shorter sequence is expected to take 10 cycles when running out of L1$. The longer sequence executes 3 µops, while the shorter one executes 4 µops.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 18:49 UTC (Mon) by anton (subscriber, #25547) [Link] (1 responses)

Interesting tool. What it gives me for "100 iterations" (default) of the three sequences above is the following number of cycles:

  1   2    3
 79  54  104 llvm-mca-11 --mcpu=tigerlake
 80  54  104 llvm-mca-11 --mcpu=znver2 seq3.s
154 104  104 llvm-mca-11 --mcpu=tremont
154 104  104 llvm-mca-11 --mcpu=silvermont

Silvermont is the slowest core I could get (I asked for bonnell, but llvm-mca rejected my request), tigerlake and znver2 (presumably Zen2) and tremont are the most recent ones available. Judging from the output, it gives me Skylake-X data for tigerlake, and Silvermont data for Tremont.

Anyway, with all these machine models sequence 2 above is faster than sequence 1. Sequence 3 is slower than sequence 2 on some machine models.

I'll have to measure on real hardware to see how well these models reflect reality.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 23:09 UTC (Mon) by anton (subscriber, #25547) [Link]

And here's the data on real hardware. I also include sequence 4 that is a more literal translation of the C code (gcc-10 -O -fwrapv produces this sequence):

	leaq	-1(%rdx), %rax
	cmpq	%rdx, %rax
	jg	...

I put 100 copies of these sequences in a row, but they check 4 different registers rather than just one (to avoid making the data dependence on the result of the dec instruction a bottleneck for this microbenchmark). The results are in cycles per sequence.

 1    2    3    4
1.02 0.54 0.54 0.55 Rocket Lake
0.54 0.54 0.52 0.53 Zen3
1.19 1.03 1.03 1.03 Tremont
2.05 1.06 2.05 1.55 Silvermont
2.42 1.43 1.27 3.09 Bonnell

So, concerning our original question, sequence 2 is at least as fast as sequence 1, and actually faster on 4 out of 5 microarchitecture, sometimes by a factor of almost 2. Even sequence 4, which is what gcc10 produces is faster on 4 out of 5 microarchitectures and is slower only on Bonnell, which has been supplanted by Silvermont in 2013. Sequence 3 is also at least as fast as sequence 1, and faster on 4 out of 5 microarchitectures. So the gcc maintainers decided to recognize this idiom in order to pessimize it. Strange.

Code can be found here.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 17:01 UTC (Fri) by farnz (subscriber, #17727) [Link]

The problem is that it's a net win as long as nobody needs to test the edges of UB. If you're at the edges, then you're at risk of finding out just where those edges are and losing out. For the specific case you've mentioned, though, you can get back to the optimal code in GCC and Clang with the __builtin_sub_overflow extension, which gives you overflow/underflow detection on arbitrary integral types.

As to the rest of it, that's precisely my point; because there's a couple of components of SPECint 2006 that regress with -fwrapv, you have to pay the pain of not being able to expect that people will use that flag or keep all the pieces. And that's a cultural thing - if C++ had a safety culture, you'd be able to expect that people would use -fwrapv, and ignore the people who don't do that.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 4, 2023 7:37 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (3 responses)

> If we want to test whether signed "x" is as small as it can get, we could use "if (x-1>x)".
> But since that's UB, what we have to write instead is "if(x==LONG_MIN)".
> And that actually turns out to take more bytes of code, because LONG_MIN is a 64 bit constant.

From maintainability point of view it should be always x == LONG_MIN.

x-1>x requires fresh register which may be used somewhere, it execution shortest path by 1 instruction,
which cannot be processed in parallel with some other instructions.

Finally, gcc will optimise (x - 1 < x) with -fwrapv to x == LONG_MIN.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 17:59 UTC (Sun) by anton (subscriber, #25547) [Link] (2 responses)

x-1>x requires fresh register which may be used somewhere

Just as LONG_MIN.

Finally, gcc will optimise (x - 1 > x) with -fwrapv to x == LONG_MIN.

That's a pessimisation then. It did not pessimize it in this way when I last tried it.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 18:19 UTC (Sun) by adobriyan (subscriber, #30858) [Link] (1 responses)

> > Finally, gcc will optimise (x - 1 > x) with -fwrapv to x == LONG_MIN.
> That's a pessimisation then. It did not pessimize it in this way when I last tried it.

It is still micro-optimisation because "mov r64, LONG_MIN" doesn't have to wait for x, so cmp can be executed asap,
but "lea r64, [r64 - 1]" or equivalent must wait for x.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 18:38 UTC (Sun) by anton (subscriber, #25547) [Link]

In my usage the result of the comparison is used by a highly predictable branch, so the supposed advantage vanishes on an OoO CPU (and it's unclear whether it exists on an in-order CPU). The size disadvantage, however, stays. And if gcc really recognizes this idiom, gcc's maintainers are extra-clueless if (on AMD64) they compile it into

mov r9, LONG_MIN
cmp r9, r10
je ...

rather than

dec r10
jo ...

which is much shorter and only one macro-instruction.

I won't go as far as suspecting wilful pessimization by the gcc maintainers ("you turned off our glorious -fno-wrapv optimizations, we will teach you by pessimising your code into the same code you would have gotten with -fno-wrapv").

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:01 UTC (Fri) by khim (subscriber, #9252) [Link]

> the language community would decide that if you run with -fno-wrapv, you're doing something you know is dangerous, and therefore we don't need to care about coding defensively for that case - you've explicitly asked for pain.

For that to happen there need to be a language community and you have to kick out guys who are talking about “portable assembler” thingie.

It's like sports games: people may argue about whether it's Ok to shot one free throw or two free throws if someone tries to run with a ball, but if they wouldn't kick out guys who insist that they are capable of running with a ball and thus it should be permitted then playing game is just impossible.

Because that is, ultimately, the @vadim and @anton position: “who told you one can not run with a ball? I tried it and it works just fine”.

If you couldn't kick such guys from the game and are not allowed to refuse to play with them then it's pointless to talk about the rules.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:55 UTC (Wed) by mb (subscriber, #50428) [Link] (11 responses)

>So literally any alternative is valid, including "whatever the hardware does".

Which means that you would still have UB. "Whatever the hardware does" includes UB, because it's the real world. In a sufficiently complex system, it's impossible to define everything.

>What I want is a commitment towards minimizing it.

I agree. I'm all for it. It's a good thing to define stuff where it makes sense and where it is possible.
It's a good thing to avoid having UB in places where having UB brings no benefit.

But that's very different from what you said in the previous paragraph. Removing *all* UB is just impossible.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:20 UTC (Wed) by vadim (subscriber, #35271) [Link] (10 responses)

> But that's very different from what you said in the previous paragraph. Removing *all* UB is just impossible.

I said "I would define all of the UB in this code". Not in all of C++.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:02 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

Even with all of the UB defined, you can run into problems. The issue the code has is that the compiler is within its rights to do a value analysis on Do, and determine that the only possible values are nullptr, or EraseAll. Having done this, it can remove an indirect branch by changing "return Do();" to "if (Do == nullptr) { return __builtin_call_nullptr(); } else { return EraseAll(); }". Having done that, it can inline EraseAll, since there's only one callsite now. Having done that, it can do dead code elimination, and hey presto! you have a problem if we define __builtin_call_nullptr() as __builtin_unreachable().

And that's a lot of where the "surprises" come from - there's a huge chain of optimizations that are individually desireable, such as compile-time constant folding, dead code elimination, constant propagation etc, but whose combined result is surprising.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:13 UTC (Wed) by vadim (subscriber, #35271) [Link] (8 responses)

> you have a problem if we define __builtin_call_nullptr() as __builtin_unreachable().

Obviously, I'd disagree with this step. Calling a nullptr is now a valid option, therefore not an unreachable state, especially since we have a pointer that's been initialized to nullptr.

Therefore this program ends up crashing either with a segfault from a jump to 0x0, or aborts with a compiler-provided "You called a nullptr, dummy" error message.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:48 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> Obviously, I'd disagree with this step.

You may disagree with that step, but why are you sure that others would disagree with it, too?

When John Regehr to collect list of UBs which shouldn't be UBs the only result that he was able to achieve was to find out that literally no one agrees on anything.

Everyone have their pet peeves where compiler treats as UB something that they think shouldn't be UB and yet almost everyone have ideas about which optimizations are indispensable — and these optimizations require certain things to be UBs!

Some, like Wol here, are even advocating replacing one language with 200 UBs with 2²⁰⁰ languages with random assortiment of UBs (that's what “I just need this to become a user-defined flag” means, right?) and even say this would, somehow, make things more secure… and no, they are not joking!

The fact that they believe anything reliable may be achieved in this madness is the real C/C++ disease. Bazillion of UBs and unpredictable outcome of using the compiler are symptoms, the real disease is that desire to ear the cake and have it, too.

Rust never had it, it was concentrating on the idea that there are some rules that are supposed to be followed and there are process which makes it possible to change it, but as long as rules are there they shouldn't be ignored. Can you imagine something like this part: “some operation” is always Undefined Behavior. No you can't do it. No you're not special. to be enthusiastically embraced by C/C++ community without significant number of participants to declare that, yes, they are special, and yes, they would ignore rules, and yet it's compiler fault to produce incorrect program?

Rust tries to reduce number of UBs as much as possible, but that's, actually, the second step. The first one is firm assertion that community takes the full responsibility for active enforcement of the few remaining ones.

And, sadly, I don't foresee C/C++ community to accept that responsibility. They are all “too special” to accept it.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:16 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

volatile is another thing in C/C++ that is not well defined, but everybody has a strong and different opinion on what it means.
Everybody is sure that they understand what volatile does.
And everybody has a different "understanding" what volatile does.
Worse even everybody has different expectations what volatile shall provide.

In reality programs with volatile in them just somehow work, because compilers are mostly conservative when optimizing code with volatile in it.
But that doesn't mean it doesn't break. Even the slightest compiler change will break somebody's assumptions.
It's impossible to write a correct program with volatile in it, because it's not fully defined what a correct program with volatile would do.

These things basically don't happen in Rust and in the Rust community.
And if somebody finds out that some definition is unsound, the definition is fixed.
It has been agreed upon that either things are well defined or are unsafe. That agreement is essential for having a safe language.
That is what is missing in the C/C++ community.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:42 UTC (Wed) by khim (subscriber, #9252) [Link]

> These things basically don't happen in Rust and in the Rust community.

It does happen. And pretty regularly. But not just the outcome but, more importantly, the methodology is different.

Here is one example (precisely where would you expect it: in the embedded system when someone needs “something strange”): from URLO to miri and then to rust-lang.

See the difference? It's not technical, it's social. Developer is not proclaiming that what s/he wrote is correct, but tries to find out how to resolve the issue while staying within the written rules. When that turns out to be impossible problem is escalated and then we both have a temporary solution and question for language team which may, eventually, lead to long-term solution.

No one have “strong opinions about how s/he's right and everyone around them are idiots”, no one demands O_PONIES, everyone agrees that they are dealing with a tricky case and need to treat it as such.

There are no perfect outcome and no unreasonable demands but people are working together toward the resolution of the issue.

In C/C++ this would have ended up with ten different opinions about how that corner case should be handed with all ten camps insisting that their interpretation of the language is “correct” and everyone else is “wrong”.

> It has been agreed upon that either things are well defined or are unsafe. That agreement is essential for having a safe language.
That is what is missing in the C/C++ community.

I would say that it's even worse. C/C++ community not just have bad rules of the language, there are just too many who refuse to be bound by rules. They don't accept the fact that rules are the result of agreement and they have to be followed, but insist that that there are “correct” rules and “wrong” rules and, worse, that they can assert which rules are “correct” and which rules are “wrong” without talking to others.

And underneath all that is their belief that they may freely lie to the compiler about what happens in their program and yet it's responsibility of the compiler to produce predictable outcome even if they are lying.

It's [relatively easy] to explain how we ended up in such a crazy place (dozens of incompatible implementations and lack of the easy ability to talk to the compiler developers in the 1980th-1990th, when said community was formed), but it's not clear how can that be fixed. If that can be fixed at all.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:00 UTC (Wed) by mb (subscriber, #50428) [Link] (4 responses)

>Therefore this program ends up crashing either with a segfault from a jump to 0x0

Pointers are not only addresses.

Pointers have more traits than just its addresses, in the language's machine model. Pointers have provenance. Therefore, even if you define the address of a null pointer to be 0 and define a deref of it as emitting a memory access at address zero, you would still have UB in the language's machine model.
What about dangling pointers?

In what state should the machine model be it after accessing some "object" that is not part of the model? How could the model proceed from there in any meaningful defined way?

The language's machine model is undefined after executing such code.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:13 UTC (Wed) by khim (subscriber, #9252) [Link]

The majority of people who complain about how compiler's treatment of UB destroys their programs don't want to even think about “language models” or about anything related to how compilers actually work. That's the issue.

What they want is magic. Which would do what they want.

The really funny thing is that, since any sufficiently advanced technology is indistinguishable from magic they do get their magic.

Only then they start complaining that this magic works unreliably.

But that's just the nature of magic!

If you want magic then be prepared to receive something which works when “stars are aligned just right” and fails on Friday 13th.

If you want technology then be prepared to understand how it works.

But lots of C/C++ developers want both: magic that they don't want to understand and reliability which may only be provided by technology.

That is the fundamental social problem which can not be fixed by technical means.

You have to either pick magic (and be prepared to deal with random fallouts when it would refuse to work “for no good reason”) or you have to pick technology (and be prepared to understand how that technology that you are using works, at least in some general form).

You can not have both at the same time.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:17 UTC (Wed) by vadim (subscriber, #35271) [Link] (2 responses)

> Pointers have more traits than just its addresses, in the language's machine model. Pointers have provenance. Therefore, even if you define the address of a null pointer to be 0 and define a deref of it as emitting a memory access at address zero, you would still have UB in the language's machine model.

Explain? What's undefined in this case?

> What about dangling pointers?

Program accesses memory at the address of the dangling pointer, and whatever happens, happens.

> In what state should the machine model be it after accessing some "object" that is not part of the model?

I'm not expecting C++ to turn into Rust, but to basically be a "portable assembler". It does what the programmer wrote. In some cases that's going to be stupid, but my preference for stupid is "whatever the underlying hardware does when you ask for the stupid thing".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:31 UTC (Wed) by mb (subscriber, #50428) [Link]

>Explain? What's undefined in this case?

The language's machine model state.

>Program accesses memory at the address of the dangling pointer, and whatever happens, happens.

Ok. That means the machine model enters an undefined state. You get UB.

>I'm not expecting C++ to turn into Rust, but to basically be a "portable assembler".

Yeah right. *That* is exactly the problem. A language with an abstract machine model and with an optimizing compiler cannot be a portable assembler. That is impossible. Yet, it's a very common assumption among C/C++ developers.

>It does what the programmer wrote.

No, it can't ever do what the programmer wrote, because the programmer did not write "read from memory address x". The programmer wrote *x. Which is a superset of "read from memory address x".

>but my preference for stupid is "whatever the underlying hardware does when you ask for the stupid thing".

So what shall be the state of the machine model, if you deref-write through a dangling pointer?
Does the model have to assume that you just wrote to some existing object somewhere in memory? Because that is what can (and often does) happen on the underlying hardware.
What does that mean for optimization? Your dangling pointer is aliasing an object. You marked your other pointer somewhere else in the program with 'restrict', because you are sure it's not aliased? Too bad! The compiler has to ignore it, because some dangling pointer may still alias it. It can't do the optimization. That is just one example. In practice it would probably suppress all possible optimizations.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:48 UTC (Wed) by khim (subscriber, #9252) [Link]

> I'm not expecting C++ to turn into Rust, but to basically be a "portable assembler".

Ok. Acceptable.

> It does what the programmer wrote.

That's what clang and gcc are doing with -O0. MSVC does that with /Od. Easy.

> In some cases that's going to be stupid, but my preference for stupid is "whatever the underlying hardware does when you ask for the stupid thing".

Yes. That's what most compilers may give you. It's even default, if you don't explicitly specify -O2 or /O2 option. Yet you are not satisfied. Try to explain what you want, again, please.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 13:54 UTC (Wed) by jem (subscriber, #24231) [Link]

Note that there is not a single uninitialised variable in the sample code people are referring to in this thread. The Do function pointer is implicitly initialised to nullptr. The problem with this code is more akin to the "Billion Dollar Mistake" by Tony Hoare.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:35 UTC (Wed) by farnz (subscriber, #17727) [Link]

Converting your existing code to Rust in one go is definitely a tall order; but using something like cxx.rs to allow a gradual conversion to Rust is entirely practical. It makes it a lot like the codebases I've worked on that mix FORTRAN 77 with C++03 and Python - there's the legacy FORTRAN 77, which isn't worth rewriting yet, because the effort is high for the reward, there's the C++03 for things that are performance-critical and not in FORTRAN any more, and there's Python for the bits that aren't performance critical.

I have, however, worked with C++ static analysis tools that can warn on many constructs that cannot be proven to have UB at compile time; these certainly used to exist (but weren't free - the one I used was $50 per developer per month). The trouble with them is that when you take a codebase that has never been attacked by serious static analysis before, you get a huge amount of issues to look into, some of which are (by the nature of Rice's Theorem) false positives; I saw something on the order of one static analysis warning per line of code when I applied it to a pre-existing 100K line C++ project. Obviously, some lines were clean, but others had multiple warnings that applied, all of which would have to be examined, and fixed, if you want a tool-checked UB-free codebase.

And that static analysis experience is why I'm sceptical of attempts to make C++ "safe"; competent C++ developers write code that cannot easily and mechanically be proven safe all the time, and mostly get the safety preconditions correct. It's a big deal when they don't, and it happens often enough to be a problem, but nobody is willing to pay the cost of rewriting ~400K lines of C++ into Safe C++ (by adding static analysis annotations) today, and I don't see that changing just because the annotations become an optional part of the language standard.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 9:27 UTC (Wed) by kreijack (guest, #43513) [Link] (1 responses)

> It would help by turning them into more manageable problems. So take this:

> https://gcc.godbolt.org/z/fG77qTKrG

> Removing UB is one of two things to me:

> 1. "static Function Do" is implicitly set to nullptr, and attempts to jump to address 0x0. This causes a very clear, understandable, safe crash. Easy to debug.
> 2. "static Function Do" is invalid syntax. You must explicitly initialize it to something. Which may well be nullptr.

The standard *already* requires that "Do" as global variable *has to be set to zero* [*]. So if an UB exists in your example, it is when Do() is called because calling a nullptr is an undefined behavior.

I think that your example proof two things:
- the UB are used (or better exploited) intensively by the compiler developers in a very not intuitive way (more below).
- most of the commenters (me too, because it took me more than 1 day to remember this rule) don't know/remember very well a basic C language rule.

If you are surprised of CLANG behavior, I suggest to read these posts:
- http://kristerw.blogspot.com/2017/09/why-undefined-behavi...
- https://kristerw.blogspot.com/2017/09/follow-up-on-why-un...

Basically CLANG tries to transform the
Do();
in

Because we have only one function (func3() in the example), all the chain of 'ifs' are translated to a direct call to func3().

BR

---
[*] https://stackoverflow.com/questions/16015656/are-global-v...

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 12:11 UTC (Wed) by farnz (subscriber, #17727) [Link]

And it's worth noting that the compiler isn't doing a single transform to go from "Do();" to "EraseAll();" - the behaviour we see is a consequence of multiple optimization passes, each of which is significant on some benchmark or another, adding up to surprise emergent behaviour.

For example, you want the compiler to convert indirect calls via a function pointer into direct calls and conditional branches, since that's faster on virtually all CPUs. You want to do lots with compile-time constants so that I can write "if (mem_size < 16 * 1024 * 1024)" and have the compiler notice that x is a compile-time constant, and evaluate the condition at compile time; you want to remove conditionals wherever they're compile-time constant, and just emit the "then" or "else" block. You want to inline code wherever it's called only once, so that I can factor my code into small, easily understood functions, and have the compiler generate as good an output as it would if I had a single giant function. But when you put all of these desirable things together, you get the surprise that the example shows.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 19:58 UTC (Mon) by willy (subscriber, #9762) [Link] (1 responses)

Are you familiar with Herb Sutter's work in this direction?

https://github.com/hsutter/cppfront

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 21:29 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

It simplifies (a lot) the language, but it does not change the fundamental fact that lifetimes of C++ objects are mostly impossible to infer from the source.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 21:29 UTC (Mon) by ikm (guest, #493) [Link] (2 responses)

> These undefined behaviors are not going to go away because too many system and library headers depend on them, directly or indirectly.

Would you mind elaborating on that? How could a library (or any user code) depend on an undefined behavior? Could you give an example of such a dependency? Undefined behaviors give compilers optimization opportunities (with unfortunate results in some cases), but I don't see any reason why those can't be eliminated at some performance cost without the user code ever noticing. The reason undefined behaviors still exist is simply because people don't want to pay that performance price, isn't it?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 6:54 UTC (Tue) by mb (subscriber, #50428) [Link] (1 responses)

>The reason undefined behaviors still exist is simply because people don't want to pay that performance price, isn't it?

Some kinds of UB are unavoidable in practice. Think about data races. To avoid that completely you would have to restrict your program to single threaded. But there's more. Even in single threaded applications, think about pointer dereferences and how you would check for every possible misuse.

The way Rust does it is to build a restricted safe language and shift the UB into the unsafe parts of the language. But the code parts that run unsafe code with potential UB are still there in the program. because it's unavoidable. But the programmer is almost never exposed to it. That is what makes the difference.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:44 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Firstly, no, you aren't restricted to single threaded. Rust's rule would work fine globally, you couldn't build a systems language which requires this rule globally and yet we can write bit-banging micro-controller code and kernel drivers - however you could build a perfectly nice language capable of multi-threading and indeed that's closer to what Graydon Hoare (who invented Rust) originally wanted.

Also, data races don't even necessarily lead to UB. They are specified as UB in C and C++ and (though as stated safe Rust doesn't have any) in Rust, but in Java a data race isn't Undefined Behaviour. Your program continues to be well-defined, but since it loses Sequential Consistency humans struggle to understand what the residual behaviour means. Still, it does have meaning still. If you cared a dedicated team could likely eventually figure out what your program does now.

In OCaml they have further refined this, to the extent that they hope (it's early days) OCaml with data races is not only still well-defined, humans should be able to successfully reason about how the resulting software behaves. This is PL research level work, but it's easily possible that in twenty years "My programming language has Undefined Behaviour when there are data races" marks you as a caveman as much as "My programming language has Undefined Behaviour for integer overflow" does today.

The important cultural difference (and culture is key here, that's what's so hilarious about the reaction from C++ and from WG21 specifically, they don't see what the real problem is and so they're not even addressing it) is that code marked "unsafe" in Rust is culturally expected to satisfy safety constraints, but it's allowed to demand pre-conditions of its callers. That's what SAFETY block comments are in Rust code if you've seen those. They're explaining either why this code is actually fine despite using unsafe keyword, or, how to correctly use this unsafe function so as to deliver safety.

Rust's community shuns people, libraries, codebases which YOLO the way C++ does. Technically nothing prevents you writing much the same dangerous nonsense. Culturally it's prohibited. That's not something WG21 could fix by tweaking the text of their ISO document, Rust has a safety culture, C++ does not, if safety is important (and I would argue it is) that's what actually matters.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 20:25 UTC (Mon) by Tobu (subscriber, #24111) [Link] (2 responses)

The part of the keynote spent on What is “safety”? felt like misdirection. If you don't have type and memory safety, you have no hope of delivering any other. But if you have them, with an expressive type system like C++ certainly has, the rest is a matter library authors are empowered to deal with.

Languages that do provide safety invariants encode them on top of the type system: making HTML/SQL injection proof with different types for templates and variables within them, guaranteeing any kind of construction invariant like ranged integers, passing around capabilities… All these abstractions rely on a sound type system and the absence of UB.

Anyway, if C++ could get past the denial stage, I'd like it to follow something like Rust's plan for unsafe code: provide a machine-checkable model of what unsafe code is allowed to do (stacked borrows, well-defined unsafe code guidelines); and a tool akin to miri: an interpreter able to check (at runtime) that the model's invariants are upheld. Then, some effort to contain bits of important libraries in a way that running the interpreter on them and fuzzing them is tractable. Maybe start with some of the standard library.

At the moment, there are sanitizers, and CHERI, but as I understand it, they don't check a lot, possibly because there isn't a way of speccing what C++ is allowed to do without being unable to compile most C++ code.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 20:49 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> I'd like it to follow something like Rust's plan for unsafe code: provide a machine-checkable model of what unsafe code is allowed to do (stacked borrows, well-defined unsafe code guidelines); and a tool akin to miri: an interpreter able to check (at runtime) that the model's invariants are upheld. Then, some effort to contain bits of important libraries in a way that running the interpreter on them and fuzzing them is tractable. Maybe start with some of the standard library.

IIUC, this is the idea behind decorating `export` and `import` statements with attributes clarifying what level of safety is expected from APIs provided by the named module. I have reservations about it being compilable depending on how these attributes affect the BMI generation, but I hope to discuss that next week.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 11:12 UTC (Tue) by Tobu (subscriber, #24111) [Link]

Having a fine-grained set of profile annotations on modules is just a way to reiterate the “what even is safety” argument; giving it the appearance of something formally defined through new syntax and tooling efforts.

Multiplying these attributes is a distraction. roc downthread points out that efforts towards sound lifetimes have stalled. There is no point worrying about composing safety properties when one can't define soundness for at least a small, self-contained part of the unsafe code in base libraries.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 21:40 UTC (Mon) by lopgok (guest, #43164) [Link] (27 responses)

I have used many safe languages as well as many unsafe languages.
C is portable assembly. It has evolved over the years, but it is quite unsafe.
C++ has evolved quite a lot over the years. It is 'more' safe than C, but still far from safe.
If you look at all of the vulnerabilities over the years, a large majority of them can be attributed to C and C++. Poor or non-existent memory management, use after free, buffer overflows and the like.

But there are other fundamental issues. The 'switch' statement is brain dead and has caused issues.
Divide by 0, overflows, underflows, returning without a value and many other 'features' of C and C++ have contributed to unsafety.

I fear Bjarne has caused more problems than he has solved with respect to language safety.

I remember long ago during an interview to describe a 2 dimensional array in C. I explained that C doesn't have arrays at all, though it has array notation. When I started programming, I used algol 60. It had booleans, as well as multi-dinensional arrays. These are not new ideas. Later I was programming in Mainsail, which has garbage collection, as well as strong typing. These ideas may have been new in the 60's or 80's, but that was a very long time ago...

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 1:03 UTC (Tue) by rolexhamster (guest, #158445) [Link] (26 responses)

If you look at all of the vulnerabilities over the years, a large majority of them can be attributed to C and C++. Poor or non-existent memory management, use after free, buffer overflows and the like.

That's very true for C, but less so for C++.

It is possible to have sane memory management and safety within C++, as long as one sticks to a "sane" subset of the language, avoids touching pointers of any kind, never reads/writes without bounds checks, and uses well-tested libraries to do all the nasty stuff (pointer and memory handling) behind the scenes.

While on the surface the above sounds "doable", there are several problems with this approach.

Firstly, there would need to be a formalization of what is a "sane" subset etc, and I'm not sure people can come to a consensus in a reasonable time frame (if ever) because it would "break too much existing code".

Secondly, the C++ language has gotten so complex with recent revisions (say from C++17 onwards) that any code using new features resembles line noise (not too dissimilar from how code written in Perl can appear). It's can be very difficult to parse by humans, to the point that the number of people that can grok the code significantly drops, leaving codebases at risk of bitrot. Bjarne Stroustrup's so-called plan is well-meaning, but it's making the language even more complex.

Thirdly, the proposal effectively places band-aids on a fundamentally outdated and creaky language design. It's essentially an attempt at (partially) retrofitting Rust-like features into C++, while breaking compatibility with existing codebases.

This brings us to the ultimate point: all of this has already been done, with a clean-sheet design. It's called the Rust language. Rather that perpetuating and trying to "fix" C++, it's time to put C++ into the same league as COBOL: a legacy language that should not be used for any new codebases.

C++ is in large part Bjarne Stroustrup's brainchild, so it's somewhat understandable that he's trying to fix and evolve it as a fatherly figure. However, this is a clear case of Planck's principle, diplomatically stated as:

... scientific change does not occur because individual scientists change their mind, but rather that successive generations of scientists have different views.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 6:59 UTC (Tue) by mb (subscriber, #50428) [Link] (16 responses)

>It is possible to have sane memory management and safety within C++

Is it?
Can that even be possible, without complete lifetime checking?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:09 UTC (Tue) by ianmcc (subscriber, #88379) [Link] (14 responses)

You can't do it with raw pointers, but if you wrap everything in smart pointers and never let raw pointers or references escape then there is no problem.

The problem is that there is no way to enforce that users of your library follow those rules. I guess the analog in Rust is that there is no way to enforce that users of your library don't do anything "unsafe" with objects that are managed by your library.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:30 UTC (Tue) by smurf (subscriber, #17840) [Link] (4 responses)

> The problem is that there is no way to enforce that users of your library follow those rules

There's no way to force *any* user to follow those rules. What are you going to do, forbid taking a pointer to memory? Then what about strings? or two-dimensional arrays?

Care to do global lifetime analysis across the whole program, including all your libraries?

No, sorry. The way to arrive at a safe language is to start out with one, and then encapsulate the unsafe bits and pieces in small discrete bits that can be reasoned about individually. This is what Rust is doing, and it seems to be working.

Starting off with an unsafe language and then playing whack-a-mole is a great way to get paid for wasting a lot of time, but unlikely to succeed. After rewriting your code three times so that it again compiles with the current blend of somewhat-safer-than-yesteryear-C++, you'll have spent more effort than by rewriting it to Rust — and that includes learning Rust.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 14:23 UTC (Tue) by ianmcc (subscriber, #88379) [Link] (3 responses)

> There's no way to force *any* user to follow those rules. What are you going to do, forbid taking a pointer to memory? Then what about strings? or two-dimensional arrays?

Not sure what you mean here; using std::string there are no pointers in sight. Two-dimensional arrays don't exist in C++ (or in C). It is common to implement a 2D array by encapsulating it in a class, with an overloaded function call operator()(i,j) to get the array element at index (i,j), with a backing store of (for example) an std::vector. Again, no pointers in sight.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 15:34 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> Not sure what you mean here; using std::string there are no pointers in sight.

Now create an std::string_view that points to that string.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:14 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

> using std::string there are no pointers in sight

std::string empty;
auto p = empty.data();

it seems to me that p is a pointer. Am I wrong? By "no pointers in sight" did you just mean "Well, the type itself isn't a pointer"?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 21:16 UTC (Tue) by ojeda (subscriber, #143370) [Link]

Ok, let's try that. Just a vector. No pointers or references in sight. Not even explicit iterators or container views.


    std::vector<int> v;
    v.push_back(42);
    v.push_back(43);
    for (const int i : v) {
        v.push_back(i);
        std::cout << i;
    }

==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x60...

https://godbolt.org/z/hT9hjqjx9

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 10:51 UTC (Tue) by farnz (subscriber, #17727) [Link]

This then comes down to a social/cultural difference, not a technical one.

In Rust, use of "unsafe" is heavily examined for flaws in your safety reasoning, and you are expected to make unsafe code robust against abuse by safe code; if someone finds a way to call your unsafe code from safe code that breaches the compiler guarantees offered to safe code, there's a social expectation that you will fix your unsafe code. You therefore need to harden any module that uses "unsafe" against callers who breach your preconditions, and ensure that there are no post-conditions once you leave your module. From this, Rust gets a tendency to say that you should not use unsafe, since doing so is really hard, and you're almost certainly better off using safe Rust and not having to manually prove your code is free of UB.

In C++, if your code does something that's undefined behaviour under certain conditions (the reason to use "unsafe" in Rust), the expectation is that your callers will all take care not to call your code in ways that makes those conditions true. You don't need to harden your code against callers that breach your preconditions, since that's their job.

And both languages benefit from the ongoing advance in compiler technology; compilers are getting very good at spotting tautologies in your source code and removing them, and thus the runtime cost of safety is becoming very low, since the compiler can spot that you've made things like bounds checks into tautologies at compile time, and remove them. You can see a trivial example of this in this Godbolt example - both demo1 and demo2 compile down to the same code, even though one has bounds checks implied by the use of "at", while the other uses operator[] which has no bounds checking.

You can also see (again, a contrived example) that the compiler can still eliminate tautologous bounds checks if you rely on exceptions; demo1 has a different output for size 1 and size 2 vectors, so has two bounds checks, while demo2 only has one bounds check because the compiler can see that if the bounds check in v.at(1) succeeds, the bounds check in v.at(0) will also succeed. The Rust compiler is equivalently powerful here; it can not only eliminate tautologous bounds checks, it can see that demo3, demo4 and demo5 are all the same semantically, and emit one block of code for all three functions (using a .set directive to tell the linker to reuse the emitted code - turn off directive filtering to find ".set example::demo4, example::demo3" and ".set example::demo5, example::demo3").

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 11:12 UTC (Tue) by fw (subscriber, #26023) [Link] (6 responses)

Doesn't this rule out using non-static member functions because the this pointer is not a smart pointer? Is anyone actually doing this?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 14:18 UTC (Tue) by ianmcc (subscriber, #88379) [Link] (5 responses)

No, you don't need to refer to `this` to call a non-static member function. It is quite rare to refer to `this` explicitly in C++. The most common usage is probably assignment operators that return a reference to the current instance with `return *this`. That usage is fine, since the current instance is obviously in scope for the duration of the reference to *this.

You *could* do something weird like stash a copy of `this` into some global variable and hence potentially violate some lifetime assumptions. In much the same way that you *could* drive your car off a cliff, and violate your own lifetime assumptions. It wouldn't pass any sane code review. I guess you could do something similar with unsafe rust too. I imagine one of the 'profiles' that Bjarne is talking about would warn or error if you attempted to save a pointer somewhere that potentially has a longer lifetime than the pointee object.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 15:22 UTC (Tue) by fw (subscriber, #26023) [Link]

If you don't refer to this either explicitly or implicitly, the member function might as well be static.

That usage is fine, since the current instance is obviously in scope for the duration of the reference to *this.

Why is that obvious? The object might do something that causes itself to be deleted. Such self-deletion may be somewhat uncommon, but seems hard to avoid in certain contexts. For example, and event handler might detect that it has no need to watch for future events.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:16 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

> That usage is fine, since the current instance is obviously in scope for the duration of the reference to *this.

It's not obvious at all.

The returned reference can become invalid for example if
1) in a multithreaded program another thread does things with the object or deletes it.
2) in a singlethreaded program after returning the reference to the caller the caller deletes the object and uses the reference afterwards.
Both cases cannot be solved without lifetime checking of the reference.
Trivial cases of 2) could *maybe* be caught by making the C++ compiler more clever. But I doubt you could whack all moles.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:59 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

It's obvious in simple cases, where no-one's doing anything weird. But I've noted, when I help people learn Rust, that one of the biggest problems C++ developers have with the borrow checker is that it deems something "unacceptable", and the learner complains that it's "obviously OK" and the borrow checker is being "overzealous" right up until I construct a non-obvious way to turn their code into a use-after-free.

Now, this is not to say that the Rust borrow checker is never wrong - it does have its moments where it rejects programs that are actually safe, because it can't yet prove them safe (but Polonius can, by using more powerful logic techniques) - but I have learnt to be sceptical of "obvious" as a safety proof mechanism since I so often encounter experienced C++ developers whose "obvious" is also wrong.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:12 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> it does have its moments where it rejects programs that are actually safe, because it can't yet prove them safe (but Polonius can, by using more powerful logic techniques)

And Rust can do this because its rule is "mutable XOR multiple references". The borrow checker does its best but still can't convince itself in every case. Polonius does better, but there exists valid (abstract) Rust that no compiler knows how to compile. Just like pre-NLL (non-lexical lifetimes) and post-NLL code are both *valid*, it's just that older compilers only *understood* the former.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 8:56 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Yes, Rice's theorem (which I'd guess substantially pre-dates not only all the programming languages we've mentioned but all the people participating) tells us it is categorically impossible to have what we desire here. If the general purpose programming language has non-trivial semantic requirements (and powerful languages like Rust and C++ certainly do) then checking these requirements is Undecidable, which means it cannot be done in the general case.

I believe khim is correct that Rust makes the only sane choice here, attempt to check, and whenever we're not sure we reject the program, this produces an incentive to (a) improve the compiler to be better at discovering the requirement is met and (b) where possible write code which is obviously meeting the requirement.

C++ makes the other choice, that's what "Ill-Formed No Diagnostic Required" phrases scattered through the ISO document are for, they're saying well, if the compiler isn't sure whether the semantic requirements are met we'll press on and hope. The result is jokingly referred to as "False positives for the question: Is this a C++ program?"

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 18:09 UTC (Tue) by mb (subscriber, #50428) [Link]

>You can't do it with raw pointers, but if you wrap everything in smart pointers and
>never let raw pointers or references escape then there is no problem.

I don't think this is true. You can't ensure memory safety in multithreaded programs just with smart pointers and without lifetimes. Lifetimes and the concept of Shared XOR Write for references are required to get full memory safety. And these things probably are impossible to fully implement, without transforming C++ into a completely different language.

I'm not saying that you can't make C++ safer.
Making C++ safer is a good thing.
I'm all for it.

But I currently don't believe that you can transform it into a completely memory safe language, unless you transform it into a completely new incompatible language that could have unsafe {} blocks with traditional C++ inside. But that would mean you put your existing code into unsafe blocks, which doesn't make the existing code any safer.
Therefore, I think the better way would be a "safer C++" instead of a "safe C++".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:34 UTC (Tue) by rolexhamster (guest, #158445) [Link]

You're right. I should have written "saner" rather than "sane".

While it's possible to have meaningful improvements in C++ (at some cost of breaking backwards compatibility), I don't think C++ can ever be made as safe as Rust.

The danger here is that lazy and/or short-sighted upper management may prefer to "believe" Bjarne Stroustrup’s so-called plan because of perceived lower cost of required changes to codebases. This fallacy needs to be nipped in the bud.

One counter argument is that if there is already going to be a break in backwards compatibility, might as well do the right thing and write new codebases in Rust, gaining further important safety improvements as a "bonus", which in turn leads to lower maintenance requirements (i.e. fewer embarrassing exploits and associated CVEs).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 8:34 UTC (Tue) by pwfxq (subscriber, #84695) [Link] (6 responses)

Secondly, the C++ language has gotten so complex with recent revisions (say from C++17 onwards) that any code using new features resembles line noise (not too dissimilar from how code written in Perl can appear).

I can skim code in various languages, get the general idea of what's going on and maybe make minor changes. But Perl & C++ are two that are no better than Egyptian hieroglyphs to me.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 16:40 UTC (Tue) by syrjala (subscriber, #47399) [Link] (5 responses)

For me that applies to Rust as well. From what I've seen it has far too much magic shorthand notation for a layman to generally understand what the program is doing. Feels like it was designed by mathematicians because they value brevity so much that they have a special symbol for absolutely everything.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 17:57 UTC (Tue) by mb (subscriber, #50428) [Link] (4 responses)

>a special symbol for absolutely everything.

What exactly are you talking about? Can you give an example? Rust has not that many special characters. Especially when compared to Perl. :)
Most of the characters are simply doing the same things as in C. And the rest are not that hard to understand (e.g. Multi-trait-bounds uses A + B).
And many things written in Rust often read like English text. impl A for B where...

Also in things are getting simpler over time. There have been many small steps in the direction of making Rust code writing more intuitive. And there are many more to come.

C++ actually did the same evolution. Look at how we used the C++ STL back in the days where there were no auto types and no iterator support in loops. It was *horrible* to use and read. And look where we are today.
The same thing happened and still happens in Rust.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 9:10 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (3 responses)

Lifetime annotations tend to trip people up.

One thing I'd suggest may help with that, try being unnecessarily explicit until you're more comfortable.

When you write &str in Rust, the inference engine (or if we're writing a function signature, specifically Lifetime Elision) tries to figure out what lifetime that reference has and silently assumes that's what we meant, but while this is idiomatic if you're learning lifetimes it may be helpful to write more of them explicitly instead.

Also, much like Physicists writing Fortran think v and k are perfectly good names for important long-lived variables, the Rust lifetimes you see in real code often have absurdly terse names, but that's not mandatory, it's fine to give them longer names.

fn greet<'brief>() {
let s: &'brief str = "Hello there";
println!("{s}");
}

There's no need to name this lifetime 'brief, but by doing so maybe we get more used to seeing lifetimes around and we learn where they're needed even though the language is good at inferring them so you can get away without writing them down.

As a halfway point, you can write the underscore to say you know a lifetime goes here but that lifetime can be inferred.

fn greet() {
let s: &'_ str = "Hello there";
println!("{s}");
}

I found lifetimes very easy to get used to, but I think they're the "weirdest" syntactical change from other semicolon languages. There are a bunch of important semantic changes (notice the semicolon in Rust isn't even doing the same thing as in a language like C or Java) but the syntax is what gets in your eyes immediately.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 9:51 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

>if you're learning lifetimes it may be helpful to write more of them explicitly instead.

Well, then just do it this way, if you think it's better?
That's exactly how books teach Rust. It's the obvious way. And then they teach you about how to make it readable by removing all unnecessary lifetime annotations.

>notice the semicolon in Rust isn't even doing the same thing as in a language like C or Java

Why is it different? It terminates a statement. How is that different from C?

>I found lifetimes very easy to get used to, but I think they're the "weirdest" syntactical change

That's because almost no language has lifetime annotations.
Therefore, it's obvious that most people don't know that concept, if they don't know Rust, yet. But it's actually pretty easy to learn. There are only a handful of rules to remember. And most of the time the compiler just inferres it for you. Which is a good thing, because otherwise Rust code would be completely unreadable.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 22:19 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

> Why is it different? It terminates a statement. How is that different from C?

C is a statement oriented language, whereas Rust is expression oriented. So e.g. this Rust is fine:

let x = if even { 6 } else { 7 }; // The type of the if expression was some sort of integer and so that's the type of x

And so is this:

if keep { save_the_files() } // Otherwise I guess don't save them? Evidently save_the_files() doesn't return anything

Or this:

if keep { save_the_files(); } // save_the_files(); is a statement, if save_the_files() did return anything it's gone

But this won't compile:

let x = if even { 6 }; // If it isn't even what then? We need the implied else clause to type check and it does not

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:34 UTC (Wed) by farnz (subscriber, #17727) [Link]

The other thing that I found helpful when trying to get used to lifetime annotations was to take my "normal" Rust code and apply the rules of lifetime elision to remove all the implied lifetimes. That way, I was able to build a sense of what the implied lifetimes were actually doing for me, and thus get to a point where when I needed to read annotations, I was used to them; it also meant that when I needed to write annotations, I had an intuitive sense built of what I was telling the compiler.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:12 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

> ... scientific change does not occur because individual scientists change their mind, but rather that successive generations of scientists have different views.

I wonder why Planck restricted himself to Sience.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:22 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

He had the humility to not speak outside his domain of expertise?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 22:16 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (26 responses)

8 years ago I heard a similar talk by Bjarne Stroustrup at the Curry On conference. In those 8 years little has changed. Yes, C and C++ compilers got better static analysis. On the other hand C++ this days give even more ways to produce subtle memory safety bugs with things likes ranges and string_view.

So I do not understand why Bjarne continues to believe that C++ can be fixed. At this point we know that memory safety that can be proven at compile like in Rust requires very different approach compared to what C++ standard libraries and typical code uses. Any notion that a profile can fix an existing code is a wishful thinking. Or is the hope that by still calling the resulting code C++ one has better chances to convince management to spent efforts to port to it rather than using Rust?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 23:59 UTC (Mon) by pizza (subscriber, #46) [Link]

> Or is the hope that by still calling the resulting code C++ one has better chances to convince management to spent efforts to port to it rather than using Rust?

I suspect this is actually the most important consideration.

($dayjob-1 had easily tens of millions of lines of C++ in production. Convincing them to move to a newer, "safer" C++ dialect is a much easier sell than wholesale replacement with a radically different language)

...That said, Stroustrup's arguments do come across as spherical cow-ish.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:08 UTC (Tue) by kreijack (guest, #43513) [Link] (4 responses)

> So I do not understand why Bjarne continues to believe that C++ can be fixed. At this point we know that memory safety that can be proven at compile like in Rust requires very different approach compared to what C++ standard libraries and typical code uses. Any notion that a profile can fix an existing code is a wishful thinking. Or is the hope that by still calling the resulting code C++ one has better chances to convince management to spent efforts to port to it rather than using Rust?

I think that both C and C++ have a so large code base that these fall in "Too big to fail". I am not sure that rewriting a so large codebase will reduce the problems. Because the translation itself will add further problems... for sure.

I am not saying that (e.g.) Rust is weaker than C/C++ or that C/C++ is a "safe" language. But a (aged) C/C++ developer could be capable to write "safe" code easily with a subset of C/C++ than starting to learn Rust.

I think that the Bjarne speech should be read as: the point is not if the (e.g.) Rust is better or worse than C/C++, the point is that you have a) skilled C/C++ developers and b) a large C/C++ codebase that you have to maintain. And this is more simple/reliable using a subset of C/C++ than switching to (e.g) Rust.

In these days I am looking to a C/C++ code for a realtime product. It use a very short subset of C++ (no memory allocation, no string, no exception, no template), and to me it seems simple and enough error proof to the point that the likelihood of an error logic is greater than the likelihood of invalid memory access.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:53 UTC (Tue) by farnz (subscriber, #17727) [Link]

Similar used to be said, when I was a young developer, about COBOL. In practice, what seems to have happened is that some entities still depend on COBOL, but the majority of the stack we care about (including in places that have legacy COBOL surviving) is written in something more modern, like Java.

This is where Rust's decent FFI story comes into play - if I have a legacy C++ codebase, I can use cxx.rs or perhaps bindgen's C++ support to allow me to incrementally replace parts of the codebase with Rust, just as I work with projects that mix FORTRAN 77 and C, or C++ and Fortran 90 today.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 15:25 UTC (Wed) by smurf (subscriber, #17840) [Link]

> But a (aged) C/C++ developer could be capable to write "safe" code easily with a subset of C/C++ than starting to learn Rust.

There is no such subset, and in fact there cannot be. Compilers do not solve the halting problem, thus you need language extensions to tell it about your intent. Without these, they cannot check that your code is "safe".

Rust has mutability and object lifetime syntax, which works (for the most part).
Python sidesteps the issue by reference-counting the world (and accepts the resulting impact on performance and memory footprint).
C++ ignores the problem and simply asks the programmer to not write unsafe or UBish code, or else.

You guess which language has problems with safety.

Stroustrup's approach is not going to fix this. Profiles are a nice idea but how the heck should the compiler know that your library needs exclusive access and to, and/or will eventually deallocate, an object you pass in, if you don't declare that fact? How should it know that returning a pointer/reference/whatever to some member of a struct (and thus the caller's use of it) is safe, if you don't have a way to declare that in the requisite function's interface? and so on.

The fact that he doesn't seem to have concrete ideas about the semantics of these profiles doesn't help.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:06 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

> and to me it seems simple and enough error proof to the point that the likelihood of an error logic is greater than the likelihood of invalid memory access.

You are in luck, then. Rust handles logic errors just fine. In fact affine type system (which Rust hides behind ownership and borrow checker) was invented many decades ago precisely to handle logic errors and not to handle invalid memory accesses.

The epiphany that converted Rust from yet-another-experimental-language into something-that-may-actually-replace-C/C++ was accidental observation: if you have type system which may prevent data races and ensure that you couldn't use hardware improperly then… you may as well use it to handle memory allocations, too! And drop garbage collector from your language.

It's both hilarious and sad that this, pretty obvious, step took more than quarter-century, but in hindsight it's pretty obvious, isn't it? More rigorous and strict tool should be able to perform duties of more limited and less powerful tool, isn't it?

I guess the fake “fact” that “safe memory management equals garbage collection” was established in mids of developer's community so firmly than no one was even thinking about whether alternate solutions are possible. That's why this, incredibly important, step had to wait for the moment when theorists met practitioners.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:22 UTC (Fri) by farnz (subscriber, #17727) [Link]

There's an important thing here that pushes against the desire for any language to be stable in the long run (and it remains to be seen if the Rust "editions" system is able to mitigate this); there's always a lot of research ongoing into programming language theory, much of which ends up concluding "yep, we can do this, but it's not practical".

As a result, you either need a mechanism to adopt the things that researchers show are practical and useful, or you stagnate at the state of the art when your language was first designed (at best).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:05 UTC (Wed) by IanKelling (subscriber, #89418) [Link] (19 responses)

> 8 years ago I heard a similar talk by Bjarne Stroustrup at the Curry On conference. In those 8 years little has changed. Yes, C and C++ compilers got better static analysis. On the other hand C++ this days give even more ways to produce subtle memory safety bugs with things likes ranges and string_view.

Bjarne's said safety will be optional, so the fact that it has developed unsafe features is no surprise. You mention rust, and it too has developed unsafe features in the last 8 years and safety is optional. If you link to the conference talk, we might learn something more specific.

> So I do not understand why Bjarne continues to believe that C++ can be fixed.

To put this nicely, that seems quite over the top in a negative way to say you there is no basis to understand Bjarne at all and that it cannot be done. I recall reading there already are safe C++ variants, and C++ is whatever the standards committee says + compilers implement, so is certainly "can be fixed", but if people believe it cannot be done because of comments like yours, then it certainly won't be.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:08 UTC (Wed) by IanKelling (subscriber, #89418) [Link]

correcting my own post:

is certainly -> it certainly

to say you there -> to say there

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:13 UTC (Wed) by mb (subscriber, #50428) [Link] (13 responses)

>You mention rust, and it too has developed unsafe features in the last 8 years

The unsafe powers are

https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#u...

- Dereference a raw pointer
- Call an unsafe function or method
- Access or modify a mutable static variable
- Implement an unsafe trait
- Access fields of unions

That hasn't changed in a long time.
There is no reason for it to change.

>there already are safe C++ variants

where?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:42 UTC (Wed) by IanKelling (subscriber, #89418) [Link] (12 responses)

"powers" are not the same thing as "features" that I and the parent commenter were talking about.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 1:08 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link] (11 responses)

> "powers" are not the same thing as "features" that I and the parent commenter were talking about.

Neither you or the parent has listed what specific features in Rust you are calling unsafe. Can you elaborate?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 3:57 UTC (Wed) by IanKelling (subscriber, #89418) [Link] (10 responses)

It isn't that hard to figure out, the top parent said string_view and ranges. I don't know C++, but I looked them up and they are both standard library functions, well, really each is a set of a dozen a few dozen. I'm not sure which ones specifically are unsafe, but it doesn't matter for this specific discussion. Okay, so I just cloned rust, went into a standard library folder, stdarch, grep -r 'pub unsafe fn' | wc -l: 9519. Ok, lets go back a bit, git log --before 2020-01-01, checkout that commit, grep -r 'unsafe fn' | wc -l: 2064. Ok, so a standard library in rust implemented 7455 unsafe functions in the last 2.9 years.

I'm not trying to draw any big conclusions about any language, but the argument presented was: "C++ this days give even more ways to produce subtle memory safety bugs with things likes ranges and string_view. So I do not understand why Bjarne continues to believe that C++ can be fixed", and also a reference to 8 years time. I've barely written a line of rust or C++, but I'm pretty confident that does not make sense.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:25 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

Rust's std::arch is mostly SIMD intrinsics, and its whole public API is marked `unsafe` because it's a low-level library that doesn't automatically check if the CPU supports a given SIMD instruction before calling it (and if not supported it will simply crash; it's not a memory safety problem). Regular programmers should rarely use std::arch directly - you'll almost always rely on auto-vectorisation for SIMD, and if not then use something higher-level like std::simd or the `wide` crate, which provide a safe API (based on compile-time flags stating what instructions the target CPU supports; preventing installation on an unsupported CPU is outside Rust's scope). If you really want to use std::arch, you'll have to call it in `unsafe {...}` blocks, which are a big sign that it needs careful code review.

C++'s std::string_view is designed to be used directly by regular programmers. It's basically a `struct { const char *s; size_t len; }` and its purpose is to replace std::string in situations where you don't want to memcpy the string, for performance. Unlike Rust's `unsafe` APIs, it doesn't do anything to make itself look unusually scary or dangerous. Yet it invites subtle memory safety bugs in trivial cases like:

std::string s = "Undefined";
std::string_view sv = s + "behaviour\n";
std::cout << sv;

(which creates a view of a temporary string, then destroys the temporary, then prints it).

I don't think those are even slightly equivalent situations.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 9:47 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (1 responses)

Example is bad: why are you creating string_view only to print it when you can print the string (original and temporary):

std::string s = "Undefined";
std::cout << (s + "behaviour\n");

I think it is even guaranteed that temporary will outlive "operator<<".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:07 UTC (Thu) by roc (subscriber, #30627) [Link]

The example is a little unnatural, but there are very similar examples that are more natural, e.g. using 'sv' twice.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 10:41 UTC (Wed) by mb (subscriber, #50428) [Link] (6 responses)

>Ok, so a standard library in rust implemented 7455 unsafe functions in the last 2.9 years.

And that's actually a good thing. It improves memory safety.

The majority of unsafe interfaces are eventually wrapped in safe interfaces.
That is how Rust works.
The programmer almost never comes in contact with unsafe interfaces. Even on embedded systems it's really rare to have unsafe blocks.
And that is *because* all these unsafe blocks exist in core, std or other special wrapper crates that take care of providing a safe interface.
It's just that if you *don't* have such a safe wrapper with unsafe internals in std/core or somewhere else, then you have to make your own. That is the risky and hard part.

Writing unsafe code in Rust is hard. It is much harder than writing C or C++ code.
Because you must take care of all the checks and things you know from C/C++ and in addition to that you have to ensure that the safe interface to your code can never trigger UB.
Therefore, it's a good thing to have this hard to write code in central places like core/std. The more unsafe code there is in core/std, the less unsafe code there is in the applications.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 11:26 UTC (Wed) by farnz (subscriber, #17727) [Link] (5 responses)

To nitpick ever so slightly; writing unsafe code in Rust is no harder than writing safe C or C++ code. The difficulty is that the Rust community expects that you will write safe code, whereas the C and C++ communities understand that all code is unsafe, and accept the risks inherent in this.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 16:35 UTC (Wed) by smurf (subscriber, #17840) [Link]

> and accept the risks inherent in this.

s/accept/ignores/. Or worse.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:09 UTC (Thu) by roc (subscriber, #30627) [Link]

Writing unsafe code in Rust *is* harder than writing safe C and C++ code, because the aliasing rules you must follow are more complicated (and not even fully defined yet, although clear enough that it's possible to write code that is definitely OK).

This is OK for everyone who has to write little or no unsafe Rust code, which is almost everyone.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:22 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

> writing unsafe code in Rust is no harder than writing safe C or C++ code.

It's actually much harder but for social, not technical, reason and for very good reason.

In C/C++ it's Ok to write some piece of code and only document some “happy and nice” way to use said code. Corner cases, that invariably exist, are left as an exercise for the future generations.

In Rust you have to think about use of your code which C/C++ developers would just declare “you are holding it wrong” case.

One simple example: std::stable_sort and Vec::sort_by.

What happens if comparison function actually changes elements? In Rust that's compile-time error if interior mutability is not in use (the majority of cases) or “random permutation in output” (if it's in use) in C++ that's UB…

What happens if comparison function is not generating total order? In Rust that's some “random permutation in output”, in C++ that UB…

Thus yes, writing unsafe is much harder than in C++… but for very good reason.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:14 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

See, I would say that if you document the "happy and nice" way to use said code, you've not written safe C or C++; you've written unsafe C or C++ and documented the preconditions that apply. If you're actually writing safe C or C++, in the same sense as "safe Rust", then it becomes as hard to write as unsafe Rust, because you're aiming to uphold the same guarantees as Rust requires.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:43 UTC (Fri) by khim (subscriber, #9252) [Link]

> If you're actually writing safe C or C++, in the same sense as "safe Rust", then it becomes as hard to write as unsafe Rust, because you're aiming to uphold the same guarantees as Rust requires.

If you try to write code in C or C++ that is “safe in the same sense as “safe Rust”” then you immediately hit the wall: language rules are perfectly Ok with pointers being uninitialized and references that are pointing to memory which is not yet a valid object, but touching them is instant UB thus, practically speaking, you always have some preconditions in addition to what compiler requires.

Which preconditions are are Ok for “safe C/C++” and which are not Ok for “safe C/C++” is always subject for massive debate and different, perfectly reasonable, people disagree on that subject often.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:48 UTC (Wed) by smurf (subscriber, #17840) [Link] (3 responses)

> safety is optional

Nope. In Rust, safety is not optional, it is the default; if you write unsafe code (a) you must explicitly state this and (b) you're culturally expected to (b1) limit your unsafe-ness to chunks small enough to be proven correct by human inspection and (b2) provide a safe public interface (e.g. by not returning multiple mutable references to the same buffer, not allowing *any* combination of calls to end up with uninitialized or double-freed memory, and so on).

In C++ unsafe-ty is not just the default, it's impossible to write safe code. Even if your interface is "clean" you're free to grab a pointer/string_view to some struct/class member, let the class go out of scope, then continue to use that pointer. Or create funky memory leaks. Or just crash your program by appending to a std::vector while iterating over it.

Nothing's preventing your C++ code from doing these things – all of which are compiler errors in Rust. Nothing's even warning you. Ultimately, nothing in Bjarne's plan addresses any of this – compiler authors refuse to solve the halting problem (bastards, all of them), thus the compiler cannot prevent those boring issues like object lifetime and memory safety bugs without relying on us overworked humans to annotate the code with type hints, lifetime tags, access restrictions (including "while the iterator is running, you can't modify the object it's iterating over") and so on. (Unless you're going the Python way and reference-count everything.)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:26 UTC (Thu) by khim (subscriber, #9252) [Link]

> Unless you're going the Python way and reference-count everything.

What you are calling “Python way” should be called “Swift way” IMNSHO. Python have full-blown garbage collector in addition to many other things while Swift is much closer to C/C++ and Rust and it only includes reference-counting, it doesn't have tracing GC.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 1:43 UTC (Sun) by IanKelling (subscriber, #89418) [Link] (1 responses)

>> safety is optional

> Nope. In Rust, safety is not optional, it is the default

Default as in, the default option, as in, it is optional. I mean, really?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 5, 2023 9:27 UTC (Sun) by smurf (subscriber, #17840) [Link]

Oh come on. A generic language (meaning, one you can use to write its own runtime library) that doesn't allow you to turn off safety if you need to is not possible.

For instance, you cannot write a generic memory allocator using only "safe" code.

The talk video

Posted Oct 30, 2023 22:49 UTC (Mon) by ikm (guest, #493) [Link]

Note that if you want to watch the talk yourself, it's available at the end of the article (as an unlisted YouTube video).

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 30, 2023 23:05 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

This is all pretty high-level discussion. It's about 8 years since the C++ Core Guidelines Lifetimes work started trying to define a safe C++ subset based on local pointer analysis, and about 5 years since they downgraded their goals to just catching a reasonably large set of errors instead of trying to make safety guarantees --- I covered this in https://robert.ocallahan.org/2018/09/more-realistic-goals.... How is Bjarne going to succeed where that effort failed?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 9:19 UTC (Tue) by marcH (subscriber, #57642) [Link]

As often, a large part of the problem is defining "success".

If the expectation is to _completely_ catch up with other, safer languages then we can probably forget it.

On the other hand, if the expectation is for profiles to provide "super linters" that can at last enforce coding guidelines and catch a large number of frequent C++ issues then maybe there is a chance?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 13:35 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (5 responses)

Some nice properties of software compose, that is, if the code I wrote has property A and the code you wrote has property A too, then the combined software definitely also has property A. Others do not compose, even having checked that my code has B, and yours also has B, if they're combined the result of that must nevertheless be re-checked if we care whether it has property B.

Rust's Safety composes. Crate 1 of safe Rust, plus crate 2 of safe Rust, plus my safe glue code, always results in a safe program. Does the program do what I intended? Maybe not. I might have completely misunderstood something, or maybe my logic is flawed. But it is definitely safe.

Because these C++ Profiles are arbitrary, they can't be expected to compose. In particular we should assume that the "mix-and-match" approach illustrated by Stroustrup in this presentation never delivers composition. I don't see this as at all viable at scale. It may be true that Alice's work obeys Profile X, Bob's work meets Profile Y and Charlie's work satisfies Profile Z, but the combined software has no meaningful properties at all as a result it may be neither X nor Y nor Z let alone all three, why did we even bother with these "Profiles"?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 14:53 UTC (Tue) by smurf (subscriber, #17840) [Link] (4 responses)

Well, if you're lucky these profiles are not mutually exclusive, so you might actually get some sort if intersetion of their properties.

Then again, you might not.

I'm not holding my breath – and I definitely do not expect this to bear any meaningful result, and the resulting mountain of code churn doesn't count.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 15:11 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

The particular scenario that worries me is profiles that each individually guarantee some useful property, but which when combined does not guarantee that property.

For example, Profile M guarantees no use-after-free bugs via a specific set of smart pointers and a limited lifetime analysis. Profile G guarantees no use-after-free bugs via a different set of smart pointers and a different set of lifetime analysis rules. But if you pass a profile M smart pointer to profile G code, conversions happen that break the lifetime analysis rules, and the same in the other direction, so that you get use-after-free back into the program.

This is probably the worst case scenario, since it means that any reasonable sized program will be working with libraries that use different profiles to accomplish the same goals from the language, but by doing so, invalidates all the guarantees from the profiles.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 19:35 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

The trick is not to create bazillion incompatible profiles but to only have few and well-designed ones.

That's not too much dissimilar from Rust with it's stacked borrows and tree borrows or Ada with SPARK.

Technically it's all is very much possible and could even be feasible… except for the problem of large subset of C/C++ community that just wants ~~O_PONIES~~ “portable assembler” and would just ignore that work entirely.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 11:20 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

The important thing with Rust's stacked borrows and tree borrows, or Ada's SPARK, is that the intent is to end up with only one set of rules. It's just that we're going from "the rules are not written down, but compiler authors think they make sense" to "this is the formal set of rules", and in that process, you end up with things like tree versus stacked borrows, where there's at least two sensible formalisms that match the current state of play.

But this is an intermediate state for Rust, while the rules are being converted from intuition into formalism. Once the formalisation process is finished, there will be one and only one profile going forwards.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 16:50 UTC (Fri) by khim (subscriber, #9252) [Link]

> But this is an intermediate state for Rust, while the rules are being converted from intuition into formalism. Once the formalisation process is finished, there will be one and only one profile going forwards.

Why couldn't C or C++ adopt the same stance? IMO the only reason is this social one that I'm talking here about: there are certain small, but significant, subset of C/C++ developers who explicitly refuse to accept rules and think it's they who should be defining them, not some other people who are writing language specs and/or compilers.

As long as they are not kicked out of the community all talks about safety remain a pipe dream but if one would find a way to achieve such expulsion… everything else sounds doable.

But neither Stroustrup nor anyone else talks about that issue, even if it's, probably, the most important impediment to the whole thing.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 19:42 UTC (Tue) by cyperpunks (subscriber, #39406) [Link] (17 responses)

It seems some thinks existence of Rust is a problem in itself. It's not.

Learning Rust takes a week or two.

That's much better than fixing security problems in C/C++ rest of your life.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:45 UTC (Tue) by eharris (guest, #144549) [Link] (16 responses)

Well....C vs. Rust:

(1) The C Programming Language, Kernighan and Richie, 2nd Edition, 1988 -- 272 Pages
(2) Programming Rust, Blandy, Orendorff and Tindall, 2nd Edition, 2021 -- 692 Pages

It's taken me nearly thirty years to get a reasonable grasp of reference #1.
Reading reference #2 was interesting.....but I really don't have another thirty years to wrap my head around it.

....and that's before the Rust language might STOP CHANGING every few minutes!!!!

Comments from Rust fanbois are welcome!!

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Oct 31, 2023 23:54 UTC (Tue) by mb (subscriber, #50428) [Link] (5 responses)

>....and that's before the Rust language might STOP CHANGING every few minutes!!!!

Luckily it stopped changing in 2015, if you stick to that edition. Just like you seem to stick to C89 or something.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:13 UTC (Wed) by mikebenden (guest, #74702) [Link] (3 responses)

This type of response from the Rust "in crowd" to what is IMHO a valid concern is a big part of why I decided to put a "low pass filter" on my attempt to dive into Rust.

Maybe in 10 years or so the language (and its fanbois) will mature a bit, and stop treating anyone who isn't approaching Rust as a eager supplicant, as if it were ${DOG}'s gift to humanity, with either dripping disdain or outright hostility.

Then again, maybe in 10 years I'll be retired and y'all be someone else's problem... :)

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 0:15 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

>This type of response from the Rust "in crowd" to what is IMHO a valid concern is a
>big part of why I decided to put a "low pass filter" on my attempt to dive into Rust.

So having real backwards compatibility is what you consider a valid concern?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 4:52 UTC (Wed) by IanKelling (subscriber, #89418) [Link]

> >This type of response from the Rust "in crowd" to what is IMHO a valid concern is a
> >big part of why I decided to put a "low pass filter" on my attempt to dive into Rust.

> So having real backwards compatibility is what you consider a valid concern?

Is mb related to Mike Benden?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 17:23 UTC (Wed) by cyperpunks (subscriber, #39406) [Link]

> Maybe in 10 years or so the language (and its fanbois) will mature a bit, and stop treating anyone who isn't approaching Rust as a eager > supplicant.

It's opposite, Rust is here *now* and it works, code produced by Rust in Android have far less security issues than any other framework used.

There is no need for C++ people to re-invent memory safety and other features found in Rust in C++ context.
Rust is already proven in the field today.

The C++ extension proposed to make C++ is not implemented and will many years to stabilize.

The fastest and most secure path for safe programming is Rust.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 7:00 UTC (Wed) by roc (subscriber, #30627) [Link]

It's a fair point that Rust is evolving faster than C evolved since 1988.

But taking 30 years to become proficient in either C or Rust seems very excessive.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 1, 2023 15:39 UTC (Wed) by smurf (subscriber, #17840) [Link] (9 responses)

We're talking about C++ here, not C. That's an order of magnitude larger than K&R.

Also, K&R are missing a bunch of features without which you can't write a type- or memory-safe program. (OK, you can, but you can't teach a compiler to verify that your program *is* correct. Halting problem and all that.)

As for the Rust fanboi-ism, I refer you to Asahi Lina (and her collaborators). They're writing Apple M1 graphics drivers for Linux. In Rust.

Anybody who ever did that will tell you that chasing memory leaks, object overruns, double-frees and whatnot is a large part of what makes writing (and, much worse, debugging) code in that domain difficult as heck. Surprise, these errors cannot happen in (safe) Rust programs – which is why, when I came across their blog entries about this, my first reaction was "they did WHAT in THAT time?!? HOLY S***".

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 10:06 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (8 responses)

> As for the Rust fanboi-ism,

Fanboism is definitely there. When Rust pops up on HN, often says "Linux is using Rust already" implying that Linux was converted to Rust for 50 years and what do you know. In reality some scaffolding was merged with exactly 0 production lines of code: "rm -rf rust/" could be done and nothing will break because nothing was using it on the first place.

I think it is well known(?) psychological trick: pretend you've succeeded and then tell everyone about it for some hype and demoralization.

> I refer you to Asahi Lina (and her collaborators). They're writing Apple M1 graphics drivers for Linux. In Rust.

New Binder code was sent, written in 146% organic Rust. It will be interesting to count unsafe blocks.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 14:35 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (4 responses)

> I think it is well known(?) psychological trick: pretend you've succeeded and then tell everyone about it for some hype and demoralization.

I don't think the people actually doing work are doing this. It sounds more like the behavior of sports fans that claim "we" won despite being hundreds of miles away.

> New Binder code was sent, written in 146% organic Rust. It will be interesting to count unsafe blocks.

Considering a C implementation's `unsafe` block count is approximated by `wc -l`, what is your point?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 16:33 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (3 responses)

> Considering a C implementation's `unsafe` block count is approximated by `wc -l`, what is your point?

My point, only now there will be some production Linux Rust code.

As for "wc -l" part, the statement is far from true.

Imagine you're an independent auditor tasked with reviewing Linux C binder code.
You're reading it and red-penning everything you can't prove to yourself is correct.

For example it is quite easy to verify that binder_ctl_ioctl() is OK:
* struct binderfs_device doesn't have padding -- good

even better to have builtin __builtin_type_has_padding() and refuse copy_from_user/copy_to_user on types which have padding, but OK.

* copy_from_user() is correct, although that block is verbosely written (which is not a bug).

* binderfs_binder_device_create() does some shenanigans with inodes,
quota and ACL aren't initialised.
Rust must scream here but it is OK, because Binder doesn't support quotactl and ACLs.

* req->name[BINDERFS_MAX_NAME] = '\0'; /* NUL-terminate */
Hell, yes! Comment is of "increment i by 1" variety (which is not a bug).

* first memory allocation

device = kzalloc(sizeof(*device), GFP_KERNEL);
inode->i_private = device;

Eww, scary pointers... But VFS guarantees to not touch i_private (and, god forbid, free it), so

kfree(device);
iput(inode);

is OK.

* second memory allocation
name = kmemdup(req->name, name_len + 1, GFP_KERNEL);
device->context.name = name;
device->miscdev.name = name;

Eww, sketchy territory, naked pointers, unrefcounted, but searching for "context.name" shows that only 1 is freed.
and "miscdev.name" is not freed, phew.

* backcopy
if (userp && copy_to_user(userp, req, sizeof(*req))) {

if (userp) is obviously unnecessary, because it was used to copy _from_ userspace thus a valid pointer.

"req" was mutated by clearing last byte to '\0' which doesn't leak secret kernel data, so OK.
Ergo, it safe to ship to userspace.

Now suddenly (221 - 111) + (256 - 233) = 133 LOC are most certainly safe, not unsafe.

This "all C code is unsafe" is such a non true.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 17:05 UTC (Thu) by ojeda (subscriber, #143370) [Link] (2 responses)

> As for "wc -l" part, the statement is far from true.

From your comment, I think you misunderstand what "safe" means in Rust.

It does not mean "the lines do not have UB because I checked" (like you do in your comment). It is the other way around -- that would be "unsafe" Rust.

In C, essentially any non-trivial line may contain UB. That is why they are "unsafe" in Rust terms, and why the statement about `wc -l` is quite close to reality.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 17:29 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (1 responses)

> In C, essentially any non-trivial line may contain UB.

"may contain" doesn't mean "contains". This fixation on UB is partially misguided.

> That is why they are "unsafe" in Rust terms, and why the statement about `wc -l` is quite close to reality.

C/C++ static checkers/compilers don't require to mark code with "safe" or "unsafe" and don't mark code themselves ex post facto.
Rustc requires unsafe marks thus creating the illusion that close to 100% of C/C++ code is unsafe.

I'm writing toy C compiler at the moment:
* C expressions are allocated from stable container,
* expressions form AST with pointers pointing to other expressions,
* pointers to expressions are never freed,
* references and other form of pointers aren't used,
* stable container is globally destructed when program exists (which is waste of cycles but this is for later).

It is easy to verify for a human at high level (and sanitizers confirm) that there are no leaks and
there are no bugs with pointer management despite having a awful lots of pointers.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 17:42 UTC (Thu) by mb (subscriber, #50428) [Link]

> It is easy to verify for a human at high level (and sanitizers confirm) that there are no leaks and
>there are no bugs with pointer management despite having a awful lots of pointers.

Yep. And that is what Rust calls "unsafe code, manually checked".
Safe code would be, if the compiler itself could prove that without human intervention.

Get it now?

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 15:23 UTC (Thu) by Tobu (subscriber, #24111) [Link] (2 responses)

New Binder code was sent, written in 146% organic Rust. It will be interesting to count unsafe blocks.

If someone wants to take a look: git log b4be1bd6c44225bf7276a4666fd30b8da9cba517...dca45e6c7848e024709b165a306cdbe88e5b086a (from rust-binder-rfc in https://github.com/Darksonn/linux.git via the RFC thread). About 8k lines of the C implementation removed, 7k lines of Rust replacing them (with some debug facilities still to port), and 120 SAFETY comments documenting the reasoning that makes unsafe blocks safe. The comments mostly have to do with why pointers are valid to dereference (often there is another smart pointer ensuring they are alive), why removal from a linked list is okay, with calling C functions or dereferencing pointers passed by a C caller (in the latter case there is a Safety comment documenting conditions the caller must meet as well). Commits that interface with C tend to add unsafe lines, but the rest of the commits often don't have to.

I'm sure it's possible to encapsulate the unsafety a little bit more, but it looks like the assumptions are well documented and easy to review.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 2, 2023 16:49 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (1 responses)

> 120 SAFETY comments documenting the reasoning that makes unsafe blocks safe.

I know this is borderline goalposts move party but I'm shocked it is such a big number.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 3, 2023 18:52 UTC (Fri) by njs (subscriber, #40338) [Link]

That would be huge for a pure rust project, but in this case I think I'm impressed instead. When you start adding Rust into an existing C project, every single existing internal API is unsafe, and the kernel has an incredible amount of internal interfaces, data structures, helper functions, etc that something like binder has to interact with. So you start out with unsafe on almost every line, and then you reduce it by gradually pushing it back into safe wrappers – which sometimes requires redesigning the C API at the same time you wrap it, if it wasn't designed with Rust's lifetime rules in mind.

Getting to the point where there are <120 places where all of binder touches the universe of kernel C its embedded in is pretty impressive, especially given how early days this is for rust in the kernel.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 3:23 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link] (4 responses)

This is probably the best way to get memory safe C++ right now: https://labs.oracle.com/pls/apex/f?p=LABS:0:0:APPLICATION...

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 6, 2023 12:26 UTC (Mon) by smurf (subscriber, #17840) [Link] (1 responses)

> best way to get memory safe C++ right now

Summary: this basically runs your C/C++ code within a garbage collected system, by running an interpreter/just-in-time compiler over the LLVM tree.

While you can get "memory safety" that way if you think this term means "no use-after-free, no double-free, no memory leaks". But you can still have e.g. memory aliasing bugs. Or several other classes of UB. As such, IMHO it's a band-aid that might help protect against some exploits, but it's the sort of crummy band-aid that lets the wound below it fester instead of helping it to heal.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 11, 2023 8:13 UTC (Sat) by linuxrocks123 (subscriber, #34648) [Link]

It's pretty extensive, man. Unless you want to move the goalposts to the point where Java isn't "safe", it's safe.

> In this mode, by design, it is not allowed to call native code and access native memory. All memory is managed by the garbage collector, and all code that should be run needs to be compiled to bitcode.

> Pointer arithmetic is only possible to the extent allowed by the C standard. In particular, overflows are prevented, and it is not possible to access different allocations via out-of-bounds access. All such invalid accesses result in runtime exceptions rather than in undefined behavior.

> In managed mode, GraalVM simulates a virtual Linux/AMD64 operating system, with musl libc and libc++ as the C/C++ standard libraries. All code needs to be compiled for that system, and can then be used to run on any architecture or operating system supported by GraalVM. Syscalls are virtualized and routed through appropriate GraalVM APIs.

https://www.graalvm.org/latest/reference-manual/llvm/Nati...

I think this is a very cool project and would love to play with it sometime. Doing something like hooking this into the Gentoo build system and making a whole Linux distro where everything is compiled to have these safety characteristics would be interesting. Of course, it would be as slow as Java, so I wouldn't actually want to use such a system normally. But maybe I'd be willing to pay the cost with a web browser, or for server software exposed to the Internet.

Bjarne Stroustrup’s Plan for Bringing Safety to C++ (The New Stack)

Posted Nov 7, 2023 18:52 UTC (Tue) by Tobu (subscriber, #24111) [Link] (1 responses)

That's a very short read, that left me unsatisfied (is the IR before or after optimisation?); but looking at the intro, the same author's PhD thesis seems like it will at least address this.

Addressing UB in C

Posted Nov 9, 2023 12:12 UTC (Thu) by Tobu (subscriber, #24111) [Link]

Safe Sulong does/did address classes of UB much more accurately than tools that rely on heuristics (like valgrind and sanitizers). You would use it with clang -O0 (though that does still have optimisations that would silence UB) and a toolchain that preserves LLVM IR, then try to find bugs at runtime. Kind of like what Miri does for Rust, possibly faster thanks to JIT (the meta-interpreter approach is similar to RPython). Sadly the code was never released (the perils of being an Oracle partnership?). At the time of the paper there was a lot of work left to reimplement libc functions; porting a libc to emulate at a lower level (instead of reimplementing every libc function) was considered but didn't happen. It also didn't allow pointer-integer roundtrips (exposing provenance), since it could garbage collect objects that didn't have a live pointer.

Native Sulong (maybe with some of the relaxed rules from Lenient C) seems to be published in the GraalVM repo, but it doesn't tackle most UB, it just makes it easier to mix C/C++/Fortran… with GraalVM languages.

And Lenient C is definitely interesting as well, as is the idea of relaxing UB rules based on what programmers seem to believe should work. Though extending allocations to live past the point free is called (using the GC graph instead) feels maybe too lenient.