|
|
Subscribe / Log in / New account

FreeBSD considers Rust in the base system

By Joe Brockmeier
August 19, 2024

The FreeBSD Project is, for the second time this year, engaging in a long-running discussion about the possibility of including Rust in its base system. The sequel to the first discussion included some work by Alan Somers to show what it might look like to use Rust code in the base tree. Support for Rust code does not appear much closer to being included in FreeBSD's base system, but the conversation has been enlightening.

Base versus ports

Unlike Linux, the FreeBSD operating system kernel and user space are developed together as the base system, maintained in the FreeBSD source tree (often referred to as "src"). This means, for the purposes of discussing using Rust as a language for the FreeBSD kernel or other programs/utilities in the base system, the Rust toolchain would need to be present in base as well. Currently, the languages supported for FreeBSD's base system are assembly, C, C++, Lua, and shell scripts written for sh. In the distant past, Perl was also part of the base system, but was removed in 2002 prior to FreeBSD 5.0.

FreeBSD also has a ports collection for third-party software that is not maintained as part of FreeBSD itself. This includes everything from the Apache HTTP Server to Xwayland. Rust is already present in the ports system, as are many applications written in the Rust language. A search on FreshPorts, which lists new packages in the ports collection, turns up more than 500 packages in the ports system that are written in Rust.

A dream of Rust

But Rust is not allowed in the base system. Somers noted this fact with disappointment in a discussion about a commit to fusefs tests in January. Enji Cooper asked Somers why he had not used smart pointers in the change, to which Somers said "it's because I'm not real great with C++". He added that he had stopped trying to improve his C++ skills in 2016, and had focused on Rust instead:

Even when I wrote these tests in 2019, I strongly considered using Rust instead of C++. In the end, the only thing that forced me to use C++ is because I wanted them to live in the base system, rather than in ports.

Somers said that he dreamed about the day when Rust is allowed in the base system, and mentioned several projects he would have done differently if it were allowed. Warner Losh replied that it would require some visible success stories for the FreeBSD community to consider inclusion in base. Rust, he said, "has a lot of logistical issues since it isn't quite supported in llvm out of the box". (By default, Rust uses its own fork of LLVM.) He suggested adding test cases in base that could be run by installing Rust from ports prior to building "to show it's possible and to raise awareness of rust's viability and to shake out the inevitable growing pains that this will necessarily [cause]".

Brooks Davis also suggested bringing in Rust code that would use an external toolchain rather than having the toolchain in base. "There are a bunch of bootstrapping and reproducibility issues to consider, but I think the fixes mostly lie outside the src tree."

The case for (and against) Rust

On January 20, Somers made his case to the freebsd-hackers mailing list on the costs and benefits of including Rust code in FreeBSD's base system, which opened the first lengthy discussion around Rust. He summarized the cost as "it would double our build times", but the benefit would be that "some tools would become easier to write, or even become possible". Losh reiterated his suggestion to start with adding better tests in Rust. That would allow the project to get an external toolchain working to learn if Rust "actually fits in and if we can keep up the infrastructure" or not.

Dimitry Andric wrote that it might be possible to build Rust using FreeBSD's base version of LLVM, but he said that the discussion was going in the wrong direction. The upstream build systems for LLVM and Rust require too many dependencies. Trying to support such tools "is getting more and more complicated all the time". He wanted to know why the project was spending time trying to build toolchain components in the base system at all. He argued, instead, that the focus should be on removing toolchains from the base system. "Let new toolchain components for base live in ports, please." In other words, software written in Rust could live in base, but its toolchain would stay in ports. That sentiment was shared by a number of other participants in the discussion.

Alexander Leidinger asked what kind of impact using Rust would have on backward compatibility. Would it be possible to compile FreeBSD release x.0 in two years on version x.2, for example, as it is with C/C++ code in base? Somers said the short answer is yes. The longer answer, he said, was that the Rust language has editions, similar to C++ editions, that are released every three years. Compiler releases are capable of building the latest edition, and all previous editions:

So if we were to bring Rust code into the base system, we would probably want to settle on a single edition per stable branch. Then we would be able to compile it forever.

Some participants were not convinced that code written in Rust should be allowed in base, even if its toolchain lived outside base. Cy Schubert complained that the language was still evolving quickly. Poul-Henning Kamp said that the pro-Rust argument simply boiled down to "'all the cool kids do it'". He argued that FreeBSD developers should "quietly and gradually look more and more to C++ for our 'advanced needs'":

I also propose, that next time somebody advocates for importing some "all the cool kids are doing it language" or other, we refuse to even look at their proposal, until they have proven their skill in, and dedication to, the language, by faithfully reimplementing cvsup in it, and documented how and why it is a better language for that, than Modula-3 was.

Bertrand Petit agreed with Kamp, and said that adding Rust to base should be avoided at all costs. However, he suggested that if using Rust needs something in base "such as kernel interfaces, linking facilities, etc." to work properly in the ports system, it should be provided in base.

After the discussion had simmered a bit, Somers replied with answers to some of the questions about Rust in base. He said that the comparisons of Rust to Perl were missing the mark. The crucial difference is that Rust is suitable for systems programming, while the others were not. "Rust isn't quite as low-level as C, but it's in about the same position as C++." To Kamp's assertion that developers should just use C++, Somers said that he was far more productive in Rust and his code had fewer bugs, too. He said he had used C++ professionally for 11 years, but was more skilled in Rust after six months. The problem, he said, was C++. "In general, it feels like C++ has a cumbersome mix of low-level and high-level features."

Out of the blue

Ultimately, the discussion trailed off in early February without any concrete plan of adopting Rust. The thread was re-awakened by Shawn Webb on July 31. Webb replied to Somers' original email with the news that the Defense Advanced Research Projects Agency (DARPA) is investigating a program to automate rewriting C code to Rust, called Translating All C to Rust (TRACTOR).

Losh said that he was still waiting for someone to take him up on the suggestion to do build-system integration for Rust tests. "Since the Rust advocates can't get even this basic step done for review, it's going to be impossible to have Rust in the base." Webb replied that he would be willing to find time in September to work on build system integration, if Losh was willing to mentor him, which Losh agreed to do.

Konstantin Belousov replied that it would be better to focus on what useful things could be implemented in Rust, rather than how to integrate code into the build. That caused Kamp to interject with a history lesson about the failures of importing Perl into FreeBSD's base system.

The choice to bring Perl into the base system, he said, was based on arguments identical to those being made for Rust. The project overlooked the fact that Perl was more than a programming language, it was an ecosystem with a "rapidly exploding number of Perl Modules". The goals of rewriting things in Perl went unrealized, he said, once developers realized that FreeBSD base only offered Perl the language and not Perl the ecosystem:

Having Perl in the tree was a net loss, and a big loss, because it created a version gap between "real Perl" and "freebsd Perl", a gap which kept growing larger over time as [enthusiasm] for Perl in the tree evaporated.

Adding Rust to FreeBSD will be the /exact/ same thing!

That left two options, Kamp said. The first is a FreeBSD Rust only intended for use in base without benefit of the Rust ecosystem, the second would be to find a way to allow FreeBSD Rust to "play nice with both Rust in ports and the Rust ecosystem outside ports". He was pessimistic about the second option being possible at all, and even if it was "it is almost guaranteed to be a waste of our time and energy" and would revert to the first option in a few years.

A third option, he said, would be to work on a distribution of FreeBSD based on packages instead of the base/ports system it has now. That could allow FreeBSD to have the benefit of the Rust ecosystem, or Python, or C++, etc.

A demo

On August 4, Somers posted a link to a repository forked from the FreeBSD src tree with examples of new programs written from scratch, old programs rewritten in Rust with new features, and libraries written in Rust. Somers also noted several features that his demo did not include, such as kernel modules ("those are too hard"), integrating the Rust build system with Make, or cdylib libraries (those are Rust libraries intended to be linked into C/C++ programs). He invited anyone with questions about what it would look like to include Rust in base to examine his demo branch.

Kamp replied that Somers's demo was awesome, but asked if it was worth the effort when compared to the idea of a package-based version of FreeBSD where Rust things could be built without all the extra effort. That sparked a back-and-forth about the difficulties of maintaining tests separately from fast-moving features in the kernel. Ultimately, Kamp said that he understood the problem: "I've been there myself with code I have maintained for customers." The problem of maintaining Rust code separately from kernel code only impacts a few dozen developers. "Adding Rust to src would inconvenience /everybody/, every time they do a 'make buildworld'." The solution, he said, is not to add Rust to src, but to "get rid of the 'Src is the holy ivory tower, everything else is barbarians' mentality" that has caused FreeBSD trouble over the years.

Into the black

Once again, the discussion trailed off without any firm resolution. No doubt the topic will come up again, perhaps later this year if Webb and Losh dig into Rust build-system integration. Rust may yet find its way into FreeBSD's base system, unless Kamp's vision of a package-based FreeBSD comes to pass and makes the distinction irrelevant.



to post comments

Sigh, not seeing the forest…

Posted Aug 19, 2024 16:50 UTC (Mon) by iustin (subscriber, #102433) [Link] (20 responses)

This part is sad:

> … and said that adding Rust to base should be avoided at all costs.

I an not familiar with FreeBSD's build system at all, so I don't know what would make it soo difficult to integrate Rust. But Linux can do it, and yes, all the "cool kids are doing it" indeed, but for the _right_ reasons. Rust is indeed a superb mix of advanced programming language designed for low-level code.

I hope they change their mind. Considering more C++ instead of Rust? Sigh.

Sigh, not seeing the forest…

Posted Aug 19, 2024 22:09 UTC (Mon) by rsidd (subscriber, #2582) [Link] (2 responses)

Linux can do it but rust doesn't live in the linux kernel source tree. The FreeBSD philosophy has been, everything that is required to build the base system lives in /src (used to include gcc, now includes llvm) and so if a rust tool (even a userspace tool) is included in /src the rust toolchain must be, too. But that causes problems when you want a newer or more powerful rust (which FreeBSD has faced before with perl in the base system). Some are suggesting that rust can stay in ports while code requiring rust can be merged in base. That would be a break from the past. Others (PHK) say they should work to make src smaller and move as much as possible to packages. This is how Linux works and I think it has proven to be a good idea but there will be resistance to that too.

Sigh, not seeing the forest…

Posted Aug 21, 2024 14:59 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

Is/was `/src` bootstrappable or did it still require an existing toolchain to get the /src-hosted toolchain going?

Sigh, not seeing the forest…

Posted Aug 22, 2024 11:25 UTC (Thu) by rsidd (subscriber, #2582) [Link]

As I remember from years ago, "make buildworld" uses your existing toolchain to bootstrap the new toolchain (ie build, then rebuilt with built version) and then it uses the new toolchain to build everything else.

Sigh, not seeing the forest…

Posted Aug 19, 2024 22:43 UTC (Mon) by jasone (subscriber, #2423) [Link] (16 responses)

Imagine your Linux-based operating system had its included C compiler dictated by whatever the Linux kernel developers were using to compile the kernel version in your system. And then same thing for whatever Rust compiler they were using. That's very close to what happens with FreeBSD. It's a high tax to pay all around.

Sigh, not seeing the forest…

Posted Aug 19, 2024 23:49 UTC (Mon) by khim (subscriber, #9252) [Link] (15 responses)

If they are doing things that way then adding Rust is, indeed, more-or-less useless.

Look rustc compatibility map: if you look on popular crates (that are actively maintained and used) then more than half require Rust 1.76+. Rust 1.76 was released in February. This year, yes.

If you are trying to stick with some fixed version of Rust you would find yourself limited to ancient versions of most popular crates and if you make it hard to bring crates into the codebase (most likely article we are discussing is close to correct) then you quickly end with with “FreeBSD Rust” that's entirely disconnected from “everyone's else” Rust.

People may tolerate that in kernel (which is special enough that you probably don't want to bring random crates just because someone thinks they are cool), but in the userspace? This would be pure misery for everyone involved: every attempt to ask about anything on public forums outside of FreeBSD would be meet with a shrug and “well, you have created problems for themselves, you find a way to resolve them” answer 90% of time.

Sigh, not seeing the forest…

Posted Aug 20, 2024 8:03 UTC (Tue) by gspr (guest, #91542) [Link] (14 responses)

> Look rustc compatibility map: if you look on popular crates (that are actively maintained and used) then more than half require Rust 1.76+. Rust 1.76 was released in February. This year, yes.

More than of those require rustc 1.76+ *in order to build their latest version!* Roll those crates back back a few versions, and you'll see a very different picture.

I do also find the Rust culture of "use the latest version of every crate, or bust" mentality problematic, but I've had success doing Rust development on Debian stable with only Rust crates (and rustc) as shipped by Debian stable.

Sigh, not seeing the forest…

Posted Aug 20, 2024 8:39 UTC (Tue) by Vorpal (guest, #136011) [Link] (13 responses)

Why though? I argue that the LTS approach is actually wrong. In theory you get stability and fewer bugs. In practise I have used Debian stable, Ubuntu LTS and Arch Linux and Arch has been the most stable of them all (with the few bugs I have encountered getting fixed promptly).

If you use Ubuntu and something breaks due to a backported fix you are out of luck until the next LTS comes around (unless you have a commercial support contract I guess, maybe?).

Arch Linux has also been the most stable when it comes to things like laptop suspend/resume working and not bugging out.

This suggests to me that LTS/stable releases doesn't actually work in practice. They end up less well tested than upstream, and thus buggier. Now, I'm not suggesting that you should run the absolute bleeding edge (you should run debian testing, NOT unstable). And if you have something critical that depends on things working you should have automated tests before you deploy. In fact you should have automated tests anyway, or you risk breakage when updating from one stable version to the next.

Really, the only reason I see for staying on old versions (other than temporary until a specific bug is fixed), is if you legally need certification (e.g. a certified compiler for software in medical devices, automotive, etc). And at that point you better have really good automated testing as well. (And follow a bunch of relevant standards for how you code, etc. I work on safety critical systems for my day job, it can be fairly involved.)

Sigh, not seeing the forest…

Posted Aug 20, 2024 8:54 UTC (Tue) by gspr (guest, #91542) [Link] (9 responses)

I guess we're deviating from the actual topic (what version of rustc is required for the most popular crates), but the digression is interesting too, so I'll answer in general terms:

I work in research. I need my computer for that. I do *like* maintaining computer systems (that is, for example why I participate in the Debian project as a DD). However, the older I get, the less time I have for it. So then, in order to be able to do my work, I need my equipment to be *predictable* when I start the day. It's not really relevant whether the latest and greatest breaks or not (my younger self definitely didn't experience much breakage at all when living like that), it's more about whether things change from under me. Can I start exactly where I left off yesterday, or do I need to adapt to the world having changed?

The world upending completely every two years or so feels like just about the right pace for me. A one year cycle would be OK too. The main point is that I've come to love and appreciate the stability(*) of classical, slow-cadence, distro releases (and all the software they ship). I use the word stability in a very Debian-y sense here: not stable as in does-not-crash, but rather stable as in does-not-change(-for-good-or-for-bad).

So I guess that makes me a weirdo who loves hippest programming language in town *and* simultaneously the good old ways of classical distros. (I'm even a weirdo who'd love to see something like ABI stability and dynamic linking in Rust, but that of course faces many technical hurdles.)

Sigh, not seeing the forest…

Posted Aug 20, 2024 8:57 UTC (Tue) by gspr (guest, #91542) [Link]

I'll add: Of course one sometimes needs access to the latest and greatest. Especially in research. While I don't at all embrace the "containerize everything" fad, I do very much appreciate that container systems let me spin up isolated bleeding edge environments on top of my stable base system whenever a research project relies on version-released-yesterday neural network, or something like that. Best of all worlds.

Sigh, not seeing the forest…

Posted Aug 20, 2024 9:16 UTC (Tue) by khim (subscriber, #9252) [Link] (6 responses)

> The world upending completely every two years or so feels like just about the right pace for me. A one year cycle would be OK too.

But are you sure it's because that's how older people feel or because that's the only alternative available?

I know lots of older people who have got Chromebooks or Chromeboxes for them. And that thing updates itself every 6 weeks. Most of these users are pretty happy.

The trick is, of course, that they don't ever need to know or care that it updates itself.

> I use the word stability in a very Debian-y sense here: not stable as in does-not-crash, but rather stable as in does-not-change(-for-good-or-for-bad).

Yes. That's what elders violently assert they want but what they actually need is, most likely, stable as in does-not-crash.

Yearly Android and iOS releases invariably lead to more complaints than every 6 weeks Chrome and ChromeOS releases.

People tolerate certain amount of breakage and certain amount of instability. They are not robots. The trick is to reduce that amount below certain threshold. And not to provide LTS releases.

Of course it only works if you don't go and break things just because it's new release and you feel you have the right to do that, but change things slowly and gradually.

But that actually happens pretty much automatically in the “use the latest version of every crate, or bust”: major breakage is confined to major, semver incompatible, releases, minor releases need, at most two or three lines of code changed (in places where you used something explicitly unsupported, usually).

Sigh, not seeing the forest…

Posted Aug 20, 2024 10:45 UTC (Tue) by gspr (guest, #91542) [Link] (1 responses)

>> The world upending completely every two years or so feels like just about the right pace for me. A one year cycle would be OK too.

> But are you sure it's because that's how older people feel or because that's the only alternative available?

Yes. I'm the "older person" in my own anecdote. I'm not very old, but older than I was a decade ago when I very much enjoyed having the latest version of everything. I can't say for sure whether my change in attitude comes from age. But I guess that's not really all that important – what's important is the very real change in attitude.

Sigh, not seeing the forest…

Posted Aug 20, 2024 11:14 UTC (Tue) by khim (subscriber, #9252) [Link]

> I'm not very old, but older than I was a decade ago when I very much enjoyed having the latest version of everything.

So you have never tested the approach that you are criticising from what I can see.

Rust approach is not to push disruptive changes on you, but to have few versions available: one that is only getting fixes and is backward-compatible and one that is getting new features.

You are expected to always pick the bugfixes but only upgrade to next major version when you have time.

In practice that's closer to Android model than to LTS or Debian testing.

Sigh, not seeing the forest…

Posted Aug 20, 2024 14:10 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

> > I use the word stability in a very Debian-y sense here: not stable as in does-not-crash, but rather stable as in does-not-change(-for-good-or-for-bad).

> Yes. That's what elders violently assert they want but what they actually need is, most likely, stable as in does-not-crash.

Why can't they BOTH be important? What matters is stability, as in "if it worked yesterday, it has to work THE SAME WAY today".

And yes I DO LIVE THIS EVERY DAY!

As someone providing support to a disabled wife, and her elderly parents, my BIGGEST problem is change. That's why we upgraded directly from XP to Windows 8.1. My wife pretty much didn't touch 7 or 8.0, because *I* couldn't face the pain of upgrading her, until her XP computer became pretty much unusable because of its age ...

Cheers,
Wol

Sigh, not seeing the forest…

Posted Aug 20, 2024 14:45 UTC (Tue) by Vorpal (guest, #136011) [Link] (2 responses)

How painful was that upgrade once you actually did it though?

As for getting both, that would be nice. Doesn't seem to happen in practice though.

Sigh, not seeing the forest…

Posted Aug 20, 2024 20:28 UTC (Tue) by ringerc (subscriber, #3071) [Link]

Trouble is you don't know how painful the upgrade will be, and rollback is usually infeasible.

I disk image systems before upgrades but most people can't do that.

I have little time and can't afford to spend 2 days tracking down why my updated GPU driver causes the main app I use to overlay fuzzy squares on everything. If it works I minimize change until I have some contingency time in hand. Since home desktop and laptop systems are effectively all unique (hw+sw+config combo) there's always a chance of mess.

Since my last Ubuntu update for example I'm having some frustrating issues with both Firefox and with Electron based apps experiencing periodic display glitches. No time to deep dive on it at the moment so I put up with it. Last update my laptop display server started freezing when unplugged form the external display. The one before that, it would periodically fail to suspend or resume... and this is pretty well supported hardware on a widely used distro. Admittedly the nVidia GPU is a significant part of the problem, one I'm usually able to avoid but wasn't with this model, but still.

The windows systems I have to maintain are even worse because they aggressively force significant updates. They're infrequently used so they're often pretty much disabled by immediately forced updates when I do boot them every few months...

Sigh, not seeing the forest…

Posted Aug 20, 2024 20:43 UTC (Tue) by Wol (subscriber, #4433) [Link]

> How painful was that upgrade once you actually did it though?

Pretty painful. Not as bad as 7 or 8.0 would have been, however, I don't think. Precisely because 8.1 brought back a lot of the XP look-n-feel.

Mind you, it's now degenerated to "supporting anything is painful", as the in-laws can no longer cope with technology full stop, and my wife struggles all the time. That's why change is so painful now - if things don't change at least she stands a fighting chance of remembering what to do, if it changes - no chance!

Cheers,
Wol

Sigh, not seeing the forest…

Posted Aug 20, 2024 10:21 UTC (Tue) by Vorpal (guest, #136011) [Link]

Fair enough, I would rather take a laptop that properly wakes from suspend to ram every time (never seen that work reliably on Ubuntu or Debian, it is 99.9% reliable on Arch). I guess it all depends on what stability metrics you care about.

And even on Arch most updates don't "upend the world". Yes, sometimes they do (KDE 6), but that is rare, because upstream doesn't release such major releases very often. I guess the question then is, do you want those 3-4 times per year (and only small and partial to specific subsystems) or do you want to accumulate a bunch and get them all in one go.

Really though, I can't think of any major update in the last year except for KDE 6. Yes, there been 2-3 updates where I hit minor bugs after updates too (quickly reported and quickly fixed, and none of the a showstopper), and one where I got a fairly annoying bug (Bluetooth defaulted to off in the login manager, easy to fix in the config file once I figured that out, and on a laptop it is an inconvenience not a show stopper).

LTS versus staying on latest

Posted Aug 20, 2024 9:58 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

I worked at a place where we transitioned from LTS to staying on latest while I was there; there were two impacts of this change:

  1. The bug count went down overall, but the specific bugs we were facing kept changing. This was acceptable for us, since we could work with upstream to fix the ones we cared about, and it didn't matter that the exact set of bugs varied with time, only that we had no important bugs.
  2. Upstream were much more willing and able to work with us on the bugs we cared about when we were using their latest code than they were with LTS code. As a corollary of this, we got much more support for fixing regressions, because upstream were much more willing to revert the improvement that caused a regression for us while they found a version that fixed the regression than they were when we came in via LTSes (where the attitude was often "well, if we simply revert while we find a fix, we cause a regression for this other user").

Based on this experience, I would say that LTSes are the right approach when it's critical that there are no regressions but you're happy to live with the existing set of bugs, while staying up to date is the right approach when you can handle a regression but are unhappy having to live with known bugs.

LTS versus staying on latest

Posted Aug 20, 2024 20:31 UTC (Tue) by ringerc (subscriber, #3071) [Link] (1 responses)

Right. LTS is about predictability more than runtime stability.

Especially when you are part of a chain of delivery it can really help.

LTS versus staying on latest

Posted Aug 20, 2024 20:43 UTC (Tue) by farnz (subscriber, #17727) [Link]

And specifically LTS is about having a known set of bugs; you may have a bug that means that one app overlays little fuzzy squares on everything, but it's consistently present, and you're not going to exchange that bug for one where the app crashes after 72 hours of continuous use. The idea is that you can live indefinitely with the bugs in the ".0" release of the LTS, but you cannot cope with the introduction of a new bug or with a regression.

Decouple the actions

Posted Aug 19, 2024 17:10 UTC (Mon) by jengelh (guest, #33263) [Link] (33 responses)

>DARPA is investigating a program to automate rewriting C code to Rust, called Translating All C to Rust (TRACTOR)

and then just translate the output back to C as a second step. That way, you get the fixes from the tool, without a forced language switch right now. (You can still switch to a higher-level language at a later point in time.)

Decouple the actions

Posted Aug 19, 2024 23:35 UTC (Mon) by khim (subscriber, #9252) [Link]

You would get nothing, most likely. DARPA is known to fund many projects with the understanding that 99% of these would fail, but if even one out of hundred would succeed that would be radical breakthrough that may change lots of things.

If there would be anything to speak about later then we may discuss what to do with it, but for now better to treat it as something destined to fail.

Decouple the actions

Posted Aug 20, 2024 5:26 UTC (Tue) by dvdeug (guest, #10998) [Link] (27 responses)

This is going to be extremely challenging to produce a useful tool that produces fast code. Compilers that use another language as an intermediate source are likely to be hacky going from C; a C compiler via Rust might virtually emulate the memory system of C inside Rust, adding any memory safety by an extra level of indirection. (Or just make everything unsafe Rust code, which allows using the Rust compiler but doesn't add any memory safety.)

A program that rewrites C in Rust is likely to reject perfectly correct C; "printf (string, x);" is legal and valid (given correct runtime contents of string), but is not able to be converted to Rust without a printf function, and that would just converting bad code to worse. It's not going to be able to magically trace tangled webs of pointers, so there's going to be extra indirection (killing speed), unsafe Rust code, or just rejecting the code.

Any way you cut it, a good tool that rewrote C into Rust would be useful for automating the majority of the conversion. But it's not going to turn unsafe C into safe Rust without some intervention.

Decouple the actions

Posted Aug 20, 2024 6:04 UTC (Tue) by roc (subscriber, #30627) [Link] (26 responses)

The goal of TRACTOR and similar efforts is to produce *idiomatic* Rust code, not some kind of C virtual machine written in Rust. If it can be done, it will need advanced technology like LLMs, formal verification, and stuff like that. It can't be a simple mechanical transformation.

Decouple the actions

Posted Aug 20, 2024 7:52 UTC (Tue) by taladar (subscriber, #68407) [Link] (25 responses)

LLMs can't even reliably write idiomatic correct code from the easiest prompts, what makes you think it can do so from something as messy as C code instead?

Decouple the actions

Posted Aug 20, 2024 8:54 UTC (Tue) by khim (subscriber, #9252) [Link] (12 responses)

Because one thing that LLMs do pretty damn impressively well are human language translation.

Maybe not as good as best human translators but comparable to average.

And it's not entirely clear why translation of artificial programming languages should be radically worse.

Success is not guaranteed, but that's why it's DARPA project and not Google or Microsoft project.

P.S. If I understand correctly the idea is to translate ideomatic C code into ideomatic Rust code which may not be 100% correct. After that humans would do some testing, fix stupid corner cases where C code did something crazy (and conversion to Rust exposed that), etc. The end result is, hopefully, better code. I'm skeptical, at this point, but wouldn't say it's 100% guaranteed fail.

Decouple the actions

Posted Aug 20, 2024 14:39 UTC (Tue) by yeltsin (guest, #171611) [Link] (11 responses)

LLMs, at least all the publicly available ones I've tried, make terrible translators. The machine is much better at it nowadays than in was in the days of Magic Goody, but it still routinely produces inane sentences and "misunderstands" context (I'm using quotes because they don't have a real understanding of what they're working with even in the best case), and completely breaks down when faced with idioms and slang.

My friend is a translator who works for a local representative of a large Chinese company. They process massive amounts of technical documentation (originally written in Mandarin Chinese), and the only way to handle it is to rely on machine translation. They've picked the best solution they could find (it's something commercial, I haven't bother but can ask him if needed), and its output is unusable without heavy manual editing. He sent me some examples over the years, before and after him going over it, and the original "translation" is often impossible to understand for someone not trained in deciphering it. Truly impossible, it's not just a bad choice of words here and there.

That's what he's doing all day. It's still a helpful tool (or they wouldn't be using it) in the sense that it's better than nothing, bit it has a looooooong way to get to get to the level of an *average* translator who can at least produce something coherent that gets the point across, even if it's not written in Shakespearean style.

But they're much better at the English grammar than I am, that I'm ready to admit.

Decouple the actions

Posted Aug 20, 2024 14:54 UTC (Tue) by khim (subscriber, #9252) [Link] (10 responses)

> It's still a helpful tool (or they wouldn't be using it) in the sense that it's better than nothing, bit it has a looooooong way to get to get to the level of an *average* translator who can at least produce something coherent that gets the point across, even if it's not written in Shakespearean style.

Are you sure? I remember how documentation that was received from Chinese companies looked like before advent of LLMs and that description was true for the majority of it:

> the original "translation" is often impossible to understand for someone not trained in deciphering it.

I just had to accept the fact that this is what average human translator produces.

P.S. I have no idea what causes that effect, BTW: I know that documentation translated from Russian or German was never that bad (in both directions), both with humans and LLMs, but something in Chineese just makes it impossible to translate adequately without someone who fully understands what these sentences actually mean, regular translators without knowledge of specific technology always produced garbage and LLMs still do that.

Decouple the actions

Posted Aug 20, 2024 20:33 UTC (Tue) by Wol (subscriber, #4433) [Link] (9 responses)

> P.S. I have no idea what causes that effect, BTW: I know that documentation translated from Russian or German was never that bad (in both directions), both with humans and LLMs, but something in Chineese just makes it impossible to translate adequately without someone who fully understands what these sentences actually mean, regular translators without knowledge of specific technology always produced garbage and LLMs still do that.

It's presumably because English, German and Russian are all Indo-European languages, and as such are all descended from a fairly recent ancestor. I remember reading something about "four waves of languages" and Indo-European belongs to wave 4. I believe Hungarian and Finnish are wave 2 languages, and while their vocabulary is completely different, they share a similar grammatical structure.

I think Gaelic, Basque, Catalan might be wave 3. Where Chinese, Japanese etc fit I don't have a clue.

But the point is that the structure of European languages is similar, so it's mostly a case of translate the individual words, be aware of idiom, and converting a crude translation to a good one isn't that much work. Start translating into a language from a completely different group, and many concepts may become completely untranslateable, Language shapes your view of the world just as much as your view of the world shapes language. A good example is "Borgeois, Burgerlich, Middle-class". Three words, three languages, the same basic concept, but each word is unique to its language, and while a naive translator might think they are the same, they all three mean something rather different one from the other. Indeed, I don't even know that any of those have an exact translation into any of the other languages.

Imagine that in three closely related languages. Now extend that to massively more different languages ...

(I see that - slightly differently - all the time at work. I have Polish, a Chinese, and an Indian colleague. Because these languages all differ in the sounds they use, I have difficulty hearing them clearly, and they have difficulty hearing me clearly.)

Cheers,
Wol

Decouple the actions

Posted Aug 20, 2024 22:26 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

It's not just individual vocabulary words, either. Chinese grammar does many things that are utterly alien to Indo-European languages, like verb stacking, aspect markers in lieu of tenses, the use of classifiers and absence of determiners in noun phrases, and a near-total lack of inflection (the latter is, oddly enough, almost a feature of English, but English does a little inflection here and there, mostly for verb conjugation and plurals). That's not even getting into the fact that "Chinese" is not one language, it is a whole family of (closely-related) languages, which all do things slightly differently from each other.

(For the curious: I found https://en.wikipedia.org/wiki/Chinese_grammar a helpful starting point, but I must admit that I don't know Chinese myself, so I have no idea how accurate it is.)

Decouple the actions

Posted Aug 21, 2024 0:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Yup. This is correct. One thing that tripped many translation systems was the lack of grammar tenses in Chinese. In English, every verb has to have a tense, it's unavoidable. Not so in Chinese, you have to get the tense (past, present, future) from the surrounding context.

Another thing that especially trips documentation writers is passive voice. It's rarely used in Chinese, unless talking about something serious ("he was hit by a car" type serious). A sentence like "once a job is processed" is difficult to translate word-for-word.

Decouple the actions

Posted Aug 21, 2024 6:46 UTC (Wed) by viro (subscriber, #7872) [Link] (3 responses)

Excuse me, but... what the hell have you been smoking? Gaelic is a Celtic language, which is a branch of Indo-European. Catalan is a direct descendant of Vulgar Latin. Which is to say, also I-E, Italic branch. It's about as far from Spanish as the language of Robbie Burns is from BBC English. Definitely _way_ closer than English and Dutch are to each other.

Decouple the actions

Posted Aug 21, 2024 10:46 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)

Fair enough - it's clear from what I wrote that I didn't actually know. Which is why I was deliberately - and clearly - vague.

And actually - from what you say - I suspect the language of Rabbie Burns may be *further* from BBC English than Catalan from Spanish (I was under the impression that Catalan might have been a pre-existing language swallowed up into Spain. Bit like Welsh and English). Rabbie spoke Scots (aka "the language of the Angles"), while BBC is English (aka "the language of the Saxons").

And where does Basque fit into all this?

Cheers,
Wol

Decouple the actions

Posted Aug 21, 2024 15:24 UTC (Wed) by anselm (subscriber, #2796) [Link]

Rabbie spoke Scots (aka "the language of the Angles"), while BBC is English (aka "the language of the Saxons").

Robert Burns is actually not a great example of Scots because in many cases he would tone down his Scots to be rather more like English (he had an audience to consider, after all). The Scots language, as distinct from other varieties of English in Britain, became its own thing only by the 15th century, when “Angles vs. Saxons” hadn't been an issue for close to a millennium or so.

And where does Basque fit into all this?

Basque is really an outlier because it is the only surviving language in Europe that is not somehow related to some other language. The general thinking is that early Basque developed before Indo-European languages (such as Celtic or Romance languages) reached the area. Basque has now assimilated various words from its neighbours but the grammar is still considerably different from Indo-European languages.

Decouple the actions

Posted Aug 21, 2024 16:38 UTC (Wed) by viro (subscriber, #7872) [Link]

Catalan is a language being swallowed up - it's just that it is closely related to the language that swallows it. So's Scot to English (and divergence times are similar - 15th century or so). Basque is a very different beast - it's not I-E at all, but more to the point, the grammar is different enough to make things interesting. It's not about the common ancestry - we *know* that there had been many deep reworks in that among the I-E, to the point that reconstructing the grammar of their common ancestor is pretty much hopeless; Basque grammar has features outside of the observed range for attested I-E languages. No idea how much headache does that cause for automatic translation, though...

Decouple the actions - languages

Posted Aug 22, 2024 3:43 UTC (Thu) by kenmoffat (subscriber, #4807) [Link] (2 responses)

I had not come across this wave theory. But a quick look at wikipedia suggests that if Indo-European languages are wave 4 then Catalan is definitely wave 4 (a derivative of vernacular latin, like many other languages) and while Gaelic is derived from insular (Ireland, Britain) languages it is still Indo-European but possibly related to Iberian or Gallic tongues. But certainly Finnish is not Indo-European, so probably wave 2, and Basque (currently regarded as an isolate) maybe wave 1 (as in "before the known movements of languages" - or should that be "wave zero ?").

But your point that all of these have similar sentence structures whereas Chinese dialects/languages (the choice of which they are is a political choice) are completely different is very true. As is the difficulty of hearing correctly - e.g. in Han or (ex-Han languages such as Korean) sounds which we English-speakers can distinguish such as l,n,r are hard for speakers of those languages to differentiate, and I'm certain the corollary is true for certain of their sounds which we did not learn as children.

Decouple the actions - languages

Posted Aug 22, 2024 3:48 UTC (Thu) by kenmoffat (subscriber, #4807) [Link] (1 responses)

And I now see that other people have expressed this much better than I could. <sigh/>

Decouple the actions - languages

Posted Aug 23, 2024 10:05 UTC (Fri) by Wol (subscriber, #4433) [Link]

Well, you picked up on my sounds comment, which I don't think anyone else did ...

Cheers,
Wol

Decouple the actions

Posted Aug 20, 2024 9:01 UTC (Tue) by moltonel (guest, #45207) [Link] (9 responses)

An LLM on its own can't, but an LLM coupled with a verifier could. This is what Google did for its AI that answers international math olympiad questions.

Writing a "is this Rust code equivalent to that C code" verifier remains a challenge, but perhaps a bit more approachable.

Decouple the actions

Posted Aug 20, 2024 11:00 UTC (Tue) by khim (subscriber, #9252) [Link]

> Writing a "is this Rust code equivalent to that C code" verifier remains a challenge, but perhaps a bit more approachable.

We couldn't even answer much simpler answer “is this machine code equivalent to that C code” question because of all these “we code for the hardware” guys!

But then, if we are not aiming for 100% correctness but for 99% correctness and assume human would cleanup the final version… that task may actually be feasible. Still skeptical, but we would see how well it'll go.

Decouple the actions

Posted Aug 20, 2024 13:50 UTC (Tue) by Vorpal (guest, #136011) [Link] (7 responses)

> Writing a "is this Rust code equivalent to that C code" verifier remains a challenge, but perhaps a bit more approachable.

That is in fact undecidable. Follows from Rice's Theorem, which is an extension of the halting problem.

You can of course get a yes/no/don't know (that is how static analysis works at all). Even so that will be a very tricky problem.

Decouple the actions

Posted Aug 20, 2024 15:10 UTC (Tue) by moltonel (guest, #45207) [Link] (2 responses)

It is undecidable in the general case. But we're not giving the verifier a finished converted codebase, there's a back and forth between the verifier and the generator (some kind of Generative Adversarial Network). The LLM learns to only generate snippets that the verifier can prove are equivalent. Having enough verifiable transforms to convert a real C project is still a tall order, but it's not impossible.

Decouple the actions

Posted Aug 20, 2024 15:38 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

One big practical problem here is that the prover is going to prefer the well defined parts of the system, but there's a good chance the meat of the solution was the parts that are poorly defined in C, where a prover just rejects them as unclear and so it's impossible to say whether the Rust does the same thing.

Example, suppose we've got code which if there are fewer than two payments, returns immediately, and, if not it ignores the largest and smallest payments, finding the mean of the others. The verifier may conclude this "mean finding" code, the actual meat of the solution, is poorly defined because it's not sure the divisor in the mean calculation is non-zero. What if there were exactly two payments? So, all this work, the stuff we were trying to do, is abandoned and our Rust just answers the trivial "Are there some payments?" question and leaves the actual work as TBD.

There are way too many narrow contracts in C. It's basically idiomatic in C to write functions with narrow contracts. In the stdlib these are spelled out (if anybody reads them) but in other people's code they're just silently narrow and you get to find that out the hard way, I think a prover will just reject all these cases as maybe Undefined.

Decouple the actions

Posted Aug 21, 2024 15:54 UTC (Wed) by mrugiero (guest, #153040) [Link]

Exactly. You don't need to solve the halting problem, you just need to cover a substantial enough area to work for most practical problems. It's the same with termination and the eBPF verifier, just restrict what your problem space to what you know you can solve or add some heuristic to abort in cases where you think maybe you'll never finish, etc.

Decouple the actions

Posted Aug 20, 2024 15:29 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

If the LLM produces Rust code *and* a formal proof that the Rust code is equivalent to the input C code, then it's very easily decidable whether the proof is valid.

That's a big "if" of course. No-one's saying this is easy or even actually feasible. This is research.

Decouple the actions

Posted Aug 20, 2024 16:02 UTC (Tue) by leromarinvit (subscriber, #56850) [Link] (2 responses)

If you can prove that any safe Rust program is equivalent to a given C program, that means the C program can't have any memory safety issues to begin with. How many large C code bases (the ones you'd want to convert to Rust) are there for which this can be said with any reasonable certainty?

If you have such a (memory safety) bug-free C program, then yes, you could use the Rust code generated by such a hypothetical converter to make introducing bugs in the future harder. But if you want to convert to Rust to weed out existing, uncaught issues, doesn't demanding a formal proof of equivalence make the problem literally impossible to solve?

Decouple the actions

Posted Aug 21, 2024 11:52 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

> If you can prove that any safe Rust program is equivalent to a given C program, that means the C program can't have any memory safety issues to begin with. How many large C code bases (the ones you'd want to convert to Rust) are there for which this can be said with any reasonable certainty?

It's more nuanced than that because you can refine the notion of "equivalent". You could, for example, use "equivalent, assuming that the C program never triggers undefined behavior" (i.e. the notion of equivalence that C compilers use when applying optimizations).

Or you could analyze the complete system and prove that for all the cases that can happen in practice, some C function is equivalent to some Rust function, even though in a different context it might behave differently and do something unsafe.

Definition of "equivalent" for TRACTOR

Posted Aug 21, 2024 12:40 UTC (Wed) by farnz (subscriber, #17727) [Link]

And one of the reasons that TRACTOR is a research proposal, and not something that "just" needs implementation work, is that the definition of "equivalent" that DARPA are aiming for is "what a Rust programmer would write, given the same specification that caused a C programmer to write this C code".

So, assuming DARPA fund a research team to see where they get to from here, part of the project will be to define what "equivalent" actually means - there will be some things that are technically UB where TRACTOR is expected to work out that a human programmer would expect it to be meaningful (and what that meaning is), for example.

The TRACTOR project as a whole is not a small idea - it's a big and far-reaching project that's likely to come up with a negative result - "this can't be done usefully because" - but where if it does come up with a positive result - "here is a machine that gets you all the way from a typical C codebase to a high quality Rust codebase, ready for a Rust programmer to refactor" - it's a huge deal.

LLMs to translate C to Rust

Posted Aug 20, 2024 9:41 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

LLMs originate in machine translation, and are very good in that space; as the prompt gets larger and more detailed, LLMs get better at producing idiomatic language in their outputs. From an LLM's perspective, "the easiest prompt" is a complete worked example of what acceptable output could be in one language; a short simple prompt is hard from an LLM's perspective. It happens that they can produce a plausible output from a small prompt, but that's a lucky accident, and not their core competency.

So, given this thing about the nature of LLMs, it's plausible that LLMs can't do "write me a version of Twenty Questions in Rust", but can do "given the following implementation of Twenty Questions in C, give me Rust code that's equivalent".

And TRACTOR is a "big bet" project, quite clearly (like Silent Talk was before it) where there's no expectation that the project will succeed, but if it succeeds the pay-off is huge. In that sense, it's DARPA buying a lottery ticket; their winning chances are tiny, but the impact of a win if they get one is huge.

LLMs to translate C to Rust

Posted Aug 21, 2024 7:59 UTC (Wed) by taladar (subscriber, #68407) [Link]

My definition of "the easiest prompt" would be the one that gives the LLM the information in exactly the way it needs short of spelling out byte for byte what kind of output you want it to produce. Honestly, even with the latter most LLMs don't produce the correct output reliably.

Decouple the actions

Posted Aug 20, 2024 6:07 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

Translating idiomatic Rust code to C would give you horrible C code. It would actually be a language switch, to a language that is totally unfamiliar and no-one wants to touch.

Decouple the actions

Posted Aug 21, 2024 15:59 UTC (Wed) by mrugiero (guest, #153040) [Link] (2 responses)

It would be easier to do in a mechanical way though. Pretty much all Rust semantics map to valid C semantics AFAICT, it's the other way around that doesn't happen. Mechanically converting something safe to something that allows but does not enforce safety is easier than converting something unsafe to something that does not allow unsafety.

Big exception would be async code, which would need to be hardcoded in the runtime first to properly be translated.

Decouple the actions

Posted Aug 22, 2024 9:56 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

It would have negative value because the resulting Rust code would be no safer than the C code, and much worse to maintain than the C code.

Decouple the actions

Posted Aug 22, 2024 11:09 UTC (Thu) by farnz (subscriber, #17727) [Link]

I'm missing something; the proposed direction is to start with hand-written Rust code, and to translate that to C, which can then be compiled by a compiler in FreeBSD's "base system". The Rust code is therefore likely to be more maintainable than the C code, because the Rust is written to be maintainable, while the C code is output by a translator that takes in Rust and produces C with the same semantics for all defined behaviours of the Rust code.

The usual issues with such a scheme are twofold:

  1. C doesn't actually define everything that a compiler needs to know about a platform; for example, what is the alignment of uint64_t? In the i386 ELF psABI, it's 4, while in the x86-64 ELF psABI, it's 8. This sort of thing matters when the compiler is expected to rearrange datastructures for optimal access; if, in Rust, I define struct Foo { kind: u8, data: u64, user: u16 } then the compiler is expected to reorder the elements to minimise padding (e.g. put kind at the end), where a C compiler is expected to lay them out in the order given and add more padding.
  2. Most use cases for this sort of translation are because you're working on a niche platform whose C compiler is not up to the standards of Clang or GCC, and you thus can't actually depend on it not having code generation bugs when you feed it standard C. As a result, the translation tool ends up having to be distorted heavily to create "safe" C for this compiler.

The second doesn't apply here, but if someone was actually interested in writing such a tool, writing a "C" backend for LLVM would probably be a good approach - and would expose all the problems of point 1 to the tool author.

editions

Posted Aug 19, 2024 18:47 UTC (Mon) by josh (subscriber, #17465) [Link] (10 responses)

It's not clear what the link was between new versions of rust building all old edition code and needing to select a single edition for all code in base. Nothing would go wrong if some code was still in an older edition; it would still compile.

editions

Posted Aug 19, 2024 20:30 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (8 responses)

Giving the benefit of the doubt, a single Edition does make it slightly simpler to teach Rust.

If you say all our code is Rust 2021 Edition, nobody needs to know anything that is gone in 2021 edition, and yet they also needn't learn anything that's new only in 2024 Edition (but they do still need to learn things that are new in that period but aren't part of the edition, they're just new).

In principle somebody maintaining 2015 Edition code ought to know Box<Goose> in that code might mean what we'd write today as Box<dyn Goose>. You can't do that in a modern edition, but in 2015 that's how people would have spelled that type and they were used to it.

Likewise your 2021 Edition code knows arrays are IntoIterator -- everybody (with a modern Rust compiler) can iterate over arrays of course, but in 2018 or 2015 edition code pretends to have no idea why that works so as to ensure compatibility with another old-timey thing people used to write back when arrays really weren't IntoIterator but references to the whole array as a slice were.

In practice what I'd expect is that FreeBSD would want a MSRV promise for this work. Maybe FreeBSD 14.1 promises Rust 1.79, while FreeBSD 14.0 says only Rust 1.70. People writing tools for such a system would judge whether they should be conservative (which might mean a lot more work) or not.

On the whole though, I see in this conversation something I see a lot in discussions with C++ programmers where they seem to imagine Rust's frequent release schedule means everything is in flux for the language, while at the same time considering C++'s lack of stability to be inconsequential. I notice that a no point does the discussion mention _which_ C++ language is in FreeBSD's base system. C++ 98? C++ 14? C++ 23? "Whatever Clang does -shrug-" ?

C++ ships new language versions every three years. Although in practice these languages are very similar as desired by the users, there actually is no formal promise of such commonality, and particularly on large codebases where people might value stability over a long span of time despite complex software C++ just doesn't always meet that expectation - it didn't promise to so no promise was broken.

Herb Sutter for example is proud of adding the <=> "spaceship" operator to C++ 20. This operator serves a similar purpose to Rust's PartialOrd trait. You implement it once, you get all the comparison operators at once, they're guaranteed consistent, that's a win for new C++ software. But of course there *was* already C++ code out there which had weird definitions of existing comparison operators and some of that code "broke" in the new C++ language either because of Herb's feature or because of knock-on effects in the stdlib. Maybe the code was already technically wrong (most C++ is) but for its owners the situation is that C++ 20 broke it.

editions

Posted Aug 19, 2024 20:38 UTC (Mon) by josh (subscriber, #17465) [Link] (1 responses)

I definitely agree that using a single edition is easier for usability and learnability. It just didn't seem like that was the case being made; there seemed to be some impression that it was *needed* for compatibility.

And yes, it does seem like people are assuming Rust will have breakage on upgrades on par with other languages they have experience with, while not allowing for the possibility that a language can do better about compatibility.

editions

Posted Aug 20, 2024 0:17 UTC (Tue) by khim (subscriber, #9252) [Link]

> And yes, it does seem like people are assuming Rust will have breakage on upgrades on par with other languages they have experience with, while not allowing for the possibility that a language can do better about compatibility.

What they want is the exact opposite: picking one, fixed, thus, eventually, old version of LLVM and Rust and using it for the duration of certain FreeBSD release.

And for some unfathomable reason they think Rust editions would save them. They wouldn't.

Rust just doesn't work that way.

editions

Posted Aug 20, 2024 0:13 UTC (Tue) by khim (subscriber, #9252) [Link]

> Giving the benefit of the doubt, a single Edition does make it slightly simpler to teach Rust.

Exactly. Slightly simpler. Not by much.

I strongly suspect the whole discussion is based on the assumption that Rust editions are like C++ versions: you commit to one version and use features that are in there and then, 3 years later, get another version with bunch of new features.

Nothing can be further from the truth: 99% of features in Rust are added outside of any editions. GATs were added in Rust 1.65 and async fn traits have become available in Rust 1.75… these are major features that significantly change how you write code in Rust.

Editions are strictly for breaking changes and usually are limited to syntax sugar, all the major changes happen between editions.

> I notice that a no point does the discussion mention _which_ C++ language is in FreeBSD's base system. C++ 98? C++ 14? C++ 23? "Whatever Clang does -shrug-" ?

It's because it doesn't matter as much. There are lots of codebases that are using C++17 these days and the ones who are using C++20 are considered “very advanced” in spite of the fact that C++23 is already a thing.

This works fine with FreeBSD approach: if people accept patches for the five year old standard support (and they do!) then FreeBSD may just use “recent enough” C++ that's supported by whatever version of LLVM is supported by base and that's enough.

This wouldn't work with Rust. Not at all. And editions wouldn't help.

> C++ ships new language versions every three years.

What's important is not how often new version of C++ are shipped by how quickly third-party code stops supporting old versions.

In C++ world that takes years for “fast” and “advanced” codebases to stop supporting old standards (LLVM itself still requires that all code is compilable with C++17, e.g.), in Rust it takes months (same as with perl).

editions

Posted Aug 21, 2024 16:05 UTC (Wed) by mrugiero (guest, #153040) [Link] (4 responses)

One caveat that I see is missing from the editions discussion is that for this kind of project where stability is the most valued asset you should stick to at most current-1: the current edition is technically live and, while always backwards compatible, it's not necessarily forward compatible, as extensions are allowed to some degree. The same is true for the stdlib. You may find yourself in the situation where you happily coded locally with Rust 1.80 and used a feature that 1.78 doesn't have, but base uses 1.78, so you broke the build, even though both used the same allowed edition. Luckily all prior editions are already frozen because new development happens in current.

editions

Posted Aug 22, 2024 8:21 UTC (Thu) by taladar (subscriber, #68407) [Link]

That is because Rust is not designed to use an old compiler. It is designed for the new compiler to always be able to handle your old code.

editions

Posted Aug 22, 2024 12:26 UTC (Thu) by khim (subscriber, #9252) [Link]

> One caveat that I see is missing from the editions discussion is that for this kind of project where stability is the most valued asset you should stick to at most current-1: the current edition is technically live and, while always backwards compatible, it's not necessarily forward compatible, as extensions are allowed to some degree.

That's true for all editions, including the very first Rust 2015 edition. One example: Rust 2015 got NLL is year 2022, one year after Rust 2021 got it and four years after Rust 2018 got it — which means, ironically enough, that if you want to backport some code written for Rust 2015 in year 2024 it may be easier to port it to Rust 2018 than to stay with Rust 2015 if you want to use old compiler.

The only way to stay with old version of Rust compiler is to, essentially, fork everything and stay in your own little island.

> Luckily all prior editions are already frozen because new development happens in current.

I recommend you to find someone who told you this blatant lie and kick him. All prior edition have to stay compatible with all prior code (with some small caveats), but they are not frozen and they are not meant to be frozen.

Edition compatibility over time

Posted Aug 22, 2024 12:33 UTC (Thu) by farnz (subscriber, #17727) [Link]

One caveat that I see is missing from the editions discussion is that for this kind of project where stability is the most valued asset you should stick to at most current-1: the current edition is technically live and, while always backwards compatible, it's not necessarily forward compatible, as extensions are allowed to some degree

This is also not completely true; the old edition is also technically live, since, while it's not permitted to change the meaning of code that compiled under old compilers except across an edition boundary, it's permissible for a new compiler to accept code under an old edition that an old compiler rejected. So, I can write Rust 2015 code with compiler 1.80 that will not be accepted by compiler 1.50, for example, because the new compiler accepts more code than the old one did (borrow checker changes, changes to core and std to name new examples).

editions

Posted Aug 22, 2024 13:12 UTC (Thu) by laarmen (subscriber, #63948) [Link]

> Luckily all prior editions are already frozen because new development happens in current.

The scenario you describe could very well still happen even using an older edition. Even if your Rust code uses the 2018 edition, the 1.80 compiler will happily let you use stdlib APIs introduced in the 1.80 dev cycle. Editions are about incompatible changes to the syntax, changes to the default import, etc. They're not about freezing the language (in the wider sense) in time.

editions

Posted Aug 20, 2024 3:47 UTC (Tue) by himi (subscriber, #340) [Link]

My reading of this was that they wanted to set a *maximum* edition on Rust code that could go in the base - so that developers could target 2024 and anything older, but not 2027 (until they updated the requirements).

I suspect a minimum supported version would work better, given the way features stabilise in Rust currently - the only advantage of specifying an edition I can think of is that it might make it easier to integrate with other implementations like gcc-rs . . . though I guess we won't really know how that might work until we have actual practically useful alternative implementations.

Rust backwards compatibility

Posted Aug 19, 2024 20:37 UTC (Mon) by lukegb (subscriber, #106233) [Link] (18 responses)

Is it true that Rust editions solve the problem? My impression was that breakages can and do happen, and are not regarded as important to solve before release or even warn in a previous release before breaking in the next one by the Rust maintainers or Rust policy (yet, anyway) - the most recent example being https://github.com/rust-lang/rust/issues/127343, which broke older versions of the time crate and thus anything depending on it. This requires active work from downstream maintainers to update their lockfiles to fix, and does not appear to be tied to editions.

Obviously the time crate regression itself is out of scope here if we're considering that FreeBSD base would not have any dependencies on crates from crates.io, but it's also worth noting that any FreeBSD base breakage that would have resulted from this would have been completely invisible to the Rust regression tooling because it's not on crates.io.

Rust backwards compatibility

Posted Aug 19, 2024 20:52 UTC (Mon) by josh (subscriber, #17465) [Link] (1 responses)

The time crate breakage was an unusually awful failure, and we're working on several process and language improvements to make sure it doesn't happen in the future. (Short version: a case where only one possible type could be the answer became a case where more than one could be, causing a type inference failure, and we're working on proactively catching and avoiding cases like that.)

That said, one huge advantage of the FreeBSD way of doing things over most other software distributions is that they have all the code in one place and can patch all software in the same commit that upgrades tools.

Rust backwards compatibility

Posted Aug 21, 2024 9:46 UTC (Wed) by dev-ardi (guest, #172609) [Link]

> we

Out of curiosity, who of the many Joshes in the project are you?

Rust backwards compatibility

Posted Aug 19, 2024 21:09 UTC (Mon) by atnot (subscriber, #124910) [Link] (11 responses)

It's incredible how every time something regrettable happens in Rust, the same Palantir/"Big Macro" guy shows up and makes the worst possible decision, and yet he seemingly remains untouchable.

Rust backwards compatibility

Posted Aug 20, 2024 5:35 UTC (Tue) by roc (subscriber, #30627) [Link]

You mean dtolnay? Well, he is responsible for serde, which is one of the things I like most about Rust, so he deserves a lot of slack. But I agree with you, I've disagreed strongly with some of his recent decisions.

Rust backwards compatibility

Posted Aug 20, 2024 6:17 UTC (Tue) by ralfj (subscriber, #172874) [Link] (9 responses)

It's also incredible how many amazing things he's done for the Rust ecosystem. Just compare "Most downloaded" at https://crates.io/ with https://crates.io/users/dtolnay?sort=recent-downloads : the top 3 most downloaded crates are by him, and that doesn't even include serde!

He's doing a lot of work, and some mistakes. If you only look at what happens in Rust when there's an outcry, of course you will only see his mistakes.

Rust backwards compatibility

Posted Aug 20, 2024 10:19 UTC (Tue) by khim (subscriber, #9252) [Link]

That's the trouble with extremely talented people: they are so accustomed to being right when their opponents are wrong that, in rare cases where they are actually wrong, it takes absolutely ridiculous effort to convince them that yes, this time they are wrong for real.

I guess one just have to accept that our our shortcomings are a continuation of our strengths and spend that effort when needed.

Rust backwards compatibility

Posted Aug 20, 2024 13:25 UTC (Tue) by atnot (subscriber, #124910) [Link] (7 responses)

Personally when I look at that list the thing I see is a bunch of macro hacks that would, in an ideal world, not exist in the first place or at minimum not be treated as the perfect permanent solutions they are. And much effort on improving compile times too but these giant macro crates which are a large fraction of clean build times are untouchable.

It wouldn't be this harsh if this was completely unrelated to him and out of his control. Just the best we could do. But that's not the case, because what happened instead is that he deliberately, secretly sabotaged the addition of reflection to the language out of an unknown mixture of racism and powertripping at the thought of his kingdom of macros becoming slightly less relevant. And then spent more than half a year having other, lovely people with more integrity than him (some of which I consider friends) take the fall for him.

So instead of a great feature which everyone could use right there in the language and would, not coincidentally, make a lot of these things so trivial to implement they wouldn't even need to be a library, we get this. And all everyone ever says you can't criticize the guy because he's selling the "top 10 crates" cure to the situation he created.

If this sounds bitter, yeah. Correct.

Rust backwards compatibility

Posted Aug 20, 2024 14:16 UTC (Tue) by intelfx (subscriber, #130118) [Link] (6 responses)

> what happened instead is that he deliberately, secretly sabotaged the addition of reflection to the language out of an unknown mixture of racism and powertripping at the thought of his kingdom of macros becoming slightly less relevant

Sounds spicy. Can I read about this somewhere in more detail?

Rust backwards compatibility

Posted Aug 20, 2024 16:01 UTC (Tue) by khim (subscriber, #9252) [Link] (5 responses)

I think this is the best you may find. The question of who exactly did what and when to whom is still not answered (and I'm not sure it can be answered, at this point).

Rust backwards compatibility

Posted Aug 20, 2024 16:56 UTC (Tue) by intelfx (subscriber, #130118) [Link] (2 responses)

The RustConf 2023 keynote mess and "sabotaging the addition of reflection" is the same thing?

I must be missing additional context.

Rust backwards compatibility

Posted Aug 20, 2024 17:16 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

The keynote was about ongoing work on the way to add reflection to Rust. And that work was stopped after they keynote fiasco.

Whether that was deliberate sabotage or not, but the end result: no one works on reflection in Rust anymore.

Rust backwards compatibility

Posted Aug 20, 2024 17:18 UTC (Tue) by intelfx (subscriber, #130118) [Link]

With help of your and atnot's replies, I finally connected the dots now.

That's a damn shame.

Rust backwards compatibility

Posted Aug 20, 2024 16:58 UTC (Tue) by atnot (subscriber, #124910) [Link] (1 responses)

Worth noting that this is before the revelation of the degree to which dtolnay was involved and his disastrous response. It also spends a lot of time disecting things that are not relevant to anyone affected, like the exact mechanisms of how it happened and who said what and when with what motives.

Here's my TL;DR of the timeline:
- thephd et.al. get strongly encouraged by a wide array of people to work on reflection
- They receive a sudden reversal of some decision out of the blue and nobody wants to be responsible for it or tell them what it's about. They smell a rat and demand a technical explanation of why the work was not considered up to scratch.
- Receiving no such explanation they nope out, correctly detecting the telltale signs of someone pulling strings against them behind the scenes (a thing any black person living in the US will be keenly attuned to)
- Big drama, everyone blames someone else, it was nobody's fault, it was actually legitimate concern, etc. (fasterthanlime post happens here)
- Lots of people step down from various roles as a result of letting this happen and/or protecting the person who did it, which nobody names.
- Much later, word gets out that dtolnay was in fact pulling strings behind the scenes and just let everyone take the fall for him. He responds to this with a bizarre github gist full of verifiable bs. And then reaches out to thephd asking for how to resolve this. They, once again, demand a technical critique of their work.
- Dtolnay can't offer it. thephd reaffirms that they will not return to Rust until such a technical critique is made. This has not happened even after david's name was known, evidencing thephd's hunch that such a critique never existed. (https://cohost.org/ThePhD/post/7169013-weird-question)

And that's where things remain today.

(But if you have way, way, too much time on your hands here's the most recent thing I know of that attempts to summarize what happened: https://dragon.style/@pyrex/111005018693053136)

Rust backwards compatibility

Posted Aug 25, 2024 23:09 UTC (Sun) by marcH (subscriber, #57642) [Link]

> (But if you have way, way, too much time on your hands here's the most recent thing I know of that attempts to summarize what happened: https://dragon.style/@pyrex/111005018693053136)

Summary of the summary:

1. People pull strings to win. Sad but business unfortunately as usual. Even the best language in the world is still affected by politics and non-technical issues.

2. The person who pulled the strings was racist because... the "victim"[*] is black and the evil guy is white. Wow. If you found anything tangible that I missed then please correct me. If not, that's sheer and obvious defamation.

Ironically, the racism accusation comes after a truckload of other, more substantiated accusations that are 10 times enough to explain what is supposed to have happened.

There is nothing tangible to prove that the guy is _not_ racist either. I naively assumed that in such a void, everyone had a right not to be called racist but maybe I don't spend enough time on this new social media thing.

[*] quotes because that term is normally used for bigger life problems than being rejected from Rust.

Rust backwards compatibility

Posted Aug 19, 2024 21:57 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

The libs team really dropped the ball there IMHO.

In the past I don't think this kind of regression would have been tolerated. Maybe something has changed.

Rust backwards compatibility

Posted Aug 19, 2024 22:17 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

As far as I can tell, this breakage was technically permitted under RFC 1105: https://rust-lang.github.io/rfcs/1105-api-evolution.html

Note that "technically permitted" is not necessarily the same thing as "a good idea in practice." The RFC itself warns about this distinction. What appears to be happening now is that the Rust folks are reassessing whether the RFC is a good enough standard, or if there should be more elaborate rules for cases like this. Which is the normal and healthy thing you would expect to see a language's developers do in a situation like this.

Tangent: IMHO the existence of documents like this also demonstrates why SemVer's use of MUST instead of SHOULD was misguided. If you purport to forbid doing something that is not actually required for your standard to "work," then people are just going to introduce more definitions to explain why they're not really doing the thing you thought you prohibited. SemVer's operative MUSTs (i.e. the ones that people actually care about, not the obvious "don't use irregular numbering that makes no sense" MUSTs) are de facto SHOULDs in the context of Rust, and in practice, I suspect that most other projects which purport to follow SemVer also treat those MUSTs as very strong SHOULDs. But at least Rust is honest about doing so (and clearly defines the boundaries of doing so).

Rust backwards compatibility

Posted Aug 21, 2024 16:28 UTC (Wed) by mrugiero (guest, #153040) [Link] (1 responses)

Worse than MUSTs being SHOULDs is that effectively most of the ecosystem is zerover[0], which of course is a joke name for the phenomena of never releasing 1.0 to avoid having to comply with semver, which in turn makes most of the ecosystem unusable for anything long term.

Rust backwards compatibility

Posted Aug 22, 2024 12:45 UTC (Thu) by khim (subscriber, #9252) [Link]

The real funny thing is that not releasing version 1.0 in a Rust world, doesn't free you from confines of SemVer: cargo just assumes that if the major version of some crate is zero then next number is the actual major version.

Means you can rely on version 0.5.13 of SQLx to be compatible with version 0.5.0 of SQLx, but version 0.8.0 of SQLx may require changes to the code.

It's almost like difference between two major version, in fact on technical level it's exactly the same. Difference lies in social contract: crates with major version larger then zero usually promise to support certain version for some time, while crates with major version zero usually ask you to change the code if you want bugfixes. IOW: if you use old version of zero-version crate then your code wouldn't, suddenly, break when new version is released, but if you want to file a bug then you need to first upgrade to the latest version.

That's closes to the difference between normal and LTS kernels than to anarchy without any SemVer support.

Memory safety is still considered nice-to-have?

Posted Aug 19, 2024 20:41 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (20 responses)

> Poul-Henning Kamp said that the pro-Rust argument simply boiled down to "'all the cool kids do it'".

I must admit that I'm shocked people are still talking about Rust in this way in 2024. Memory safety is not OOP, or any of the other fads that have pervaded the software engineering landscape from time to time. Memory safety is mandatory. You have to care about it, or your language has to care about it for you. There is no other way. Rust provides a unique zero-cost abstraction which enables the vast majority of a program to run at C++-like speeds without needing to care about memory safety at all. That is the argument for using it: It provides a specific, tangible benefit that is not available in most (if not all) other programming languages, and which is useful to every nontrivial program that allocates memory. Rust has a number of other, less interesting properties that also make it a good language, but borrow checking is the reason that Rust is special, and not just another "C but without all the warts" language.

There is certainly room to argue over exactly how *strong* of an argument in favor of Rust that is, and how it applies to FreeBSD's circumstances in particular. But reading the whole of Kemp's email, it does not sound as if he is seriously engaging with that argument in the first place.

As for the suggestion to ask people to reimplement some large piece of software, or else we won't take you seriously: I really want to believe that everyone in these discussions is arguing in good faith. I do not want to believe that Kemp is seriously advocating for an institutionalized hazing ritual before people may be taken seriously in this space. But I really struggle to see what kind of useful data he would expect to get out of such an effort, or in what other way it would be useful to require people to do such a significant amount of work before writing a proposal.

It makes some sense to ask to see an implementation of something useful that is not the language's own stdlib or compiler, in order to better understand how the language looks in practice (and how much it depends on external libraries, an entirely valid point that Kemp also brings up). But you don't then need to demand that the person making the proposal is also the person who authored said implementation - the benefit of that is really marginal at best, and it's severely limiting. So I'm really at a loss for what Kemp is getting at or why he thinks this is a good idea.

Memory safety is still considered nice-to-have?

Posted Aug 19, 2024 23:15 UTC (Mon) by jlarocco (subscriber, #168049) [Link] (17 responses)

> Memory safety is mandatory.

I agree, but you seem to have missed the part where the dev in the article decided not to use C++ smart pointers "" because I'm not real great with C++"".

It's absurd to advocate for Rust, while at the same time actively avoiding memory safety features in C++ to support the use of Rust. There seems to be a double standard.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 0:38 UTC (Tue) by khim (subscriber, #9252) [Link] (11 responses)

> There seems to be a double standard.

I don't see how.

> It's absurd to advocate for Rust, while at the same time actively avoiding memory safety features in C++ to support the use of Rust.

What do you call “memory safety features in C++”? ASAN or MSAN? I hope the Somers used them.

But smart pointers? They are not memory safety features! They are convenience features! You can have both dangling pointers and double free even with unique_ptr (the closest thing C++ have to “safety feature”) and string_view or span make these easier than raw C-style pointers! And don't even start on optional and visit!

C++ doesn't, really, have any safety features, only convenience features with “you are holding it wrong” refrain which require extensive knowledge about how temporary variables work work, how things like mandatory RVO works and so on to be able to use them without causing complete meltdown of you program.

And I'm speaking as someone who used C++ for more than 20 years and is still actively using it.

For someone who stopped actively following it years ago the “let me stick to something I know 100%” approach is not a bad choice.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 4:26 UTC (Tue) by jlarocco (subscriber, #168049) [Link] (10 responses)

> > There seems to be a double standard.

> I don't see how.

I'm assuming Rust code that went out of the way to use "unsafe" would be called out in review and not allowed. So why allow C++ code using "raw" pointers?

> But smart pointers? They are not memory safety features! They are convenience features! You can have both dangling pointers and double free even with unique_ptr (the closest thing C++ have to “safety feature”) and string_view or span make these easier than raw C-style pointers! And don't even start on optional and visit!

I took "smart pointer" to mean std::shared_ptr, which is nothing like unique_ptr, and much more like Rust's borrow checker.

> For someone who stopped actively following it years ago the “let me stick to something I know 100%” approach is not a bad choice.

If that's what they want to do good for them, but that doesn't mean they get to commit crap code.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 6:11 UTC (Tue) by roc (subscriber, #30627) [Link]

> So why allow C++ code using "raw" pointers?

Because "this" is a raw pointer in C++. So are all C++ references, effectively --- there is nothing to stop the referenced object going away, leaving behind a dangling reference.

Another big use of raw pointers or references is out-parameters to functions. No-one's going to use smart pointers there.

The dialect of C++ that doesn't use "this" or references anywhere would be a very strange one indeed. No-one does that.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 8:58 UTC (Tue) by excors (subscriber, #95769) [Link] (2 responses)

> I took "smart pointer" to mean std::shared_ptr, which is nothing like unique_ptr, and much more like Rust's borrow checker.

I think it's the other way round. std::shared_ptr<T> is Rust's Arc<T>: thread-safe reference counting, with significant run-time overhead. std::unique_ptr<T> is a crude version of Rust's T: single ownership with move semantics, almost zero run-time overhead, and you can borrow a non-owning reference to the owned object (T& or const T& in C++, &mut T or &T in Rust). (But std::unique_ptr is worse because it only supports heap-allocated objects, not stack-allocated; and it has a little more run-time overhead, checking for nullptr to see if it still owns the object, whereas Rust tracks ownership at compile-time; and unique_ptr gives lots of opportunities to violate memory safety, though it's still much safer than raw pointers.)

std::shared_ptr and std::unique_ptr are both typically called smart pointers, including by the C++ specification itself (section 20.3). And the context of the FreeBSD post is code where objects are allocated and freed within a single function, so std::unique_ptr would be much more appropriate (or maybe std::vector since they're mostly arrays).

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 18:09 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

I wouldn't characterize unique_ptr as equivalent to T. It is closer to Box<Option<T>>, but it implicitly calls Option::unwrap_unchecked() when you dereference it. Which would be horrible by Rust standards, but is very meh by C++ standards.

Memory safety is still considered nice-to-have?

Posted Aug 21, 2024 11:13 UTC (Wed) by excors (subscriber, #95769) [Link]

Technically that's correct, but I meant (but didn't really say) that idiomatically, I think you'd typically use std::unique_ptr<T> in the same places you'd simply use T in Rust.

E.g. in C++ you may avoid using a raw T because it has an expensive copy constructor and no move constructor, or because you suspect it might and its documentation doesn't provide a clear answer, or you're writing generic code so you can't know in advance. It's safer to wrap everything in pointers to avoid surprising performance issues, whereas in Rust you know it's always possible and cheap to move any T. In particular I think it's fairly common to use std::vector<std::unique_ptr<T>> to ensure there's no unwanted copying when the vector grows, whereas in Rust you'd just use Vec<T>.

std::unique_ptr<T> can be used in things like the pimpl idiom, i.e. a public class's member variables where the definition of T is private and should not be accessible to users of the class. You don't need to do that in Rust, since a public struct can have fields with private types and the visibility rules keep the implementation hidden. (That does mean the private type becomes part of the public ABI, but Rust doesn't provide ABI stability anyway.)

std::unique_ptr<T> can be used for members with delayed initialisation, where you don't want to (or can't) call the default constructor of T, so you leave the unique_ptr empty until it's initialised later. In Rust you can use Option<T> instead, which admittedly isn't T but it's much closer to T than to Box<T>, since there's no heap allocation.

std::unique_ptr<T> can be used for local variables when T is too large to comfortably go on the stack. In Rust, tough luck - when you create a new Box<T>, the compiler's probably going to construct a temporary T on the stack before memcpying it into the heap. You'll need to make sure your environment has large stacks, and then you might as well just keep T on the stack and skip the boxing. (And avoid defining large structs; e.g. never use large statically-sized arrays, use Vec instead.)

There are cases where you'd still want a singly-owned Box<T>, like polymorphic types (Box<dyn Trait>) or recursive types (binary trees etc), but I suspect that's rare compared to all the other uses for std::unique_ptr that shouldn't be translated into a Box.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 9:47 UTC (Tue) by khim (subscriber, #9252) [Link]

> I took "smart pointer" to mean std::shared_ptr, which is nothing like unique_ptr, and much more like Rust's borrow checker.

Nope. You either have no idea how Rust works or are mixing Rust and Swift.

Swift does automatic reference counting, Rust doesn't. Rust's references are much closer to std::unqiue_ptr than to anything else in a sense that Rust guarantees that when you are modifying something it's safe to do that: all other observers than may be affected by your changes are quiescent.

std::unqiue_ptr does more-or-less the same thing, but with a bad twist: if you are doing something wrong you are not getting a compile-time error, but runtime error — but only if you are lucky. That's why it's convenience feature and not a memory safety feature!

> If that's what they want to do good for them, but that doesn't mean they get to commit crap code.

Sure, but chances are high that if someone uses things they know and understand the end result would be better than if they would use something they don't, really, understand.

Situation with Rust is radically different: attempt to abuse something leads to compile time errors (there are certain soundness error where you can actually write code that compiler accepts and that is not correct, but it's very hard to hit these by accident… that's why Rust features are explicitly a safety features, but very explicitly not a security features, that's something else, again).

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 12:37 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (4 responses)

There are several comments here attempting to map C++ std::unique_ptr and std::shared_ptr to Rust and I don't think any of them really get it right.

std::unique_ptr<Goose> is Box<Goose>. This Goose lives on the heap somewhere, when we're done with it we need to destroy that heap allocation as well as whatever (if anything) should happen to the Goose. Both these types provide mechanisms to get the actual raw pointer out, and to convert a raw pointer back into the type, they're very parallel. It's easier to see what's going on when you Box::leak(a_goose) but the same API exists for std::unique_ptr just with a less explanatory name.

std::shared_ptr<Goose> is Arc<Goose> or in a few cases on some toolchains, magically Rc<Goose> instead -- hope the tooling was correct if that happened. There's a Goose, on the heap, and when we "duplicate" it we just get the same Goose again but we're counting how many distinct references to it exist, and once there aren't any we can get rid of the Goose. Again the APIs are parallel, std::weak_ptr is Weak<T> the most substantial difference is that C++ provides a mechanism to decide explicitly whether the control block is part of the same memory allocation. If it is, even though the std::shared_ptr<Goose> is "gone" we can't free the memory because our control block is in there, and that's needed for any remaining std::weak_ptr, but on the other hand if it isn't then we're doing a separate allocation for an item and its associated control block.

It is a problem that C++ programmers see "more modern" as safer, when it maybe isn't. For example Arc<Goose> doesn't allow multiple mutable references, but std::shared_ptr<Goose> is fine with that, and of course although programmers in C++ know they're supposed to never make mistakes they're only human, two references to the same Goose might get modified without a happens-before and now we've got a data race, it's Undefined Behaviour.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 12:57 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> It is a problem that C++ programmers see "more modern" as safer, when it maybe isn't.

Modern C++ absolutely is safer, but the big problem is that modern C++ is not a collection of modern features. Instead of being something in the language it's more of something outside of the language. Core guidelines of collection of Abseil tips or things like that.

With some support from the language, sure, but mostly documents for humans.

And they are pretty big.

Learning them may improve robustness of your code but blind attempt of using tools designed to support these guides may easily lead to the opposite.

That's the issue: unlike Rust tools that are designed to work with ignorant users C++ tools are very much not designed for that.

Rust tools wouldn't withstand in the face of malicious intent (that's why they are not security tools, but safety tools) but they handle ignorance just fine. C++ doesn't handle it well.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 15:59 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

For example std::span and std::string_view are modern. But, unlike the Rust &[T] and &str they are of course very dangerous because in Rust these are borrows and so they're checked, but in C++ they are not checked.

You could argue they aren't _less_ safe, but even where that's true, they aren't _more_ safe, they're just more modern

For string_view in particular we know people took code that was wasteful but correct (copying strings needlessly) and converted it to code that's fragile or outright wrong through use of string_views whose underlying string might vanish while they're in use.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 16:22 UTC (Tue) by khim (subscriber, #9252) [Link]

> You could argue they aren't _less_ safe, but even where that's true, they aren't _more_ safe, they're just more modern

No. They are safer and faster. They are combining two unrelated parts and they don't rely on zero-termination. These things already give you more safety then C counterpart.

They are not memory safe, true, but they are safer.

The danger is, of course, in thinking: oh, these are modern facilities, surely they should make everything memory-safe if not abused? And no, they don't give you that kind of safety.

> For string_view in particular we know people took code that was wasteful but correct (copying strings needlessly) and converted it to code that's fragile or outright wrong through use of string_views whose underlying string might vanish while they're in use.

Yes. But if you would try to convert that code into pile of raw pointers chances are that you would screw up everything even more badly.

I would say that problem is not that “Modern C++” is not safer than “old C++” or “C” but that it's sold as if it's improvements in safety are comparable to what Rust offers, where in reality they are marginal at best.

Memory safety is still considered nice-to-have?

Posted Aug 21, 2024 3:12 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Mentioned in another comment, repeating+expanding here for visibility and clarification:

C++ smart pointers are not directly analogous to Rust smart pointers, because C++ smart pointers have an empty state, and Rust smart pointers don't. This is required for C++ move semantics* to work. So all C++ smart pointers are de facto equivalent to Rust Ptr<Option<T>> (or perhaps Option<Ptr<T>>) rather than Ptr<T> (for the appropriate value of Ptr), but because C++ is unsafe, they all implicitly call unwrap_unchecked() (or as_ref().unwrap_unchecked(), etc.) every time you dereference them.

* In C++, moving an object has no effect on the original object's lifetime. A move constructor is like a copy constructor, except that it is allowed/expected to take ownership of the original object's resources. The original continues to exist until whenever it would otherwise be destroyed, and then its destructor is run if it has one. Because we don't want unique_ptr to double-free every time it is moved, we have to put it in an empty state where it won't free anything. In Rust, moving an object causes the original to magically vanish at the moment the move completes, without running the destructor at all. Well, actually there's no magic, because Rust is a systems language, so this is all well-specified and you can do it by hand in unsafe Rust. Moving an object is implemented as if by memcpy, and once the byte copy completes, the original object is "garbage" (i.e. you must not access it or otherwise interact with it, not even to run its destructor), and you may then deallocate or reuse its memory if you so desire. You can see a worked example of this in the section of the Rustonomicon where they implement Vec. In safe Rust, the compiler takes care of these details automatically, and causes it to look like the object has "magically vanished" (by the devilishly clever method of throwing a compile error if you try to interact with it after moving from it).

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 0:42 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (4 responses)

> It's absurd to advocate for Rust, while at the same time actively avoiding memory safety features in C++ to support the use of Rust. There seems to be a double standard.

That is not what Somers said. He said he avoided some C++ features because he was unfamiliar with them, not because he wanted to use Rust instead. He did also say that he would have used Rust instead if possible, but that's an entirely different thing to intentionally sabotaging FreeBSD's C++ in order to make it compare unfavorably to Rust (which is what you appear to be describing). I don't think anyone has credibly accused him of doing that.

One could, I suppose, contend that a volunteer is wrong to choose to spend his time learning Rust instead of C++, or that it is somehow "unfair" for someone to prefer learning one over the other. Personally, I am not willing to make arguments of that form, because it strikes me as presumptuous. Volunteers may decide how to spend their own time.

I would like to interpret your argument as something more sensible, but I'm struggling to think of anything else it could mean.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 4:34 UTC (Tue) by jlarocco (subscriber, #168049) [Link] (3 responses)

I didn't accuse him of sabotaging FreeBSD's C++. My point was that there are less extreme ways to get memory safety than by jumping to Rust, but he hadn't bothered to use them.

There are smart pointers (std::shared_ptr, std::unique_ptr, etc.) ASAN, valgrind, coverity and linters, etc. that help make C++ safer but don't require an entirely new language and compiler infrastructure.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 8:07 UTC (Tue) by taladar (subscriber, #68407) [Link]

And all of those ways required years of time investment into what can at this point be considered a dead end approach to memory safety.

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 8:34 UTC (Tue) by farnz (subscriber, #17727) [Link]

Note that he didn't write C++ specifically; he used a few C++ features in his C code, but the languages he knows well enough to be confident with are C and Rust.

This means that for him to learn to write memory-safe C++ would involve him learning C++; what's the incentive to learn and get good at C++ if you're happy with Rust, competent in C, and only use C++ in areas where C's a bit lacking but Rust is not available?

Memory safety is still considered nice-to-have?

Posted Aug 20, 2024 10:05 UTC (Tue) by moltonel (guest, #45207) [Link]

> there are less extreme ways to get memory safety than by jumping to Rust, but he hadn't bothered to use them.

From the PoV of an individual developer, getting memory safety in C++ is a more extreme endeavor than getting it in Rust. C++ requires a lot more knowledge, tooling, and diligence to get to the same safety level as Rust's baseline. You can't blame Somers for investing more skills in a language that gets more things done.

Memory safety is still considered nice-to-have?

Posted Aug 21, 2024 16:31 UTC (Wed) by mrugiero (guest, #153040) [Link] (1 responses)

> Poul-Henning Kamp said that the pro-Rust argument simply boiled down to "'all the cool kids do it'".

Yeah, this part angered me a bit. This was just condescending elitism with no argument behind it. There were some quite reasonable arguments against it later on, but this wasn't one.

Memory safety is still considered nice-to-have?

Posted Aug 25, 2024 16:51 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Yeah, this part angered me a bit. [...] There were some quite reasonable arguments against it later on, ...

The most "angered" person should be PHK himself for sabotaging all his reasonable arguments with such an introduction.

Glad I am not using FreeBSD

Posted Aug 20, 2024 8:10 UTC (Tue) by taladar (subscriber, #68407) [Link] (23 responses)

> I also propose, that next time somebody advocates for importing some "all the cool kids are doing it language" or other, we refuse to even look at their proposal, until they have proven their skill in, and dedication to, the language, by faithfully reimplementing cvsup in it, and documented how and why it is a better language for that, than Modula-3 was.

Reading this I am so glad I am not using FreeBSD anywhere if this is the level of discourse of important people in the project.

Also, why would anyone want to reimplement some ancient CVS related tool? Presumably that is still in some way important to FreeBSD?

Glad I am not using FreeBSD

Posted Aug 20, 2024 10:53 UTC (Tue) by khim (subscriber, #9252) [Link] (22 responses)

I think you are missing a lot of context here, that's why the overreaction.

> Also, why would anyone want to reimplement some ancient CVS related tool? Presumably that is still in some way important to FreeBSD?

I'll correct for you: Also, why would anyone want to reimplement some ancient CVS related tool central tool which is used to manage FreeBSD code?

Does that question even need answers?

> Presumably that is still in some way important to FreeBSD?

From the FreeBSD wiki: FreeBSD uses a mixture of CVS and Perforce for managing the various source trees and projects; CVS (extended with cvsup) is the "authoritative" revision control system, and contains four complete and independent repositories (src, ports, projects, doc), but its limitations regarding heavily branched independent development are significant (emphasis mine).

Asking to reimplement pretty important tool that's written, for some unfathomable reason, in a Modula-3, of all things, sounds like a pretty reasonable request.

There are plenty of people who are seeking projects to rewrite in Rust (as learning excercise), asking one of them to rewrite CVSup before doing more commitment would be a good idea. Maybe it's possible to even convince some company to give funds for such excercise to someone.

Glad I am not using FreeBSD

Posted Aug 20, 2024 11:13 UTC (Tue) by Vorpal (guest, #136011) [Link]

> I'll correct for you: Also, why would anyone want to reimplement some ancient CVS related tool central tool which is used to manage FreeBSD code?
>
> Does that question even need answers?

It does raise the question as to why FreeBSD would still be relying on CVS in this day and age...

That said Modula-3 seems to be a saner language than C or C++ (I say this as a professional C++ developer who have worked in safety critical hard real-time for over a decade, and am now a huge fan of Rust).

Looking at the examples on Wikipedia for Modula-3 it reminds me of a mix of Pascal/Ada with the upper case of Fortran. I prefer something a bit less verbose and more functional personally, but it sounds like it is at least somewhat memory safe.

At the time it was probably a sensible choice (the other options would be been Ada or a scripting language I guess?). Now it will suffer from the lack of an ecosystem, which means fewer people who understand the code, and less available libraries (meaning you have to write everything yourself).

Glad I am not using FreeBSD

Posted Aug 20, 2024 13:37 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

I think there are a few sensible precautions before anybody took on the considerable work of rewriting CVSup in Rust

- Is there "buy in"? Do the users of CVSup want Rust CVSup, or does it just go in the pile of "Tasks we set for Rust developers as a prank" when you're finished?

- Is the present behaviour of CVSup well characterised, documented, maybe there are even unit tests so that we can tell whether our Rust CVSup is in fact a working replacement ?

- Is there something CVSup doesn't do that a Rust CVSup could fix? Or equally something it does do that it shouldn't?

My guess is that in fact these are merely "top bants" and in practice there is no actual interest in a Rust CVSup so this exercise would be futile.

Notice the linked Perforce page just tells you FreeBSD hasn't used Perforce for years. This is legacy documentation, most of FreeBSD's documentation can be described as "Somebody wrote this, nobody is in charge of making sure it's still correct and we don't keep records of what it's about or why it was written. Good luck". Maybe CVSUp is exciting new software just introduced, maybe it is being replaced with a C equivalent for platform compat reasons. Maybe both those stories are long obsolete.

Glad I am not using FreeBSD

Posted Aug 21, 2024 17:34 UTC (Wed) by mrugiero (guest, #153040) [Link]

You need to no further from "is it likely that the first thing included in base done in <language> will be one of the most core ones?" to know it is just meant as a disincentive. It's just meant to raise the barrier of entry, quite the opposite to what Rust is about. That is, condescending gatekeeping.

Glad I am not using FreeBSD

Posted Aug 20, 2024 14:14 UTC (Tue) by HJVT (guest, #172982) [Link] (2 responses)

> Asking to reimplement pretty important tool that's written, for some unfathomable reason, in a Modula-3, of all things, sounds like a pretty reasonable request.

That's not the request though. The request is for someone interested in getting language A adopted in base to reimplement a tool written in an obscure language B, that has not seen continued development in nearly 15 years, nor has ever seen significant adoption.
So it would seem absolutely fair to assume that barely anybody knows the language B, which means the request is actually to learn this language to a fairly competent level. Which somehow will demonstrate that the value of adopting the language A?

Glad I am not using FreeBSD

Posted Aug 20, 2024 14:25 UTC (Tue) by khim (subscriber, #9252) [Link]

Have you actually tried to rewrite something? I mean: ever in your life?

Sure, you need to know two languages, but most of the time you have to be an expert only in target language, you need to have only very rough understanding of source language.

Otherwise these projects that replace COBOL with Java would have been entirely impossible (and in reality only half of them fail).

Glad I am not using FreeBSD

Posted Aug 21, 2024 17:38 UTC (Wed) by mrugiero (guest, #153040) [Link]

Playing Devil's advocate, the underlying message must be something along the lines of "we don't know if Rust will be around in 10 years from now, let's see you deal with our previous mistakes of early adoption so you know what we'd be signing up for in case Rust becomes an old obscure language as well". But that said, that argument is already old IMO. If there is ONE thing where "all the cool kids do it" is actually a fair argument is for chances of it being around in 10 years. When "all the cool kids do it" you create tons of future legacy code that someone will have to deal with, which means someone will maintain a toolchain for it.

Glad I am not using FreeBSD

Posted Aug 20, 2024 14:39 UTC (Tue) by a12l (guest, #144384) [Link] (4 responses)

> From the FreeBSD wiki: FreeBSD uses a mixture of CVS and Perforce for managing the various source trees and projects; CVS (extended with cvsup) is the "authoritative" revision control system, and contains four complete and independent repositories (src, ports, projects, doc), but its limitations regarding heavily branched independent development are significant (emphasis mine).

That article is there for historical reasons (as noted in the banner at the top), and not actually relevant nowadays. FreeBSD has migrated from CVS --> SVN --> Git, the latest migration done around 2020.

Glad I am not using FreeBSD

Posted Aug 20, 2024 14:44 UTC (Tue) by khim (subscriber, #9252) [Link] (3 responses)

If that's true then I have to agree with taladar and repeat after him: I am so glad I am not using FreeBSD anywhere.

Glad I am not using FreeBSD

Posted Aug 20, 2024 15:01 UTC (Tue) by shawn.webb (subscriber, #118686) [Link] (2 responses)

What's wrong with FreeBSD using git? How do you stay away from any project that uses git?

Glad I am not using FreeBSD

Posted Aug 20, 2024 15:23 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

Nothing is wrong with using Git. But asking for a rewrite of a tool that you are no longer using is just dishonest.

Glad I am not using FreeBSD

Posted Aug 21, 2024 7:16 UTC (Wed) by viro (subscriber, #7872) [Link]

Modula 3 toolchain had been a recurring nightmare for maintainers. Having a critical (at the time) tool written in that was a painful mistake; the real solution was to switch away from CVS and be done with both cvsup and modula 3 support. An intermediate was "fuck that m3 shite, let's rewrite the parts of cvsup we really need in C, so the entire system wouldn't be a hostage of that horror" (csup). I suspect that phk point is not so much a literal requirement for rust-in-core-system advocates as a reminder of the last time when somebody decided that "it's such a nice language" was a sufficient argument for making the system depend on unstable toolchain.

They paid _hard_ for that mistake with m3. Granted, rust toolchain is nowhere near that horror (look it up yourself - it really could be used as an object lesson in how not to do language toolchains), but cvsup story must've left very painful scars.

Glad I am not using FreeBSD

Posted Aug 21, 2024 8:06 UTC (Wed) by taladar (subscriber, #68407) [Link] (10 responses)

Personally I wouldn't use any project that still uses CVS. Granted, CVS is slightly better than Subversion but it is still an extremely bad version control system compared to even the less popular modern ones.

Glad I am not using FreeBSD

Posted Aug 21, 2024 8:46 UTC (Wed) by chris_se (subscriber, #99706) [Link] (9 responses)

> CVS is slightly better than Subversion

Huh? Personally, I found SVN to be a huge step up from CVS. Back in the day (pre git) I switched over to it from CVS before even SVN 1.0, because it was so much better in my eyes. Granted, I've been using git exclusively for everything for a long time now, and I don't want to look back. And don't get me wrong: I do have lots of criticisms for SVN - but I'm utterly baffled by the statement that CVS is better than SVN. Could you elaborate what CVS does better than SVN in your eyes?

Glad I am not using FreeBSD

Posted Aug 21, 2024 10:46 UTC (Wed) by paulj (subscriber, #341) [Link] (4 responses)

Subversion was not very stable, least over the timeframe where it was the only serious alternative to CVS. The easier to setup backend (I forgot the name) in particular had some corruption bugs. My vague memory is that the by the time it was reliable enough to trust, there were other and better alternatives (distributed SCMs that is).

Glad I am not using FreeBSD

Posted Aug 21, 2024 15:16 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

The initial backend used BerkeleyDB, and it sometimes became "wedged", requiring manual intervention. They fairly quickly followed it by FSFS (filesystem over filesystem) that used immutable files to represent revisions. This backend was (and still is) rock-solid. That was still years before git.

Glad I am not using FreeBSD

Posted Aug 21, 2024 15:50 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

Yeah, BDB.

Googling suggests FSFS came out in SVN 1.1, in 2004-09-29. Anyone burned by the BDB implementation in SVN was, obviously, going to wait a while to see how FSFS panned out before jumping in. At that point, the DSCM movement was already underway, with monotone and Darcs (evolving from Arch) in existence - giving further pause to anyone considering switching SCM to see how things would pan out. Less than 8 months later we got git (and mercurial soon after).

There was a window in 2002 to '03 or so when SVN looked like maybe it was the answer to "what should replace CVS?". But it was buggy. By the time you could FSFS was available and trustworthy, it was too late.

Glad I am not using FreeBSD

Posted Aug 22, 2024 7:18 UTC (Thu) by chris_se (subscriber, #99706) [Link] (1 responses)

I apparently got really lucky. I remember using SVN quite a lot in the early 2000's (starting ~2001 or so), and while the FSFS transition was nice because it was a lot faster than BDB, I never really had issues with the BDB backend (at least as far as I can recall 20 years after the fact).

Additionally, while git was started in 2005, I remember looking at it in early 2006 (or so) and was immediately put off, because the user interface back then was atrocious (IMHO). I don't remember exactly when, but I only looked at git again somewhere in 2008 or 2009, when the user interface had already gotten much better.

So I had at least 7 years of mostly good experiences with SVN back then. Now of course, in hindsight there are a lot of shortcomings in its design, and from the lens of today I'd probably characterize my experience working with SVN back then very differently. But at the time I didn't have that much to complain about. I definitely wouldn't want to go back from git.

I can definitely see the perspective you shared as to why you didn't seriously consider SVN as a successor to CVS for historic reasons. But I'm still confused as to the statement I replied to initially that CVS is slightly better than SVN (present tense). It might have been in 2003 when FSFS didn't yet exist (due to bugs in BDB), but nowadays?

Glad I am not using FreeBSD

Posted Aug 22, 2024 11:42 UTC (Thu) by anselm (subscriber, #2796) [Link]

But I'm still confused as to the statement I replied to initially that CVS is slightly better than SVN (present tense).

AFAIR the main advantage of SVN compared to CVS was that SVN would let you rename directories.

I personally used SVN for a bit but then moved over to Arch and eventually to Mercurial (which IMHO is way underrated). At work these days we're using Git, but for me, that needs Magit to make it halfway bearable.

Glad I am not using FreeBSD

Posted Aug 22, 2024 8:04 UTC (Thu) by taladar (subscriber, #68407) [Link] (3 responses)

CVS repos are much easier to import into something modern like Git because Subversion did that stupid thing where it tried to make everything a path and so you can end up with multiple revisions for tags and projects using their own concepts that do not exactly match branches or tags based on that low level filesystem-path abstraction.

Glad I am not using FreeBSD

Posted Aug 26, 2024 7:49 UTC (Mon) by marcH (subscriber, #57642) [Link] (2 responses)

Yes: Subversion "quietly" got rid of... tags and branches! Quite extraordinary for a version control system. Even more extraordinary: it took me a year or two to realize it and describe it here:

https://en.wikipedia.org/wiki/Apache_Subversion#Subversio...

Glad I am not using FreeBSD

Posted Aug 26, 2024 9:30 UTC (Mon) by kleptog (subscriber, #1183) [Link] (1 responses)

Yeah, SVN made it very easy to branch (just copy, simple right?) but merging was a clusterf*ck. At one point they added "merge tracking" to keep track which revisions had been merged from where but it never worked well for me,

I think it was Linus in a discussion about Git noting that the one thing a revision control system really needed to be good at was merging and SVN failed that miserably.

git-svn was the only thing that made it usable for me. Then I could have local branches without dealing with the server.

Glad I am not using FreeBSD

Posted Aug 26, 2024 15:13 UTC (Mon) by marcH (subscriber, #57642) [Link]

I learned git with... git-svn the first time: purely because I was so sick of SVN (and not just the lack of branches).


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds