Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 18:35 UTC (Thu) by mss (subscriber, #138799)
In reply to: Rust in the Linux kernel (Google security blog) by Cyberax
Parent article: Rust in the Linux kernel (Google security blog)

> but safe C++ requires a very particular style of coding. It's often incompatible with high performance (e.g. returning std::shared_pointer)

Rust memory safety isn't free either, right?

By the way, one would normally use a std::unique_ptr smart pointer template (which does not do reference counting).
std::shared_ptr is only required when the object has to be kept in multiple places at the same time (and then reference counting is necessary).

> has very glaring issues (exception handling).

Pretty much every large C++ project does not use exceptions so there is no requirement for kernel to use them, either.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 18:42 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (35 responses)

> Rust memory safety isn't free either, right?
It actually is in a fairly large amount of cases. In many cases it's faster than C++ because safe C++ would require suboptimal constructs.

> By the way, one would normally use a std::unique_ptr smart pointer template (which does not do reference counting). std::shared_ptr is only required when the object has to be kept in multiple places at the same time (and then reference counting is necessary).
Another major issue are structures that contain pointers. In C++ you have to pessimise and use shared_pointer if you want the structure to be useful in a generic case because there's no way to tell "this object's lifetime is now controlled by the parent structure". Neither is it possible to handoff ownership from a shared_ptr back to unique_ptr.

> Pretty much every large C++ project does not use exceptions so there is no requirement for kernel to use them, either.
Exactly. And this is a problem.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 19:35 UTC (Thu) by mss (subscriber, #138799) [Link] (21 responses)

> In many cases it's faster than C++ because safe C++ would require suboptimal constructs.

That's a very general statement.

> In C++ you have to pessimise and use shared_pointer if you want the structure to be useful in a generic case because there's no way to tell
> "this object's lifetime is now controlled by the parent structure".

This is normally controlled by object (class) hierarchy.

Usually, when a class needs to store a pointer to an object it is because it has been transferred ownership of that object.
Then unique_ptr is enough.

Sometimes a class needs access to something in one of its parent classes.
Then either a raw pointer to that object should be stored (by the child class constructor, to force proper construction order at the parent class) or, alternatively, the child class should store a (raw) pointer to its parent object which implements a getter method for that thing.
In that case, one has to think whether using inheritance instead of composition would make more sense there.

A shared_ptr is necessary only in the case a class wants to "pin" an object of unspecified lifetime (with respect to its own lifetime).
But at that point reference counting is pretty much always necessary anyway.

Blindly using shared_ptr everywhere, besides the performance impact, is a circular reference bug waiting to happen.

> > Pretty much every large C++ project does not use exceptions so there is no requirement for kernel to use them, either.
> Exactly. And this is a problem.

Are you suggesting that kernel *should* make use of C++ exceptions?

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 20:17 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (20 responses)

> This is normally controlled by object (class) hierarchy.
Nope. C++ does offer a somewhat streamlined model for ownership of data (copy/move constructors), but it fails utterly with flexible ownership. There are third-party tools (e.g. boost::pointer_container) but they are all kinda awkward.

And sorry, but inheritance and OOP have nothing to do with that.

> A shared_ptr is necessary only in the case a class wants to "pin" an object of unspecified lifetime (with respect to its own lifetime). But at that point reference counting is pretty much always necessary anyway.
It can be avoided in Rust most of the times.

> Are you suggesting that kernel *should* make use of C++ exceptions?
No, that C++ should be redesigned to actually be fully usable without exceptions.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 21:36 UTC (Thu) by bvanassche (subscriber, #90104) [Link] (19 responses)

>> Are you suggesting that kernel *should* make use of C++ exceptions?
> No, that C++ should be redesigned to actually be fully usable without exceptions.

I think that C++ is fully usable without exceptions. A great and efficient alternative for exceptions and error codes is available in the Abseil library. See also Abseil Status.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 22:41 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (18 responses)

Note that this means giving up on some core bits (`new`) and needing a new algorithms library (because the STL algorithms' only "please stop, we found a problem" is an exception). Ranges might help with this, but I've not used them myself.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 7:32 UTC (Thu) by ncm (guest, #165) [Link] (17 responses)

There are in fact no algorithms in <algorithm> that rely on throwing exceptions. Users may provide callbacks that throw exceptions, but that is wholly voluntary and anyway, in my experience, never necessary.

We don't need falsehoods here.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 11:36 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> There are in fact no algorithms in <algorithm> that rely on throwing exceptions.
That's false. Plenty of algorithms in <algorithm> rely on exceptions. E.g. std::copy_n with a back_inserter will throw on out-of-memory.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 17:06 UTC (Fri) by ncm (guest, #165) [Link] (4 responses)

std::back_inserter is not an algorithm.

If you choose to specialize std::copy_n on it—as for any callback—exceptions are your responsibility. Normally, if one does not want exceptions from std::back_inserter, one calls reserve() on the container, first, to establish enough storage, which is anyway more efficient; or, for nodal containers, construct it with an allocator and make sure the allocator had enough in reserve.

Taking responsibility for what your code does is something we call programming.

Falsehoods do not improve the discourse here.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 18:55 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Normally, if one does not want exceptions from std::back_inserter, one calls reserve() on the container, first, to establish enough storage, which is anyway more efficient; or, for nodal containers, construct it with an allocator and make sure the allocator had enough in reserve.
"reserve" requires an upfront knowledge of the resulting size. But even knowing the resulting size is not enough, because copy constructor itself might throw (e.g. copying a string and getting an OOM).

C++ absolutely relies on exceptions and working around them results in a lot of ugly code.

> Taking responsibility for what your code does is something we call programming.
If everybody actually took responsibility for what they write, C/C++ programmers would be in prison, serving lifetime sentences for the reckless endangerment.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 19:40 UTC (Fri) by mss (subscriber, #138799) [Link] (2 responses)

> C++ absolutely relies on exceptions and working around them results in a lot of ugly code.

It's STL that you are talking about, not the core C++ language.

This distinction is important when considering C++ for kernel usage - we would probably only use parts of STL,
after adapting it for this environment.

> If everybody actually took responsibility for what they write, C/C++ programmers would be in prison, serving lifetime sentences for the reckless endangerment.

That kind of opinion doesn't help keeping the discussion technical and civil.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 19:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> It's STL that you are talking about, not the core C++ language.
I disagree. Core C++ also relies on exceptions. It's the only way to return errors from constructors, for example. With copy constructors being the worst offenders.

It's fair to say that it's possible to work around that issue, but then you'll lose most of the STL (including std::string). And you code will quite often not be idiomatic C++.

> That kind of opinion doesn't help keeping the discussion technical and civil.
Yet it's true. And I say that as a C++ developer.

I believe that writing in C++ is absolutely irresponsible at this point, and the whole safety culture needs to change.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 19:59 UTC (Fri) by mss (subscriber, #138799) [Link]

> It's the only way to return errors from constructors, for example.

Many C++ projects manage to use this language without resorting to exceptions.

It depends on a class, but it it often possible to initialize object to a dummy state in case of an unexpected error in a constructor
(like a NULL pointer in case of a smart pointer template, empty string for a string template, etc.).

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 13:27 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (10 responses)

I didn't say *rely*. I said that if you have errors that can occur while processing the iterator range, you need to throw an exception to *abort the algorithm*. They support gathering all the errors, but if you're just caring about *any* error, stopping the algorithm from doing useless work is pretty important.

As a concrete example, say I have a fallible function call that I want to call on all elements within an iterator range, bailing on the first problem and returning the good values into an output iterator. What <algorithm> do I use for this? AFAIK, all you can do is throw an exception to escape the algorithm. Basically, I want the behavior of this open-coded loop using `std::transform`. AFAIK, turning the `break` into a `throw` is all you have.

std::vector<T> output;
output.reserve(xs.size());
bool err = false;
for (auto const& x : xs) {
  if (!validate(x)) {
    err = true; // probably richer in real code
    break;
  }
  output.emplace_back(process(x));
}
if (err) {
  // handle
}

In Rust, I have:

let res = xs.iter().map(|x| if !validate(x) { Err(()) } else { Ok(process(x)) }).collect::<Result<Vec<_>>>();
// res Ok/Err status gives me all results or the first error (tossing good results that have already been made)

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 16:39 UTC (Thu) by hummassa (subscriber, #307) [Link] (5 responses)

C++20 has take_while, and your example would be like

auto output = xs | take_while(&validate) | transform(&process);

One could argue, though, if your Err() indicates a real error condition, throwing would be the right thing to do™ and that there are lightweight (zero-overhead) implementations of exceptions for freestanding C++ programs (e.g. kernels).

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 16:54 UTC (Thu) by hummassa (subscriber, #307) [Link]

Oh, and obviously, you can always implement kernel::outcome and do

auto output = xs | transform([](const auto& x){ return validate(x) ? process(x) : ValidationError(x); }) |
  collect_or_first_error();

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 17:54 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (3 responses)

Ah, that is ranges (which I'm not that well-versed in). That does indeed vastly improve things. Unfortunately such things aren't available until C++20, but that's just how it is sometimes (most of the C++ projects I work on are only *considering* C++14 right now due to their compiler support matrix).

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 21:22 UTC (Thu) by hummassa (subscriber, #307) [Link] (1 responses)

The following questions arise: Isn't a recent Rust compiler the minimum to compile Rust modules? Why should C++ be different?

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 0:46 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Various popular ones? Sure. Given that the kernel is likely in `no_std` land, the minimum is probably way lower than "typical" Rust is likely to have.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 21:24 UTC (Thu) by hummassa (subscriber, #307) [Link]

(Ah, and ranges are a library-only update, so, no problem in implementing them even in C++11 for a freestanding kernel toolchain)

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 17:24 UTC (Fri) by ncm (guest, #165) [Link] (3 responses)

I have used std::find_if to walk a sequence and stop short, with no exceptions thrown or caught. It is quite easy and clean with a lambda.

And, of course, it is very, very easy to code your own variations, typically needing not more than one or two lines in the body. (Such an algorithm used very frequently should be proposed for the library in an upcoming Standard.)

This is a practice we call programming. It is not fundamentally different from in other languages, but in C++ is generally more fun.

When you have had to reach so far off your center of gravity to try to support an argument, and fail anyway, it is usually better to concede the point.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 18:15 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

I guess I could have made the example more robust. But if `validate` is a condition on the transformation result, what is the recommended way? Say the body was:

{ auto transform = process(x); if (is_valid(transform)) { v.emplace_back(std::move(transform)); } else { break; } }

I suspect it needs something like Rust's `try_for_each` kind of algorithms, but I don't know for sure. Maybe std::outcome or whatever it ends up being will help here, but I suspect it'll just be "use ranges".

> but in C++ is generally more fun.

To each their own. But I also find Python not all that *fun* to work in, so tastes are all over the place.

> When you have had to reach so far off your center of gravity to try to support an argument, and fail anyway, it is usually better to concede the point.

Ranges just haven't been in my purview. Yes, they probably (basing on what I'm seeing at least) do make things as easy to compose and work as Rust's Iterator trait. But, C++ is C++ and getting everyone to move to the new version is difficult (and bringing in dependencies to polyfill isn't easy either). I'm glad that there's something new, but I think my point stands that if you have error handling you want in <algorithm> you're either stuck with iterating multiple times (which might not work if your iterator isn't over a view, but instead consumes like a stream iterator) or throwing an exception for early termination (or open-coding which I guess works too, but I find named algorithms easier to read).

Rust in the Linux kernel (Google security blog)

Posted Apr 24, 2021 23:46 UTC (Sat) by flussence (guest, #85566) [Link] (1 responses)

Then please do us all a favour and concede. This entire thread tree was painful to read and reminds me of the OpenOffice ones.

If this kind of broken-record conversational sandbagging is representative of the C++ community then I'm glad the language is being pushed out.

Rust in the Linux kernel (Google security blog)

Posted Apr 25, 2021 16:59 UTC (Sun) by corbet (editor, #1) [Link]

Perhaps everybody involved should let this conversation go at this point...I don't set it shedding any real light if goes any further.

Rust in the Linux kernel (Google security blog)

Posted Apr 18, 2021 8:26 UTC (Sun) by ncm (guest, #165) [Link] (12 responses)

It does not improve the dialog on LWN to post falsehoods about a language you clearly know very little about.

In fact shared_ptr is rarely used in well-designed C++ programs, and not used at all in most. It would never be used at all in kernel code. (Overuse of shared_ptr is called Java Disease. It has proven curable.)

FUD about imaginary C++ inefficiencies is similarly unwelcome. The common experience is that C++ code is faster than C code doing the same work, for a variety of reasons, among them that the higher-level description gives the compiler more implicit knowledge of what is going on, and libraries can capture more of semantics including optimizations. (This is the reason that C++ is used in HPC, to the exclusion of other languages.)

Linus's ignorant, childish ranting on the topic is a long-standing embarrassment. He is completely wrong on his facts, and you are too.

Rust in the Linux kernel (Google security blog)

Posted Apr 18, 2021 9:01 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> It does not improve the dialog on LWN to post falsehoods about a language you clearly know very little about.
I've been writing C++ since mid-90-s, starting from 16-bit Borland and then graduating to 32-bit Watcom with a DOS extender. I then moved on to gcc (including a non-existing 2.96 version) and afterwards had to struggle with VC6 and its horrible standard incompatibilities. I struggled quite a bit with metaprogramming in Boost (hey, I even used Boost.Spirit!) with C++03, and it was very frustrating but fun. I did made a pause for a bit but returned to hard C++ two years ago.

But yeah, I don't know anything about C++.

> In fact shared_ptr is rarely used in well-designed C++ programs, and not used at all in most. It would never be used at all in kernel code. (Overuse of shared_ptr is called Java Disease. It has proven curable.)
Really? The number of refcounted objects in the kernel is pretty huge. Linux also doesn't have a lot of generic code that has to work with arbitrary data.

C++ doesn't really have good solutions for that. E.g.: "void doSomething(const std::vector<std::unique_ptr<Obj>> &foo);" won't work in a generic enough way because one of the callers might need to use "std::vector<Obj*>". Eventually every complex C++ library either falls back to std::shared_ptr or it mandates that objects use intrusive reference counting (usually via a common base class).

> (This is the reason that C++ is used in HPC, to the exclusion of other languages.)
Not really.

Rust in the Linux kernel (Google security blog)

Posted Apr 18, 2021 13:14 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> void doSomething(const std::vector<std::unique_ptr<Obj>> &foo);

Well, the problem is that the function is saying it needs a specific container of a specific type. Instead, the C++ way is to take an iterator pair over a type which behaves as you expect. Granted, without ranges, doing an in-iteration ".get() on each member" is a PITA, but the API is just bad here to begin with. It doesn't support std::set, or the values of a std::map either.

Where you'd use Iterator in Rust, one should use iterator pairs (or the new Container concept which probably supports ranges way better) in C++.

> > (This is the reason that C++ is used in HPC, to the exclusion of other languages.)
> Not really.

Agreed. Fortran is still *really* important here because, for number crunching, its aliasing rules are way more sensible than C and allows for far better optimization.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 7:26 UTC (Thu) by ncm (guest, #165) [Link] (7 responses)

It is remarkable to have used C++ for so long, yet learned so little.

It is far from true that "every complex C++ library either falls back to std::shared_ptr or it mandates that objects use intrusive reference counting." I rarely, today, see any such library, and use none, although they were common enough back in the '90s. So, another falsehood.

The kernel over-uses refcounting for reasons that often relate to its very weak implementation language. In C++ kernel code, obeying ref-counting rules could and would be automated, but in no case via use of std::shared_ptr, or operators new/delete. std::unique_ptr is adaptable enough to be useful in kernel code, so would be used, with exactly zero overhead vs. existing C code.

An easy way to use C++ in the kernel, today, is via eBPF; LLVM is happy to generate eBPF originating from C++ sources. It causes no problems of any kind.

C++ is, in fact, the preferred language for simulations at LLNL and Los Alamos, for reasons. Even when, at the lowest levels ("BLAS") Fortran code runs array operations, they are most often in service of C++ programs. There are sound reasons why both labs, along with others (e.g. CERN, Argonne), participate very actively at ISO Standard C++ meetings.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 10:50 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> I rarely, today, see any such library, and use none, although they were common enough back in the '90s. So, another falsehood.
Well, there are really not that many recent complex C++ libraries. But I can point you out to QT and Tensorflow, both are refcount-happy.

> In C++ kernel code, obeying ref-counting rules could and would be automated, but in no case via use of std::shared_ptr, or operators new/delete
I don't really see any difference. C++ code will inevitably result in refcounting slowly expanding everywhere. Pure C makes refcounting explicit, so people tend to avoid it. C++ requires discipline to avoid refcounting.

> An easy way to use C++ in the kernel, today, is via eBPF; LLVM is happy to generate eBPF originating from C++ sources. It causes no problems of any kind.
We both know that eBPF C++ does not support even a small fraction of the total C++ functionality. In particular, no dynamic allocation.

> C++ is, in fact, the preferred language for simulations at LLNL and Los Alamos, for reasons. Even when, at the lowest levels ("BLAS") Fortran code runs array operations, they are most often in service of C++ programs. There are sound reasons why both labs, along with others (e.g. CERN, Argonne), participate very actively at ISO Standard C++ meetings.
C++ is used because these code bases are ancient, and go back to times when C++ was pretty much the only choice for non-Fortran code.

But even in HPC right now Python is pushing away C++.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 17:43 UTC (Fri) by ncm (guest, #165) [Link] (5 responses)

> Well, there are really not that many recent complex C++ libraries.

> C++ code will inevitably result in refcounting slowly expanding everywhere.

> eBPF C++ does not support even a small fraction of the total C++ functionality.

When you need to trot out one obvious falsehood after another to try to hold up your argument, that tells us all we really need to know.

> C++ is used because these code bases are ancient

Ancient codebases do not rely on refining features going into a new Standard. The intense interest in new Standards evident at these labs is inconsistent with your claim.

So, another falsehood. Really, best to stop digging. We understand that you have not kept up.

Rust in the Linux kernel (Google security blog)

Posted Apr 23, 2021 18:52 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> When you need to trot out one obvious falsehood after another to try to hold up your argument, that tells us all we really need to know.
You seem to lose the track completely. For example, how do I allocate memory in eBPF programs? Never mind use unbounded loops and recursion.

> Ancient codebases do not rely on refining features going into a new Standard. The intense interest in new Standards evident at these labs is inconsistent with your claim.
Incorrect. I worked with ROOT and it successfully migrated over time from pre-03 C++ to C++17.

Rust in the Linux kernel (Google security blog)

Posted Apr 28, 2021 5:32 UTC (Wed) by ncm (guest, #165) [Link] (3 responses)

You will need a rather better argument to convince anyone that eBPF is not useful.

Rust in the Linux kernel (Google security blog)

Posted Apr 28, 2021 8:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Whut? Are you a Markov chain based random phrase generator?

Rust in the Linux kernel (Google security blog)

Posted Apr 30, 2021 1:30 UTC (Fri) by ncm (guest, #165) [Link] (1 responses)

Really, stop digging.

Rust in the Linux kernel (Google security blog)

Posted May 4, 2021 14:07 UTC (Tue) by nye (subscriber, #51576) [Link]

This is very confusing. I think maybe your last few replies have been posted to the wrong article by mistake?

Rust in the Linux kernel (Google security blog)

Posted Apr 18, 2021 18:14 UTC (Sun) by ssmith32 (subscriber, #72404) [Link] (1 responses)

I have disagreed with Cyberax before, but to say he doesn't know c++ or contribute to the discussion is silly, at least based on my experience.

In fact this exact discussion is helpful to me, as I'm escaping my day to day of Java by resurrecting an old c++ program written before any of this _ptr stuff, and have been trying to bring it up to date, so I can then... replace as much as us reasonable with Rust 😄

c++ is a big, old language. Calling some ignorant because they don't use it how you expect is not helpful.

Rust in the Linux kernel (Google security blog)

Posted Apr 22, 2021 7:37 UTC (Thu) by ncm (guest, #165) [Link]

In general, when somebody needs to lie to make his case, the most reasonable conclusion is that he has no case to make. We would need good reasons to reconsider such a conclusion.