Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 18:49 UTC (Thu) by matthias (subscriber, #94967)
In reply to: Rust in the Linux kernel (Google security blog) by mss
Parent article: Rust in the Linux kernel (Google security blog)

> The solution here is simple: just use C++ instead of Rust.
That is no solution.
> It is supported on practically every arch, has well-optimized compilers (been there for decades), smart (including reference-counted) pointers, RAII so bottom half of a function doesn't need to be a chain of "goto" labels for error handling, templates to replace ugly preprocessor-generated code with something actually manageable and much more...

Yes, just that the syntax for using smart pointers (which are the only pointers allowed in safe code) is very very ugly. Also, I do not see how safety is enforced. I am no expert in C++, but from what I have seen, the constructor of a smart pointer takes a standard C pointer as argument. How is one supposed to create a smart pointer without relating to unsafe code? Are there any implicit checks that throw a compile time error if I initialize a smart pointer with a possibly invalid pointer?

Also, from the explanations that I have seen, the smart pointers are there to enforce that the destructor is called (and memory freed) once the pointer goes out of scope. What is with the reverse problem, the object goes out of scope while the pointer is still there. How is this handled?

>Additional bonus is that C++ code can call C code directly at first, OOP wrappers can be introduced later.

That is in principle possible with rust, but it would be a very bad idea. Every call into the C code is unsafe. Therefore safe abstractions should be generated. If you want to use C++ instead of rust for safe development, you should also never directly call into C code without a safe abstraction.

> Rust memory safety is no longer guaranteed when the code calls an unsafe function. Which is pretty much every kernel API right now.

Yes, therefore safe abstractions should be added. These are safe to use as long as the underlying C-code has no bugs. Of course, if there a bugs in the C-code then all bets are off, regardless of which language calls into the code.

> From the linked article:
> > for every unsafe function, the developer must document the requirements that need to be satisfied by callers to ensure that its usage is safe;
> > additionally, for every call to unsafe functions (or usage of unsafe constructs like dereferencing a raw pointer),
> > the developer must document the justification for why it is safe to do so.
> You can easily do such manual tagging with C++, too.

The tagging (with unsafe keyword) is enforced by the compiler. If you do not tag, this is a compile time error. Can you do this with a C++ compiler? I mean emitting a compile time error for all unsafe code that is no tagged as such? Derefencing a standard C pointer obviously has to count as unsafe code.

What the article says is that additionally to the tagging, there has to be extensive documentation specifying exactly when and why the interface is safe to use. This is just good rust practice.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 19:01 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> How is one supposed to create a smart pointer without relating to unsafe code?
Modern C++ allows to forward constructor arguments. E.g.:

struct TestObject {
TestObject(int arg);
};
std::shared_ptr<TestObject> obj = std::make_shared(42);

> What is with the reverse problem, the object goes out of scope while the pointer is still there. How is this handled?
This shouldn't happen. Your smart pointer should encapsulate the ownership.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 19:17 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> This shouldn't happen. Your smart pointer should encapsulate the ownership.

smrtptr.get() is a thing. As long as that is callable and not banned via static analysis, you lose the lifetime "encapsulation" the smart pointers are supposed to provide. Unfortunately, `std::optional<T&>` is busted in C++ and is supposed to either be spelled `std::optional<std::reference_wrapper<T>>` or T*. Guess which one everyone is going to use?

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 19:19 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

> > How is one supposed to create a smart pointer without relating to unsafe code?
> Modern C++ allows to forward constructor arguments. E.g.:
> struct TestObject {
> TestObject(int arg);
> };
> std::shared_ptr<TestObject> obj = std::make_shared(42);

OK. This looks as if it should work. Not exactly what I call beautiful code. Probably unavoidable if one wants to stay compatible with old code.

> > What is with the reverse problem, the object goes out of scope while the pointer is still there. How is this handled?
> This shouldn't happen. Your smart pointer should encapsulate the ownership.

How does this work if the object is on the stack? Or is this not allowed to use pointers to objects on the stack?

Or if I have a composite object, how can I create a smart pointer to a sub-object, e.g. to pass it in a function call? Probably this can be done with reference counted pointers, but reference counting has a runtime overhead. In rust one can simply create a pointer to a sub-object and use it. Of course this is only allowed if it is clear by scope that the object always outlives the pointer to the sub-object. And this works without any runtime overhead. The compiler will verify at runtime that the lifetimes work out (or throw an error).

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 14:47 UTC (Mon) by nye (subscriber, #51576) [Link]

> > std::shared_ptr<TestObject> obj = std::make_shared(42);
> OK. This looks as if it should work. Not exactly what I call beautiful code. Probably unavoidable if one wants to stay compatible with old code.

This could be spelled "auto obj = std::make_shared<TestObject>(42);", if that makes you any happier.

> > > What is with the reverse problem, the object goes out of scope while the pointer is still there. How is this handled?
> > This shouldn't happen. Your smart pointer should encapsulate the ownership.
> How does this work if the object is on the stack? Or is this not allowed to use pointers to objects on the stack?

I don't think many people would dispute that C++ is a language which provides you with as many foot cannons as you could possibly want, so indeed it's "allowed" in the sense that it will compile, and in principle it could be done safely in certain circumstances - although I'd argue that in those cases it's actually worse than using a raw pointer thanks to the illusion of safety that it offers.

I wouldn't expect it to pass even the most cursory code review though, because creating a shared_ptr/unique_ptr by passing in a raw pointer to some existing object stands out much like an "unsafe" block in Rust. Actually more so, because I'm not sure there are any circumstances where it's unavoidable, unlike "unsafe". The same is true of using "new" or "delete" by the way, let alone "malloc" or "free": for a pure C++ project I consider them a very strong code smell.

Granted there will be some cases where it's avoidable but you want to do it anyway, eg because you have a strong performance reason to prefer a stack object over a heap allocation. Or maybe where you're implementing a glue layer to some legacy code, although then I'd probably expect it to be used with the return value of some legacy function, so at least you know it's not a stack object (or that function is already completely broken). Much like "unsafe", these should hopefully be cases that are infrequent enough that they can be given special attention.

> Or if I have a composite object, how can I create a smart pointer to a sub-object, e.g. to pass it in a function call? Probably this can be done with reference counted pointers, but reference counting has a runtime overhead. In rust one can simply create a pointer to a sub-object and use it. Of course this is only allowed if it is clear by scope that the object always outlives the pointer to the sub-object. And this works without any runtime overhead. The compiler will verify at runtime that the lifetimes work out (or throw an error).

Certainly it's not as easy to get it right as it is in Rust - this is IMO where we start getting into the areas where the benefits of Rust begin to outweigh the cost of learning a new language. For me personally this is one of the things that's help persuade me that it's worth switching. (Rust's enums are another major one.)

To me Rust feels a lot like a successor to C++ specifically, in that it seems to take a lot of ideas from C++ and do them right from the start[0]. The development of C++ is a history of following numerous blind alleys over a few decades, where all of those dead-end paths need to be left open for compatibility. It now provides you with the route to do it right, typically with a lot less pain than ten or twenty years ago, let alone thirty - but also with the route to do it wrong, and the best you can hope for is a compiler warning that tells you "this path takes you to your doom - are you sure that's what you want?". Rust says "actually we're going to brick up most of those paths, and the rest are going to be behind a big iron gate with a padlock labelled 'unsafe'". This is very attractive because it means you can spend more time thinking about what you're doing and less time obsessively checking the map.

[0] In particular, it's the only language I've tried in which the approach to RAII seems to me to be obviously correct.

Rust in the Linux kernel (Google security blog)

Posted Apr 20, 2021 8:40 UTC (Tue) by jamesh (guest, #1159) [Link]

> Or if I have a composite object, how can I create a smart pointer to a sub-object, e.g. to pass it in a function call?

If I have a "shared_ptr<T> parent" reference to a composite object, I can create a shared pointer ref to a sub-object with "shared_ptr<U>(parent, parent->child)". This new shared_ptr will use the same reference count bookkeeping data as the parent object.

However, if I expected my function to only be called with objects managed with unique ownership, I could use "const unique_ptr<T>&" as the argument type. That prevents copying or moving the variable, so the function cannot hold on to it past returning (assuming it doesn't bypass the smart pointer with get()).

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 20:21 UTC (Thu) by mss (subscriber, #138799) [Link] (20 responses)

> Yes, just that the syntax for using smart pointers (which are the only pointers allowed in safe code) is very very ugly.

I guess beauty in in the eye of the beholder.
By the way, {unique,shared,weak}_ptr are STL things, not core C++ language constructs.
Kernel implementation is free to implement its own templates here.

> What is with the reverse problem, the object goes out of scope while the pointer is still there.
> How does this work if the object is on the stack? Or is this not allowed to use pointers to objects on the stack?

Smart pointers manage dynamically allocated objects, like the ones allocated by kmalloc() or malloc() or operator new(),
not the ones with automatic storage duration (that's the ones that are created on stack and destroyed when they go out of scope).

> I have a composite object, how can I create a smart pointer to a sub-object, e.g. to pass it in a function call?

If the function never stores the pointer for use after it has returned then it should just accept a raw pointer.

Otherwise, smart pointers are tied to the memory allocation / deallocation calls - if the composite object is deallocated as one object that it should be managed by one smart pointer.
If it consists of multiple allocations for subobjects then it might make sense to manage them separately.

> How is one supposed to create a smart pointer without relating to unsafe code?

I totally disagree that code should not have any raw pointers.

Smart pointers & co. are just tools for writing better and safer code.
They do not need to be used where there is close to zero possibility of making a mistake (they have a cost, too).

For example, let's consider the code below (when exceptions aren't used):
> Foo *foo = new Foo;
> if (!foo)
> return false;
> foo->action(bar);
> delete foo;

I don't see a point of introducing a wrapper for "foo" here - even when the compiler is smart enough to optimize it out it is still a compile time cost.

> Are there any implicit checks that throw a compile time error if I initialize a smart pointer with a possibly invalid pointer?

How do you define a "possibly invalid pointer"?
If raw pointers are disallowed then how do you access, for example, MMIO registers at constant memory address?
Pointers to these by definition aren't given out by an allocation function.

After all, we are talking about reducing the number of (accidental) bugs here, not combating actively malicious behavior.

> These are safe to use as long as the underlying C-code has no bugs.
> Of course, if there a bugs in the C-code then all bets are off, regardless of which language calls into the code.

That was my point when I have said that:
> Rust memory safety is no longer guaranteed when the code calls an unsafe function. Which is pretty much every kernel API right now.

And due to, for example, the above thing about register access, you'll eventually end up calling some unsafe function.

> The tagging (with unsafe keyword) is enforced by the compiler. If you do not tag, this is a compile time error.

So the compiler enforcement is kind of all-or-nothing: either the function is untagged and it is 100% safe or the function is tagged and it is unsafe.
It would be better if it actually checked more specific requirements for the input or output parameters (something sparse and other checkers currently do).

> Derefencing a standard C pointer obviously has to count as unsafe code.

As I wrote above, you will never escape the necessity of having raw pointers and using them where they are cleanly unnecessary has some costs.

> What the article says is that additionally to the tagging, there has to be extensive documentation specifying exactly when and why the interface is safe to use.
> This is just good rust practice.

That's great!
It would definitely be a good C++ practice, too, to do so.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 20:34 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (7 responses)

> If the function never stores the pointer for use after it has returned then it should just accept a raw pointer.

No, you take a reference. Raw pointers always have `nullptr` which is a valid value. References do not (unless one has already performed UB).

> For example, let's consider the code below (when exceptions aren't used): (snipped)

`new` never returns `nullptr`. It throws an exception. Freestanding might change that, but that's not standard (yet).

> That was my point when I have said that:
> > Rust memory safety is no longer guaranteed when the code calls an unsafe function. Which is pretty much every kernel API right now.

The thing is that *all* of the C++ code is unsafe and needs auditing. Rust needs auditing at the boundaries of unsafe blocks. A *much* more doable task. unsafe blocks are not "I get to do whatever I want". They are "I am doing something I know is safe, but I cannot convince the compiler of that". One can still get it wrong, but, again, the review burden is *way* lower.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 20:59 UTC (Thu) by mss (subscriber, #138799) [Link] (6 responses)

> > If the function never stores the pointer for use after it has returned then it should just accept a raw pointer.
> No, you take a reference. Raw pointers always have `nullptr` which is a valid value. References do not (unless one has already performed UB).

For const reference parameters you are right.
Non-const reference function parameters are generally frowned upon.

Also, sometimes you also want to act differently on a NULL pointer, using it for example for making some of the parameters optional.

> new` never returns `nullptr`. It throws an exception. Freestanding might change that, but that's not standard (yet).

That's why I have said "(when exceptions aren't used)".
We'll need a special operator new for kernel anyway to pass things like GFP_KERNEL.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 21:10 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Also, sometimes you also want to act differently on a NULL pointer, using it for example for making some of the parameters optional.

As I said elsewhere in this thread, you *really* want `std::optional<T&>`, but that has an awful functional spelling or T*.

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 2:03 UTC (Mon) by khim (subscriber, #9252) [Link] (4 responses)

> Non-const reference function parameters are generally frowned upon.

Where have you found that idea? References are very much encouraged for output argument. Google Style Guide was the only popular exception and even it says today: non-optional output and input/output parameters should usually be references (which cannot be null).

The funny fact: over last 10 years my code have become much closer to Google-standard code… not because I have changed my habits, but because Google rewrote it's Style Guide.

Also, sometimes you also want to act differently on a NULL pointer, using it for example for making some of the parameters optional.

That use-case is valid and in fact Google changed their rules about references specifically to allow that: if you see that function accepts pointer — then you know it's safe to pass nullptr, too.

> We'll need a special operator new for kernel anyway to pass things like GFP_KERNEL.

Thankfully operator new overload was supported since C++98 (although early C++ used assignment to this in constructor instead this idea was shot very early).

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 8:57 UTC (Mon) by zlynx (guest, #2285) [Link] (1 responses)

I have stopped using output reference parameters in any new functions I write in C++. I return a tuple instead. Or suck it up, do some extra typing, and return a nice struct with named members.

If they are return values, then make them return values! It was always and forever a kind of stupid hack to use pointers (and later, references) to "return" things.

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 10:35 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> I return a tuple instead. Or suck it up, do some extra typing, and return a nice struct with named members.

This is, generally, my preferred behavior, but in *really* performance-sensitive code (or if the allocator is special in some places), you might want the caller to allocate the memory for use in the function. This is something usually borne out by usage patterns or project requirements; not something I would do during the design phase on a whim. C++ RVO and/or move operators can probably help, but ensuring you have a move isn't always the easiest thing.

Such input parameters do make STL algorithms hard to parallelize since one either needs to pre-allocate nThreads objects for the threads to use or otherwise handle the multiple writer problem.

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 13:30 UTC (Mon) by mss (subscriber, #138799) [Link] (1 responses)

> Where have you found that idea? References are very much encouraged for output argument.
> Google Style Guide was the only popular exception and even it says today: non-optional output and input/output parameters should usually be references
> (which cannot be null).

Google Style Guide has a lot of traction.
This is a recent change, as far as I can see it was done less than a year ago.

The idea is (was?) that if you have a "do_something(foo);" and "foo" isn't a pointer then in the absence of non-const reference function arguments "do_something()" won't modify "foo" since it will operate either on a copy of "foo" or a const reference to it.

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 13:49 UTC (Mon) by khim (subscriber, #9252) [Link]

Yes, that was the idea. But it was really weak signal: even if do_something couldn't modify foo it can still modify objects which are referenced by foo.

And C++11 and auto made question of whether foo is a pointer much less obvious.

Another thing that C++11 did was rvalue references and C++17 added mandatory RVO which made move-only types much more feasible.

Google contemplated change for a few years but since this, basically, separated them from everyone else in a C++ world… they finally did that.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 21:32 UTC (Thu) by matthias (subscriber, #94967) [Link] (10 responses)

> > Yes, just that the syntax for using smart pointers (which are the only pointers allowed in safe code) is very very ugly.
> I guess beauty in in the eye of the beholder.
> By the way, {unique,shared,weak}_ptr are STL things, not core C++ language constructs.

What are the core C++ language constructs for memory safe code?

> Kernel implementation is free to implement its own templates here.
> > What is with the reverse problem, the object goes out of scope while the pointer is still there.
> > How does this work if the object is on the stack? Or is this not allowed to use pointers to objects on the stack?
> Smart pointers manage dynamically allocated objects, like the ones allocated by kmalloc() or malloc() or operator new(),
> not the ones with automatic storage duration (that's the ones that are created on stack and destroyed when they go out of scope).

Then how is one supposed to use pointers to objects on the stack? This is no problem at all in rust. The compiler will check that the pointer does not outlife the stackframe.

> > I have a composite object, how can I create a smart pointer to a sub-object, e.g. to pass it in a function call?
> If the function never stores the pointer for use after it has returned then it should just accept a raw pointer.

How can a function that accepts a raw pointer be sure that the pointer points to valid data? This is inherently unsafe design.

> Otherwise, smart pointers are tied to the memory allocation / deallocation calls - if the composite object is deallocated as one object that it should be managed by one smart pointer.
> If it consists of multiple allocations for subobjects then it might make sense to manage them separately.

No, I am talking about a big object managed by one pointer, but now I want to pass a pointer to a subobject to some function. In rust, I can just take a pointer to the subobject, as long as the lifetime of the pointer to the subobject is contained in the lifetime of the whole object.

> > How is one supposed to create a smart pointer without relating to unsafe code?
> I totally disagree that code should not have any raw pointers.
> Smart pointers & co. are just tools for writing better and safer code.
> They do not need to be used where there is close to zero possibility of making a mistake (they have a cost, too).
> For example, let's consider the code below (when exceptions aren't used):
> > Foo *foo = new Foo;
> > if (!foo)
> > return false;
> > foo->action(bar);
> > delete foo;

So this code is memory safe because you verified it to be memory safe. This is not the idea of memory safe languages. How complex should code be such that you think that one should actually verify correctness? People will disagree about the complexity that can be checked by humans. And usually programmers are overconfident in producing correct code. So you will end up with programmers using unsafe raw pointers and memory safety violations. And these programmers will say, this code is obviously safe. There is no need for smart pointers, just like you just did with this example code. If you leave it up to the programmers whether they use the safety features, then bugs are inevitable.

> I don't see a point of introducing a wrapper for "foo" here - even when the compiler is smart enough to optimize it out it is still a compile time cost.

And a compile time cost is fine. This compile time cost is just checking that the programmer did not make a mistake. And this check should be there because programmers (like all humans) are fallible.

> > Are there any implicit checks that throw a compile time error if I initialize a smart pointer with a possibly invalid pointer?
> How do you define a "possibly invalid pointer"?

Any pointer where the compiler cannot verify at compile time that it points to valid data. Such accesses should be avoided if possible. Otherwise they have to be clearly tagged.

> If raw pointers are disallowed then how do you access, for example, MMIO registers at constant memory address?
> Pointers to these by definition aren't given out by an allocation function.

This is one of the few things where a raw pointer will be necessary. Accesses to this pointers should get a safe abstraction. Then you will only have a few lines that have to be verified by humans.

> After all, we are talking about reducing the number of (accidental) bugs here, not combating actively malicious behavior.

In rust, the goals are much more than that. You want to directly see where the critical code is that can possibly invalidate your memory safety assumptions. All such code has to be flagged. And should be extensively documented.

> > These are safe to use as long as the underlying C-code has no bugs.
> > Of course, if there a bugs in the C-code then all bets are off, regardless of which language calls into the code.
> That was my point when I have said that:
> > Rust memory safety is no longer guaranteed when the code calls an unsafe function. Which is pretty much every kernel API right now.

You probably did not get the point, why a memory safe language is introduced to the kernel. Of course, the kernel will not be magically memory safe if 0.1% of the code is rust. The point is that the new code, i.e., the code written in rust, cannot introduce any new memory safety bugs. The wrappers to call C-code are very small. These are safe functions that do an unsafe call. The meaning is: the person who implemented the wrapper has verified the wrapper to be correct. These functions are usually just a few lines. The rust code using the wrappers can then be implemented in safe rust, i.e., without using the unsafe keyword. Thus all this code cannot on its own introduce memory safety violations. This is very different with C++, where there are no checks at all that the code is memory safe. In C++ you just have some aids that make it a bit less likely that the programmer does an error. In safe rust this is verified.

> And due to, for example, the above thing about register access, you'll eventually end up calling some unsafe function.

Yes, eventually. The function doing the actual register access will usually have just a few lines. All you have to do to ensure memory safety is verify the correctness of these few lines. It does not matter at all what happens outside of this function that does the actual access and provides a safe interface to the outside world.

> > The tagging (with unsafe keyword) is enforced by the compiler. If you do not tag, this is a compile time error.
> So the compiler enforcement is kind of all-or-nothing: either the function is untagged and it is 100% safe or the function is tagged and it is unsafe.
> It would be better if it actually checked more specific requirements for the input or output parameters (something sparse and other checkers currently do).

What usually is done is to create a function doing a safe abstraction. All requirements of this function are checked. Inside the function there will be an unsafe keyword, which wraps the part of the code the compiler cannot verify to be correct. These code has to be manually verified. And of course you are free to also use checkers for this. All accesses to the safe interface are verified by the compiler to be memory safe.

> > Dereferencing a standard C pointer obviously has to count as unsafe code.
> As I wrote above, you will never escape the necessity of having raw pointers and using them where they are cleanly unnecessary has some costs.

There are only very few places where raw pointers are needed. In the kernel, there might be a few more than in normal user space code. And just using raw pointers everywhere in the code because you would loose a fraction of second in compile time cannot be the answer. If this would be an excuse for using raw pointers, then we will always have memory safety bugs because most programmers do not see where the code gets to complex to be manually verified.

> > What the article says is that additionally to the tagging, there has to be extensive documentation specifying exactly when and why the interface is safe to use.
> > This is just good rust practice.
> That's great!
> It would definitely be a good C++ practice, too, to do so.

Yes, but totally infeasible. Your little example would already need documentation why it is safe to dereference a raw pointer. And if you would allow small portions of unsafe code without documentation, then people will start doing larger portions of unsafe code without documentation. Where is the border?

And I definitely do not buy your argument at all, that just because somewhere inside the code you need a small section of unsafe code, all the code is unsafe. There is a clear benefit that only these small sections need to be verified. In C or C++ it is totally impossible to verify the memory safety of the code. All code is possibly unsafe and therefore all code needs to be verified. There is no means to actually see where problems could be. In rust you will end up with just a tiny fraction of the code tagged as unsafe. You can easily find these with grep.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 22:17 UTC (Thu) by mss (subscriber, #138799) [Link] (9 responses)

> Then how is one supposed to use pointers to objects on the stack? This is no problem at all in rust. The compiler will check that the pointer does not outlife the stackframe.

Automatic storage duration is normally used only for objects that are unnecessary after they go out of scope.
It's a possibility to use such storage duration, not a hard requirement.

> How can a function that accepts a raw pointer be sure that the pointer points to valid data? This is inherently unsafe design.

It's the caller responsibility to not provide any invalid pointers to the function.

> No, I am talking about a big object managed by one pointer, but now I want to pass a pointer to a subobject to some function.
> In rust, I can just take a pointer to the subobject, as long as the lifetime of the pointer to the subobject is contained in the lifetime of the whole object.

It depends what do you specifically mean by a subobject.
If it's like a class field then normally one does not provide a direct pointer to the field to an unrelated code but implements a specific interface and provides it instead.

> How complex should code be such that you think that one should actually verify correctness?
> People will disagree about the complexity that can be checked by humans.

There is no "scientifically-correct" answer to that question, since it's mostly an individual opinion.
One can simply identify obviously-correct code (like in my example) and treat any disagreement over this as an evidence that the code is not obviously-correct.

> Then you will only have a few lines that have to be verified by humans.

"A few lines" of unsafe code in an OS kernel?
Even the current C code in Linux kernel is not standard-complaint and that's already a rather low bar...
And this is done for a reason (performance).

> And if you would allow small portions of unsafe code without documentation,
> then people will start doing larger portions of unsafe code without documentation. Where is the border?

As I have said above, simply institute a document-on-doubt-of-obviousness policy.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 23:00 UTC (Thu) by matthias (subscriber, #94967) [Link] (8 responses)

> > How can a function that accepts a raw pointer be sure that the pointer points to valid data? This is inherently unsafe design.
> It's the caller responsibility to not provide any invalid pointers to the function.
Of course it is. And it should be the the compilers responsibility to check this. However, this si impossible with raw pointers. It is not even clear from the function signature whether a null pointer is correct or not. In some functions this is fine and has a semantics, in others it is not. This is what I mean by inherently unsafe, the compiler cannot check correctness. Every such function call would need a documentation why it is safe to do.

> > No, I am talking about a big object managed by one pointer, but now I want to pass a pointer to a subobject to some function.
> > In rust, I can just take a pointer to the subobject, as long as the lifetime of the pointer to the subobject is contained in the lifetime of the whole object.
> It depends what do you specifically mean by a subobject.
> If it's like a class field then normally one does not provide a direct pointer to the field to an unrelated code but implements a specific interface and provides it instead.

So, one is not supposed to use getters for fields of an object?

> > How complex should code be such that you think that one should actually verify correctness?
> > People will disagree about the complexity that can be checked by humans.
> There is no "scientifically-correct" answer to that question, since it's mostly an individual opinion.
> One can simply identify obviously-correct code (like in my example) and treat any disagreement over this as an evidence that the code is not obviously-correct.

And rust has a pretty clear opinion on this. A dereference of a raw pointer is never obviously correct. This rather hard judgement can of course not work in C++. According to rust standards almost every line in C++ is not obviously correct.

> > Then you will only have a few lines that have to be verified by humans.
> "A few lines" of unsafe code in an OS kernel?
> Even the current C code in Linux kernel is not standard-complaint and that's already a rather low bar...
> And this is done for a reason (performance).

Of course, in the kernel it will be a few lines more. Still it would be way way less than with C or C++-code. Most of the Kernel can be written in safe rust. It are mostly the low level primitives that would need unsafe code. Calling these primitives can be safe code. Today all of the Kernel code is unsafe code.

Rust in the Linux kernel (Google security blog)

Posted Apr 15, 2021 23:19 UTC (Thu) by mss (subscriber, #138799) [Link] (7 responses)

> Of course it is. And it should be the the compilers responsibility to check this.
> However, this si impossible with raw pointers. It is not even clear from the function signature whether a null pointer is correct or not.
> In some functions this is fine and has a semantics, in others it is not.
> This is what I mean by inherently unsafe, the compiler cannot check correctness.

GCC has a "nonnull" attribute that does exactly that - warns if you passed a NULL pointer as that function argument.
Microsoft SAL annotations allow even more.

In practice, this is not much of a problem, since one assumes that a function does not allow NULL pointers as parameters unless specifically described as such.
And in order to use a new function one has to read its docs anyway to learn what does it exactly do.

> So, one is not supposed to use getters for fields of an object?

An interface will provide getters.

> Most of the Kernel can be written in safe rust. It are mostly the low level primitives that would need unsafe code. Calling these primitives can be safe code.

That's mostly an opinion-type statement, other commenters have already stated there is a runtime cost of Rust code,
so anywhere performance is the most important consideration it isn't likely the best choice.

Once again, I would like to say that I am not against having Rust in kernel per se.

Rust in the Linux kernel (Google security blog)

Posted Apr 16, 2021 10:56 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (4 responses)

> That's mostly an opinion-type statement, other commenters have already stated there is a runtime cost of Rust code, so anywhere performance is the most important consideration it isn't likely the best choice.

Versus C++? Unlikely. C? More plausible, but it really depends on the context. It's definitely not something one can make a blanket statement about. Rust compiles down to machine code just like them. If you're referring to bounds checks, that is done in debug builds by default only. Do you have links to these claims or can you be more specific?

Rust in the Linux kernel (Google security blog)

Posted Apr 16, 2021 12:05 UTC (Fri) by matthias (subscriber, #94967) [Link]

Bound checks are enabled in release builds. They are crucial for memory safety. What is usually only done in debug builds is integer overflow checking.

In many cases bound checks are necessary (and explicitly added in C-code). And if you do manual bound checking (e.g. to handle errors) the rust compiler will usually see this and optimize away the automatic bound checks.

If it is really crucial for performance and one is sure that the index cannot be out of bounds, then bound checks can of course be omitted for some crucial code path.

Rust in the Linux kernel (Google security blog)

Posted Apr 17, 2021 12:41 UTC (Sat) by jezuch (subscriber, #52988) [Link] (2 responses)

Interestingly, what people find out with Rust is that safe code is easier to optimize by the compiler, because the compiler can prove more things about it. I've seen stories where people thought that they can make the code faster by reimplementing a small section using unsafe code (because it allows more low-level bit-twiggling etc.) and found out that it was actually slower.

One thing in particular is that with the ownership model you can always say what is aliased to what. As a result, the compiler wants to tell LLVM a lot that this particular thing is not aliased. It wants, but it can't, because it exposed so many bugs in LLVM, which are never hit with C and C++, where you have to assume that everything can be aliased always.

As I understand it, Rust is not the fastest game in town mostly because the bitcode it generates is so lousy.

Rust in the Linux kernel (Google security blog)

Posted Apr 17, 2021 12:58 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (1 responses)

> One thing in particular is that with the ownership model you can always say what is aliased to what. As a result, the compiler wants to tell LLVM a lot that this particular thing is not aliased. It wants, but it can't, because it exposed so many bugs in LLVM, which are never hit with C and C++, where you have to assume that everything can be aliased always.

Indeed. That issue[1] got closed recently though, so there's hope.

> As I understand it, Rust is not the fastest game in town mostly because the bitcode it generates is so lousy.

I thought it was that the bitcode was noisy. As more optimizations move to MIR, LLVM gets less bitcode to compile and can do other optimizations. Or maybe that's related to compiler performance more than runtime performance.

[1]https://github.com/rust-lang/rust/issues/54878

Rust in the Linux kernel (Google security blog)

Posted Apr 19, 2021 12:45 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> Indeed. That issue[1] got closed recently though, so there's hope.

Oh cool! After only 2,5 years! :D

Rust in the Linux kernel (Google security blog)

Posted Apr 16, 2021 20:10 UTC (Fri) by dezgeg (subscriber, #92243) [Link] (1 responses)

> GCC has a "nonnull" attribute that does exactly that - warns if you passed a NULL pointer as that function argument.
> Microsoft SAL annotations allow even more.
> In practice, this is not much of a problem, since one assumes that a function does not allow NULL pointers as parameters unless specifically described as such.
> And in order to use a new function one has to read its docs anyway to learn what does it exactly do.

A much bigger problem than pointer parameters is pointer return values - will the return value be a) always valid b) NULL on error c) ERR_PTR on error? Result/Option would help greatly.

Rust in the Linux kernel (Google security blog)

Posted Apr 18, 2021 13:15 UTC (Sun) by matthias (subscriber, #94967) [Link]

> A much bigger problem than pointer parameters is pointer return values - will the return value be a) always valid b) NULL on error c) ERR_PTR on error? Result/Option would help greatly.

You missed the question: How long will the pointer be valid? We need a lifetime analysis to answer this question and to ensure that the returned pointer is not used after the object has vnished.

Rust in the Linux kernel (Google security blog)

Posted Apr 16, 2021 0:26 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> If the function never stores the pointer for use after it has returned then it should just accept a raw pointer.
The problem is that sometimes you might want to do something like this:

struct Wrapper{ Wrapped *obj;}
Wrapper* make_wrapped(Wrapped *some);

And there's always the issue of what kind of a pointer one should use inside the Wrapper object.