Unsafety can be subtle

Posted Sep 18, 2024 23:26 UTC (Wed) by khim (subscriber, #9252)
In reply to: Unsafety can be subtle by NYKevin
Parent article: A discussion of Rust safety documentation

> Because the reference was returned by slice::get_unchecked, and slice::get_unchecked is documented to have the same semantic meaning as C pointer arithmetic (but with the added constraint that it is illegal to construct a "one past the end" pointer, unlike in C where that is legal).

Where is it documented that way? Official documentation is pretty clear: returns a reference to an element or subslice, without doing bounds checking. And subslice there means not any random subslice but specifically subslice specified by something like Range. That's not how it was used in the discussed example, thus we can ignore that case.

Nowhere does it tell is that you get permission to access anything outside of that single element (or slice).

> C pointer arithmetic is generally understood to return a pointer with provenance over the whole array, so if you're going to change that rule, you really ought to do so explicitly.

If Rust would have been a new version of C then sure. But Rust is not a new version of C and, in general, in Rust, you don't get the permission to access something that lies outside of that object that function gives you and, most importantly, nowhere near that point raw pointers are even discussed.

Why function from not C, that returns something that C doesn't support and in a fashion that is typical for that language should include warning in the form of “hey, if you know about C then remember that this function works like a Rust function and not like a C function”. That would be really very strange. Obnoxious and repetitive (remember that most Rust developers don't, actually, know C, they come from Java or Python background).

> Note also that a completely literal interpretation of your argument would equally well apply to slice::as_ptr()

Sure, but result would be different. Because that one is different: it returns a raw pointer to the slice’s buffer.

There are big difference betwee a reference to an element or subslice and pointer to the slice’s buffer.

Element is, well, element. Singular. One byte. Buffer is not one element, it's content of the entire slice.

> Nothing in either function's documentation clarifies that there is a meaningful distinction.

You couldn't tell the difference between “one” and “many”? That's explained pretty early in most schools. Maybe you forgot?

> So what's so special about get_unchecked() that makes it different, exactly?

Well… the fact that it does different thing?

> Unfortunately, in this case, we have the option of saying that it is indeed legal to do pointer arithmetic on pointers derived from get_unchecked(),

Whoa, whoa, whoa. Of course you can do pointer aritmetic – as long as you don't go beyond boundaries of that one, single, byte that you have got access to!

What else do you expect if you have got reference to one byte and not reference to the whole buffer?

> or we have the option of saying that it is not possible to use as_ptr_range() for anything other than accessing the first element.

Whoa, whoa, whoa. Why would that be the case? One function returns reference to one, single, element (or subslice), another function returns pointer to buffer and third one returns two raw pointers spanning the slice.

Why would they behave identically if they return references and/or pointers that deal with different entities. How is that ever logical?

> If it is the second one (stricter provenance for references than for raw pointers), what exactly are those rules, and where can I read about them?

Why would you need any such rules? If your pointer or reference are pointing to one element, then you can touch that element and nothing else. If your pointer or reference are pointing to the entire buffer then the whole buffer is fair game.

It's really as simple as that: there are no radical difference between pointer and reference in the [naïve] approach to their provenance (experimental models relax the restrictions, don't add a new ones), but these functions return references (or pointers) to a different objects, why should they behave identically?

Unsafety can be subtle

Posted Sep 19, 2024 0:03 UTC (Thu) by atnot (subscriber, #124910) [Link]

Not to call the pot black as a kettle, but I would greatly appreciate if you could stop having the same near-identical heated argument under every C or Rust related post, and presumably so would others that haven't muted you yet.

Unsafety can be subtle

Posted Sep 19, 2024 3:34 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (4 responses)

> If Rust would have been a new version of C then sure. But Rust is not a new version of C and, in general, in Rust, you don't get the permission to access something that lies outside of that object that function gives you and, most importantly, nowhere near that point raw pointers are even discussed.

Rust cannot escape the shadow of C, as you can see by the the functions as_ptr() and friends, since they exist primarily for interoperation with C (as_ptr_range() says this explicitly). It is quite unreasonable to assume that people writing unsafe Rust have no preconceptions from C whatsoever.

> There are big difference betwee a reference to an element or subslice and pointer to the slice’s buffer.

The bytes that make up the element are also the bytes that make up the buffer, so it is perfectly reasonable to interpret the two as synonymous when reading documentation casually. If this is really an important distinction, then it needs an entire section of the 'nomicon, not just a couple of stray words in std::ptr and then a minor difference in phrasing here or there in other parts of std. This is not a game of Clue - nobody is going to carefully scrutinize every word of docs.rust.org trying to figure out whether Professor Plum did it with the misaligned pointer in the .text section.

Either you tell everyone what the rules are in a highly explicit and unambiguous way, or some programmers will misunderstand them, and then compiler writers will once again be stuck having to stick flags all over their shiny new optimizer because it breaks some legacy code that was wrong when it was written.

Unsafety can be subtle

Posted Sep 19, 2024 13:18 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

> It is quite unreasonable to assume that people writing unsafe Rust have no preconceptions from C whatsoever.

Yet it would be even more unreasonable, some would even say preposterious, to assume that everyone who uses Rust and even unsafe Rust is a C programmer.

People who are using Rust and who never used C do exists and there would be more of them over the time.

I wouldn't be surprised to find out that there are more of them already than Rust users that know C.

The desire to bring Rust in kernel is driven, in large part, by the desire to bring precisely these people into the mix.

> The bytes that make up the element are also the bytes that make up the buffer, so it is perfectly reasonable to interpret the two as synonymous when reading documentation casually.

That's not true for the majority of the languages. If we only include popular languages then it's, essentially, true only for C and C++. C#, Go, Java, JavaScript, Python… nowhere would you find such equation. Most of these languages don't give you access to the part of the object, but if such ability is there then you get to touch that part and nothing else.

> Rust cannot escape the shadow of C, as you can see by the the functions as_ptr() and friends, since they exist primarily for interoperation with C (as_ptr_range() says this explicitly).

That's true, for some degree, but you can write lots of apps for years in Rust and never hit the need to know that dark corner of the language. Pushing its existence in the description of various perfectly normal, “safe” function would definitely be strange.

Equally as strange as saying that you have to go and learn C before you would attempt Rust.

Remember that Rust wasn't even imagined as a replacement for C and C++, it was originally not developed for that audience and such work is still not it's primary focus.

Rust developers are ready to accommodate certain requests to help these people, but “fuck everyone without C experience and make their life miserable for the sake of 10% of Rust users who are using it as C replacement” doesn't sound like a reasonable request.

> Either you tell everyone what the rules are in a highly explicit and unambiguous way, or some programmers will misunderstand them,

Nope. This doesn't work like this. “I will ignore any and all rules that you may place on me” is an attitude problem, not documentation problem. And the only solution is social. Rust community does well there thus there's a chance.

Unsafe Rust is hard. You just have to accept it. Yes, in an ideal world it would be easy. Yes, people know it's a mess and try to fix that. But right now the story is: you have to be careful.

Sticking with strict provenance is your best bet right now (after you have looked on safe sound wrapper around unsafety and couldn't find a suitable one for your usecase, of course). If you can do that. These are very strict, yet sensible, rules and they allow 99% of code that needs unsafe to be written. And the remaining 1%… they are still working on it.

Complaining about the fact that something that is known to be underdocumented and hard and just saying that it's underdocumented and hard wouldn't help you if you have no constructive ideas about how to change that situation.

> and then compiler writers will once again be stuck having to stick flags all over their shiny new optimizer because it breaks some legacy code that was wrong when it was written.

Not if they would proactively kick out people who are writing code that ignores the rules.

It's one thing to say “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach”. Such stories (if justified and without any simple solution that would make them strict-provenance compliant) are always welcome on the URLO and IRLO or even just the bugtracker.

But the first line of defense should always be an attempt to not use the unsafe code in the first place! And the second line is to use sound API. And third line is strict provenance. And only after all these possibilities are exhausted you go to these forums in a search of a solution.

You tell us: nobody is going to carefully scrutinize every word of docs.rust.org trying to figure out whether Professor Plum did it with the misaligned pointer in the .text section – but that's exactly how fully conforming C code is supposed to be written! You already have to look on subtle nuiances of wording of the C standard and defect reports and other such things… Rust just makes it easier because there are various forums where you can ask Rust language developers for clarification.

Unfortunately “we code for the hardware” people don't even stop to think about whether it's possible to do something while staying within boundaries – but that's social problem and can not be solved by technical means.

Technology may just make conformance easier. And Rust language developers are trying to help with documentation and tooling (Miri is a must-have for someone who develops unsafe code in Rust), but, ultimately, only developer can be responsible for the conformance with the rules. Part where compiler does it's magic and where you don't have to carefully scrutinize every word exists on the other side of unsafe!

Unsafety can be subtle

Posted Sep 19, 2024 23:03 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

> Rust developers are ready to accommodate certain requests to help these people, but “fuck everyone without C experience and make their life miserable for the sake of 10% of Rust users who are using it as C replacement” doesn't sound like a reasonable request.

Please stop making quotes up and attributing them to other people. It is extremely rude.

> Nope. This doesn't work like this. “I will ignore any and all rules that you may place on me” is an attitude problem, not documentation problem.

Ibid.

> It's one thing to say “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach”. Such stories (if justified and without any simple solution that would make them strict-provenance compliant) are always welcome on the URLO and IRLO or even just the bugtracker.

I literally just told you a story like that, and you responded by making up a bunch of things I didn't say, and then ridiculing those things.

Since this conversation is clearly going nowhere, I'm going to bow out.

Unsafety can be subtle

Posted Sep 20, 2024 0:51 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> I literally just told you a story like that, and you responded by making up a bunch of things I didn't say,

How is it different from how you make up piece of documentation that never existed and write “slice::get_unchecked is documented to have the same semantic meaning as C pointer arithmetic” then use that “a bunch of things [others] didn't say” as justification for your assertions?

I have even gave you benefit of doubt and asked just where it was documented that way.

And in the end, instead of showing us something like that you decided to blame the documentation for your inability to read: “it is perfectly reasonable to interpret the two as synonymous when reading documentation casually”.

And when I ridiculed it editor only had to say this: “This is an on-topic discussion, but please remember to keep things polite”.

And yes, I agree, I wasn't polite, but I, at least, stick to the facts.

> and then ridiculing those things.

You want to say that only you can lie (that one even you admitted as a lie: “but literally nobody follows the strict provenance rules anyway, since they're explicitly marked as experimental and non-normative“) and then use these lies to misrepresent things? Others couldn't do that?

Let me quite myself, for a change (you haven't misquoted me, just ignored what I say): Rust developer would like to hear “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach” stories. Not stories related to someone's inability to read or understand the documentation (documentation writers do accept patches for such cases, but, as noted, it's not clear how to change the documentation and force someone to actually try to read and understand it… I'm not even sure that's possible at all)!

And you tell me that we are discussing exactly such story here. But that's big fat lie, author of the original example even admitted that original code that was discussed was brought to compliance with stacked borrows (and the whole example that we are discussing here really-really doesn't need any pointers and thus stacked borrows or other such horrors, if you want to eliminate bounds checking then call to one unsafe function with literally zero arguments is absolutely enough).

Thus it doesn't see an example of design that is limited by strict provenance. Not even remotely close to it. Sorry.

Unsafety can be subtle

Posted Sep 20, 2024 11:20 UTC (Fri) by daroc (editor, #160859) [Link]

Let's stop discussing this here. It doesn't look like you're likely to change NYKevin's mind, and it seems like this comment doesn't really contribute any new technical information. Sometimes the best thing you can say is just "I still don't agree with you, but thanks for having this discussion with me."

Unsafety can be subtle

Posted Sep 19, 2024 11:53 UTC (Thu) by daroc (editor, #160859) [Link]

This is an on-topic discussion, but please remember to keep things polite. Specifically:

> You couldn't tell the difference between “one” and “many”? That's explained pretty early in most schools. Maybe you forgot?

This is not polite or respectful, and doesn't add anything interesting to your technical point. Please avoid insulting other commenters.

Unsafety can be subtle

Posted Sep 21, 2024 6:45 UTC (Sat) by ralfj (subscriber, #172874) [Link]

khim, I understand you are passionate about Rust-for-Linux and want to see that project succeed, but I think unnecessarily aggressive comments like this are hurting the cause. It is not necessary to repeat the point ("one element" vs "entire buffer") so many times in your answer, nor is it appropriate to suggest people forgot basic things taught early in school. Such a comment will not manage to convince anyone of your position, it will just make people uninterested in engaging with you. Please re-think your communication strategy and try to be a little more empathetic towards the position of others. :)

Kind regards,
Ralf