Unsafety can be subtle
Unsafety can be subtle
Posted Sep 17, 2024 21:39 UTC (Tue) by apoelstra (subscriber, #75205)Parent article: A discussion of Rust safety documentation
I think the notion that "in C you need to reason about your entire program as though it were all marked unsafe" is a bit misleading and unfair toward C. C has only one kind of pointer type with semantics that are much easier to understand and reason about than those for Rust references.
When I first started working in Rust, because of this sort of messaging, I had the notion that "unsafe Rust is basically just C". In fact, unsafe Rust is much more subtle and dangerous than C. The reason is that Rust references have much stronger requirements on them than C pointers, and Rust references are what you get when you use the & or &mut operators.
Unlike with C pointers, it is illegal to start with a Rust reference and then offset it outside of the bounds of the original object. (In C you can do this as long as the object came from within an array.) Rust references may never be null or dangling; they may not point one-past the end of an array; they may not point to uninitialized memory (except through the special `MaybeUninit` type). If you create such a reference this is immediate UB.
Mutable Rust references may not alias any other reference. If you violate this rule by constructing a bad reference this is immediate UB.
As an example of the sort of problem this can cause, I recently learned of a bug in some unsafe Rust code of mine. It looked like this:
fn bad_load_int_le(buf: &[u8; 32], i: usize) -> u32 { debug_assert!(i + mem::size_of::<u32>() <= buf.len()); let mut data = 0u32; ptr::copy_nonoverlapping( buf.get_unchecked(i), &mut data as *mut _ as *mut u8, mem::size_of::<u32>(), ); data.to_le() }
The C equivalent of this code is pretty straightforward: you take a buffer, assert that it's big enough, memcpy some data into an uint32_t, and then swap the bytes.
But in Rust, the call to buf.get_unchecked(i) returns a &u8 pointing to a single byte of the buffer, and then ptr::copy_nonoverlapping reads past this single byte, and that's UB right there. All the stuff that a reviewer might worry about -- the buffer being too small, pointers being insufficiently aligned, casts between different pointer types, etc -- are all fine. The badness is that the `get_unchecked` method returns a reference instead of a pointer, something that isn't even visible in the code as written.
(This was found by MIRI, an interpreter for Rust's "MIR" intermediate language which is designed to detect violations of the pointer rules, so I am fairly confident that this is actually undefined behavior, and not just extra paranoia.)
Posted Sep 18, 2024 2:26 UTC (Wed)
by cplaplante (subscriber, #107196)
[Link] (15 responses)
Posted Sep 18, 2024 2:56 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (4 responses)
You can't have a library function for every thing you would ever need. If the language doesn't lend itself to writing unsafe/unrestricted/low-level code without excessive accidental complexity, it might indeed be a problem.
Posted Sep 18, 2024 10:31 UTC (Wed)
by excors (subscriber, #95769)
[Link] (3 responses)
Even without a built-in function, you can implement it easily without `unsafe`: On x86, when not inlined, that compiles into a bounds check and a single `mov` instruction. As a bonus, the bounds check will apply in release mode too (unlike the original example's `debug_assert!()`), and avoids the danger of integer overflow (unlike the original `i + mem::sizeof::<u32>()`), so it's no longer possible for a caller to trigger UB and the code is actually safe. Or if you don't like doing the arithmetic yourself, you can implement it with `unsafe` but without any tricky references or pointers: (which I think is what `from_le_bytes` does internally), where `transmute` is basically a byte-wise move between two arbitrary types of the same size. The documentation says `transmute` is "incredibly unsafe", but I think none of its dangers apply when simply converting integers and arrays of integers. Correctly handling references and pointers in `unsafe` does seem pretty difficult, but you can do a surprising amount in safe or unsafe code without that. I think it involves a significant shift from the C mindset where you naturally think in terms of pointers and bytes - in Rust that should be seen as a last resort when you've exhausted all other possibilities, it shouldn't be the default way of approaching a problem.
Posted Sep 19, 2024 12:55 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (2 responses)
This isn't a bonus. The point of the original code was to avoid the bounds check and associated panic path.
>you can implement it with `unsafe` but without any tricky references or pointers:
Aside from your new code now having `try_into` and an explicit panic path, it also illustrates my comment perfectly :). You have replaced code which "narrowly" uses unsafe to do an unchecked array index with code that uses `transmute`, which has pretty-much unlimited chaos potential! In this case your code appears to be correct, but I don't think "avoid pointers, always reach for transmute first" is a reasonable policy for authors of unsafe code.
Posted Sep 20, 2024 0:38 UTC (Fri)
by khim (subscriber, #9252)
[Link] (1 responses)
Why? Generally the advice is to use the safest approach if possible and while, technically, That's still a good change. Heck, C++20 even added their own version of transmute citing precisely pifalls of correct use of memcpy (C/C++ version of P.S. I wanted to avoid discussion about why your approach to the whole thing is more-or-less ridiculous by assuming that you just wanted to show how it's not easy to use pointers in Rust, but if your goal is to just read integer then Rust documentation, of course, includes pretty good example which explains how to do what you want to do properly, without any unsafe code here. And if you want to avoid panic there, then the simplest way is to just literally tell the compiler that you know what you are doing: That's it. It would be compiled into one instruction and there are really no need to invoke horrors of
Posted Sep 20, 2024 23:02 UTC (Fri)
by apoelstra (subscriber, #75205)
[Link]
But I will add my voice to several others asking that you take a kinder tone and offer others the benefit of the doubt (or even, just stop assuming the worst). It makes these technical insights much harder to read and less likely to be noticed.
Posted Sep 19, 2024 12:50 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (9 responses)
Posted Sep 19, 2024 13:46 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (8 responses)
Note that you can use try_from to convert a slice of the correct size into an array. Assuming that you've verified that the slice is big enough, this can't actually fail - and you'd expect the compiler to optimize out the failure path as a result.
And, indeed, when I go that route (and yes, this is slightly obscure), I get the optimal form that cannot panic at runtime:
The string "Q1234Qload_int_leQ32Z32Z" is in there as something that's unlikely to appear in the compiled binary by mistake; you can thus have release-mode CI grep for that string and fail if it appears, because you want the failure path to be resolved at compile time; if it does appear, you know where to check.
Posted Sep 19, 2024 17:02 UTC (Thu)
by atnot (subscriber, #124910)
[Link]
I do sometimes wish there was a function to return an array directly in the standard library and save that line of code, but honestly there's enough utility crates like bytemuck, binrw etc. that you'll probably want to be using anyway if you're doing a lot of reading bytes into filetypes.
Posted Oct 4, 2024 8:19 UTC (Fri)
by geert (subscriber, #98403)
[Link] (6 responses)
Posted Oct 4, 2024 10:42 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (5 responses)
You're right - that's the other fun of using "clever" code, in that you can confuse someone trying to refactor it, and the compiler won't complain.
A better translation would be this Godbolt link, which makes the use of unsafe very clearly about not panicking on error, and uses safe functionality for all of the data transformation. It also makes it clear that this function should be marked as unsafe, since as a precondition it requires buf to be long enough.
Posted Oct 4, 2024 11:41 UTC (Fri)
by geert (subscriber, #98403)
[Link] (1 responses)
Posted Oct 4, 2024 11:55 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
The reason there's so much code for one asm instruction is that most of the code is concerned with things that asm doesn't care about - asm doesn't care if you're reading out of bounds (that's a CPU fault at worst, and a program bug at best), or if you're asking it to read from the "wrong" location, or a whole host of other things that programming languages care about.
And it's similar in size to the dance you need to do with standard C to get the same instruction.
Posted Oct 4, 2024 12:08 UTC (Fri)
by pizza (subscriber, #46)
[Link] (2 responses)
So... you're saying that the job _isn't_ done when the Rust compiler stops complaining, and that the resultant code might not actually _work_ even though the compiler "proved" it to be "correct"?
(Yeah, yeah, I know you're not one of the obnoxious evangelists making that idiotic claim, but I've been piled upon here multiple times here for making this point..)
Posted Oct 4, 2024 12:21 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
As with anything, it depends what you require the compiler to prove. My experience is that Rust code is more likely to use enums, structs and other such type system to make it impossible to represent impossible states, rather than clever code that allows for this sort of bug to slip past.
In other words, the problem here is clever code where you're trying to beat the optimizer, rather than trusting that the optimizer will get it right. And my experience of cheap coders (offshore outsourcing body shops) is that they trust the optimizer instead of trying to be clever, so they'll do the right thing in Rust; I have, however, had a pile-on for expressing that in the past.
Posted Oct 4, 2024 13:12 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
But isn't that *always* true? Just because the *maths* is correct, doesn't mean the *logic* is sound.
I have a production system that uses bell-curve statistics on a skewed distribution. It mostly works because the data is USUALLY a strong peak with a weak tail on one side. But feed it a double-peak, or a strong tail, and the results will be rubbish. That's a risk we've chosen to accept. The mathematical model is sound. It's just the fit to reality isn't quite right ...
Cheers,
Posted Sep 18, 2024 8:29 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (19 responses)
Perhaps I'm missing something, but that does not make sense to me. ptr::copy_nonoverlapping takes its first argument as a raw pointer, not a reference. The &u8 is legal when it is constructed (it points to the ith element of buf, which must have been initialized before the function was called or else it was already UB to begin with), and it is then implicitly converted to a raw pointer to match the type expected by ptr::copy_nonoverlapping, before any dereference occurs. At that point, the rules for a raw pointer are more or less the same as C, minus the strict aliasing rule (the assumption is that, if you're using raw pointers, you're already off the beaten path and so you may also be doing some type punning).
Under the strict provenance rules documented in std::ptr, this is technically not allowed, because the conversion to &u8 shrinks the provenance of the pointer, and you're not allowed to get it back just by converting to a raw pointer. But literally nobody follows the strict provenance rules anyway, since they're explicitly marked as experimental and non-normative.
Posted Sep 18, 2024 10:29 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
Note, too, that the point of the strict provenance rules is that they're sufficient but not necessary; if you comply with strict provenance, you're going to be compliant with whatever the real rules are, but you can comply with the real rules without meeting strict provenance rules. The long-term goal is to give you simple rules (strict provenance) for simple cases, and a tower of steadily more complex (and powerful) rules for more complicated cases, where you start at the top with strict provenance, and pick up the specific rules from lower tiers of the tower as needed to make your code safe.
And if the tower is well-constructed, you'll be able to pick and choose which level of the tower you're working at; you can say "all of this is fine under strict provenance" and deal with a big chunk of unsafe code using simple rules, and only have to use the rules of lower in the tower for the stuff that isn't OK at higher levels of the tower, making your correctness proof easier to write.
Posted Sep 18, 2024 10:42 UTC (Wed)
by khim (subscriber, #9252)
[Link] (17 responses)
That's quite an assertion. I know a lot of people who follow strict provenance rules, even if they don't know them. They just try to make Miri, with default settings, happy and this automatically makes you follow the strict provenance rules. But I think that sentence very nice highlights the difference between C and Rust. And no, difference in not in the fact that Rust's Nope. Rust doesn't do TBAA because it doesn't need to! Even if you are using pointers you still have to ensure that rules for references that exist somewhere outside of your code are not violated. C and C++ use TBAA as a poor man's substitute for Rust's reference access rules. This works extremely poorly (as in: we know that rules for pointers are complicated that existing standards describe them – but we still have no idea what are the exact rules for pointers in C. There are dozens of proposals, none of them implemented: one, two, three… And that's the code difference: rules that C places on your are insanely complicated and unknowable (when rules are not actually written and the only way to know for sure is to show your code to dozen of compiler developers from different companies… you can be sure that is much closer to “nobody does that, anyway”), while rules that Rust places on your are knowable and understandable, if not finalized (two main of these are stacked borrows and tree borrows), but, more importantly, they are actually used and followed by real developers (even if in the form of “making Miri happy”) And that is core difference: complicated rules of C are, mostly, just ignored by developers (and kernel developers even have an excuse: Linus is famous enough and kernel is important enough so that approach of “we violated all your rules, but don't you dare to miscompile our code” approach mostly works), while Rust rules are followed! Of course if you ignore rules in one case and [try to] follow them in the other case then it doesn't actually matter how complicated rules are in first case and how simple they are in the second case: following the rules would always be harder then ignoring them!
Posted Sep 18, 2024 18:37 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (16 responses)
The other problem is that you have to read these rules extremely carefully to even notice that this is illegal in the first place, because they're almost exclusively focused on the usize -> ptr conversion, and barely acknowledge that "shrinking provenance" is a thing that can happen through other operations. I still cannot find a single, authoritative source which explicitly documents all of the operations that can shrink provenance, and I'm not 100% convinced that it has even been formalized in the first place.
Posted Sep 18, 2024 21:37 UTC (Wed)
by khim (subscriber, #9252)
[Link] (14 responses)
How is this even relevant? There are no “shrinkage of provenance” in that example. You have reference that's pointing to one, single byte. That means that you get to access just one, single, byte and nothing more. That's direct consequence of refusal to adopt insanity of C/C++ TBAA: if we no longer consider memory as typed where different pieces of memory turn into differently typed objects magically, then boundaries of objects are now parts of pointers and references, for how else can we reason about them? This automatically and immediately makes this example illegal. Which is good for everyone concerned: given the fact that out-of-object accesses are core issue with 90% of all exploits we don't want to declare such accesses “normal”. Then you couldn't do many things then these “non-normative and experimental” models give you. You are left, essentially, with this: The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory. All other usages (in particular the potential ability to go from a single byte to something larger) need these “non-normative and experimental” models because without them they all these operations fall under the the precise rules for validity are not determined yet clause. This reading, most definitely, doesn't give you the ability to access anything outside of that one, single, byte that you received from
Posted Sep 18, 2024 22:39 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (13 responses)
Because the reference was returned by slice::get_unchecked, and slice::get_unchecked is documented to have the same semantic meaning as C pointer arithmetic (but with the added constraint that it is illegal to construct a "one past the end" pointer, unlike in C where that is legal). C pointer arithmetic is generally understood to return a pointer with provenance over the whole array, so if you're going to change that rule, you really ought to do so explicitly.
Note also that a completely literal interpretation of your argument would equally well apply to slice::as_ptr(), because the return type of that function is const *T, not some nonexistent unsafe slice type (and so a pedantically literal interpretation is that you may only access the first element of the slice through said pointer, because it is a pointer to T, not a pointer to unsafe-slice-ish of T).
In fact, I cannot find a solid reason to allow one and not the other. Nothing in either function's documentation clarifies that there is a meaningful distinction. At the same time, it is obvious (by the existence of e.g. as_ptr_range()) that this is at least meant to be allowed in the case of as_ptr(). So what's so special about get_unchecked() that makes it different, exactly?
> That's direct consequence of refusal to adopt insanity of C/C++ TBAA: if we no longer consider memory as typed where different pieces of memory turn into differently typed objects magically, then boundaries of objects are now parts of pointers and references, for how else can we reason about them?
It sounds as if you're saying that the rules for pointers are the same as the rules for references, which I agree makes logical sense. Unfortunately, in this case, we have the option of saying that it is indeed legal to do pointer arithmetic on pointers derived from get_unchecked(), we have the option of saying that references have stricter provenance rules than raw pointers, or we have the option of saying that it is not possible to use as_ptr_range() for anything other than accessing the first element. The latter is obviously ridiculous and untenable, so which of the first two is your interpretation of this case? If it is the second one (stricter provenance for references than for raw pointers), what exactly are those rules, and where can I read about them?
Posted Sep 18, 2024 23:26 UTC (Wed)
by khim (subscriber, #9252)
[Link] (8 responses)
Where is it documented that way? Official documentation is pretty clear: returns a reference to an element or subslice, without doing bounds checking. And subslice there means not any random subslice but specifically subslice specified by something like Range. That's not how it was used in the discussed example, thus we can ignore that case. Nowhere does it tell is that you get permission to access anything outside of that single element (or slice). If Rust would have been a new version of C then sure. But Rust is not a new version of C and, in general, in Rust, you don't get the permission to access something that lies outside of that object that function gives you and, most importantly, nowhere near that point raw pointers are even discussed. Why function from not C, that returns something that C doesn't support and in a fashion that is typical for that language should include warning in the form of “hey, if you know about C then remember that this function works like a Rust function and not like a C function”. That would be really very strange. Obnoxious and repetitive (remember that most Rust developers don't, actually, know C, they come from Java or Python background). Sure, but result would be different. Because that one is different: it returns a raw pointer to the slice’s buffer. There are big difference betwee a reference to an element or subslice and pointer to the slice’s buffer. Element is, well, element. Singular. One byte. Buffer is not one element, it's content of the entire slice. You couldn't tell the difference between “one” and “many”? That's explained pretty early in most schools. Maybe you forgot? Well… the fact that it does different thing? Whoa, whoa, whoa. Of course you can do pointer aritmetic – as long as you don't go beyond boundaries of that one, single, byte that you have got access to! What else do you expect if you have got reference to one byte and not reference to the whole buffer? Whoa, whoa, whoa. Why would that be the case? One function returns reference to one, single, element (or subslice), another function returns pointer to buffer and third one returns two raw pointers spanning the slice. Why would they behave identically if they return references and/or pointers that deal with different entities. How is that ever logical? Why would you need any such rules? If your pointer or reference are pointing to one element, then you can touch that element and nothing else. If your pointer or reference are pointing to the entire buffer then the whole buffer is fair game. It's really as simple as that: there are no radical difference between pointer and reference in the [naïve] approach to their provenance (experimental models relax the restrictions, don't add a new ones), but these functions return references (or pointers) to a different objects, why should they behave identically?
Posted Sep 19, 2024 0:03 UTC (Thu)
by atnot (subscriber, #124910)
[Link]
Posted Sep 19, 2024 3:34 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
Rust cannot escape the shadow of C, as you can see by the the functions as_ptr() and friends, since they exist primarily for interoperation with C (as_ptr_range() says this explicitly). It is quite unreasonable to assume that people writing unsafe Rust have no preconceptions from C whatsoever.
> There are big difference betwee a reference to an element or subslice and pointer to the slice’s buffer.
The bytes that make up the element are also the bytes that make up the buffer, so it is perfectly reasonable to interpret the two as synonymous when reading documentation casually. If this is really an important distinction, then it needs an entire section of the 'nomicon, not just a couple of stray words in std::ptr and then a minor difference in phrasing here or there in other parts of std. This is not a game of Clue - nobody is going to carefully scrutinize every word of docs.rust.org trying to figure out whether Professor Plum did it with the misaligned pointer in the .text section.
Either you tell everyone what the rules are in a highly explicit and unambiguous way, or some programmers will misunderstand them, and then compiler writers will once again be stuck having to stick flags all over their shiny new optimizer because it breaks some legacy code that was wrong when it was written.
Posted Sep 19, 2024 13:18 UTC (Thu)
by khim (subscriber, #9252)
[Link] (3 responses)
Yet it would be even more unreasonable, some would even say preposterious, to assume that everyone who uses Rust and even People who are using Rust and who never used C do exists and there would be more of them over the time. I wouldn't be surprised to find out that there are more of them already than Rust users that know C. The desire to bring Rust in kernel is driven, in large part, by the desire to bring precisely these people into the mix. That's not true for the majority of the languages. If we only include popular languages then it's, essentially, true only for C and C++. C#, Go, Java, JavaScript, Python… nowhere would you find such equation. Most of these languages don't give you access to the part of the object, but if such ability is there then you get to touch that part and nothing else. That's true, for some degree, but you can write lots of apps for years in Rust and never hit the need to know that dark corner of the language. Pushing its existence in the description of various perfectly normal, “safe” function would definitely be strange. Equally as strange as saying that you have to go and learn C before you would attempt Rust. Remember that Rust wasn't even imagined as a replacement for C and C++, it was originally not developed for that audience and such work is still not it's primary focus. Rust developers are ready to accommodate certain requests to help these people, but “fuck everyone without C experience and make their life miserable for the sake of 10% of Rust users who are using it as C replacement” doesn't sound like a reasonable request. Nope. This doesn't work like this. “I will ignore any and all rules that you may place on me” is an attitude problem, not documentation problem. And the only solution is social. Rust community does well there thus there's a chance. Sticking with strict provenance is your best bet right now (after you have looked on safe sound wrapper around unsafety and couldn't find a suitable one for your usecase, of course). If you can do that. These are very strict, yet sensible, rules and they allow 99% of code that needs Complaining about the fact that something that is known to be underdocumented and hard and just saying that it's underdocumented and hard wouldn't help you if you have no constructive ideas about how to change that situation. Not if they would proactively kick out people who are writing code that ignores the rules. It's one thing to say “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach”. Such stories (if justified and without any simple solution that would make them strict-provenance compliant) are always welcome on the URLO and IRLO or even just the bugtracker. But the first line of defense should always be an attempt to not use the You tell us: nobody is going to carefully scrutinize every word of docs.rust.org trying to figure out whether Professor Plum did it with the misaligned pointer in the .text section – but that's exactly how fully conforming C code is supposed to be written! You already have to look on subtle nuiances of wording of the C standard and defect reports and other such things… Rust just makes it easier because there are various forums where you can ask Rust language developers for clarification. Unfortunately “we code for the hardware” people don't even stop to think about whether it's possible to do something while staying within boundaries – but that's social problem and can not be solved by technical means. Technology may just make conformance easier. And Rust language developers are trying to help with documentation and tooling (Miri is a must-have for someone who develops
Posted Sep 19, 2024 23:03 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Please stop making quotes up and attributing them to other people. It is extremely rude.
> Nope. This doesn't work like this. “I will ignore any and all rules that you may place on me” is an attitude problem, not documentation problem.
Ibid.
> It's one thing to say “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach”. Such stories (if justified and without any simple solution that would make them strict-provenance compliant) are always welcome on the URLO and IRLO or even just the bugtracker.
I literally just told you a story like that, and you responded by making up a bunch of things I didn't say, and then ridiculing those things.
Since this conversation is clearly going nowhere, I'm going to bow out.
Posted Sep 20, 2024 0:51 UTC (Fri)
by khim (subscriber, #9252)
[Link] (1 responses)
How is it different from how you make up piece of documentation that never existed and write “ I have even gave you benefit of doubt and asked just where it was documented that way. And in the end, instead of showing us something like that you decided to blame the documentation for your inability to read: “it is perfectly reasonable to interpret the two as synonymous when reading documentation casually”. And when I ridiculed it editor only had to say this: “This is an on-topic discussion, but please remember to keep things polite”.
And yes, I agree, I wasn't polite, but I, at least, stick to the facts. You want to say that only you can lie (that one even you admitted as a lie: “but literally nobody follows the strict provenance rules anyway, since they're explicitly marked as experimental and non-normative“) and then use these lies to misrepresent things? Others couldn't do that? Let me quite myself, for a change (you haven't misquoted me, just ignored what I say): Rust developer would like to hear “hey, I don't know how to implement this kind of design while staying in the boundaries outlined by the strict provenance approach” stories. Not stories related to someone's inability to read or understand the documentation (documentation writers do accept patches for such cases, but, as noted, it's not clear how to change the documentation and force someone to actually try to read and understand it… I'm not even sure that's possible at all)! And you tell me that we are discussing exactly such story here. But that's big fat lie, author of the original example even admitted that original code that was discussed was brought to compliance with stacked borrows (and the whole example that we are discussing here really-really doesn't need any pointers and thus stacked borrows or other such horrors, if you want to eliminate bounds checking then call to one Thus it doesn't see an example of design that is limited by strict provenance. Not even remotely close to it. Sorry.
Posted Sep 20, 2024 11:20 UTC (Fri)
by daroc (editor, #160859)
[Link]
Posted Sep 19, 2024 11:53 UTC (Thu)
by daroc (editor, #160859)
[Link]
> You couldn't tell the difference between “one” and “many”? That's explained pretty early in most schools. Maybe you forgot?
This is not polite or respectful, and doesn't add anything interesting to your technical point. Please avoid insulting other commenters.
Posted Sep 21, 2024 6:45 UTC (Sat)
by ralfj (subscriber, #172874)
[Link]
Kind regards,
Posted Sep 19, 2024 13:42 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (3 responses)
To restate khim's post much more briefly: the difference is that as_ptr returns a raw pointer while get_unchecked returns a reference.
I agree that it'd be helpful to say this in the docs (as well as to mention that you can call get_unchecked with a range if you want access to more than one element, which is how I ended up fixing my original code). I don't even think you need to mention C.
Though having said that, `get_unchecked` is a very old method and one of the classic "here's a case where you'd need to use unsafe" methods, so I'm skeptical that many users will carefully (or even casually) read the docs.
Posted Sep 19, 2024 14:20 UTC (Thu)
by khim (subscriber, #9252)
[Link] (2 responses)
Note that Your situation is a bit unusual because you have immediately decided to skip all the range-checking without testing and without verifying that these are the bottlenecks and are worth eliminating (note that if they are easy to eliminate for human then often compiler can easily eliminate them, too, after inlining, which means these checks are not, necessarily, a performance bottlenecks and often where they are performance bottlenecks they are, often, desirable because if compiler couldn't prove that they are not needed then chances are high that it's not actually guaranteed by the structure of your code, either). That's why you have missed all these explanations (that are there for get, but not for it's range-check-skipping sibling). That's not how slice is supposed to be used, normally! I guess documentation was just not written with people which would try to remove as many guardrails as possible as early as possible. Are you even sure in your case use of get_unchecked actually brings measurable performance wins over the use of get?
Posted Sep 19, 2024 19:09 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (1 responses)
This is an incredible inference from example code I posted to illustrate pointer provenance rules and which has never actually been deployed in the form I described.
Posted Sep 19, 2024 19:35 UTC (Thu)
by khim (subscriber, #9252)
[Link]
That's not not inference from the example code, but more of inference from your comments. You are using Forgive me, but I fail to see how one would end up in that situation except if one have apriori decided for oneself that one is not interested in writing normal, safe code and would stick to the removal of Rust-provided guardrails as much as possible. Note: I'm not even saying that's it's wrong to write code that way. Maybe, for the kernel needs, it's even the best way and you haven't forgotten to erect manual check after you removed the normal Rust-provided provided check there. But that definitely puts you into a different usage mode: normally it's expected that Rust developer would stick to normal, “safe” interface as much as possible and would only remove guardrails where that's truly justified. Going down that “normal” road it's almost impossible to miss detailed specification that explains what get function works, how is it supposed to be used, there are examples with a single argument and a range, etc.
Posted Sep 19, 2024 9:11 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
You're right to say that the rules for provenance haven't yet been fully formalised, and I suspect that the operation you're talking about is in that set.
Right now, the "real" rules for pointer provenance are "whatever LLVM happens to do on this machine at this version, given that LLVM doesn't want to do anything that will shock C or C++ programmers". Strict provenance is a set of rules that are guaranteed to be stricter than whatever the final rules will be; the intention is that the final formalization of provenance rules will end up being equivalent to "strict provenance, except that these things banned by strict provenance are actually allowed in the final rules".
It'll probably not be written that way, though; there will be strict provenance rules (as today), and then a set of rules that are verified to be a superset of strict provenance rules. There may even be multiple tiers of this, where each tier is both less strict and harder to verify your code against than the tier above.
The idea remains, though, that if your code is compliant with strict provenance rules, it will definitely be compliant with the final rules for pointer provenance; whether those be PVNI-ae-udi from draft N2676 for the C language, or something with more subtleties in it. And there's thus a promise from the Rust compiler developers that they'll avoid breaking unsafe code that complies with strict provenance, even though the final provenance rules haven't yet been determined.
Posted Sep 18, 2024 9:35 UTC (Wed)
by donald.buczek (subscriber, #112892)
[Link] (6 responses)
Can you identify a concrete rule which is violated?
- https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping....
I can't.
Posted Sep 18, 2024 11:09 UTC (Wed)
by khim (subscriber, #9252)
[Link] (5 responses)
The problem is not with “miri checks”, the problem is with stacked borrows model, itself. Can we, please, stop the language lawyering. That's how C turned into the disaster, there are no need to repeat that story. The idea here is to use conservative rules as long as the work and talk with language developers about complicated cases when these conservative rules couldn't be satisfied.
Currently there are two sets of rules: stacked borrows and tree borrows. They are a bit different and both support some code that the other rejects. Miri supports both. Try to run your code with them and see if it would accept it. It would be nice to write somewhere which model are you using, but please don't dismiss them. There are known issues, sure (e.g. Rust doesn't provide any way to attach pointer provenance to function pointers and yet Miri would complain about these issues). Discussion is still ongoing but since that's an important case where even documentation recommends things that Miri currently rejects you may assume it's Ok to use it, for now. The main problem with Of course in that same sense the whole C/C++ are not finalized WRT what can be done with pointers and what couldn't be done with pointers, but in Rust people try to create sane rules… that's harder than ignoring problems and doing nothing, of course, but still… if you have legitimate design patterns that couldn't be reconciled with Miri in both modes – talk to Ralf, file bugs, etc. Don't dismiss these issues by saying “oh. well, these are just some experiements, it's Ok to ignore them”. P.S. Frankly, I suspect it could be easier to use tree borrows with Rust-in-Kernel because it allows more tricks that expand pointers validity while cases which it rejects are mostly irrelevant for kernel, but to know for sure we need to run code with both and compare.
Posted Sep 18, 2024 11:51 UTC (Wed)
by pizza (subscriber, #46)
[Link] (4 responses)
Language lawyering is how "problems" with your language are identified [1]. How those "problems" are handled/resolved is another matter entirely.
[1] Strongly enforcing a still-growing set of highly opinionated rules is the *entire point* of Rust.
Posted Sep 18, 2024 12:19 UTC (Wed)
by khim (subscriber, #9252)
[Link] (3 responses)
Sure. That's how you cause dissent and doubt. Why is that needed here? No. The “entire point” of Rust is to make program safer. Highly-opinionated rules (as well as any other rules) help as long as they are followed. And language lawyering (attempt to use the documentation as holy gospel while forcibly ignoring things that exist outside of it) is not how rules are followed, but the exact opposite: it's the way to dig extra rules from the exact same text. It's useful activity only as long as you are planning to improve the documentation itself. To show that documentation contradicts the implementation or is deficient for some other reason (e.g. if it have self-contradictions or something that couldn't be supported), etc. When you are doing that to find an excuse to do or not do something… this just splinters the community, because different people may “discover” different things in the exact same “holy” text.
Posted Sep 18, 2024 12:35 UTC (Wed)
by pizza (subscriber, #46)
[Link] (2 responses)
...And what do you do if the rules are ambiguous or do not match reality?
while (<undesireable behavior occurs>) {
Posted Sep 18, 2024 13:03 UTC (Wed)
by khim (subscriber, #9252)
[Link] (1 responses)
The important thing is what you don't do. You don't say “can you identify a concrete rule which is violated?” This asserts that only documentation and rules included in it matters and everything outside of it just simply doesn't exist or could be ignored.
Posted Sep 19, 2024 9:01 UTC (Thu)
by donald.buczek (subscriber, #112892)
[Link]
I don't understand why you have to devalue your otherwise interesting answers by using unnecessary fighting words.
> You don't say “can you identify a concrete rule which is violated?”
This interpretation of my question is wrong.
Posted Sep 20, 2024 2:48 UTC (Fri)
by comex (subscriber, #71521)
[Link] (1 responses)
It is undefined behavior according to Stacked Borrows. But it's not undefined behavior [1] according to Tree Borrows, which is an alternative model that Rust may end up officially adopting instead of Stacked Borrows. Ralf Jung has also explored the idea of modifying Stacked Borrows to make it legal [2]. See also the canonical issue report about this pattern [3].
Broadly speaking, this pattern is something that would be nice to make legal at least for immutable references / reads, because there aren't any current optimizations, or highly-desired future optimizations, that would break it.
The other pattern you cited - aliasing with mutable references - is different. It needs to be UB in _some_ form, because it's used to justify an important optimization: marking function parameters of reference type with LLVM's `noalias` attribute, equivalent to C `restrict`. Since most C code does not use `restrict`, this is and will remain an area where Rust unsafe is harder to reason about than C.
That said, Tree Borrows affects this pattern as well, and at least reduces the amount of UB compared to Stacked Borrows. Specifically, if you create an aliasing mutable reference but don't actually write through it (at least not to the same bytes that you later read from), then you're probably okay under Tree Borrows. [5]
(For pedantry's sake, I should mention that even Stacked Borrows' rule is less strict than how you phrased it, but only slightly.)
[1] https://github.com/BoxyUwU/rust-quiz/issues/11
Posted Sep 21, 2024 6:22 UTC (Sat)
by ralfj (subscriber, #172874)
[Link]
Unfortunately those modifications wouldn't make this particular code legal. I don't know any way to achieve that without going to something like Tree Borrows.
Posted Sep 21, 2024 6:18 UTC (Sat)
by ralfj (subscriber, #172874)
[Link]
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
It is hard to find a set of rules for these aliasing requirements that is precise, can be automatically checked, and allows the desired optimizations. We don't have the final answer yet. That's why there are no strong warnings against this kind of code in the docs: It is deep inside the "gray area", and it is not something we actually already use for optimizations. Restricting references to only access the field/element they point to is something I strongly think we should relax (and that's what the discussion at https://github.com/rust-lang/unsafe-code-guidelines/issue... is about). This is especially true for your case, where the reference points into an array, for the reasons you give -- pointer arithmetic inside arrays is legal even in C, so we should avoid such pitfalls in that space as much as we can. (Unlike C, Rust also permits pointer arithmetic in structs if you use raw pointers instead of references.) But it's hard to make Stacked Borrows accept this code without making Stacked Borrows accept *too much* code.
Nobody forces you to make your code compatible with Stacked Borrows, it is an *experimental* model after all. But if you want to be as sure as you can currently be that your code is fine with whatever the future aliasing model ends up being, then it is a good idea to do the extra work and fix Stacked Borrows errors. Think of it like adding a safety margin -- we don't know where the edge between "UB" and "okay" will be, so it is better to err on the side of staying a bit conservative and avoiding the gray area. If you are okay with living a bit more "on the edge" and an increased risk of having to adjust your code in the future as the aliasing model solidifies, you can use "-Zmiri-tree-borrows" to switch to a different, even more experimental aliasing model that *will* accept your code. This model should still be able to catch all UB that is exploited by today's optimizer (but the optimizer can change in the future, and we're not ready yet to commit to a hard set of rules here).
> so I am fairly confident that this is actually undefined behavior, and not just extra paranoia
It is Undefined Behavior under *experimental* rules, as the error clearly indicates. The rules are deliberately conservative, so there is some extra paranoia involved here. We don't currently instruct the optimizer to actually make us of this particular UB, but there is other Stacked Borrows UB that we *do* use in the optimizer, and a bunch of Stacked Borrows UB that we *may want to* use in the optimizer. It's unclear how to tell these apart, hence this error.
Posted Oct 6, 2024 18:13 UTC (Sun)
by hkBst (guest, #173872)
[Link]
```rust
fn bad_load_int_le(buf: &[u8; 32], i: usize) -> u32 {
// this let's you use Miri
[1]:https://play.rust-lang.org/?version=stable&mode=debug...
Unsafety can be subtle
Unsafety can be subtle
Unsafety can be subtle
fn load_int_le(buf: &[u8; 32], i: usize) -> u32 {
buf[i .. i+4].iter().rfold(0, |n, &b| (n << 8) + (b as u32))
}
fn load_int_le(buf: &[u8; 32], i: usize) -> u32 {
let b: [u8; 4] = buf[i .. i+4].try_into().unwrap(); // convert to a fixed-size array type
u32::from_le(unsafe { mem::transmute(b) })
}
Unsafety can be subtle
> In this case your code appears to be correct, but I don't think "avoid pointers, always reach for transmute first" is a reasonable policy for authors of unsafe code.
Unsafety can be subtle
transmute
have pretty-much unlimited chaos potential, but it have fewer failure modes than transmute_copy
(compiler can at least verify that sizes of objects match) and, of course, transmute_copy
have fewer failure modes then copy_nonoverlapping
(in addition to all failure modes that transmute_copy
you now have to deal with possibilities of having two objects overlapping and pointer being invalid, etc).copy_nonoverlapping
).
pub fn read_le_u32(input: &[u8]) -> u32 {
if input.len() < 4 {
unsafe { std::hint::unreachable_unchecked() }
}
let (int_bytes, _) = input.split_at(std::mem::size_of::<u32>());
u32::from_le_bytes(int_bytes.try_into().unwrap())
}
transmute
, transmute_copy
or copy_nonoverlapping
if you really just want to eliminate the bounds check.Unsafety can be subtle
Unsafety can be subtle
Slice to `[u8;4]`
use std::convert::TryInto as _;
pub fn load_int_le(buf: &[u8; 32], i: usize) -> u32 {
let slice = &buf[0..4];
u32::from_le_bytes(slice.try_into().expect("load_int_le slicing issue Q1234Qload_int_leQ32Z32Z"))
}
Slice to `[u8;4]`
Slice to `[u8;4]`
Slice to `[u8;4]`
Slice to `[u8;4]`
About the new version: Wow, writing that much code to generate a function with a single asm instruction...
Slice to `[u8;4]`
Slice to `[u8;4]`
Slice to `[u8;4]`
Slice to `[u8;4]`
Wol
Unsafety can be subtle
Strict Provenance experiments
> But literally nobody follows the strict provenance rules anyway, since they're explicitly marked as experimental and non-normative.
Unsafety can be subtle
unsafe
have too many rules. Nope. In reality it has much smaller amount of rules than C. And that makes it harder to write unsafe
code, ironically enough.Unsafety can be subtle
> I still cannot find a single, authoritative source which explicitly documents all of the operations that can shrink provenance, and I'm not 100% convinced that it has even been formalized in the first place.
Unsafety can be subtle
get_unchecked
Unsafety can be subtle
> Because the reference was returned by slice::get_unchecked, and slice::get_unchecked is documented to have the same semantic meaning as C pointer arithmetic (but with the added constraint that it is illegal to construct a "one past the end" pointer, unlike in C where that is legal).
Unsafety can be subtle
Unsafety can be subtle
Unsafety can be subtle
> It is quite unreasonable to assume that people writing unsafe Rust have no preconceptions from C whatsoever.
Unsafety can be subtle
unsafe
Rust is a C programmer.unsafe
to be written. And the remaining 1%… they are still working on it.unsafe
code in the first place! And the second line is to use sound API. And third line is strict provenance. And only after all these possibilities are exhausted you go to these forums in a search of a solution.unsafe
code in Rust), but, ultimately, only developer can be responsible for the conformance with the rules. Part where compiler does it's magic and where you don't have to carefully scrutinize every word exists on the other side of unsafe
!Unsafety can be subtle
> I literally just told you a story like that, and you responded by making up a bunch of things I didn't say,
Unsafety can be subtle
slice::get_unchecked
is documented to have the same semantic meaning as C pointer arithmetic” then use that “a bunch of things [others] didn't say” as justification for your assertions?unsafe
function with literally zero arguments is absolutely enough).Unsafety can be subtle
Unsafety can be subtle
Unsafety can be subtle
Ralf
Unsafety can be subtle
> Though having said that, `get_unchecked` is a very old method and one of the classic "here's a case where you'd need to use unsafe" methods, so I'm skeptical that many users will carefully (or even casually) read the docs.
Unsafety can be subtle
get_unchecked
is, actually, range-check-skipping version of get and it even explicitly tells you: For a safe alternative see get (with clickable link!), which actually then tells you that you may either access one element or range of elements with this method.Unsafety can be subtle
> This is an incredible inference from example code I posted to illustrate pointer provenance rules and which has never actually been deployed in the form I described.
Unsafety can be subtle
unsafe
function to do some manipulations that usually are done with normal, safe, code in an unsafe way by using function that's sole purpose is to save a tiny amount of execution time by bypassing the usual safety checks. And then you add debug_assert
that, essentially, restores error-checking that is normally used by Rust programs unconditionally in debug build. And then complain that function that you used to bypass normal safety checks have poor documentation – while asking for clarifications that are already present in the sibling version of that same function, that's already referred as “safe alternative” there and which is one click away. Which means that, you probably, have never read the documentation for the safe variant: how else would you not notice that this variant already addresses all you concerns?Provenance rules not yet formalised
Unsafety can be subtle
- https://doc.rust-lang.org/std/ptr/index.html#safety
> Are you sure, this is UB and not just a case which is not yet handled by miri's stacked borrow checks?
Unsafety can be subtle
unsafe
Rust is not that it's rules are too complicated but that these rules are not finalized!Unsafety can be subtle
> Language lawyering is how "problems" with your language are identified
Unsafety can be subtle
Unsafety can be subtle
if (<behavior allowed by spec>) {
if (!<we care>) {
break; // The typical answer of C language stewards
} else {
// bug in specification
fix_spec(); // The typical answer of Rust language stewards
}
} else {
// bug in compiler/tooling
fix_implementation();
}
}
> ...And what do you do if the rules are ambiguous or do not match reality?
Unsafety can be subtle
Unsafety can be subtle
> This asserts that only documentation and rules included in it matters and everything outside of it just simply doesn't exist or could be ignored.
Unsafety can be subtle
[2] https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Making.20Stacked.20Borrows.20better.20with.20ugly.20hacks
[3] https://github.com/rust-lang/unsafe-code-guidelines/issue...
[4] https://github.com/rust-lang/unsafe-code-guidelines/issue...
[5] https://perso.crans.org/vanille/treebor/protectors.html#n...
Unsafety can be subtle
Unsafety can be subtle
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/... for further information
Why this code possibly has undefined behavior using strict providence
use std::{mem,ptr};
assert!(i + mem::size_of::<u32>() <= buf.len());
let mut data = 0u32;
unsafe {
ptr::copy_nonoverlapping(
buf.get_unchecked(i), // provenance problem because only references the first byte
//buf.get_unchecked(i..i + mem::size_of::<u32>()).as_ptr(), // reference all four bytes
//buf.as_ptr().add(i), // reference entire buf
&mut data as *mut _ as *mut u8,
mem::size_of::<u32>(),
);
}
data.to_le()
}
fn main() {
let data = [0u8; 32];
bad_load_int_le(&data, 0);
}
```