Beingessner: Rust's Unsafe Pointer Types Need An Overhaul [LWN.net]

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 16:23 UTC (Mon) by smoogen (subscriber, #97) [Link]

Going from whenever I deal with pointers for too long, I feel this is a good quote of the week (from teh end of the blog)
```
I think about unsafe pointers in Rust a lot.

I wrote this all in one sitting and I really need dinner.

Head empty only pointers.
```

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:01 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (51 responses)

Seems like the language is starting to face the real world of computers, in which absolutely nothing distinguishes a pointer from an integer, both are used and stored interchangeably in registers, it's only how they are _used_ that makes us consider one is arbitrarily a pointer or an integer. On x86 a number of instructions even support indirect memory accesses involving [reg1+reg2*scale+disp]. One would think that reg1 is always the pointer, reg2 an index and disp a relative displacement, but it can be used in any form, including as indexes from local variables whose address is known (hence the pointer in disp), and reg1/reg2 can easily be exchanged when the scale is 1.

IMHO that's what "volatile" and "register" address in C (and not in the most elegant way, admittedly). "volatile" may be aliased by anything and will always be reloaded when read. "register" may never be aliased at all and the compiler will happily optimise their accesses.

Ideally we'd need a simplified mechanism in a language to indicate that certain pointers may alias only their own type, nothing at all or everything, and that they may be aliased by the same factors. with this, developers could choose their constructs without having to worry about what the compiler does behind (exactly like they do in assembly). Having to pretend that something is a register to prevent it from being aliased is annoying and limited since you cannot take its pointer to pass it anywhere. But if we could say "this never aliases anything" some constructs could be more easily optimized. Maybe some scopes would be useful (sort of aliasing barriers for certain variables).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:11 UTC (Mon) by acarno (subscriber, #123476) [Link] (6 responses)

So... basically an Ada access type?

http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-3-10...

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:42 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (5 responses)

Interesting, thanks for the link, I wasn't aware that Ada did that. It would be interesting to compare evolutions of all such languages and the classes of bugs or the complexities in developing certain classes of programs.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 22:52 UTC (Wed) by acarno (subscriber, #123476) [Link] (4 responses)

As someone who writes Ada on a daily basis, it's entertaining to read folks who fawn over Rust, since Ada offers a lot of the same benefits. That's not to say I think Ada is better than Rust, just that it was very much ahead of its time as a language.

Now if only the initial round of tooling had been better back in the 80s, it may not have suffered from its reputation as resource-intensive and unwieldy. ¯\_ (ツ)_/¯

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 23:01 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

The last time I looked at Ada, it seemed like pointers ("access types") were an afterthought there. Is there a good documentation about something like borrow checker but for Ada?

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 25, 2022 16:58 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

There's some Ada proof assistant-like project named SPARK that I read an article about the beginnings of Rust' borrow checker influencing its design or progress. I may be mis-remembering details as to what SPARK is or how influential Rust has been on it.

https://blog.adacore.com/using-pointers-in-spark

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 25, 2022 19:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I admittedly hasn't looked into SPARK within the last 5 years, but way back then it didn't support non-static allocations. It was actually a "feature" there - bounded resource usage (because it can't be dynamic).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 25, 2022 19:55 UTC (Fri) by ms_43 (subscriber, #99293) [Link]

See this FOSDEM 2021 talk by Claire Dross:

Proving heap-manipulating programs with SPARK
The SPARK open-source proof tool for Ada now supports verifying pointer-based algorithms thanks to an ownership policy inspired by Rust

https://archive.fosdem.org/2021/schedule/event/safety_ope...

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:31 UTC (Mon) by pm215 (subscriber, #98099) [Link]

The parts of the article that talk about CHERI, on the other hand, are dealing with the real world of a computer that absolutely *is* distinguishing a pointer (which has the untamperable metadata saying it's a valid pointer) from an integer (which doesn't, even if the 64 bits of 'value' are the same, and will fault if you attempt to use it as a pointer)...

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 20:37 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> "volatile" may be aliased by anything and will always be reloaded when read.

Officially, volatile gets you exactly nothing in terms of the abstract machine semantics and the strict aliasing rule. In practice, if all of the aliasing variables are volatile, it's unlikely that most "reasonable" compilers will have issues, but it's still UB and so the entire code path is still considered poisoned. It's possible that the compiler assumes, for example, that the code path in which the aliased write happens is never executed while the other alias exists, and therefore makes incorrect simplifying assumptions about the overall flow of control.

The purpose of volatile is to control memory-mapped I/O and other hardware that does "magic" stuff to your memory/address space. It is not to enact an end-run-around the strict aliasing rule.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 21:05 UTC (Mon) by walters (subscriber, #7396) [Link]

See e.g. https://www.ralfj.de/blog/2020/12/14/provenance.html which talks about this in the context of C and LLVM.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 22:40 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (39 responses)

The sentence "IMHO that's what..." trails off without ever reaching any conclusion, but perhaps that's appropriate since whatever you were going for here, no they don't.

It's true that you can't alias variables with storage class register because you can't take their address, although this was a relatively late addition (K&R C does not have this rule), but there are no rules about aliasing volatile at all, neither allowing nor forbidding.

The standard says that register is merely a hint [which today your compiler almost certainly ignores], that it be a good idea to put this variable in a register and serves no other purpose despite the restriction.

Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO. This is tricky to get right but since C doesn't provide any intrinsics for this purpose it's the only way that's even a little bit portable. All other uses of volatile are platform specific (in the good cases) or just voodoo / cargo cult C, sprinkled on by people who are hoping maybe the bug goes away if they write volatile in more places.

> But if we could say "this never aliases anything" some constructs could be more easily optimized.

Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 22:53 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (8 responses)

>> But if we could say "this never aliases anything" some constructs could be more easily optimized.
Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.

To be fair, C does have the restrict keyword. But that's more or less the opposite of register or Rust's borrow checker (i.e. instead of the type system preventing aliasing from happening and promising the programmer that it has done so, the programmer prevents aliasing from happening and promises the type system that they have done so), and this arguably makes it less useful in more complicated cases.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 9:21 UTC (Tue) by roc (subscriber, #30627) [Link] (2 responses)

'restrict' seems like a nightmare in practice. You'll need a whole new sanitizer to detect restrict violations, and you'll still have restrict violations in untested code paths. Violations of 'restrict' will be painful to track down.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:04 UTC (Tue) by immibis (subscriber, #105511) [Link]

Everything in C is like that. I assume that C programmers are accustomed to thinking very slowly and carefully about their programs.

Who am I kidding.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 19:19 UTC (Wed) by bartoc (guest, #124262) [Link]

I mean sort of, but its the default in Fortran and that doesn't cause all that many problems (though the kind of programs that tend to be written in Fortran don’t alias much anyways

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:53 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (4 responses)

> To be fair, C does have the restrict keyword. But that's more or less the opposite of register or Rust's borrow checker

Kevin, could you please explain me "restrict" ? I've started to see it a few years ago in includes and man pages, and all info I've read on it were incomprehensible to me. I've always been interested in strong typing (and am using const a lot). I'd like to know if "restrict" may bring me anything at all or if I shouldn't care.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 14:05 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

"restrict" is a promise from the programmer to the compiler, which is why it's a pain to understand.

Using the following names for exposition:


int foo[16];
int * foo_ptr = &foo[0];
int * restrict foo_restrict = foo;

With foo_ptr, the programmer makes no promises about aliasing. There can be a second pointer to any element of foo, and you can use foo[2] and *(foo_ptr + 2) interchangeably.

"restrict" makes a promise to the compiler about using overlapping names, and hence a promise that no aliasing is used for as long as the "restrict" pointer is alive. For as long as foo_restrict is alive, you promise not to access foo directly, or via foo_ptr, and you promise that if you use *(foo_restrict + 4), you have not accessed foo[4] any other way since foo_restrict was initialized, and that you will not access it any other way (e.g. via foo[4], or *(foo_ptr + 4)) until the lifetime of foo_restrict ends.

The usual concrete example is memcpy versus memmove; the inputs to memcpy are "restrict" pointers, because if you do memcpy(foo, bar, 16 * sizeof(foo[0]));, you promise the compiler that until memcpy returns, *(foo + 0) through *(foo + 15) cannot be accessed via *(bar + offset). memmove, on the other hand, permits that overlap, so its input pointers cannot be marked restrict.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 14:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (2 responses)

Many thanks. That's exactly the explanation I was missing, and your memcpy() vs memmove() example is point on!
I think I'm seeing a few cases where that could help, especially when some asm() statements are used and the
compiler cannot figure that some values cannot have changed there. At least now I know what to look for and
how to experiment.
Thank you!

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 4:11 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

I don't see much in this thread that wasn't already on https://en.wikipedia.org/wiki/Restrict (and probably elsewhere) but if you do then please go and edit that page.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 10:34 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

It just turns out that wikipedia is not exactly the first place that comes to my mind when searching for the definition of a language keyword :-) But indeed it seems clear enough there as well.

What I previously found was this: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Restricted-... but it wasn't very clear to me. Of course there's nothing wrong in it, it's just that when the use cases are unclear to you they can remain unclear after reading the doc.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 23:29 UTC (Mon) by jlombera (guest, #155698) [Link] (26 responses)

> Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO. This is tricky to get right but since C doesn't provide any intrinsics for this purpose it's the only way that's even a little bit portable. All other uses of volatile are platform specific (in the good cases) or just voodoo / cargo cult C, sprinkled on by people who are hoping maybe the bug goes away if they write volatile in more places.

When accessing/modifying shared memory between processes/threads, volatile is some time the right thing to do to ensure stores/loads to/from memory. Thus it's not limited to MMIO.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 1:26 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (25 responses)

The Right Thing™ in your scenario is atomics. Asking for volatile and expecting atomics is mostly not dangerous on Windows, using Microsoft's C++ compiler.

On other platforms, with other compilers, you get what you asked for, not what you expected. Maybe you get lucky and maybe you don't. Maybe if you get unlucky you can write "volatile" in a few extra places and now it works. Voodoo.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 1:47 UTC (Tue) by jlombera (guest, #155698) [Link] (24 responses)

I don't see how this is atomics. Sure you might want/need atomics/synchronizations in addition to volatile is some cases, but the use of volatile in this case is to ask the compiler not to optimize access to certain memory, always go to memory.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 3:34 UTC (Tue) by mrugiero (guest, #153040) [Link] (23 responses)

You probably still need atomics because "go to memory" in pretty much all non-microcontroller hardware really means "go to cache", and if you're sharing with other threads/processes you really want to synchronize the data, not just makes sure it gets out of the register. Your C program typically doesn't know about nor can control how cache gets flushed and sync'd. You may not notice the problem in x86 because MESI solves it behind the scenes, but that's not universal.
For MMIO it works because the OS can mark a page as cache disabled so it goes straight to "memory" (which really is a mapped device).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 4:16 UTC (Tue) by jlombera (guest, #155698) [Link] (22 responses)

> Your C program typically doesn't know about nor can control how cache gets flushed and sync'd.

The processor knows, though. All the compiler needs to do is not to register-optimize and emit memory access instructions instead, the processor takes care of maintaining cache coherence.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 4:45 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (16 responses)

To the best of my understanding, this is somewhat true on x86, and not at all true on basically any (modern) architecture other than x86 (e.g. ARM).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 5:02 UTC (Tue) by jlombera (guest, #155698) [Link] (15 responses)

I can assure you "volatile" works as described on at least X64, IA64, SPARC, IBM Z (don't remember the actual name of the arch) using different compilers (GCC, Clang, MSVC, ICC, ACC, Solaris Studio, XLC).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 11:29 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (14 responses)

"Volatile" is a C qualifier but the CPU doesn't run C it runs machine code. And so your volatile storage qualifier is long gone by the time the CPU is running your program.

You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do, on x86 (and x86-64) you get Acquire/Release semantics (but not full consistency) "for free" (in fact everybody is paying for these semantics, all the time on this platform so it's only "free" the same way the ice is "free" with a $5 coke in a restaurant) but on other platforms if you don't see the CPU being told to do this work it's not doing the work. Maybe you get away with it, and maybe you don't. You are gambling every time.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 16:14 UTC (Tue) by jlombera (guest, #155698) [Link] (12 responses)

>"Volatile" is a C qualifier but the CPU doesn't run C it runs machine code. And so your volatile storage qualifier is long gone by the time the CPU is running your program.

Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.

>You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do

Sure, feel free to play with this (very contrived) example in godbolt.org (sorry for the formatting, I couldn't fine a way to make this work as a plain text comment):

```
void f(volatile int *x_p) {
while (!*x_p)
;
}

void g(int *x_p) {
while (!*x_p)
;
}
```

Feel free to play with different compilers, optimization levels, even different archs. You'll see that in every case, in the loop in f() *x_p is read from memory in every iteration, whereas for g(), different kind of optimizations are performed.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 17:13 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (8 responses)

> You'll see that in every case, in the loop in f() *x_p is read from memory in every iteration, whereas for g(), different kind of optimizations are performed.

Nobody is disputing that. We are telling you that the compiler will fail to emit acquire/release memory barrier instructions on non-x86 platforms, and without those, you get no cross-thread guarantees.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 25, 2022 21:33 UTC (Fri) by ncm (guest, #165) [Link] (7 responses)

A quibble here: Apple M1 is said to provide memory/cache semantics compatible with x86, and thus stronger than other-ARM or POWER.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 26, 2022 22:13 UTC (Sat) by foom (subscriber, #14868) [Link] (6 responses)

Apple only enables the x86-like TSO mode in their CPU when running emulated x86 code. It uses the usual ARM memory ordering guarantees otherwise.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 28, 2022 17:13 UTC (Mon) by MrWim (subscriber, #47432) [Link]

I wonder what the performance impact of this being enabled is?

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 29, 2022 10:58 UTC (Tue) by nye (subscriber, #51576) [Link] (4 responses)

I find it fascinating that they've designed their hardware specifically to make it easier to emulate a different architecture. Is this sort of thing done a lot?

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 29, 2022 11:46 UTC (Tue) by excors (subscriber, #95769) [Link] (1 responses)

Maybe Nvidia's Denver counts? From early descriptions, it sounds like it was designed with a native ISA that closely matches its microarchitecture (in-order, very superscalar, etc), and it emulates the ARM ISA via both a hardware ARM instruction decoder (low latency but poor utilisation of the unusual microarchitecture) and a firmware-based dynamic ARM-to-native translator (high latency but can heavily optimise hot loops). And it would have also emulated x86 if not for patent licensing issues.

Denver reportedly provides sequential consistency (https://threedots.ovh/blog/2021/02/cpus-with-sequential-c...) so it's even stronger than x86, regardless of what mode it's running in - maybe that made sense when they were originally planning to support x86, because it'd be cheaper than emulating x86 on a weaker model, and simpler than exactly implementing x86's weird TSO model.

On the other hand, that doesn't sound terribly different to all modern CPUs, where the ARM/x86/etc instructions are heavily translated into microarchitecture-specific micro-ops - you could view them all as essentially emulating ARM/x86, and their microarchitectures are designed to do that emulation efficiently. The main difference is just that they're designed for a single ISA (or a related pair like A32 and A64), whereas Denver and Apple M1 were designed for both ARM and x86 (even if Denver only ever shipped as ARM).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 30, 2022 9:53 UTC (Wed) by nye (subscriber, #51576) [Link]

Quite different but very interesting, especially with the firmware caching translated code in main memory; thanks for the pointer. Apparently they had some former Transmeta employees on the team, which makes sense.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Apr 4, 2022 2:41 UTC (Mon) by ncm (guest, #165) [Link] (1 responses)

It is absolutely the norm, going back to the early 1960s.

The current Intel chips emulate all previous Pentiums, both 64 and 32 bits, which emulated the 486, which emulated the 386, which emulated the 286, which emulated the 8086, which was jimmied to emulate 8080, which emulated 8008. The Z-80 also emulated 8080. IBM z emulates 370, which emulates 360, which could emulate 1401, which I think emulated 701. The DEC Alpha was able to emulate x86 faster than any native x86 of the time could run.

So, emulating is almost the norm. There are said to be businesses running programs from the '60s on machines emulating five levels deep.

But that Apple turns off x86-style bus ordering when not emulating suggests they can run faster without it. It is usually said that Intel avoids a speed penalty for their memory model by throwing way more transistors at the problem than seems reasonable to other makers. So, this looks complicated.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Apr 11, 2022 16:22 UTC (Mon) by flussence (guest, #85566) [Link]

Most of that's right, but it's worth noting new Intel chips are actually starting to pull up the 80xx compatibility ladder behind them. While OSes have long since grown past the need for 40-year-old quirks (and userspace emulation is more than good enough to not care any more), the BIOS was still a major holdout until recently.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 18:27 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.

And that's not enough. Not even on x86. You know that, right?

Intel 8086 included lock prefix from the very beginning! And you can not force compiler to use it with volatile. End of story.

Yes, with C89 you had no choice but to use assembler with some volatile sprinkled here and there. C11 offers atomics which provide much more concise and usable semantic.

Don't use volatile except in kernel, please. It's not needed and harmful.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 20:13 UTC (Tue) by jlombera (guest, #155698) [Link] (1 responses)

Sorry, I think we just keep talking about different things. You keep bringing atomics, reordering and serialization issues to the discussion. I already conceded those might be required in addition (or instead of) to "volatile" depending on what you are trying to achieve (e.g. serialization), but those are not the use case of "volatile".

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 0:20 UTC (Wed) by excors (subscriber, #95769) [Link]

Could you give specific examples of use cases with shared memory between threads where volatile is sufficient?

The main thing I can think of is e.g. performance counters, where one thread updates them and another thread periodically reads them, and where you don't care if it reads slightly stale values (but not worse than a few usecs) or reads each counter in an unpredictable order. In that case, you do need something like volatile (to ensure the first thread doesn't hold the counter in a register for many usecs) but you don't need any further synchronisation guarantees. You also need atomic reads/writes, which I don't think volatile guarantees, but in practice it's probably okay if it's an aligned word-sized value.

Probably you could also do a simple form of mailboxes, where the producer thread does "while (m != 0) {}; m = 42;" and the consumer thread does "while (m == 0) {}; do_work(m); m = 0;", where (I think? but not certain) there are hopefully enough implicit dependencies that it will always behave as expected on any CPU. (But that won't work if you want to share more than a single word, because the mailbox message won't be synchronised with any other memory access.)

Those seem very niche cases, though. And you can easily do them with C++/C11 atomics using memory_order_relaxed (which adds no synchronisation barriers but does guarantee atomicity, like a more well-behaved volatile). I'm not aware of any drawbacks of memory_order_relaxed over volatile, and the benefit is it can be combined with acquire/release accesses (to the same variable or to others) for cases where synchronisation is important (which is nearly all cases involving shared memory).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 4:17 UTC (Wed) by marcH (subscriber, #57642) [Link]

> so it's only "free" the same way the ice is "free" with a $5 coke in a restaurant

BTW https://queue.acm.org/detail.cfm?id=3212479

> The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 9:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (2 responses)

> the processor takes care of maintaining cache coherence

The CPU doesn't promise sequential consistency, because doing so would make it (much) slower. So now your program doesn't have sequential consistency. This is inherently a very difficult environment in which to write programs at all, but C and C++ don't bother you with that trouble because both languages have the same rule about sequential consistency: If your program doesn't exhibit sequential consistency it instead has Undefined Behaviour and they wash their hands of you entirely.

Again, you can write "volatile" on some more variables and maybe you get lucky and on the CPU you're working with the extra spills cause a cache flush, or forces an extra wait cycle somewhere and it happens to mask the bug. And then maybe somebody buys a CPU with more L1 cache, or a different cache policy and now the mysterious bug is back. You are using the wrong tool for the job.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 15:45 UTC (Tue) by jlombera (guest, #155698) [Link] (1 responses)

Sure, you might need to resort to explicit memory fencing when you want to ensure sequential consistency, but this is not what volatile is about.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Apr 4, 2022 2:54 UTC (Mon) by ncm (guest, #165) [Link]

Being absolutely clear: "volatile" is completely useless for synchronizing between threads. If you have been relying on volatile for synchronization, you are shipping bugs.

Unless you are pinging hardware registers, "volatile" is doing absolutely nothing for you, no matter how it looks to you in Godbolt. To synchronize between threads, you need memory fences, which are expressed as "atomic" types in C and C++. If you are using atomics, there is exactly zero value in adding "volatile". If you are not using atomics and not pinging hardware registers, there is exactly zero value in adding "volatile". "Volatile" appearing in code not part of an OS driver reliably indicates bugs.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 22:38 UTC (Tue) by edeloget (subscriber, #88392) [Link]

> > Your C program typically doesn't know about nor can control how cache gets flushed and sync'd.

> The processor knows, though. All the compiler needs to do is not to register-optimize and emit
> memory access instructions instead, the processor takes care of maintaining cache coherence.

That would not work.

First of all, the obvious: it would fail on the multi-processor (physically separated) case because one thing a processor knows is not always known by the other guy. That would require N-to-N communication between the processors - and it would be a nightmare on systems where you have many nodes (up to 6x64 cores) that can share the same physical memory, such as the Chinese Sunway supercomputer.

And then it would be reeeeeeeeaaaaaaaaaaly slow. The reason why cache works this way is that the processor don't even try to find out if the underlying memory has changed (on a load) or if it should change (on a store). Because it does not check anything, it's fast. If you start to factor in multiple checks then you'll hit a performance wall quite soon.

That's exactly why we do this only when performing an atomic operation: we are willing to pay the performance cost in exchange for the information. This is not something we want to do on every load or store. And that's exactly why processors don't do it unless we explicitely tell them to do it. The application (either the OS or a user space program) knows when it shall make an atomic load or store. The processor cannot know it in advance and unless you make an explicit pledge the compiler cannot know either.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 25, 2022 14:11 UTC (Fri) by darthscsi (guest, #8111) [Link]

> processor takes care of maintaining cache coherence

The problem here is cache coherence is a broad term and what different architectures do is vastly different. Multithreaded or multiprocess code expose the cache coherence semantics of the architecture to the programmer. This is not true for single threaded coded. Programmers, especially x86 programmers, try to extrapolate what cache coherence does from single threaded execution, which is not a correct thing to do. Volatile effective forces the compiler to make loads and stores (but doesn't order volatile and non-volatile accesses, which is why old code had the iconic empty asm block marked as writing all memory), but it does nothing to protect you from the architecture's memory model. x86 has a strong memory model, which has fewer classes of behavior which don't match single threaded code, if you constrain the compiler, but that doesn't mean they don't exist, the ISA has locked prefixes for a number of instructions for a reason.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 8:47 UTC (Tue) by metan (subscriber, #74107) [Link] (1 responses)

Actually volatile must be used for any global variable modified from a signal handler. The most common pattern is to have volatile sig_atomic_t global flag set by the signal handler. I bet this is the most common use for volatile in userspace.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:25 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Fair. AIUI POSIX promises only that this will work for volatile sig_atomic_t and there's no promise this will work for variables with another type even though in practice int will work on real hardware.

This all pre-dates a formal memory model, but it is promised in POSIX and so you are indeed welcome to rely on it on a POSIX system. Like making errno work the way the standard says it should, on modern systems this involves a considerable amount of extra lifting for your compiler and C library, but that work is done and so yes you might as well rely on it.

There's a lot of low-level code out there actually banging on MMIO far from any POSIX system and MMIO is, in fact by my understanding where volatile starts out (first C compilers are too naive to eliminate duplicate stores/ loads, as the optimiser improves it elides enough apparently useless loads and stores that now the device driver doesn't work, volatile qualifier tells the compiler not to optimise the loads and stores and now the device drivers work properly again), so if I was a betting man I might take the other side of your bet.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:51 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> The standard says that register is merely a hint [which today your compiler almost certainly ignores], that it be a good idea to put this variable in a register and serves no other purpose despite the restriction.

I know but there are few cases where it's still used. Trying to get the pointer from a variable declared as register will be instantly refused (which is great). Declaring a global variable with register (you're forced to indicate what register) will allow the compiler to optimize some operations because it knows the variable cannot change.
But I agree these are almost exceptions to the general rule that the compiler doesn't care much anymore.

> Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO.

The primary usage is for signals, even before MMIO. Userland code needs to use volatile and is certainly not fiddling with MMIO in general.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 18:00 UTC (Tue) by njs (subscriber, #40338) [Link]

> Seems like the language is starting to face the real world of computers, in which absolutely nothing distinguishes a pointer from an integer, both are used and stored interchangeably in registers, it's only how they are _used_ that makes us consider one is arbitrarily a pointer or an integer.

I think it's exactly the opposite: at the language level, C/C++/unsafe Rust all say that you're allowed to convert back and forth between integers and pointers, because the language designers had the same intuition you did – that's how the machine actually works, so it'll be fine.

But there are two problems:

- that's not actually how all machines work (like this exotic CHERI thing, or old-school segmented architectures)

- more importantly, even on common ISAs like x86 and ARM, it turns out that if you want a decent compiler, your front end needs to target a higher-level virtual machine where pointers *aren't* just integers. Of course they'll eventually get lowered to integers, but if you do that too early then it destroys your ability to do optimizations. So the status quo right now is that all compilers *actually* treat integers and pointers as fundamentally different, and they do it using a bunch of ad hoc heuristics that were never written down and the compiler engineers have been gradually realizing are actually incoherent and busted, even if they *mostly* work in practice.

So the problem is: how do we change the language and the compiler so that the code is efficient *and* the compiler rigorously implements the language semantics *and* the language semantics are understandable without a phd. And this means the language semantics need to treat pointers and integers as fundamentally different, while still giving enough tools to do all the weird pointer tricks you need in real systems.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 2:45 UTC (Tue) by atnot (subscriber, #124910) [Link] (16 responses)

I think the fact that it is even feasible to discuss changes like this is testament to the benefits a somewhat more restrictive but also more rigorously defined system can have. It's very hard to imagine, for example, the likes of C pointers being reworked to give them better formal properties.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 5:04 UTC (Tue) by jhoblitt (subscriber, #77733) [Link] (4 responses)

It is also relevant that there are not billions of lines of Rust code which have been "working" for decades. Languages tend to become ossified as the user base grows.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:01 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (3 responses)

Avoiding this problem as much as possible is the goal of Rust's Editions system, even for Unsafe, and that's called out in this article. Rust has seen the last of big sweeping changes like garbage collection, but Editions enable it to be more agile than languages like C or C++ without the trauma associated with something like Python 3.x

Importantly Editions respect time's arrow. You write new code the new way, and your old code is unchanged. 10M lines or 10B lines doesn't matter, you aren't required to touch any of it. But Editions change how people think about what's possible and that means both that more adventurous changes are considered (knowing Editions might make the change practicable), and so often changes which wouldn't have been conceived at all without editions, ultimately turn out not to be incompatible and so the benefits accrue to everybody, not just on new editions.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 16:40 UTC (Tue) by jhoblitt (subscriber, #77733) [Link] (2 responses)

Supporting multiple different language versions is not a Rust innovation. I don't know what language can claim the original invention but perl 5 had `use 5.024_001;` in the 90s.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:01 UTC (Tue) by khim (subscriber, #9252) [Link]

Technically nothing in Rust is Rust innovation and most ideas it uses were already old when it was conceived. Heck, it was presented to the world with words technology from the past come to save the future from itself!

But most compileable “mainstream” language are based on ideas so ancient that even these, pretty old and well-tested ideas are looking like some kind of radical revelation to C/C++/ObjectPascal/etc developers (Swift took some of these ideas, though).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 24, 2022 7:28 UTC (Thu) by roc (subscriber, #30627) [Link]

Editions aren't quite the same as multiple language versions. A significant amount of effort is required to ensure that a module using edition X links with and works correctly with any module using edition Y. This somewhat constrains what changes are allowed between editions, but it's a super important property.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 7:16 UTC (Tue) by wahern (subscriber, #37304) [Link] (3 responses)

> It's very hard to imagine, for example, the likes of C pointers being reworked to give them better formal properties.

The very topic of discussion is literally evidence to the contrary. C already has intptr_t precisely to avoid the very problem Rust now has. (Relatedly, intptr_t is *optional* in Standard C, understanding that there might be some systems where data pointer to integer conversions aren't supportable; and Standard C doesn't support function pointer to integer conversions at all.) Moreover, FreeBSD has already been ported to CHERI, so claims that "real-world" C code is too riddled with non-standard pointer to integer conversions isn't very persuasive, particularly relative to Rust.

IIRC, porting the entire POSIX API to CHERI required only two significant changes: dlsym and signals. Both are areas where POSIX (much like Rust) required assumptions that Standard C doesn't. There was some ugliness related to memcpy, but Rust takes memcpy abuse to an entirely different level.

While C is far from the ideal language for a memory capability system, it certainly was more prepared for it than Rust. It's not surprising, though, as Rust was largely designed to workaround the lack of ABI- or ISA-enforced memory protections, whereas that possibility has always been at the back of the minds of C committee members. If you assume those things aren't on the horizon (and it's still not a given will see commercial success, let alone ubiquity), playing fast-and-loose with pointer types under the hood is an easy simplification. If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 17:24 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

The point which you claim has been contradicted is that it's hard to imagine - "The likes of C pointers being reworked to give them better formal properties"

But you don't present any evidence of such a reworking, only that people have managed to run some C software on CHERI which of course you'd expect since CHERI has been under development for some time explicitly to run C software. Here's an excerpt from Cambridge's description of CHERI, "The CHERI memory-protection features allow historically memory-unsafe programming languages such as C and C++ to be adapted to provide strong, compatible, and efficient protection against many currently widely exploited vulnerabilities". Nothing in there about formal properties, no proposals to the ISO committee, instead they are being pragmatic, what choice do they have after decades of C programmers resolute disinterest.

> If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.

If "something like it" managed to make the vague semantics of C's logical machine better match the reality of a modern computer by adjusting the computer instead, perhaps you'd even be right. Maybe if this had happened in the 1990s, the elevator Graydon was annoyed by in 2005 would have actually worked.

CHERI is a long way from this fantasy, many grave C problems are orthogonal to CHERI but are completely solved in (safe) Rust. Which doesn't make CHERI a bad idea, it just highlights that Graydon's problem wasn't something a lot the lines of "there's this one thing about C I don't like, so I guess I will write an entirely new programming language" but rather that systematically none of the useful lessons of past decades of programming language theory had been adopted into systems programming languages people actually use.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:06 UTC (Tue) by excors (subscriber, #95769) [Link]

> Nothing in there about formal properties, no proposals to the ISO committee, instead they are being pragmatic, what choice do they have after decades of C programmers resolute disinterest.

They appear to have plenty about formal properties at https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ch... , including papers like "Exploring C Semantics and Pointer Provenance" which adds CHERI-like semantics to a subset of C, based on (if I understand it correctly) the Cerberus tool which carefully transforms C programs into a 'Core' language that makes the memory model explicit and has well-defined operational semantics. The Core code can then be analysed for pointer provenance violations etc. And https://www.cl.cam.ac.uk/~pes20/cerberus/ lists many proposals submitted to ISO by that research group. (Of course they're still a long way from a complete semantics for C, despite working on this for well over a decade with many PhDs, so it's far from a solved (or even solvable) problem in general.)

(Hmm, actually Cerberus seems to be somewhat more relaxed than CHERI, because it doesn't require you to use intptr_t. See https://cerberus.cl.cam.ac.uk/?short/2eaa24 , select "Model > Integer provenance (PVI)", "Search > Random", and it complains of undefined behaviour when dereferencing the pointer, because it gets understandably confused about provenance. But comment out lines 9-10 (which are a noop in regular C) and it works okay, because it can still track provenance through the cast to long and back. If you step "Forward" enough times then you can see the allocation number associated with each pointer.)

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:30 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

C arguably solves this problem by ignoring it. The standard already says it's illegal to even evaluate an invalid not-null not-one-past-the-end pointer, regardless of whether you dereference it, and integer types are similarly allowed to have trap representations, so you just declare "all the weird CHERI stuff" to be UB or implementation details that "portable" code can't rely on, and then you've "ported" C to CHERI by making CHERI look like just another architecture. The most you get out of this is "If your program otherwise would have performed certain very specific kinds of UB, on CHERI it will *probably* trap at runtime instead." "Probably" because who knows what the optimizer will do. Contrast this with the much stronger guarantees you get out of safe Rust, where it will fail at compile time, every time.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:37 UTC (Tue) by uecker (guest, #157556) [Link] (6 responses)

There is a proposed technical specification which formulates this
for C and comes with precise formal semantics.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (3 responses)

That paper does indeed acknowledge the necessity of provenance in C (and C++). But what they're doing is patching the hole in these standards - a hole made about twenty years ago, whereas the Rust article is about something a little different - that hole never existed in Rust. In _unsafe_ Rust the nomicon tells you already that if you violate the provenance rules that's undefined behaviour. Unlike safe Rust, but just like C, unsafe Rust doesn't _prevent_ you from picking some random bytes and insisting they're definitely a pointer to something, however you now have incoherent nonsense, don't do that (on CHERI you fault immediately in unsafe Rust or C if you do this).

Aria proposes that usize - Rust's built in unsigned integer type that's typically 64-bit on a modern computer - should formally be the same size as an address _not_ the same size as a pointer as it is defined today. As a side effect, something desirable (to me at least, but I believe others too) falls out, while we're acknowledging that a pointer isn't just an integer with intent, we abolish the (ab)use of as casts to turn one into the other. Instead the programmer is expected to write what they meant, e.g. ptr.with_addr(address) gets you a pointer (maybe 129-bits) made from an address (maybe 64-bits) plus your promise that what you are doing is OK. Did you lie? Same rules as before, now your program is meaningless.

The C proposal can't go around adding methods to pointers, not least because C doesn't have methods and if it did it wouldn't have them on pointers, it just changes the formal semantics of the language to acknowledge the practical need for provenance. Existing correct C will remain correct, the TS just says why it's correct (or rather, why other seemingly reasonable C that doesn't work is not correct).

Also I expect that the committee will nod wisely and say that they don't have time to take this up right now, but please bring it back again next time, which is roughly what it has been doing since at least 2016, if your plan is to wait for them to fix C rather than learn a new language, don't figure on that happening any time soon.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link] (2 responses)

I do not understand why say you the hole does not exist in unsafe Rust. The article seems to indicate there are still problems in unsafe Rust of the same nature as we have addressed in the TS for C.

C could of course as easily add new way to combine a pointer with an address using some other syntax than a method. The problem with this is that it would break existing code (which for Rust somehow seems OK). The other problem is that the in most cases where you need to convert an integer to a pointer you do not have the pointer available, so you simply can not use ptr.with_addr(address). If you had the pointer you could also just to ptr + offset which is the same as ptr + (addr - base_addr), so I do not see how ptr.with_addr(address) solves the same problem.

C has a lot issues, but it also has many advantages: Widely supported, long-term stability, many existing tools, fast compilation, low complexity, emerging formal semantics, etc. And yes, it will take a long time fixing its many issues. It will also take a long time before Rust is ready (the long compile times and lack of stability rule it out for me at this time) and it is already too complex for my taste.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 15:10 UTC (Wed) by farnz (subscriber, #17727) [Link]

There are two important differences between Rust and C that make breaking changes in handling raw pointers more palatable on the Rust side:

Use of raw pointers in Rust is already gated behind unsafe, and Rust style says to keep the use of unsafe to a minimum. This allows you to use data in papers like this ACM paper on Unsafe Rust to judge the maximum blast radius of changes - the data supports the idea that at most 25% of Rust code could be affected (as around 75% of published crates contain no Unsafe Rust), and that 3% of Rust code would be a good estimate of the amount of code affected by changes to raw pointers (about 10% of Unsafe Rust deals with raw pointers). In contrast, because of the nature of C, it's harder to tell how much C code is likely to be affected by any given change to pointer semantics.
Rust's module system allows you to have maintained legacy code in an older edition and modern Rust in the same binary - I can link Rust 2015 code with Rust 2027 code, and the compiler will give the Rust 2015 code the semantics that went with Rust 2015, while giving me modern semantics for Rust 2027 code. #include in C means that I can't clearly delineate code that has modern semantics from code that doesn't, because some code has to have the "right" semantics whether it's #included into a compilation unit that has C99 semantics or whether it's #included into a compilation unit that has C27 semantics.

Of these, I think the former is the hard one to overcome; fixing the latter is something that can be done by a sufficiently smart C standard committee and compiler implementation team, while the former is about gathering statistics easily on which code might be affected.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 19:10 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

I assume from your username combined with the fact you phrase it as "We have addressed" you are Martin Uecker. I don't agree that the problems Aria talks about are "of the same nature".

In C the "hole" is because the committee admits that some things (take an arbitrary integer you got down the back of the sofa, cast that address into a pointer, dereference it) are not in fact required to work (even though the address was somehow "correct"), but hasn't (unless your work gets into the actual standards text) fixed the standards document to explain provenance so why these aren't correct is left unstated. It is not possible to understand why some example C doesn't actually do what you might intuitively expect based on the standards document today.

Rust has the notion of provenance already, and the nomicon (the guide for unsafe Rust) tells you about this danger. You can't go around casting integers you found down the back of the sofa into pointers (in unsafe Rust, as in C, you aren't *prevented* from doing this, but that doesn't mean it will work). So that hole is fixed. But this also throws a lot of babies out with the bath water. Aria shows some examples where it seems reasonable to suppose you didn't break the rules, but for that to be true usize on CHERI ends up 128-bits which is pretty crazy, and also examples where it's at least very difficult to reason about whether you broke the rules, which is scarcely better than there not being any rules.

The proposals aim to fix this so that you can more often write code that everybody (including the compiler) agrees does not break the rules, and actually does what you intended.

Of course Aria's idea wouldn't break existing code, unsafe Rust from today (say Rust 2021) would potentially be invalid on CHERI hardware with 64-bit usize if it does naughty things with pointers, but that's not worsened by this change. It works today on typical hardware, and that would stay true. Instead new code (for say, Rust 2027 edition) in this hypothetical uses the new API and trying to write things the old way in that edition doesn't work.

In most cases (and I don't see any reason this should be an exception) you can optionally do things the new way in old Rust editions, but doing them the old way in new editions is either warned against, or outright forbidden (won't compile) as appropriate. As I wrote above this respects time's arrow, your old code stays working.

As to why you'd want ptr.with_addr(address) yes you're correct that this is similar to using the existing pointer offset methods, and yes the whole point is to preserve provenance. Rust already considers provenance in the pointer API (including the offset API), that's not an innovation here. If all you have is an address, you don't have any provenance and so the compiler can't reason about this pointer. I would anticipate that software which wanted to have store some raw addresses and then later conjure valid pointers (thus with provenance) from them would keep an appropriate neutral pointer around which could supply suitable provenance via ptr.with_addr(address). If you just mean why add APIs which are convenient rather than necessary, that's normal in Rust, Rust thought "ready or not".contains("ready") was a reasonable expression from 1.0, while it took C++ until 2020 to admit that programmers want contains(x) even though it is logically equivalent to find(x) != npos.

Though maybe I misunderstood badly - Does the N2676 document propose that somehow C will be able to just conjure correct, provenance intact pointers from only an address? Where does the provenance come from?

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:10 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

It's one of half-dozen attempts. It's not yet fully finished and it's not clear if it even can be finished and adopted.

Rust's issue lies precisely with the fact that there are no C or C++ memory model which can be used to write code which would then be actually compiled (yes, there are some memory model specified by standard, but we know that compilers are happy to break certain valid programs based on that memory model, examples are actually in the proposal you are linking).

If there would have been some memory model which would have matched what the actual compilers are doing unsafe Rust would have just used that. But there are nothing, just a DR260 resolutions which prompts compiler developers to develop something and include it in the standard… and lots of handwaving.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link]

It is an attempt which addresses provenance and has been agreed on to become a technical specification. If compilers support it (they already do to a large extend - modulo some known optimizer bugs and some cases where they already do not follow the standard) and no serious objections come up this will likely be adopted to be the C semantics for provenance.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 20:12 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (2 responses)

Instead of treating a tagged pointer as conversion to and from an integer, it seems it would be better to treat it as a pair with a pointer and a tag and treat the way these two are encoded into one field as a storage optimization the analysis doesn't have to care about. Then we no longer have to consider that any integer can alias any pointer. We just have pointers and tagged pointers. If a pointer must be aligned to a multiple of 2 and you have one tag bit, it can be free.

This would be similar to the optimization that uses just a pointer for Option<ptr>.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 20:51 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

Does that mean the implementation of tagged pointers would have to be baked into the language/compiler, instead of being up to the application? That sounds excessively restrictive for a systems language, since applications may want to implement it very differently - e.g. some JavaScript engines store all values (doubles, integers, pointers, etc) as doubles, packing the tag bits and pointer values into the NaN range. (NaNs have a 51-bit payload, and x86-64 has 48-bit virtual address space (and I think only 47-bit if you exclude kernel addresses, and less if you require alignment), so there's enough space for whole pointers plus several tag bits.) (https://anniecherkaev.com/the-secret-life-of-nan)

There are also other use cases where you might want to do something funny with pointers, like XOR linked lists (a doubly linked list where you store 'prev XOR next' as a single word per node, which makes iteration trickier but saves memory).

I think Rust needs to provide the low-level primitives for applications to safely implement those things however they want.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 24, 2022 21:56 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

I think it would help to distinguish between safe and unsafe Rust here.

Safe Rust already has a notion of "some bit patterns are not used by this type" and can therefore do "clever" things like packing different types together to make a type-safe untagged union. This has all of the advantages of a tagged union or Option/Either type, including compile-time type checking, but with no (or very little) overhead. Most of the time, you can and should try to make do with that, because you want the compiler to yell at you if you forget a case. From the perspective of unsafe Rust, even this might not be flexible enough, and so there does need to be an escape hatch where you can smuggle pointers around in other values, to some extent. But as the article acknowledges, the rules for this are a great deal more restrictive than just "make sure you end up with the same physical bit pattern you started with" - CHERI will trap if you try to do that, and Rust isn't really in a position to prevent CHERI from doing that in the general case.