The perils of pinning

Posted Sep 18, 2022 18:37 UTC (Sun) by Wol (subscriber, #4433)
In reply to: The perils of pinning by foom
Parent article: The perils of pinning

> The point is that you _cannot_ do most of the stuff Linux does in C.

The point is NO formal language will let you do most of the stuff Linux does. (Of course assembler will let you, because assembler has no formal model.)

Linux is an operating system. It is MEANT to talk to other devices, with other controllers, that do their own thing. NO formal language can cope with other systems doing things behind its back.

So basically, the less "unsafe" code there is the better, because safe code means the compiler has proven that the code will work as intended (barring programmer "sillies"). But at the end of the day, there will still need to be a lot of "unsafe" code, because ...

If your code is reading from the receive buffer of a network card, you really do not want a language that assumes memory only changes when you write to it! What was that about GCC assuming reading from uninitialised memory is UB and can be optimised away? Bang goes your networking!

Cheers,
Wol

The perils of pinning

Posted Sep 19, 2022 9:02 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (2 responses)

We can't ever say the code will work as intended because it's easy for a human programmer to write code that's simply not what they intended. I wrote a loop the other day which would read the next URL, add what was found there to a list, each time checking if the next URL is empty. Except, I forgot to write the part where it updates that next URL based on what was read, so it's actually an infinite loop. My (pair) partner sat with me, wondering what was taking so long for I'd guess two minutes before we realised what I did wrong.

Rust actually reflects what your machine code can do for that network card receive buffer. You can say look, just perform actual fetches for all the "memory".

let recvd = std::ptr::read_volatile<NetworkBuffer>(recv_buffer); /* Rust will bit-wise copy the values in the buffer. */

This function is unsafe, because it fetches some arbitrary memory so clearly you can blow up the world (e.g. unaligned read on an architecture where those are forbidden, or just point it out of bounds), but it's very well defined if recv_buffer actually points at a NetworkBuffer size blob of suitably aligned and addressable "memory" we can load values from, it will emit actual loads for those values and not try to cache them or assume it knows their value or whatever.

That's a contrast from C which has us actually pretend recv_buffer is just pointing at an actual NetworkBuffer with a "volatile" qualifier and so then we can go around doing operations to it, even though in practice that's a bad idea and the only thing we ought to do is copy it somewhere. On a good day that's all the C (or worse C++) does with a volatile, on a bad day you need to guess whether the programmer knew what's really going or whether the code you're reading is full of unintentional races.

There was an effort to get C++ to move towards intrinsics for fetch/ store like Rust rather than C's volatile qualifier hack. But this got some very angry push back, and I anticipate there will not be any further attempts.

The perils of pinning

Posted Sep 19, 2022 18:16 UTC (Mon) by kreijack (guest, #43513) [Link] (1 responses)

> let recvd = std::ptr::read_volatile<NetworkBuffer>(recv_buffer); /* Rust will bit-wise copy the values in the buffer. */

> That's a contrast from C which has us actually pretend recv_buffer is just pointing
> at an actual NetworkBuffer with a "volatile" qualifier and so then we can go around
> doing operations to it, even though in practice that's a bad idea and the only thing
> we ought to do is copy it somewhere. On a good day that's all the C (or worse C++)
> does with a volatile, on a bad day you need to guess whether the programmer knew
> what's really going or whether the code you're reading is full of unintentional races.

I don't think to understood your sentence. But my understood is that you can wrote a
function in C

read_volatile(recv_buffer)

that copy the data to a "volatile" buffer. IIRC When a pointer is passed to a function
the compiler stops any assumption to the pointed data.

The perils of pinning

Posted Sep 20, 2022 8:40 UTC (Tue) by farnz (subscriber, #17727) [Link]

I'm going to stick to C syntax throughout, since I think that's easier for kernel programmers to follow than Rust.

In C, you might have code like:

struct NetworkBuffer {
    struct IpHeader ip_hdr;
    union {
        struct TcpSegment tcp;
        struct UdpSegment udp;
    };
};

volatile * NetworkBuffer buf;

It's then tempting (but often wrong) to write code that does things like buf->ip_hdr.src_addr, when this isn't actually a good idea because of the volatile reads that will be done.

Rust doesn't have a volatile qualifier on storage that changes all codegen accessing that storage. Instead it has the equivalent of void * memcpy_volatile(void * dest, volatile void * src, size_t count); (but using generics from Rust's type system to replace void * and size_t count). Because your only dependable operation on the buffer is to copy out of the shared space that can change underneath you at no notice (imagine that, for example, NetworkBuffer actually lives in memory the far side of the PCIe bus, on the NIC itself), that's what you'll do.

The perils of pinning

Posted Sep 19, 2022 18:11 UTC (Mon) by kreijack (guest, #43513) [Link] (1 responses)

> The point is NO formal language will let you do most of the stuff Linux does.
> (Of course assembler will let you, because assembler has no formal model.)

I think that C++ does...

The perils of pinning

Posted Sep 21, 2022 11:24 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

C++ like C relies on an abstract machine model. The abstract machine doesn't have the attributes of the real machines Linux is written for.

Sometimes the abstract machine's differences from a real machine just have performance consequences, for example most doubly linked list operations look really clever in the abstract machine and have reasonable performance, but this has lousy performance on an actual computer you can buy today because of caches.

But often there are simply practical differences, the abstract model lacks entirely something the real machine has. A higher level C++ application needn't care but the Linux kernel does. For some thing Linux relies on inline assembler, the same thing works in Rust, and the same trick kernels written in C++ use. In other places Linux is relying on semantics which are not offered by the formal language and which likewise are not offered in ISO C++ but do happen to work in the chosen compiler.