|
|
Log in / Subscribe / Register

DeVault: Announcing the Hare programming language

DeVault: Announcing the Hare programming language

Posted May 10, 2022 17:00 UTC (Tue) by khim (subscriber, #9252)
In reply to: DeVault: Announcing the Hare programming language by farnz
Parent article: DeVault: Announcing the Hare programming language

> Both are valid readings of this source under the rules set by C89, because C89 states explicitly that the only thing expected to change outside of explicit changes done by C code are things marked as volatile.

No. Calling fread and fwrite can certainly change things, too.

> But in this case, the get_zeroed_page and release_page pair change the machine state in a fairly dramatic way, but in a way that's not visible to C code - changing PTEs, for example.

Yes and no. Change is dramatic, sure. But it's most definitely visible to C code.

By necessity such things have to either be implemented with volatile or by calling system routine (which must be added to the list of functions like fread and fwrite as system extension, or else you couldn't use them them).

Place where you pass your pointer to the invlpg would be place where compiler would know that object may suddenly change value.

> In practice, nobody has bothered being that neat, and we accept that there's a whole world of murky, underdefined behaviour where the real hardware changes things that affect the behaviour of the C abstract machine, but it happens that C compilers have not yet exploited that.

In practice people who are doing these things have to use volatile at some point in the kernel, or else it just wouldn't work. Thus I don't see what you are trying to prove.

The fact that real OSes have to expand list of “special” functions which may do crazy things? It's obvious. In practice your functions are called mmap and munmap and they should be treated by compiler similarly to read and write: compiler either have to know what they are doing or it should assume they may touch and change any object they can legitimately refer given their arguments.

> I find it very tough, within the C89 language, to find anything that motivates the position that *ptr1 != test given that ptr2 == ptr1 and *ptr2 != test - it's instead explicitly undefined behaviour.

No. You couldn't do things like change to PTEs in a fully portable C89 program. It's pointless to talk about such programs since they don't exist.

The only way to do it is via asm and/or call to system routine which, by necessity, needs extensions to C89 standard to be usable. In both cases everything is fully defined.


to post comments

DeVault: Announcing the Hare programming language

Posted May 10, 2022 18:10 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

fread and fwrite are poor examples, because they are C code defined in terms of the state change they make to the abstract machine, and with a QoI requirement that the same state change happens to the real machine. Indeed, everything that's defined in C89 has its impact on the abstract machine fully defined by the spec; the only get-out is that volatile marks something where all reads and writes through it must be visible in the real machine in program order.

But note that this is a very minimal promise; the only thing happening in the real machine that I can reason about in C89 is the program order of accesses to volatiles. Nothing else that happens in the abstract machine is guaranteed to be visible outside it - everything else is left to the implementation's discretion.

And no, the state change is not visible inside the C89 abstract machine; if I write through a volatile pointer to a PTE, the implementation must ensure that my write happens in the real machine as well as the abstract machine, but it does not have to assume that anything has changed in the abstract machine. That, in turn, means that it may not know that ptr1 now has changed in the "real" machine, because it's not volatile and thus changes in the real machine are irrelevant.

And I absolutely can change a PTE without assembly or a system routine, using plain C code; all I need is something that gives me the address of the PTE I want to change. Now, depending on the architecture, that almost certainly is not enough to guarantee an instant change - e.g. on x86, the processor can use old or new value of the PTE until the TLB is flushed, and I can force a TLB flush with invlpg to get deterministic behaviour - but I can bring the program into a non-deterministic state without calling into an assembly block or running a system routine, as long as I have the address of a PTE.

And there's no "list of system routines" in C89; the behaviour of fread, fwrite and other such functions is fully defined in the abstract machine by the spec, with a QoI requirement to have their behaviour be reflected in the "real" machine. By invoking the idea of a "list of system routines", you're extending the language beyond C89.

You're making the same mistake a lot of people make, of assuming that the behaviour of compilers in the early 1990s and earlier reflected the specification at the time, and wasn't just a consequence of limited compiler technology. If compilers really did implement C89 to the letter of the specification, then much of what makes C useful wouldn't be possible; provenance is not something that's new, but rather an attempt to let people do all the tricks like bit-stuffing into aligned pointers (which is technically UB in C89) while still allowing the compiler to reason about the meaning of your code in a way compatible with the C89 specification.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 19:01 UTC (Tue) by khim (subscriber, #9252) [Link]

> By invoking the idea of a "list of system routines", you're extending the language beyond C89.

Which is the only way to have PTEs in C code.

> And I absolutely can change a PTE without assembly or a system routine, using plain C code; all I need is something that gives me the address of the PTE I want to change.

If you have such an address then you have to extend the language somehow. Or, alternatively, don't touch it.

> By invoking the idea of a "list of system routines", you're extending the language beyond C89.

Of course. Because it's impossible to write C89 program which changes the PTEs, such a concept just couldn't exist in it. You have to extend the language to cover that usecase.

> If compilers really did implement C89 to the letter of the specification

…then such compilers would have been as popular as ISO 7185. Means: no one would have cared about these and no one would have mentioned their existence.

> If compilers really did implement C89 to the letter of the specification, then much of what makes C useful wouldn't be possible

Yes. But some programs would still be possible. Programs which do tricks with pointers would work just fine, programs which touch PTEs wouldn't.

> provenance is not something that's new, but rather an attempt to let people do all the tricks like bit-stuffing into aligned pointers (which is technically UB in C89)

Citation needed. Badly. Because, I would repeat once more, in C89 (not in C99 and newer) the notion of “pointers which have the same bitpattern yet different” doesn't exist. If you add a few bits to the pointer converted to integer and then clear these same bits you would get the exact same pointer — guaranteed. The fact that these bits are lowest bits of converted integer is implementation-specific thing, you can imagine a case where these would live as top bits, e.g. So yet, that requires some implementation-specific extension. But pretty mild and limited.

Yes, provenance is an attempt to deal with the idea of C99+ that some pointers may be equal to others yet, somehow, still different — but that's not allowed in C89. If two pointers are equal then they are either both null pointers, or both point to the same object, end of story.

Sure, it makes some optimizations hard and/or impossible. So what? This just means that you cannot do such optimizations in C89 mode. Offer -fno-provenance option, enable it for -std=c89, done.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds