Quotes of the week

Posted Jun 2, 2021 13:59 UTC (Wed) by khim (subscriber, #9252)
In reply to: Quotes of the week by patha
Parent article: Quotes of the week

> Well, to turn something that is "unhandled" into "handled", I assume you need some sort of specification how to handle it.

What about the case where you turn something that was “handled” into “unhandled”? Let's consider concrete example. This code:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int *p = (int*)malloc(sizeof(int));
    int *q = (int*)realloc(p, sizeof(int));
    if (p == q) {
        *p = 1;
        *q = 2;
        printf("%d %d\n", *p, *q);
    }
}

Note that C89 quite explicitly allows that code and says the only possible output is “2 2”. This is because realloc there does the following: The realloc function changes the size of the object pointed to by ptr to the size specified by size. It's still the same object, both pointers point to it, so why would they behave differently?

C99 changed that. Now realloc works differently: The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size.

Why would that be important? We compared pointers, thus they should behave identically, right? No. There is a decision of WG14 committee which says literally the following: after much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean — and then short explanation of how standard should be changed to make that program illegal.

Note: they haven't said that standard actually means that today. Nope. Provenance insanity is not yet part of any standard. Not C99, not C18 and not even C++20! Yet compiler writers think they are entitled to apply these rules (which are, apparently, area of research because compiler writers still couldn't invent usable set of rules which you can use to write correct programs) to old, C89 programs.

Nice, huh?

> Well, a C compiler basically implements C.

Except today it's not true. C compiler writers implement basically whatever they want to implement and reserve the right to retroactively change rules of language. Without providing options which may bring back old behavior (-fno-builtin-realloc works today, but apparently there are no guarantee that it would work in the future).

> That would be an option, but I assume the smoothest path forward is to continue proposing "C language extensions/options", like -fno-strict-aliasing and -fno-strict-overflow to GCC, for specific instances of undefined ("unhandled") behavior in the C language. If considered useful enough to be the default option for the whole C community, it can then be brought up to the C committee.

I think at this point it's, basically, pointless. When I explicitly asked some clang developers about something like -fno-provenance option the answer was: provenance is something LLVM *violently* believes in, at the level of alloca, malloc, and similar intrinsics scribbling provenance information all over LLVM's internal representions. Even if you could turn it off, I doubt it would fix all of your miscompilations, since this is a fundamental building block of LLVM's IR. Like I said before: although provenance is not defined by either standard, it is a real and valid emergent property that every compiler vendor ever agrees on.

Note that not defined by either standard yet real and valid emergent property part. I think after answer like that… it's, essentially, pointless, to bring anything to a C committee. What's the point if said committee would pick something they like, not something that makes, you know, possible to write anything in that language?

We are more-or-less stuck with GCC extensions for the foreseeable future and I think it's good idea to adopt Linus stance. Essentially: “I couldn't forbid you to use clang but I don't consider “clang miscompiles that code” a valid reasoning for any change in any project”.

This is unfortunate because the only language which tries to address these issues in practically usable way, Rust, is basically, tied to LLVM currently. gccrs looks quite active novadays, though, thus there's hope. But C and C++… they should be declared “unfit for any purpose”, sadly. Certain specific implementations probably can be used, maybe, but there are zero hope of getting sane cross-compiler treatment. That ship have sailed.

Quotes of the week

Posted Jun 2, 2021 19:17 UTC (Wed) by Wol (subscriber, #4433) [Link] (6 responses)

> Note that C89 quite explicitly allows that code and says the only possible output is “2 2”. This is because realloc there does the following: The realloc function changes the size of the object pointed to by ptr to the size specified by size. It's still the same object, both pointers point to it, so why would they behave differently?

> C99 changed that. Now realloc works differently: The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size.

But anything that relies on your interpretation is inherently broken ...

int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, 16 * sizeof(int));

If malloc has only allocated a block 64 bytes in size for p and all the metadata it needs to manage it, it is just not possible to resize it such that q == p. Either your definition of realloc is correct and it has to return a failure (q == null), or it has to allocate a larger amount of space elsewhere and move the contents.

So regardless of whether it's correct, using your interpretation, and using realloc to grow the allocated space, is a pretty stupid idea if you assume it's "just going to work". Most people assume that malloc/realloc won't return a failure. Under your interpretation, it would be a common event.

Cheers,
Wol

Quotes of the week

Posted Jun 2, 2021 20:39 UTC (Wed) by khim (subscriber, #9252) [Link] (4 responses)

> But anything that relies on your interpretation is inherently broken ...

How and why? from the same C89 standard: the realloc function returns either a null pointer or a pointer to the possibly moved allocated space.

Yes, object can be moved by realloc and, as you have correctly noted sometimes it have to be moved, sure. But we have established that it haven't happened in our program with a simple check if (p == q). If that's the same object and if it wasn't moved — what makes it possible to treat pointer to it as “invalid”?

> Either your definition of realloc is correct and it has to return a failure (q == null), or it has to allocate a larger amount of space elsewhere and move the contents.

Of course realloc can move the content. But all standards quite explicitly say that object can be left in place. Even C18 says the following: the realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object has not been allocated.

This note (it's part of standard and was there since the invention in UNIX, although phrasing changed over time) just begs one to special-case that situation, right? But nope: apparently “provenance rules” (which are, once again, not part of C99, not part of C11, not part of C18, not part of any existing C++ standard and, apparently, are not yet even finalized) give the compiler the right to ~~screw the programmer~~ optimize that code.

One can, of course, play weasel-words with phrase “the same value as a pointer to the old object” because standard defines “the same value” in the following way: Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

And yes, I have heard from one clang-developers that, indeed, after call to realloc old pointer becomes pointer to one past the end of zero-sized array object! And using it is thus “undefined behavior”.

I couldn't even properly answer anything to such reading of the standard! Since I'm not Linus and I don't know enough English swear words.

I guess at some point these rules would be finalized and would become, retroactively, part of C89/C99/C11/C18/C++98/C++11/C++14/C++17/C++20… but I can not see how someone can write any code in a language whose rules can be retroactively changed half-century after they were written.

Quotes of the week

Posted Jun 3, 2021 12:55 UTC (Thu) by HelloWorld (guest, #56129) [Link] (3 responses)

Why do you say the rules were retroactively changed? I don't think there ever was a C standard that defined the behaviour of that program.

Quotes of the week

Posted Jun 3, 2021 13:43 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

According to all existing C standards it's well-defined program. If you don't subscribe to the ridiculous notion that somehow realloc turns existing pointer to existing object (which is not even part of any array) pointer one-past-the-end of array (specifically invented to ~~screw the programmer~~ optimize code better). In fact in that same provenance proposal this is noted quite explicitly (page 17, “pointer equality comparison and provenance” where they talk about how that part should be changed to make existing programs which are fully standard-compliant invalid — all to make language “better”, of course).

Situation with C++ is a bit more complicated. C++ have the notion of “pointer safety” which could have invalidated that program on some compilers… except all existing compilers use relaxed pointer safety. So that part couldn't justify what they do.

If you want to say that program is not guaranteed to print "2 2" then you are absolutely correct: empty output it's also a valid possibility (if realloc would actually move object). But this program, when compiled with a compiler which correctly implements existing standards couldn't print "1 2" — which it does on most compilers (clang/msvc/icc, only gcc produces correct result). And problem here is not with the fact that compilers are buggy (all software may have bugs) but with the fact that prevailing attitude here is “we screwed you up?… and we haven't yet added enough “undefined behaviors” to the standard to justify miscompilation of a perfectly valid program?… oh, that's too bad, let us add more “undefined behaviors” to the standard and apply them retroactively to make it possible to ~~screw you up more~~ optimize better… and no, we wouldn't give you the flag to make it possible to correctly compile correct programs”.

Quotes of the week

Posted Jun 3, 2021 16:30 UTC (Thu) by excors (subscriber, #95769) [Link] (1 responses)

Would you expect

int *p = malloc(sizeof(int));
free(p);
int *q = malloc(sizeof(int));
if (p == q) {
  *p = 1;
  *q = 2;
  printf("%d %d\n", *p, *q);
}

to have well-defined behaviour?

I'd expect no - 'p' and 'q' are intuitively pointers to different objects, regardless of whether they happen to be numerically equal, and one of those objects is being accessed after it was freed.

C89 (or at least a version I can find online) says "The pointer returned [by calloc/malloc/realloc] [...] may be assigned to a pointer to any type of object and then used to access such an object in the space allocated (until the space is explicitly freed or reallocated). [...] The value of a pointer that refers to freed space is indeterminate". So I think that largely matches my expectation - the value of 'p' is indeterminate after the 'free', so the 'p == q' is undefined behaviour, and the '*p' is undefined behaviour (because the space has been explicitly freed so 'p' can't be used to access the object any more).

...except if the second malloc returns the same "space" as the first malloc, is 'p' still indeterminate now that the space it was referring to is no longer free? Maybe the 'p == q' is okay (though the '*p' is still undefined because it lost its ability to access the object after the space was first freed).

Anyway, when you replace the free+malloc with realloc, it sounds like you expect that to be well-defined behaviour in C89?

I think the problem is ambiguity with terms like "object" and "space" and "reallocated". C89 indicates realloc() returns the same object with a new size (though in other places it seems to directly contradict that and says it's a new object). But can that be the same object in the same space as the original malloc(), or different space, or logically different space but the pointers might happen to be equal? If you call realloc() with the object's current size, has it been reallocated for the purposes of "until the space is explicitly freed or reallocated"?

I don't think it's fair to blame C99 for changing the semantics here - C89 seems very unclear and ambiguous, and C99 was the first time it was actually specified semi-properly (by having a more precise definition of the lifetime of an object with the term "deallocated" (instead of "freed or reallocated"), and saying realloc() always "deallocates" the old object).

Quotes of the week

Posted Jun 3, 2021 18:56 UTC (Thu) by khim (subscriber, #9252) [Link]

> Would you expect

int *p = malloc(sizeof(int));
free(p);
int *q = malloc(sizeof(int));
if (p == q) {
  *p = 1;
  *q = 2;
  printf("%d %d\n", *p, *q);
}

to have well-defined behaviour?

Good eye! You have caught the very important mistake: I have used pointer which I wasn't supposed to use. The correct way to write that code would be this:

int *p = malloc(sizeof(int));
free(p);
int *q = malloc(sizeof(int));
if (memcmp(&p, &q, sizeof(p)) == 0) {
  *p = 1;
  *q = 2;
  printf("%d %d\n", *p, *q);
}

Unfortunately that doesn't change anything: you still get the same “1 2” answer from the usual culprits.

> ...except if the second malloc returns the same "space" as the first malloc, is 'p' still indeterminate now that the space it was referring to is no longer free? Maybe the 'p == q' is okay (though the '*p' is still undefined because it lost its ability to access the object after the space was first freed).

Actually it's the other way around: p == q is not Ok (but you can use memcmp instead), while *p is fine (if you used memcmp). It wouldn't be fine only if you introduce “pointers provenance”. Which is not part of any existing C and/or C++ standard and most definitely not part of C89 (as I have noted elsewhere C++ have optional feature similar to “pointers provenance”, but only in “strict pointer safety” mode which none of the existing compilers support). Otherwise the only way for two different pointers to be equal is to have them point to the same object or, as special corner-case, to have one of them point to one-past-the-end-of-an-array (and there are no arrays in this example).

C committee was actually asked about more-or-less that issue (what should happen when one reads value of a pointer passed to free) and the result: after much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean. Note: they haven't said that standard makes it invalid now, no. They said that the fact that this program is well-defined is a problem and standard needs to be changed. And make currently valid programs invalid.

> I don't think it's fair to blame C99 for changing the semantics here - C89 seems very unclear and ambiguous, and C99 was the first time it was actually specified semi-properly (by having a more precise definition of the lifetime of an object with the term "deallocated" (instead of "freed or reallocated"), and saying realloc() always "deallocates" the old object).

I wouldn't call that “semi-properly”. The history here is the following: when C89 was developed there was already a strong push from compiler writers about the need to ~~screw the programmer~~ enable optimizations. And nice (for a compiler writers) scheme was added to the standard draft. Unfortunately it turned up almost impossible to use it for writing real program (as Dennis Ritchie noticed). And language which you couldn't use to actually write programs in is not very useful. Thus C committee ripped out the most egregious parts and only left some small remnants of that attempt in C89.

In C99, C++11 and all subsequent standards these attempts were repeated. It's unclear how many useful optimizations these attempts enabled, but they made it almost impossible to write a correct programs in C/C++. Sooner or later you end up with some kind of security check which compiler would happily remove to “optimize” your code.

This quest is not yet finished, we still have no adequate set of rules there (apparently transformations which are derived from these rules and used in LLVM can easily turn correct program into incorrect one and that doesn't usually happen simply because these transformation are applied in certain order… that would have been really hilarious it weren't so sad), yet programmers were supposed to write programs, back in 1990, which adhere to rules which would be finalized, maybe (there is hope, but no promises right now) around 2030. Does that sound realistic to you?

Quotes of the week

Posted Jun 3, 2021 15:00 UTC (Thu) by khim (subscriber, #9252) [Link]

After reading comment from others I realized that I haven't actually explained what's the issue with that code.

The issue here is not that realloc may return different pointer and then program prints nothing. Sure, it's allowed to do that according to C89, K&R or any other standard.

No, the issues here is that existing compilers make it print 1 2 instead of 2 2 — and that may never happen according to C89.

Quotes of the week

Posted Jun 3, 2021 14:16 UTC (Thu) by rschroev (subscriber, #4164) [Link] (2 responses)

Do you mean that the example program should always output "2 2\n"? Or do you mean that it should either generate output or not, and if it does output something it should output "2 2\n"? I'm asking because I'm not sure how to interpret 'the only possible output is "2 2"'.

In the first case:

I think you're wrong: C89 doesn't guarantee that p == q. The realloc function can legally move the object, even in C89. Look what it says about realloc's return value (at least in the draft of the standard -- I don't have access to the officially released version):

> The realloc function returns either a null pointer or a pointer to the possibly moved allocated space.

In the second case:

I agree, the program looks legal to me and should output either nothing or "2 2\n". Are you saying that behavior is under threat somehow?

Quotes of the week

Posted Jun 3, 2021 14:39 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

> I'm asking because I'm not sure how to interpret 'the only possible output is "2 2"'.

Ah, sorry. I actually forgot about the fact that realloc can, according to the standard, actually return a different address here. In practice all existing implementations return the same one.

Yes, correctly compiled program may return empty output here if, e.g. you realloc just always calls malloc and copies the content. That's not an issue.

The issue is: actual output (that is: 1 2) is clearly invalid.

> I agree, the program looks legal to me and should output either nothing or "2 2\n". Are you saying that behavior is under threat somehow?

Clang, MSVC and ICC all produce "1 2\n" output and they all claim that they can do that because they plan to add new set of undefined hehaviors to the C2x standard.

Once again: they miscompile perfectly valid C89 program in a strict C89 mode today and claim that it's fine because it, apparently, violates set of rules which they plan to add to C2x (and which is not yet even finalized).

And the explicitly refuse to provide any flags which can make it work (although -fno-builtin-realloc works, but it's extremely unituitive and non-obvious

Quotes of the week

Posted Jun 3, 2021 15:13 UTC (Thu) by rschroev (subscriber, #4164) [Link]

OK, thanks, that makes it clear what you're talking about. I agree this is bad.

And now I see that you made that clarification already elsewhere in the thread. Sorry, my bad.