C23 reference links
C23 reference links
Posted Jul 24, 2024 5:52 UTC (Wed) by mb (subscriber, #50428)In reply to: C23 reference links by khim
Parent article: GNU C Library 2.40 released
You could never use zero-sized realloc in cross-platform code, because all platforms behave differently.
Just don't use zero-sized realloc and you get portable code.
Before and after the C23 change.
>UB is, essentially, a permission to take any program where it happens and turn it into a pile of goo…
Unless the platform defines it. Which at least Posix does.
Posted Jul 24, 2024 8:37 UTC (Wed)
by khim (subscriber, #9252)
[Link] (9 responses)
Why should they behave differently? C23 may easily specify one way that is supported and make others behave in the same way. Just don't ever do any mistakes, never vioplated any of hundreds of rules that standard specifies and everything would just be peachy? We tried that for half-century. It doesn't work. Humans are not made to work that way! Nope. It doesn't do that. POSIX, these days, very explicitly says this volume of POSIX.1-2017 defers to the ISO C standard. Which means that yes, eventually, POSIX would be redefined to ensure that compiler can break user's programs willy-nilly. We have precedent, after all. Originally POSIX was saying that After a few years that note that made these programs valid was still there, but text that says that this volume of IEEE Std 1003.1-2001 defers to the ISO C standard was added and compilers started breaking valid programs. Years later, an actual text of POSIX was changed to ensure that program-breaking compilers are not invalid. Given that history I assert that C23 intentionally and explicitly changed the standard to break previously valid programs. Including, yes, the ability to use 0 size on POSIX. Note that POSIX defers to the ISO C standard was enough to justify one kind of breakage WRT
Posted Jul 24, 2024 16:56 UTC (Wed)
by mb (subscriber, #50428)
[Link] (8 responses)
They *do* behave differently. That's what the reality looks like.
>C23 may easily specify one way that is supported and make others behave in the same way.
Changing the behavior would very likely break existing non-portable programs.
There are only two realistic options: Make it implementation defined or undefined.
Posted Jul 24, 2024 17:18 UTC (Wed)
by khim (subscriber, #9252)
[Link] (7 responses)
None of these programs are C23-compliant, anyway, so that's a moot point. Or keep it like it was in C17. If developers may cope with two decades of non-decision for important matters then why couldn't they live without sane resolution for some fringe and not too interesting case? And between bad, worse and worst choices… worst was picked — why exactly?
It's really one of the craziest cases of “we have to do something… this is something… then let's do it!”
Posted Jul 24, 2024 19:42 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (6 responses)
Keeping it like it was in C17 is making it undefined behaviour, just with obfuscated language to hide that from people who don't spend all their time following through the implications of standardese. If the C17 definition was OK, then simplifying the language used to define it but keeping the definition the same should also be OK.
It would have been better to redraft the C17 language so that you couldn't select the set of options that make it UB, but that's not what the committee chose to do - in large part (as far as I can tell as a non-attendee) because any attempt to change the allowed set of options upset people who felt that other people were Wrong to implement realloc(_, 0) the way they did. Rather than continue trying to find some way to make it work for two groups who refuse to agree, the wording just got simplified to remove the long chain of logic leading to it being UB, since the two conflicting groups can just as easily define it downstream if they care.
Posted Jul 24, 2024 20:48 UTC (Wed)
by khim (subscriber, #9252)
[Link] (5 responses)
Tell me what gives compiler writer the right to turn this function:
into empty sequence of instructions (without even Only if you explain how the optimization described above is allowed. I coudn't see how C17 may allow that. C23, most definitely, makes it Ok. And, instead, they have picked an approach which would make every single one of these people ask about sanity of people who pushed this approach to the language. Is it really an improvement? Except we already know where that leads: no one would bother to do anything, but compiler writers, few years down the road, would interpret the wording in the most ruinous way possible. That decisions would be funny if not for the past experience with this exact function.
Posted Jul 24, 2024 22:40 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (4 responses)
It's a nasty little mistake in drafting, in paragraph 3 of 7.22.3.5, and clearly a defect. However, the standard only imposes requirements on realloc if memory for the new object is not allocated. It is permissible for realloc to allocate a new object, but not to allocate memory for that new object (since it's zero sized); at this point, the general escape hatch in 3.4.3 is open, since the standard imposes no requirements here, and you've got UB on a technicality. It then returns a non-null pointer, since it allocated an object, and you're in the bad place since the standard doesn't say what to do with the old object.
Now, this is clearly a drafting error and should be fixed by removing "memory" - but, AFAICT, people got angry at the idea of fixing the drafting error without also fixing the required behaviour to match their idea of the "only correct" way to define this. And rather than deal with that, the standards committee chose to simply open the escape hatch wide and say "screw you all".
I personally think this was the wrong decision, but it's the decision the committee made. And it's reflective of the current direction of travel of C - rather than make hard decisions for the benefit of users of the language (since the smaller the area covered by undefined behaviour, the easier it is to work within the language), they've chosen to punt a hard problem at the users and rely on "real programmers not making mistakes". I doubt history will be kind to the C committee over this, since it's clearly a case of "it's easier for us to do this, even if it makes life harder for language users".
Posted Jul 25, 2024 8:42 UTC (Thu)
by khim (subscriber, #9252)
[Link] (3 responses)
Objects is a “region of data storage in the execution environment”, if you don't have a storage then you can't have an object. It's unclear whether zero-sized objects are even possible in C17 (it makes it impossible to have zero-sized arrays or structs, but doesn't say if zero-sized objects are possible to get from The worst you can expect from C17 AFAICS is that That's a bit of strange situation, sure, but it's still not UB. In Rust that's perfectly normal, defined (and often used!) behavior, I don't see how and why C should be different.
Posted Jul 25, 2024 15:11 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (2 responses)
C is different because the standard says that unless it explicitly defines something, that thing is UB. And in this case, by talking about memory in this one clause (something which, unlike a region of storage, is undefined), it infects the whole clause with UB if you're sufficiently motivated to find UB in a program.
If they'd not thrust that word "memory" in, this would be a full definition; but by adding an undefined term, the licence to treat the whole clause as UB is introduced.
Posted Jul 25, 2024 15:37 UTC (Thu)
by khim (subscriber, #9252)
[Link] (1 responses)
Yes, but that doesn't include terms. For unknown terms it defers to ISO/IEC 2382. Are you sure it's not defined? I suspect it's common enough term than ISO/IEC 2382, somehow. No, unknown terms like that. They may be interpreted differently, because not all combinations of ISO standard terms have meaning, but that would be a defect in the standard, it wouldn't, suddenly, lead to UB. For something to be UB it shouldn't be defined at all, not defined in a strange and/or undeciperable manner. Otherwise you may pick some term which is not defined by C standard but used extensively enough (e.g. syntax is mentioned many times, but never defined) and then claim that standard doesn't define anything at all.
Posted Jul 25, 2024 15:45 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
All I can tell you is that this is the argument given by acquaintances on the committee - syntax is defined by ISO 2382, but memory is not, which is why there's wiggle-room for saying that the behaviour is undefined.
I, personally, think this is sucky, because you shouldn't be stretching to find UB whenever possible, but should be trying to reduce it to the minimum reasonable scope, but that's not current C culture.
Posted Jul 24, 2024 11:05 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (10 responses)
Just declare one version (preferably the "0 returns an invalid pointer you can safely free") as the "blessed" version, and require that the others are available with an option switch. Those people who "realloc(ptr,0)" can now declare the behaviour they want, and those who don't can ignore it.
As I've said, imho "ptr=realloc(ptr,0)" is good defensive programming if it returns null, and if it returns a value that will cause the program to barf if de-referenced - even better? Why not encourage defensive programming, not make it even harder ...
We should be getting rid of UB, and defining a "blessed" behaviour with the option of asking for the "old" behaviour is far better than creating new UB.
Cheers,
Posted Jul 24, 2024 11:40 UTC (Wed)
by khim (subscriber, #9252)
[Link] (9 responses)
We couldn't remove all UBs from the low-level, “everything is possible” languages (e.g. if you don't say that reading ununinialized memory is UB then you are then asked to support really crazy stuff, like do something with the bug report when someone notices that on Windows 8 the contents of the EBX register no longer contained a copy of the executable’s instance handle when the executable entry point as called), but they should be justified! Yup. Response to the “we have found out that implementation are doing different things in some cases” issue should be “can we make them do the same thing, instead?” and not “ooh, yes, that's a problem, let's make millions of developers remember that and keep avoiding this landmine till the end of eternity”. I guess it's broken windows theory applied to language development: if your language have no UBs or it have mere dozen of them with each thoroughly justified then adding a new one become “a really big deal”. But when you language includes, literally, hundreds of them… it becomes a knee-jerk reaction: “some compiler does things differently from all other… let's name that difference UB and make developers cater to that one, too… they are, obviously, superhumans if they can track all these hundreds of existing UBs in their head, they wouldn't mind another one”. P.S. Originally the position of C and C++ standard was “normal people shouldn't use that standard directly, than't indirect effect document just for the compiler writers and users should consult their compiler documentation instead”, and in that world stance that C23 applied to
Posted Jul 24, 2024 13:53 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (8 responses)
In which case, again it shouldn't be UB. It should be "we defer to the standard for your platform, be it Posix, Windows or whatever". In which case, as far as C/C++ is concerned it may be undefined, but it explicitly says "to find your definition, go ..." ie it's DSE (Defined Somewhere Else).
Cheers,
Posted Jul 24, 2024 14:38 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (6 responses)
UB is defined as "behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."
"behaving … in a documented manner characteristic of the environment" is the statement that another standard can define things that C leaves as UB, just expressed in ISO standardese.
If you're saying it shouldn't be UB, then it either needs to be implementation-defined behaviour, or unspecified behaviour; but unspecified has the problem that "usually, the range of possible behaviors is delineated by this International Standard".
Posted Jul 24, 2024 14:58 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (5 responses)
That sounds like my DSE. If so, why not say it? Given that realloc (and especially in its form realloc(ptr,0) ) is documented in many places as "well formed code", most of the description of UB does not apply. To declare previously well-formed code as UB is dangerous! To explicitly defer instead to a different standard makes much more sense.
It may still be "here be dragons", but it's tame dragons, not wild ones. And if it requires flags to enable the programmer to specify the behaviour, they are not merely tame, but tamed.
Cheers,
Posted Jul 24, 2024 15:23 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
Because that's how ISO standards work; any standard that is referred to by an ISO standard must be included in the references for that standard. So, to refer to the platform standard from the ISO C standard requires that all the platform standards (including version) that you're referencing are in the reference list for the ISO C standard. This avoids the problem where a standard refers to a document that you can't even identify in order to purchase - if an ISO standard references a document, then a unique identifier for that document at the referenced is present in the references list.
Further, if the dependent standard is updated to a newer version, the reference remains to the older version; you have to issue a new version of the depending standard with updated references to update to the newer standard. Wording like that used by the C standard escapes this, since now the platform standard depends on the C standard, rather than the C standard depending on the platform standard.
Posted Jul 24, 2024 15:29 UTC (Wed)
by khim (subscriber, #9252)
[Link] (3 responses)
No. That's your DSE plus carte blanche to do anything else, too! What would be a lazy implemented doing if one option gives it lots of work and no bonuses and the other one gives less work and faster results on benchmarks, that can be used by marketing team, hmm? Not true. POSIX, in particular, defers to the ISO C standard in that regard. And it incorporates it by reference which means that when C23 would be ratified (expected to happen this year) suddenly, on all POSIX platforms, Neato, isn't it? That's what happens when different people stop talking to each other. P.S. Of course compilers wouldn't start breaking existing programs on the next day after ratification of C23. It would take some time before someone would realise that these programs that call
Posted Jul 24, 2024 21:54 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (2 responses)
> And it incorporates it by reference which means that when C23 would be ratified (expected to happen this year) suddenly, on all POSIX platforms, realloc(ptr,0) would stop being defined.
I thought you said that the original C standard defered to POSIX et al? And that's never been officially changed?
> Neato, isn't it?
It's brilliant :-)
> That's what happens when different people stop talking to each other.
Reality disappearing down the event horizon of a bathtub ...
Cheers,
Posted Jul 24, 2024 22:44 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (1 responses)
In older POSIX standards, the ISO C standard was brought in by reference, and then POSIX imposes requirements on top of ISO C. That's the way round that ISO envisages its standards being used (ISO provide a base, a higher level standard tightens up requirements in ways that work for a specific use case), but POSIX is imposing fewer and fewer requirements on top of ISO over time, which is why this is now a problem.
The ideal case would be for POSIX to impose requirements on realloc that aren't in conflict with ISO; if POSIX said "realloc(ptr, 0); must be the same as free(ptr); return malloc(0);", for example, that would not conflict with C17 or C23, but would tighten up the behaviour and make it defined in a reasonably sane way.
Posted Jul 25, 2024 8:01 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Jul 24, 2024 15:14 UTC (Wed)
by khim (subscriber, #9252)
[Link]
Why? It's said quite explicitly in the rationale: Undefined behavior gives the implementor license not to catch certain program errors that are
difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined
behavior. But that's precisely what role UB is supposed to also serve! Yes, it's conflating two roles, but in a world where standard is only ever read by compiler (and platform) developers and everyone else relies on what compiler (and platform) documentation says it makes sense. And as you saw POSIX (in it's version 1997) actually does that: it defines how But at some point platform developers have become lazy and stopped doing that, instead 2004 version says: this volume of IEEE Std 1003.1-2001 defers to the ISO C standard. Same with all the later versions. And this is when things have went downhill: C and C++ standards are still developed on the assumption that there are some mythical implementer that may and would “augment the language” with sane definitions for some of these same 200+ UBs… but “platform developers” see no reason to do that! It was funny in 2011, sad in 2020, today it's just looks a complete and utter denial of the current reality.
> In which case, as far as C/C++ is concerned it may be undefined, but it explicitly says "to find your definition, go ..." ie it's DSE (Defined Somewhere Else).
That's called implementation defined behavior, only standard committee added an additional rule for themselves not to use it for things that they couldn't adequately explain. This turned DSE (Defined Somewhere Else) into “defined with all possible options listed”, which was supposed to be a good thing… except it made committee to put bazillion things that they couldn't define into the “undefined behavior” bucket. Again: not a big difference in an world where implementers use standard as a base and then decide what would they define and what would they ignore… catastrophic difference on our world where implementers just say “look in the standard for the definitions, we don't have resources to define these things”.
> You could never use zero-sized realloc in cross-platform code, because all platforms behave differently.
C23 reference links
realloc was changing the size of the memory object, which, naturally, would leave pointers valid if it's not moving them.realloc-related code, why wouldn't it be enough to justify another kind?C23 reference links
> Changing the behavior would very likely break existing non-portable programs.
C23 reference links
C17 definition of realloc(_, 0)
> Keeping it like it was in C17 is making it undefined behaviour, just with obfuscated language to hide that from people who don't spend all their time following through the implications of standardese.
C17 definition of realloc(_, 0)
int foo(int x) {
int y = x * 2;
realloc(malloc(1), 0);
return y;
}
ret at the end). We may go from there.realloc(malloc(1), 0) as UB under C17 rules
> It is permissible for realloc to allocate a new object, but not to allocate memory for that new object (since it's zero sized)
realloc(malloc(1), 0) as UB under C17 rules
realloc), but I don't see how the definition of object as “regions of data storage in the execution environment” permits you to allocate one without also allocating memory (maybe zero bytes of memory!) for it. Sorry. No “region of data storage in the execution environment”, no object. It's as simple as that.realloc would allocate one, single, zero-sized region of storage for objects and would hand over different pointers to that same single region with all-identical bits.realloc(malloc(1), 0) as UB under C17 rules
> C is different because the standard says that unless it explicitly defines something, that thing is UB.
realloc(malloc(1), 0) as UB under C17 rules
realloc(malloc(1), 0) as UB under C17 rules
C23 reference links
> Just don't use zero-sized realloc and you get portable code.
> Before and after the C23 change.
Wol
> We should be getting rid of UB
C23 reference links
realloc was sane and justified. But that have changed: other documents (like POSIX specification I already cited) directly reference C standard, compiler developers directly use it in their work and they expect users to follow rules of the standard, too! C and C++ ISO committee may not like that change, but it have already happened! Either they would adapt or they would perish (when C and C++ would be replaced with something saner, be it Carbon, Rust or Zig). Because current situation with C/C++ standards is completely insane for the documents that are supposed to be used directly by developers.C23 reference links
Wol
C23 reference links
C23 reference links
Wol
C23 reference links
> That sounds like my DSE.
C23 reference links
realloc(ptr,0) would stop being defined.realloc(ptr,0) were always non-portable and since 2024 they are also, formally, broken so why not treat them as non-existing and not optimize well-behaving program (do such programs even exist?) better.C23 reference links
Wol
Relationship between POSIX and C
Relationship between POSIX and C
Wol
> In which case, again it shouldn't be UB.
C23 reference links
realloc works and thus some programs which C++ standard rejects are accepted by this version of POSIX.
