|
|
Subscribe / Log in / New account

C23 reference links

C23 reference links

Posted Jul 24, 2024 11:40 UTC (Wed) by khim (subscriber, #9252)
In reply to: C23 reference links by Wol
Parent article: GNU C Library 2.40 released

> We should be getting rid of UB

We couldn't remove all UBs from the low-level, “everything is possible” languages (e.g. if you don't say that reading ununinialized memory is UB then you are then asked to support really crazy stuff, like do something with the bug report when someone notices that on Windows 8 the contents of the EBX register no longer contained a copy of the executable’s instance handle when the executable entry point as called), but they should be justified!

> and defining a "blessed" behaviour with the option of asking for the "old" behaviour is far better than creating new UB.

Yup. Response to the “we have found out that implementation are doing different things in some cases” issue should be “can we make them do the same thing, instead?” and not “ooh, yes, that's a problem, let's make millions of developers remember that and keep avoiding this landmine till the end of eternity”.

I guess it's broken windows theory applied to language development: if your language have no UBs or it have mere dozen of them with each thoroughly justified then adding a new one become “a really big deal”. But when you language includes, literally, hundreds of them… it becomes a knee-jerk reaction: “some compiler does things differently from all other… let's name that difference UB and make developers cater to that one, too… they are, obviously, superhumans if they can track all these hundreds of existing UBs in their head, they wouldn't mind another one”.

P.S. Originally the position of C and C++ standard was “normal people shouldn't use that standard directly, than't indirect effect document just for the compiler writers and users should consult their compiler documentation instead”, and in that world stance that C23 applied to realloc was sane and justified. But that have changed: other documents (like POSIX specification I already cited) directly reference C standard, compiler developers directly use it in their work and they expect users to follow rules of the standard, too! C and C++ ISO committee may not like that change, but it have already happened! Either they would adapt or they would perish (when C and C++ would be replaced with something saner, be it Carbon, Rust or Zig). Because current situation with C/C++ standards is completely insane for the documents that are supposed to be used directly by developers.


to post comments

C23 reference links

Posted Jul 24, 2024 13:53 UTC (Wed) by Wol (subscriber, #4433) [Link] (8 responses)

> P.S. Originally the position of C and C++ standard was “normal people shouldn't use that standard directly, than't indirect effect document just for the compiler writers and users should consult their compiler documentation instead”

In which case, again it shouldn't be UB. It should be "we defer to the standard for your platform, be it Posix, Windows or whatever". In which case, as far as C/C++ is concerned it may be undefined, but it explicitly says "to find your definition, go ..." ie it's DSE (Defined Somewhere Else).

Cheers,
Wol

C23 reference links

Posted Jul 24, 2024 14:38 UTC (Wed) by farnz (subscriber, #17727) [Link] (6 responses)

UB is defined as "behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

"behaving … in a documented manner characteristic of the environment" is the statement that another standard can define things that C leaves as UB, just expressed in ISO standardese.

If you're saying it shouldn't be UB, then it either needs to be implementation-defined behaviour, or unspecified behaviour; but unspecified has the problem that "usually, the range of possible behaviors is delineated by this International Standard".

C23 reference links

Posted Jul 24, 2024 14:58 UTC (Wed) by Wol (subscriber, #4433) [Link] (5 responses)

> to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

That sounds like my DSE. If so, why not say it? Given that realloc (and especially in its form realloc(ptr,0) ) is documented in many places as "well formed code", most of the description of UB does not apply. To declare previously well-formed code as UB is dangerous! To explicitly defer instead to a different standard makes much more sense.

It may still be "here be dragons", but it's tame dragons, not wild ones. And if it requires flags to enable the programmer to specify the behaviour, they are not merely tame, but tamed.

Cheers,
Wol

C23 reference links

Posted Jul 24, 2024 15:23 UTC (Wed) by farnz (subscriber, #17727) [Link]

Because that's how ISO standards work; any standard that is referred to by an ISO standard must be included in the references for that standard. So, to refer to the platform standard from the ISO C standard requires that all the platform standards (including version) that you're referencing are in the reference list for the ISO C standard. This avoids the problem where a standard refers to a document that you can't even identify in order to purchase - if an ISO standard references a document, then a unique identifier for that document at the referenced is present in the references list.

Further, if the dependent standard is updated to a newer version, the reference remains to the older version; you have to issue a new version of the depending standard with updated references to update to the newer standard. Wording like that used by the C standard escapes this, since now the platform standard depends on the C standard, rather than the C standard depending on the platform standard.

C23 reference links

Posted Jul 24, 2024 15:29 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> That sounds like my DSE.

No. That's your DSE plus carte blanche to do anything else, too!

What would be a lazy implemented doing if one option gives it lots of work and no bonuses and the other one gives less work and faster results on benchmarks, that can be used by marketing team, hmm?

> Given that realloc (and especially in its form realloc(ptr,0) ) is documented in many places as "well formed code", most of the description of UB does not apply.

Not true. POSIX, in particular, defers to the ISO C standard in that regard.

And it incorporates it by reference which means that when C23 would be ratified (expected to happen this year) suddenly, on all POSIX platforms, realloc(ptr,0) would stop being defined.

Neato, isn't it?

That's what happens when different people stop talking to each other.

P.S. Of course compilers wouldn't start breaking existing programs on the next day after ratification of C23. It would take some time before someone would realise that these programs that call realloc(ptr,0) were always non-portable and since 2024 they are also, formally, broken so why not treat them as non-existing and not optimize well-behaving program (do such programs even exist?) better.

C23 reference links

Posted Jul 24, 2024 21:54 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)

> Not true. POSIX, in particular, defers to the ISO C standard in that regard.

> And it incorporates it by reference which means that when C23 would be ratified (expected to happen this year) suddenly, on all POSIX platforms, realloc(ptr,0) would stop being defined.

I thought you said that the original C standard defered to POSIX et al? And that's never been officially changed?

> Neato, isn't it?

It's brilliant :-)

> That's what happens when different people stop talking to each other.

Reality disappearing down the event horizon of a bathtub ...

Cheers,
Wol

Relationship between POSIX and C

Posted Jul 24, 2024 22:44 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

In older POSIX standards, the ISO C standard was brought in by reference, and then POSIX imposes requirements on top of ISO C. That's the way round that ISO envisages its standards being used (ISO provide a base, a higher level standard tightens up requirements in ways that work for a specific use case), but POSIX is imposing fewer and fewer requirements on top of ISO over time, which is why this is now a problem.

The ideal case would be for POSIX to impose requirements on realloc that aren't in conflict with ISO; if POSIX said "realloc(ptr, 0); must be the same as free(ptr); return malloc(0);", for example, that would not conflict with C17 or C23, but would tighten up the behaviour and make it defined in a reasonably sane way.

Relationship between POSIX and C

Posted Jul 25, 2024 8:01 UTC (Thu) by Wol (subscriber, #4433) [Link]

And who's got the balls to do such an eminently sensible thing? :-(

Cheers,
Wol

C23 reference links

Posted Jul 24, 2024 15:14 UTC (Wed) by khim (subscriber, #9252) [Link]

> In which case, again it shouldn't be UB.

Why? It's said quite explicitly in the rationale: Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.

> It should be "we defer to the standard for your platform, be it Posix, Windows or whatever".

But that's precisely what role UB is supposed to also serve! Yes, it's conflating two roles, but in a world where standard is only ever read by compiler (and platform) developers and everyone else relies on what compiler (and platform) documentation says it makes sense.

And as you saw POSIX (in it's version 1997) actually does that: it defines how realloc works and thus some programs which C++ standard rejects are accepted by this version of POSIX.

But at some point platform developers have become lazy and stopped doing that, instead 2004 version says: this volume of IEEE Std 1003.1-2001 defers to the ISO C standard. Same with all the later versions.

And this is when things have went downhill: C and C++ standards are still developed on the assumption that there are some mythical implementer that may and would “augment the language” with sane definitions for some of these same 200+ UBs… but “platform developers” see no reason to do that!

It was funny in 2011, sad in 2020, today it's just looks a complete and utter denial of the current reality.

> In which case, as far as C/C++ is concerned it may be undefined, but it explicitly says "to find your definition, go ..." ie it's DSE (Defined Somewhere Else).

That's called implementation defined behavior, only standard committee added an additional rule for themselves not to use it for things that they couldn't adequately explain. This turned DSE (Defined Somewhere Else) into “defined with all possible options listed”, which was supposed to be a good thing… except it made committee to put bazillion things that they couldn't define into the “undefined behavior” bucket.

Again: not a big difference in an world where implementers use standard as a base and then decide what would they define and what would they ignore… catastrophic difference on our world where implementers just say “look in the standard for the definitions, we don't have resources to define these things”.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds