LWN: Comments on "realloc() and the oversize importance of zero-size objects" https://lwn.net/Articles/995196/ This is a special feed containing comments posted to the individual LWN article titled "realloc() and the oversize importance of zero-size objects". en-us Fri, 05 Sep 2025 00:58:54 +0000 Fri, 05 Sep 2025 00:58:54 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net leave established API alone https://lwn.net/Articles/997419/ https://lwn.net/Articles/997419/ fest3er <div class="FormattedComment"> «This only breaks existing code.»<br> <p> Correct. I've had to go through and fix a bunch of old C++ code because newer standards changed syntax. C—known to be fraught with pointer errors—should work to tighten behavior; that is, the standard should be changed to minimize the negative effects of undefined pointer actions/operations even if means that existing code will have to be corrected. It would likely involve a syntax change for well-written code, but must just require significant reprogramming for programs that employ diabolical creativity.<br> </div> Thu, 07 Nov 2024 23:33:46 +0000 Ignoring undefined behaviour https://lwn.net/Articles/997417/ https://lwn.net/Articles/997417/ fest3er <div class="FormattedComment"> «Why oh why would the existence of the standard make the set of things that can be usefully described as C implementations LARGER than it was before?»<br> <p> Probably for the same reason drag racers, sled pullers and other motorsports movers and shakers work to ensure that there are some rules that are subtly ambiguous: that ambiguity can, and often does, give them a competitive advantage even though that advantage goes against the intent of the rules. In other words, some people like ambiguous rules because it lets them flex their creativity. Alas, they forget that unfettered creativity often results in broken software.<br> </div> Thu, 07 Nov 2024 23:14:47 +0000 Why not define the behavior? https://lwn.net/Articles/996804/ https://lwn.net/Articles/996804/ Bluehorn <div class="FormattedComment"> I don't get it. If an API was misused before, I'd rather disallow what could be misinterpreted and ask to use another API.<br> <p> IMHO the only sane way forward would be to narrow the API of realloc, rejecting zero-size objects.<br> So realloc(something, 0) should just set errno to EINVAL and return a NULL pointer. This may crash some applications, but that's easier to fix than invoking undefined behavior, making the code "run on my machine".<br> <p> Full disclosure: I used realloc to resize string buffers, and was expecting for realloc(something, 0) being equivalent to free. Obviously I did not check for errors in that code, so calling realloc to increase the size failing would lead to a memory leak (and probably write to zero address).<br> <p> </div> Sun, 03 Nov 2024 20:09:51 +0000 Single call API for the heap https://lwn.net/Articles/996639/ https://lwn.net/Articles/996639/ paulj <div class="FormattedComment"> The Jenkins hash is easy to pull from the Linux source code, and more than good enough for non-security-sensitive, performance orientated, contexts.<br> </div> Fri, 01 Nov 2024 12:04:43 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996631/ https://lwn.net/Articles/996631/ mathstuf <div class="FormattedComment"> <span class="QuotedText">&gt; It is unlikely that one or a few volunteers can change the course of these big projects in a way that conflicts with the value system of the paid participants.</span><br> <p> FWIW, LLVM has a (new) community code ownership policy[1] and is actively seeking[2] community members to participate. You're unlikely to change minds if you conflict about something from the paying entities, but it is possible to offer arguments[3] that end up nudging things in the right direction[4] (even if there's a lack of explicit acknowledgement).<br> <p> [1] <a href="https://discourse.llvm.org/t/rfc-proposing-changes-to-the-community-code-ownership-policy/80714">https://discourse.llvm.org/t/rfc-proposing-changes-to-the...</a><br> [2] <a href="https://discourse.llvm.org/t/calling-all-volunteers/82817">https://discourse.llvm.org/t/calling-all-volunteers/82817</a><br> [3] <a href="https://github.com/bazelbuild/bazel/pull/19940#issuecomment-2066929706">https://github.com/bazelbuild/bazel/pull/19940#issuecomme...</a><br> [4] <a href="https://github.com/bazelbuild/bazel/pull/19940#issuecomment-2082133204">https://github.com/bazelbuild/bazel/pull/19940#issuecomme...</a><br> </div> Fri, 01 Nov 2024 09:22:09 +0000 Realloc freed the memory long before the C99 standard. https://lwn.net/Articles/996620/ https://lwn.net/Articles/996620/ kelnos <div class="FormattedComment"> A long long time ago, the first time I ever read that, I thought it meant that it would return the single same pointer value every time you called malloc() with a size of 0. But that value was also guaranteed to never be returned for a non zero sized allocation. To me the second bit was the "unique" part. And so returning NULL could be a conformant thing to do. As could (as someone else suggested) returning (void *)-1 every time.<br> <p> I know no one actually interprets or implements it that way, but to me, that's still a valid reading of the spec.<br> </div> Fri, 01 Nov 2024 00:50:52 +0000 Honestly kind of irrelevant https://lwn.net/Articles/996561/ https://lwn.net/Articles/996561/ NYKevin <div class="FormattedComment"> Because you don't need to. The developer can just do this:<br> <p> #define malloc_s(size) (size == 0? (void*)NULL : malloc(size))<br> #define realloc_s(ptr, size) (size == 0 ? free(ptr), (void*)NULL : realloc(ptr, size))<br> <p> And then the compiler can optimize that back into a malloc/realloc call easily enough, if it knows that malloc/realloc have the appropriate semantics on a given platform.<br> <p> Getting unique non-dereferencable pointers is harder, and IMHO should not be done, because (as I've explained elsewhere in the thread) it is cheaper to have a global u64 and atomically increment it (malloc takes a lock in almost any sensible implementation). If you somehow manage to allocate over 16 quintillion numbers in this fashion (so that it wraps), then you probably should be using UUIDs instead of 64-bit integers in the first place.<br> </div> Thu, 31 Oct 2024 18:10:43 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996560/ https://lwn.net/Articles/996560/ NYKevin <div class="FormattedComment"> OK, fine, let's analyze them as two groups:<br> <p> 1. The GCC/Clang people have publicly stated, in many different fora, for many years, that they will interpret UB as license to do whatever they want. GCC and Clang are also the two most popular compilers in practical use.<br> 2. The committee ignores (1) and continues to designate things as UB which probably should not be treated in this manner, and then insists that they are not at fault when the compiler writers do exactly what they publicly said they were going to do.<br> <p> That's hardly better.<br> </div> Thu, 31 Oct 2024 18:02:02 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996547/ https://lwn.net/Articles/996547/ anton Some of us actually <a href="http://git.savannah.gnu.org/cgit/gforth.git/">work on</a> and <a href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14">publish about</a> compilers (albeit not C compilers), so your slander is just that. <p>Concerning your suggestion to get involved with LLVM and GCC maintenance, these are big projects with a lot of paid-for participants who have agreed on certain goals and evaluation methods, and these agreements have lead to the current situation. It is unlikely that one or a few volunteers can change the course of these big projects in a way that conflicts with the value system of the paid participants. Even a <a href="http://www.sable.mcgill.ca/publications/papers/2008-2/paper.pdf">contribution</a> that did not conflict with that value system was ignored (and the authors of that contribution also presented at a GCC Developer's summit). Thu, 31 Oct 2024 18:00:41 +0000 Single call API for the heap https://lwn.net/Articles/996557/ https://lwn.net/Articles/996557/ NYKevin <div class="FormattedComment"> I suppose my real concern is that some people are going to use the modulus operator as their "hashing algorithm," under the (false) assumption that malloc(0) is required to give uniformly-distributed pointers (and because C does not come with a hash table implementation in its stdlib, so whatever hash implementation you come up with is probably not going to be a best-in-class implementation).<br> </div> Thu, 31 Oct 2024 17:56:57 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996543/ https://lwn.net/Articles/996543/ anton I have looked at three different POSIX versions (<a href="https://pubs.opengroup.org/onlinepubs/009696899/functions/realloc.html">2004</a>, <a href="https://pubs.opengroup.org/onlinepubs/9699919799.2013edition/functions/realloc.html">2013</a>, <a href="https://pubs.opengroup.org/onlinepubs/9799919799/functions/realloc.html">2024</a>), and they all change the wording and sometimes the behaviour of realloc() (the <a href="https://pubs.opengroup.org/onlinepubs/009696899/functions/realloc.html">2004 variant or realloc(ptr,0)</a> free()s the object pointed to by a non-NULL ptr), the others allow several implementations. <p>I don't see that a tighter definition (e.g., the 2004 one) in POSIX would be in conflict with a looser (e.g., implementation-defined, or one of several options, including the POSIX 2004 one) definition in a C standard. Thu, 31 Oct 2024 16:56:57 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996541/ https://lwn.net/Articles/996541/ anton It says that, in case of conflict (i.e., both standards define the behaviour in different ways), the C standard prevails. But if the C standard does not define a behaviour, and POSIX defines it, there is no conflict, and the POSIX definition is the one a POSIX-compliant system has to implement. <p>If you want to promote an interpretation where undefined behaviour in the C standard prevails over defined behaviour in POSIX, that's obviously absurd: for all the functions defined in the C standard, the POSIX definition would just be redundant; and for all the functions that are not defined in the C standard, but are defined in POSIX (e.g., read()), that interpretation would mean that these functions are undefined. I.e., with that interpretation all the definitions of C functions in POSIX would be superfluous. Thu, 31 Oct 2024 16:45:35 +0000 Single call API for the heap https://lwn.net/Articles/996533/ https://lwn.net/Articles/996533/ anton I have never read anything about "good hash inputs". Normally the idea is that a good hash function distributes the inputs into the buckets like a random choice of buckets or better (perfect hashing), for inputs of any characteristics (of course each hash function has input sets that produce worst-case behaviour, but for good hash functions trivial patterns do not form such sets). <p>I expect that you mean that "good hash inputs" do ok even for bad hash functions. The solution to this problem is to use good hash functions, not to produce "good hash inputs" from realloc(...,0). There is no guarantee of "good hash inputs" for any other stuff that you might throw at the hash function, including other uses of realloc(). Thu, 31 Oct 2024 15:56:29 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996456/ https://lwn.net/Articles/996456/ milesrout <div class="FormattedComment"> Note that this means that, despite being released in 2024, the 8th edition of POSIX still refers to "C17" (the 2018 edition of ISO C) and not to C23, despite 2024 being after 2023. That is, at least in part, because, as far as I know, the official standard for C23 still hasn't been released! It is expected to be finalised this year but who knows? It could well end up being ISO/IEC 9899:2025, which would be pretty funny. <br> </div> Thu, 31 Oct 2024 08:24:31 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996455/ https://lwn.net/Articles/996455/ milesrout <div class="FormattedComment"> <span class="QuotedText">&gt;And since actual version of ISO C standard is not mentioned anywhere (presumably to ensure that possible bugfixes are automatically picked)… that means that by ratifying that change they have, effectively, added undefined behavior to most existing versions of POSIX specification.</span><br> <p> This is not true. POSIX.1-2024 says in Chapter 1 of Volume 2 (<a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap01.html">https://pubs.opengroup.org/onlinepubs/9699919799/function...</a>):<br> <p> <span class="QuotedText">&gt;This volume of POSIX.1-2024 is aligned with the following standards, except where stated otherwise:</span><br> <span class="QuotedText">&gt;ISO C (C17)</span><br> <span class="QuotedText">&gt; ISO/IEC 9899:2018, Programming Languages — C.</span><br> <span class="QuotedText">&gt;Parts of the ISO/IEC 9899:2018 standard (hereinafter referred to as the ISO C standard) are referenced to describe requirements also mandated by this volume of POSIX.1-2024.</span><br> <p> And earlier versions have similar language:<br> <p> <span class="QuotedText">&gt;Great care has been taken to ensure that this volume of POSIX.1-2017 is fully aligned with the following standards:</span><br> <span class="QuotedText">&gt;ISO C (1999)</span><br> <span class="QuotedText">&gt; ISO/IEC 9899:1999, Programming Languages - C, including ISO/IEC 9899:1999/Cor.1:2001(E), ISO/IEC 9899:1999/Cor.2:2004(E), and ISO/&gt;IEC 9899:1999/Cor.3.</span><br> <span class="QuotedText">&gt;Parts of the ISO/IEC 9899:1999 standard (hereinafter referred to as the ISO C standard) are referenced to describe requirements also mandated by this volume of POSIX.1-2017.</span><br> <p> Note the 'hereinafter referred to as the ISO C standard' bit.<br> </div> Thu, 31 Oct 2024 08:20:19 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996415/ https://lwn.net/Articles/996415/ milesrout <div class="FormattedComment"> <span class="QuotedText">&gt;Making something UB is a very strong statement, though: it's saying that any existing code that does that thing is simply wrong (perhaps retroactively!), and whenever that existing code is recompiled, compilers are allowed (perhaps even encouraged) to emit binaries that only do what the developer presumably intended in situations where the UB cannot happen.</span><br> <p> This is NOT true. It is disinformation spread by malicious compiler developers.<br> <p> Code with behaviour that is not explicitly defined by the C standard is not necessarily portable to arbitrary implementations of C. There is no reason at all to think that its behaviour is not defined by something other than the C standard, such as:<br> <p> - the compiler documentation<br> - basic common sense<br> - obvious authorial intent<br> - platform documentation<br> - historical precedent<br> - behaviour implied by the standard<br> - behaviour so obvious to the standard's authors it didn't occur to them that it needed to be specified<br> - the fact that an implementation treating said code as erroneous would be useless and so no reasonable implementation will do so<br> - etc.<br> <p> It is of course possible under one reading of the standard for a malicious implementation of C++ to treat an empty infinite loop (eg. "for(;;);") as illegal and to ignore the loop, replacing it with a no-op. Such an implementation is a thought experiment. Its possible existence does not justify the claim that the standard "says" that such a loop is "simply wrong" (it does not and it is not), nor does the standard "encourage" compilers to miscompile code like this. Of course Mr Compiler can do whatever he pleases, and emit whatever machine code he likes. He can call himself a C++ compiler. But he is a useless one and no programmer that cares about correctness will use him.<br> <p> Similarly it is POSSIBLE to create a program that you claim is a C implementation that treats "realloc(p, 0)" as erroneous, without a diagnostic, and which miscompiles it. But such an implementation is just useless crap. That the GNU libc people are even considering this is very sad. GNU used to be about being better than the standard, where the standard was silent, GNU programs would be designed to do something sensible. GCC COULD miscompile code that used variable names more than 6 bytes long too (or whatever the stupid limit in the standard is). It doesn't, because what a stupid decision that would be. Half the point of GNU originally was to write decent implementations of standard utilities that used dynamic allocation to avoid those sorts of arbitrary limits, to go beyond the bare minimum "standard" and to build good useful software.<br> <p> Nowhere else in the world do we accept this kind of malicious compliance with the (very stretched) black letter of the rules while totally ignoring its objects, its context, and its plain and natural meaning. <br> <p> Nobody would create an intentionally useless and malicious alternative implementation of any programming language WITHOUT a standard (except as a joke). Why, as soon as SOME behaviour is written down, do people start to act like anything not written down is free to be implemented in as stupid a way as possible, *and should be*, and that any code that would run afoul of such a (mis)compiler is "simply wrong"?<br> <p> Again, nobody would write a C compiler with these sorts of miscompilation bugs (falsely claimed to be "optimisations" - optimisations cannot turn correct code into incorrect code so they cannot be called this) if the standard didnt exist. Why oh why would the existence of the standard make the set of things that can be usefully described as C implementations LARGER than it was before?<br> </div> Thu, 31 Oct 2024 06:05:20 +0000 Zero is a number just like any other number https://lwn.net/Articles/996398/ https://lwn.net/Articles/996398/ milesrout <div class="FormattedComment"> there is no rule that you have to use malloc when you write C. Plenty of people use "sane" allocators in C. <br> <p> this has nothing to do with the fact p = realloc(p,...) is erroneous. Of course it is wrong! It is obviously nonsensical rubbish code with any allocator. If you cant get this basic stuff right then i dont think a "sane" allocator would save you from writing hundreds of other serious bugs in your program<br> </div> Thu, 31 Oct 2024 05:33:51 +0000 Special allocator for zero-sized blocks https://lwn.net/Articles/996360/ https://lwn.net/Articles/996360/ dgm <div class="FormattedComment"> I'm not sure of understanding you correctly, but if you mean that the cost of updating 25+ years worth of code is not worth what is, in essence, an aesthetic change, then I have to concur.<br> <p> <p> </div> Wed, 30 Oct 2024 19:52:44 +0000 Opportunity for GENSYM https://lwn.net/Articles/996341/ https://lwn.net/Articles/996341/ jreiser <div class="FormattedComment"> realloc(ptr, 0) is related to malloc(0). Experienced app implementors might realize that some uses of malloc(0) are equivalent to (GENSYM) in Lisp: create a new atom that is unique for the remaining duration of the process.<br> <p> In C language, a tentatve definition such as one of:<br> void *GENSYSM(void) { return malloc(0); }<br> void *GENSYSM(void) { return malloc(1); }<br> void *GENSYSM(void) { return malloc(sizeof(void *)); }<br> allows taking advantage of simpler management than general malloc(). Allocate sequentially from larger block(s) at known address(es) (such as a page on 16-bit, aligned megabyte on 32-bit, aligned 4GiB on 64-bit), use a bitmap to track, etc. Interposing such a definition, intercepting malloc(0), can be useful in an existing system. Being able to distinguish (by address) a GENSYSM from a general malloc block can have advantages in debugging, user interface, or even logical flow.<br> <p> In an environment that has inter-operable GENESYM and malloc, then realloc(ptr, 0) can be easy to understand and implement: { free(ptr); return GENSYM(); }<br> <p> </div> Wed, 30 Oct 2024 17:20:08 +0000 leave established API alone https://lwn.net/Articles/996280/ https://lwn.net/Articles/996280/ vadim <div class="FormattedComment"> There are edge cases where outright dying instantly in the allocator may not be desirable. Eg, take a program that allocates 1 GB for something big like a VM or a database, and then the error handler allocates memory to report an useful error.<br> <p> You can expect that if you can't allocate 1GB, you probably still can get 1K for a string.<br> <p> There also may be tasks that can back off on allocation, like if the 1GB is for a cache, perhaps 512MB is also fine.<br> </div> Wed, 30 Oct 2024 11:52:12 +0000 Honestly kind of irrelevant https://lwn.net/Articles/996176/ https://lwn.net/Articles/996176/ Wol <div class="FormattedComment"> <span class="QuotedText">&gt; To me this is kind of irrelevant. If you want to write portable code then you have to cope with the fact that malloc(0) and realloc(ptr, 0) don't react the same on all platforms - and just avoid them altogether. To me any C code relying on these kind of specific behaviors (be that of glibc or any other platform) has a strong smell.</span><br> <p> I know this sort-of goes against the grain of being a standard, but why not say "here are two competing versions, you have to provide both (with a compiler switch), but we take no position on the default".<br> <p> Khim's point about ISO saying Posix can do what it likes, when Posix explicitly defers to ISO, does imho appear daft. And if you want ISO C to be portable, specifying the two dominant (Posix and MS) implementations and saying "support both" seems to be the right way to go ... whether others agree is another matter ...<br> <p> Cheers,<br> Wol<br> </div> Tue, 29 Oct 2024 10:10:26 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996163/ https://lwn.net/Articles/996163/ farnz <p>I disagree deeply; the people I see arguing that compilers should only care about the ISO standard are a disjoint group from those who say that the standard should expect downstreams to define more behaviour than ISO does. It makes a huge difference, because it's a minority group, who just happen to have positions of power w.r.t. open source C compilers - note that many of the proprietary C compilers <em>don't</em> make the same arguments around "ISO says it's OK" - and your position is a lot like saying that because you hold an opinion, Google as your employer must agree, and that it's unhelpful to view your opinions as separate from Google's. <p>In practice, if enough of the people who care were to get involved with LLVM and GCC maintenance, and write and enforce documentation for what LLVM and GCC do when ISO says "UB", "IFNDR", "US", and similar terms for "ISO doesn't have a view here", it'd stop being an issue. But that would involve a bunch of people who aren't interested in compilers taking control of compiler projects, and this is an underlying weakness of open source - only people who are genuinely interested in something tend to take control of that thing. Tue, 29 Oct 2024 09:22:11 +0000 Honestly kind of irrelevant https://lwn.net/Articles/996158/ https://lwn.net/Articles/996158/ chris_se <div class="FormattedComment"> To me this is kind of irrelevant. If you want to write portable code then you have to cope with the fact that malloc(0) and realloc(ptr, 0) don't react the same on all platforms - and just avoid them altogether. To me any C code relying on these kind of specific behaviors (be that of glibc or any other platform) has a strong smell.<br> <p> Even if the code is used in non-portable programs that are designed for just a single operating system, I still don't like them. There are some implementation-defined behaviors (such as relying on two's complement or 8bit byte sizes) where it can make a lot of sense to rely on those, because not assuming that would make the code a **lot** more complicated for no benefit at all in many cases. (And platforms that that don't use e.g. two's complement are extremely rare nowadays.) But avoiding malloc(0) or realloc(ptr, 0) is just 1-2 more lines of code. And if it's handled explicitly it is 100% clear what the code intends to do. In the absence of explicit handling that's not the case, and then one doesn't immediately know whether the code actually relies on that behavior, or whether there's a bug in there and the author didn't even think about that corner case.<br> <p> I personally would leave the current behavior as-is, because I don't see any benefit of changing it, because in my opinion nobody should ever write new code that relies on these kinds of specifics. But I also don't really care if they do decide to change that behavior.<br> </div> Tue, 29 Oct 2024 08:38:00 +0000 Ignoring undefined behaviour https://lwn.net/Articles/996099/ https://lwn.net/Articles/996099/ NYKevin <div class="FormattedComment"> Compiler maintainers are the main (if not only) group of people who tell the standards committee what they do and do not want to see in the next iteration of the standard. They may technically be two different groups of people, but they move and act in concert, and I do not think that distinguishing between them adds much expository value to our model of why the standards committee does things.<br> </div> Mon, 28 Oct 2024 19:27:58 +0000 glibc changes like this are fine https://lwn.net/Articles/996002/ https://lwn.net/Articles/996002/ guillemj <div class="FormattedComment"> <span class="QuotedText">&gt; In case anyone is wondering whether it is up-to-date: I honestly have no idea. It is probably outdated, because Debian, but I do not know off the top of my head how to confirm that. The colophon says it's from Linux man-pages version 5.10 and has a URL pointing to <a href="https://www.kernel.org/doc/man-pages/">https://www.kernel.org/doc/man-pages/</a>, which in turn tells me nothing at all about the current version of that project.</span><br> <p> From the upstream link you provided you can either get to the latest version from git or from its online manuals, from the top links "bar":<br> <p> <a href="https://man7.org/linux/man-pages/man3/malloc.3.html">https://man7.org/linux/man-pages/man3/malloc.3.html</a><br> <p> Where there is a mention of the non-portable behavior, which I assumed would be there given that Alejandro used to maintain the manpages project until recently. The Debian version you mention is from _oldstable_, you can see where the various versions are provided for each release here for example:<br> <p> <a href="https://tracker.debian.org/pkg/manpages">https://tracker.debian.org/pkg/manpages</a><br> <p> The version in _stable_ seems to already have the note:<br> <p> <a href="https://manpages.debian.org/bookworm/manpages-dev/realloc.3.en.html">https://manpages.debian.org/bookworm/manpages-dev/realloc...</a><br> </div> Mon, 28 Oct 2024 10:58:54 +0000 Ignoring undefined behaviour https://lwn.net/Articles/995996/ https://lwn.net/Articles/995996/ farnz <p>I've never seen the first position from the C standards committee, only from compiler maintainers. The committee has consistently held the second position, IME, with an extension of "and a downstream standard such as POSIX can specify things that we don't; after all, POSIX already specifies that <tt>CHAR_BIT</tt> must be 8". Mon, 28 Oct 2024 08:43:55 +0000 Possible solution https://lwn.net/Articles/995983/ https://lwn.net/Articles/995983/ ianmcc <div class="FormattedComment"> In C++, p &lt; q is undefined behavior for unrelated pointers p,q. But std::less&lt;T*&gt;(p,q) is well-defined, and must give a strict total ordering. Gcc had a bug on this, since fixed: <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78420">https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78420</a><br> </div> Mon, 28 Oct 2024 02:45:22 +0000 glibc changes like this are fine https://lwn.net/Articles/995968/ https://lwn.net/Articles/995968/ NYKevin <div class="FormattedComment"> Oh, and I forgot:<br> <p> 3. Hyrum's law. Most programmers do not read the C specification, they read (at most) the man page, which (on my WSL installation of Debian), says that realloc(ptr, 0) is equivalent to malloc(0) if ptr == NULL and free(ptr) otherwise (the return value section does mention that realloc(ptr, 0) may return a non-NULL pointer, but it does not distinguish between the ptr == NULL case and the ptr != NULL case, so the most straightforward interpretation is that it is talking about the former). There is no mention of portability anywhere in the document that I could find (the document claims that realloc is "conforming to" various versions of both the POSIX and C standards, without any note of this extension). Some programmers will not even go that far, they just write some code and test if it works as expected. Then the code silently picks up a glibc dependency that nobody explicitly knows about. This is not hypothetical - there is at least one comment in this thread from a developer who actually did take such a dependency and had to patch their code for musl compatibility.<br> <p> In case anyone is wondering whether it is up-to-date: I honestly have no idea. It is probably outdated, because Debian, but I do not know off the top of my head how to confirm that. The colophon says it's from Linux man-pages version 5.10 and has a URL pointing to <a href="https://www.kernel.org/doc/man-pages/">https://www.kernel.org/doc/man-pages/</a>, which in turn tells me nothing at all about the current version of that project.<br> </div> Sun, 27 Oct 2024 22:05:11 +0000 Ignoring undefined behaviour https://lwn.net/Articles/995963/ https://lwn.net/Articles/995963/ NYKevin <div class="FormattedComment"> I have to say that I have gotten very exhausted with the C standards committee's motte-and-bailey fallacy surrounding UB. They really need to pick one of these positions and hold it consistently:<br> <p> * Programs that perform UB are wrong. Compilers can and should optimize under the assumption that UB never occurs.<br> * Programs that perform UB are not wrong, UB just means that the standards committee does not know or care exactly what will happen in those cases due to portability considerations. Non-normatively, an implementation which makes nasal-demon optimizations is a poor quality implementation despite being technically standards-conforming.<br> <p> Right now, the committee wants to have its cake and eat it too, but the resulting constellation of UB does not make coherent sense as a whole.<br> </div> Sun, 27 Oct 2024 18:10:02 +0000 glibc changes like this are fine https://lwn.net/Articles/995945/ https://lwn.net/Articles/995945/ NYKevin <div class="FormattedComment"> It's not that easy.<br> <p> 1. Stable systems eventually do need to upgrade to the next version of everything. This is already uncomfortably close to being a major flag day for those distros. Changing the behavior of something as basic as realloc would push it further in that direction.<br> 2. There are many intermediates between an unstable distro where everything is breaking all the time and (e.g.) RHEL. Debian Testing, for example, is well known to be a reasonably well-behaved release channel in practice, and even Sid is reportedly not that bad (personally, I would never use either of those for serious purposes, but different organizations and use cases have different needs).<br> </div> Sun, 27 Oct 2024 07:08:43 +0000 glibc changes like this are fine https://lwn.net/Articles/995938/ https://lwn.net/Articles/995938/ marcH <div class="FormattedComment"> <span class="QuotedText">&gt; This is, it seems, a topic about which some people, at least, have strong feelings.</span><br> <p> 50+ comments here already!<br> <p> <span class="QuotedText">&gt; there are almost certainly programs that rely on the current behavior</span><br> <p> Apparently just non-portable programs that have a hard dependency on glibc, so such breakage would be fine because users:<br> 1. either use a stable / LTS GNU/Linux distribution that is not going to perform a major glibc upgrade (this change should obviously not be part of a minor release, only in a major one)<br> 2. or they use a rolling / fast-paced / development one where everything keeps breaking anyway for a gazillion of other reasons.<br> <p> To be fairer: _how_ it would break in case 2. matters. Discussed above already.<br> <p> </div> Sun, 27 Oct 2024 02:12:21 +0000 Call abort()? https://lwn.net/Articles/995897/ https://lwn.net/Articles/995897/ DemiMarie Could <code>realloc(p, 0)</code> just call <code>abort()</code>? I guess if UBSAN is on, but otherwise it would be too backwards-incompatible… Sat, 26 Oct 2024 05:53:54 +0000 leave established API alone https://lwn.net/Articles/995890/ https://lwn.net/Articles/995890/ abartlet <div class="FormattedComment"> This is where I sit, to change such a long-defined behaviour is fraught with risk. <br> </div> Sat, 26 Oct 2024 04:10:33 +0000 leave established API alone https://lwn.net/Articles/995850/ https://lwn.net/Articles/995850/ RogerOdle <div class="FormattedComment"> Please do not fix existing problems by changing long established API. This only breaks existing code. Treat it like the kernel does, do not break userspace. Instead, define a new sane API and provide it as an alternative.<br> <p> Personally, I like two varities:<br> <p> 1) never return a bad value, AKA is do it or die, AKA die early because I can't do anything anyway. OK to scream OUCH!!! if out of memory. Really, if you are out of memory then you are screwed anyway. If you thought you were allocating something other than 0 bytes, it is best to know as soon as possible.<br> <p> 2) return NULL and set errno=EINVAL on size==0. Users problem, not systems. I can keep going or do controlled shutdown as the case may be.<br> <p> </div> Fri, 25 Oct 2024 20:49:49 +0000 Ignoring undefined behaviour https://lwn.net/Articles/995806/ https://lwn.net/Articles/995806/ khim <font class="QuotedText">&gt; I suspect that is the real reason they did decide that it is undefined behaviour.</font> <p>There's rationale and it quite explicitly says what they <b>wanted to achieve</b>: <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf">Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please.</a></p> <p>Their intention were admirable (at least we know for the fact that anyone have done anything out of sheer malice), but what they have <b>actually achieved</b> is sheer lunacy. POSIX, for more than two decades, <a href="https://pubs.opengroup.org/onlinepubs/009696899/functions/realloc.html">declares realloc semantic like this</a>: <blockquote>blah blah blah <b>This volume of IEEE Std 1003.1-2001 defers to the ISO C standard.</b> blah blah blah (<i>emphasis in the original</i>)</blockquote> <p>And since actual version of ISO C standard is not mentioned anywhere (presumably to ensure that possible bugfixes are automatically picked)… that means that by ratifying that change they have, effectively, added undefined behavior to most existing versions of POSIX specification.</p> Fri, 25 Oct 2024 18:45:54 +0000 Zero is a number just like any other number https://lwn.net/Articles/995805/ https://lwn.net/Articles/995805/ mb <div class="FormattedComment"> <span class="QuotedText">&gt;but if you want not to allocate memory, but to free it and that fails</span><br> <p> I commented on something else.<br> <p> I was just saying that ptr = realloc(ptr, ...) was a bad pattern, because it's wrong for all cases *except* the free/zero case (maybe; implementation defined; if not UB).<br> </div> Fri, 25 Oct 2024 18:30:56 +0000 Zero is a number just like any other number https://lwn.net/Articles/995803/ https://lwn.net/Articles/995803/ khim <font class="QuotedText">&gt; If you overwrite your only pointer with the NULL return, you have leaked the original allocation.</font> <p>Sure, but if you want not to allocate memory, but to free it and that fails… what are the mitigations? At this point I would assume that allocator would just stop the program because it's probably the best response that it could do.</p> <p>And in some alternate reality where this behavior is mandated… and <code>realloc</code> was required to behave like that… and all C courses would have teached you to use that ability… yes, in such a world, use of <code>realloc</code> would have been justified.</p> <p>I would argue that <b>even in that world</b> it would have been bad design, but familiarity of the pattern would have made it justifiable.</p> <p>But inventing and pushing <b>new</b> convention like that in our world? That's just… I don't even know strong enough words to describe what I think about that idea.</p> Fri, 25 Oct 2024 18:27:24 +0000 Remember that realloc can fail https://lwn.net/Articles/995791/ https://lwn.net/Articles/995791/ NYKevin <div class="FormattedComment"> Of course, the counterpoint to my argument is that the developer can just do this:<br> <p> #define realloc_s(ptr, size) (size == 0 ? free(ptr), (void*)NULL : realloc(ptr, size))<br> <p> Now realloc_s is guaranteed to have the "correct" behavior for all pointers, regardless of what the implementation decides to do, and so one could argue that realloc() may as well do something else instead. But IMHO that way lies madness, because you could special-case any random stdlib function in this way, so where do we draw the line?<br> </div> Fri, 25 Oct 2024 17:22:46 +0000 Remember that realloc can fail https://lwn.net/Articles/995790/ https://lwn.net/Articles/995790/ NYKevin <div class="FormattedComment"> This is one of the reasons that realloc(..., 0) was specified as UB: It is not practical to write a portable program that correctly uses it, unless you're content to leak memory like a sieve on some platforms.<br> <p> IMHO if you're going to insist on allowing realloc(..., 0) at all, it should be an infallible operation. That leaves us with two general categories of behavior that make sense:<br> <p> 1. Return a pointer that can be passed to free() or realloc(), and may or may not be unique as the implementation sees fit, but probably can't be unique in the general case.<br> 2. Return NULL and free the pointer.<br> <p> Note that (2) is simply a special case of (1) where the pointer happens to be NULL - free(NULL) is specified as a no-op, and realloc(NULL, ...) is specified as equivalent to malloc(), so NULL is entirely equivalent to a zero-sized allocation of arbitrary type. As such, I see no point in bothering with (1) in the first place.<br> <p> If the developer wants a source of unique numbers, they can do that by atomically incrementing a global u64. There is no logical reason to use malloc() or realloc() for such a purpose, since those functions take actual locks and are much more expensive. If the developer wants a lot of numbers and is concerned that a u64 will wrap, then they can go to the bother of using UUIDs (seeing as 2^64 is unbelievably vast, I very much doubt the average program really needs to do this, but it's obviously the correct approach in such cases).<br> </div> Fri, 25 Oct 2024 17:13:17 +0000 Ignoring undefined behaviour https://lwn.net/Articles/995775/ https://lwn.net/Articles/995775/ excors <div class="FormattedComment"> <span class="QuotedText">&gt; I suspect that is the real reason they did decide that it is undefined behaviour.</span><br> <p> I think that's more than just a suspicion: the C23 change proposal at <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf">https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf</a> says:<br> <p> <span class="QuotedText">&gt; Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please.</span><br> <p> so it was explicitly intended for POSIX to continue to require specific behaviour, and code that only needs portability across POSIX systems (not any arbitrary ISO C conforming implementation) could continue to rely on that. It wasn't intended to break compatibility with existing code, it was just an editorial change to clean up their previous attempt to specify it as implementation-defined behaviour which turned out to be ambiguous and internally inconsistent.<br> </div> Fri, 25 Oct 2024 14:25:53 +0000