LWN: Comments on "Quotes of the week" https://lwn.net/Articles/856822/ This is a special feed containing comments posted to the individual LWN article titled "Quotes of the week". en-us Thu, 11 Sep 2025 02:42:21 +0000 Thu, 11 Sep 2025 02:42:21 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Quotes of the week https://lwn.net/Articles/858485/ https://lwn.net/Articles/858485/ excors <div class="FormattedComment"> <font class="QuotedText">&gt; Anyway, it works in both compilers when the executable is not position-independent, which is fairly common in the embedded systems where this sort of trick is most commonly used. When you are referring to objects at fixed memory locations you usually don&#x27;t want your code to be moved around in memory arbitrarily.</font><br> <p> One exception is when you&#x27;re doing dual-slot firmware upgrades, which seems fairly common in embedded devices (e.g. <a href="https://mcuboot.com/design.html">https://mcuboot.com/design.html</a> in direct-xip mode). Boot the application from partition 0, download an updated application image into partition 1, boot from partition 1; and if the new application fails then the bootloader can revert to partition 0, else continue running from partition 1 until the next update (which gets installed into partition 0).<br> <p> In that case it&#x27;s pretty helpful to have position-independent code that can run the same from either partition. (Otherwise you have to compile the application twice at different base addresses, and figure out which one to download based on which slot is currently in use, and if you&#x27;re &#x27;downloading&#x27; directly from a second chip (instead of from the internet) then it doubles the flash usage on the second chip, etc, so that&#x27;s a pain). But only references to the .text and .rodata sections should be PC-relative - any references to flash outside of the application image (e.g. if you have a filesystem that&#x27;s persistent across firmware upgrades), and any references to RAM or to hardware registers, need to remain PC-independent.<br> <p> It appears that&#x27;s impossible in GCC (<a href="https://answers.launchpad.net/gcc-arm-embedded/+question/585437">https://answers.launchpad.net/gcc-arm-embedded/+question/...</a>), though apparently it&#x27;s supported by -fropi in Clang on ARM (only).<br> <p> It&#x27;s a shame that C (the language and the compilers) has such inadequate and inconsistent support for relatively basic systems programming stuff like this.<br> </div> Sun, 06 Jun 2021 08:42:39 +0000 Quotes of the week https://lwn.net/Articles/858477/ https://lwn.net/Articles/858477/ nybble41 <div class="FormattedComment"> <font class="QuotedText">&gt; Your example works because you don&#x27;t compile the executable PIE.</font><br> <p> I stand corrected. I switched from GCC to Clang to use the -mpie-copy-relocations option, which seemed like it was intended to address this issue, but didn&#x27;t realize that GCC and Clang have different defaults for the PIE settings and attributed the different result to the option rather than the compiler. Anyway, it works in both compilers when the executable is not position-independent, which is fairly common in the embedded systems where this sort of trick is most commonly used. When you are referring to objects at fixed memory locations you usually don&#x27;t want your code to be moved around in memory arbitrarily.<br> <p> One thing that did work was linking the entire program as a shared library (including main()) with a trivial front-end that just links the shared library, the CRT entry point, and the script defining the absolute symbols. The symbol is undefined in the library, which I suppose forces the access to occur via the GOT and prevents &quot;optimization&quot; by the linker. ASLR was enabled for both parts (confirmed with gdb with &quot;set disable-randomization off&quot;) and it still printed the correct address for &amp;SPECIAL_OBJECT. However, this requires the program to be split into two separate files.<br> <p> <font class="QuotedText">&gt; …but then the issue is that the linker will optimise the SPECIAL_OBJECT reference from a GOT load into &quot;lea SPECIAL_OBJECT(%rip),...&quot;…</font><br> <p> Frankly this seems like a bug to me. The linker knows that the symbol has an absolute address since it&#x27;s in the special SHN_ABS section, which is defined to be non-relocatable, and ought to refrain from rewriting a valid GOT load into something that can never give the correct address. There should at least be a setting to disable this &quot;optimization&quot;, but I wasn&#x27;t able to find any mention of it in the documentation.<br> </div> Sun, 06 Jun 2021 03:06:48 +0000 Quotes of the week https://lwn.net/Articles/858366/ https://lwn.net/Articles/858366/ Vipketsh <div class="FormattedComment"> I don&#x27;t think you can do that with pie. Your example works because you don&#x27;t compile the executable PIE. Try this:<br> <p> $ clang -mpie-copy-relocations -o ldtest ldtest.c ldtest.ld -fpie -pie<br> $ ./ldtest<br> &amp;SPECIAL_OBJECT = 0x55cb19531000<br> <p> As you say the compiler assumes that SPECIAL_OBJECT will be a normal data variable that can be referenced pc-relative.<br> You can get the compiler to do a semi-sane thing that would work by getting it to generate a GOT load to get the symbol value by using the -fpic option, but then the issue is that the linker will optimise the SPECIAL_OBJECT reference from a GOT load into &quot;lea SPECIAL_OBJECT(%rip),...&quot; since SPECIAL_OBJECT is guaranteed to bind locally. Few, if any, architectures other than x86 do these sort of optimisations in the linker.<br> <p> Since all this is due to the deep dark voodoo of linking gcc behaves exactly the same.<br> </div> Fri, 04 Jun 2021 17:32:01 +0000 Quotes of the week https://lwn.net/Articles/858334/ https://lwn.net/Articles/858334/ nybble41 <div class="FormattedComment"> <font class="QuotedText">&gt; [excors] I think (intptr_t)(T*)0 is implementation-defined, you won&#x27;t necessarily get 0 back.</font><br> <p> This is correct. (_Bool)(T*)0 is guaranteed to evaluate to 0 due to special rules for _Bool, but all other integer conversions are implementation-defined.<br> <p> <font class="QuotedText">&gt; [excors] But (T*)(intptr_t)(T*)0 will give you a null pointer.</font><br> <p> Yes, because (T*)0 is a null pointer and conversion to (only) intptr_t or uintptr_t and back without modification is defined to give back the original pointer. <br> <p> <font class="QuotedText">&gt;&gt; [mpr22] Does the standard require that a pointer set to NULL contains the same bit pattern as an intptr_t set to 0, or just that if you compare a pointer set to NULL to an intptr_t set to zero, they are equal?</font><br> <p> <font class="QuotedText">&gt; [excors] You can&#x27;t directly compare a pointer and an intptr_t, you have to convert them both to pointers or both to arithmetic types, in which case it depends on those conversion rules.</font><br> <p> The result would be implementation-defined since &quot;an intptr_t set to zero&quot; is not &quot;integer constant expression with the value 0&quot; and thus does not qualify for the special treatment of (T*)0 being defined as a null pointer. In other words, &quot;T *p = NULL; p == (T*)(intptr_t)0&quot; should evaluate to 1, but &quot;T *p = NULL; intptr_t n = 0; p == n&quot; is implementation-defined since &quot;n&quot;, as a variable rather than a constant expression, does not necessarily convert to a null pointer even if its integer value is 0 at runtime.<br> <p> <font class="QuotedText">&gt; …everyone just uses e.g. &quot;(void *)0x12340000&quot; to create a pointer to memory at 0x12340000…</font><br> <p> Another option, relying on the ABI rather than integer-to-pointer conversions, is to declare the memory as an external object and define the address in a linker script. A useful trick here is that you can pass plain text files with an unrecognized extension to GCC or Clang (or ld/lld) and they will be treated as supplementary linker scripts, which allows for an easy way to define extra symbols:<br> <p> $ cat &gt; ldtest.ld<br> SPECIAL_OBJECT = 0x12340000;<br> <p> $ cat &gt; ldtest.c<br> #include &lt;stdint.h&gt;<br> #include &lt;stdio.h&gt;<br> extern volatile uint32_t SPECIAL_OBJECT;<br> int main(void) { printf(&quot;&amp;SPECIAL_OBJECT = %p\n&quot;, (void*)&amp;SPECIAL_OBJECT); return 0; }<br> <p> $ clang -mpie-copy-relocations -o ldtest ldtest.c ldtest.ld<br> <p> $ ./ldtest<br> &amp;SPECIAL_OBJECT = 0x12340000<br> <p> Assuming the address is valid, SPECIAL_OBJECT can be treated as an ordinary variable and used in expressions, assignments, etc.<br> <p> The Clang-specific -mpie-copy-relocations option is needed when compiling position-independent executables (for ASLR) to make the compiler generate correct code for accessing an absolute, non-relocatable external symbol; otherwise an offset would be added to the address of SPECIAL_OBJECT at runtime under the assumption that it is relative to the address of the executable code. If using GCC you can disable PIE mode with &quot;gcc -fno-pie -no-pie&quot; to get the correct result, but this will also disable ASLR.<br> </div> Fri, 04 Jun 2021 16:27:51 +0000 Quotes of the week https://lwn.net/Articles/858276/ https://lwn.net/Articles/858276/ khim <font class="QuotedText">&gt; (Alternatively: Does the standard require that a pointer set to NULL contains the same bit pattern as an intptr_t set to 0, or just that if you compare a pointer set to NULL to an intptr_t set to zero, they are equal?)</font> <p>But neither of your examples looked on bit pattern! And yes, standard guarantees that both would print 0.</p> <p>What is <b>doesn't</b> guarantee is that the following would print 0: <pre> void * myptr = NULL; intptr_t myintptr; memcpy(&amp;myintptr, &amp;myintptr, sizeof(myptr)); printf("%ld", (long) myintptr); </pre> <p>This makes it possible to implement NULL as “shifted pointer” (where it contains not address of memory, but, e.g., address of memory with one bit flipped). But this is rarely a good idea since then accessing memory becomes problematic. Especially on underpowered microcontrollers.</p> Fri, 04 Jun 2021 08:58:40 +0000 Quotes of the week https://lwn.net/Articles/858214/ https://lwn.net/Articles/858214/ excors <div class="FormattedComment"> I believe your first example is an aliasing violation (therefore undefined behaviour). If you fixed it by doing &quot;memcpy(&amp;myintptr, &amp;myptr, sizeof(void*));&quot;, then the byte representation of pointers is unspecified so you could get anything.<br> <p> What C99 specifically requires is that (T*)0 is a null pointer, which is equal to every other null pointer (of any type), and unequal to a pointer to any object or function. I think (intptr_t)(T*)0 is implementation-defined, you won&#x27;t necessarily get 0 back. But (T*)(intptr_t)(T*)0 will give you a null pointer.<br> <p> (I think I was wrong earlier when I said they &quot;convert to 0&quot;. They only convert *from* 0.)<br> <p> You can&#x27;t directly compare a pointer and an intptr_t, you have to convert them both to pointers or both to arithmetic types, in which case it depends on those conversion rules.<br> <p> So technically you could have a compiler which has different representations for null pointers vs pointers to memory at 0x00000000. And there&#x27;s no standard way to construct a pointer to memory at a fixed location, that&#x27;s always an implementation-defined thing, so the implementation could provide a way that&#x27;s different to how you construct a null pointer.<br> <p> But I&#x27;m not aware of any mainstream compilers/architectures that do that - everyone just uses e.g. &quot;(void *)0x12340000&quot; to create a pointer to memory at 0x12340000, and if you do the same for 0x0 then you&#x27;ll get a null pointer.<br> </div> Thu, 03 Jun 2021 19:03:34 +0000 Quotes of the week https://lwn.net/Articles/858189/ https://lwn.net/Articles/858189/ khim <font class="QuotedText">&gt; Would you expect <pre> int *p = malloc(sizeof(int)); free(p); int *q = malloc(sizeof(int)); if (p == q) { *p = 1; *q = 2; printf("%d %d\n", *p, *q); } </pre> to have well-defined behaviour?</font> <p>Good eye! You have caught the very important mistake: I have used pointer which I wasn't supposed to use. The correct way to write that code would be this: <pre> int *p = malloc(sizeof(int)); free(p); int *q = malloc(sizeof(int)); if (memcmp(&amp;p, &amp;q, sizeof(p)) == 0) { *p = 1; *q = 2; printf("%d %d\n", *p, *q); } </pre> <p>Unfortunately that doesn't change anything: you still get the same “1 2” answer <a href="https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAM1QDsCBlZAQwBtMQBGAFlJvoCqAZ0wAFAB4gA5AAYppAFZdSrZrVAB9LclIj2yAnjqVMtdAGFUrAK4BbWstPoAMnlqYAcnYBGmYiABOUgAHVCFCI1pLG3tlUPDDOld3L1tffyC9TANIhgJmYgJouwdOXUx9RNo8goJkzx8/EAAOXXzC4tiyoXa6twa0puaASl1Ua2JkDikAUgAmAGY3ZBssAGoZhfMe/FQAOgRN7BmZAEF5pdoV63XN7YJ0VjxvA6OT88Xl1cwNrZ7iNzAV4LY5nd5uAhrWzMNwQYYbADsACF3ms0WsIWsAFTBX4AETWEAhWOG0NYrFQyAg4QAXphUFQifRhsNNiizuiMfRsQBHfGE4nDYiYNgUqnBUhrWn0xkQlls1HovBUQm2TC2ZC2YIQeYANglGzmup5kulDIgwRZ%2BM2BJk8JmyMVnLROP5nAVHOdLr5NrWcw9py9aOCAPojPm/rmAFZ0IaYzMo%2BZaBHJTjUzzWQt2YH0Q68e88%2B8pKNWNIo/IHLJ5KhpOYpeNJj8Lpx5ARpHIWaQANZcGQyFTSbjyWwgKP9ytyUg1qTyIQgfttqujOCwGCIFCoLV4dhkCgQNBbncoFZqYCcOZ9vjbgh%2BOcQbzt%2BTeNwFACe0hbpAPavoAHlaKw75VqQWDQuo7CPiBeDCjkABumBzsBmDiNk1g3pBEIVB%2B8hPN4xBvpYWDYaQBAAiOUgtqM/CMCwEE8HwdAEMIYiSMBShlKo6ggFoGg6Lhc6QKMqDBFUiEALS/nMaxiTsNrIM0ATScEmDoGohjILOFTZFUJhmJ0pSkE49SpOkcRhBEdD6WZCSRMZjT%2BN0Wk5HQNQdFYJTKFkznVL0dmDA5bS1FZ3S%2Bf0JlNJwoxCA2UxcCWZYVpB07iM0upibq3BrCe6hrOeewyPlhK4IQJCGgsZRrJYh5%2BGVkWVa2j6dggIpYP4cLdiAF4DlIQ6kCOY6kBO1bSLO84kY1pAruu4wEMEaHkJQB7BNuEWGfgRABdRTBsBw9HUcxEiQQA7vhwTYfFUjloNSXSM2axHYQCBrClaUZVlnHALlcz5flDVLqMzXMK1lCjD2XWlj1iXAdOo0LhNU0QEgS0rbui2bstR7IHgyDIBecycOeV6sDexB3g%2BwHPrQb7Ed%2BpgEP%2BgGQaBp4QcB%2BAwYY8GIZOyGoehwGYRDn64fhxCvoR0yfqReDkZRDH0NtdG8PtIiHWxnUqKe3HaCozwCe1wmidIElSTJDxyQpSkqWp2OaZUkS6RY7ldIZZh%2BaZZTxBZUTOwZXtVO7q1eVUrlFL7nlOSHoUpPZnm9MFgWFIHDlRTFu0XVdQ1TtIL3pZl2PIH6Mj43snB7FJEDFRttWSlVGM1RcczwuYf0dqDnX9hDvVZzDuhjYuHaTWuiMgDNc0EAt%2B7oyjGsqSVm2MYru3K4xB2sZOJ3MGdFHFt1mc3bMixSQ9BBPbnb0F0XJdl3MrfFu34ODlDk693OcNLkP8Aj8jO6Tz/TQ41YJwAIGgFhzCJiTMmkFKbUx3l%2BTcP56YASApOZm4FpioOgtpPAXNIK82QGhDB8hBbERFgRDAktWxkXOvLGiO0uAr0EKrde8glALE1lxHifE9bwCEiJSIc4AD0klyj22MBAJwCcjJhVjp7cyVQE7%2B1sjI/yEcxE%2BSCuHRy6jQ7JzjpomIBkei1D0ZFMYExYpmIhvvaGOdUpQiELBQusFgGlxkISAA6gASQ8NgeEld1qlWbLXaeO4yoLGbnfTsYNO5P2urYmcfd35t26rfPqo5xwHyiRdBYz9hqJIHvfUg8FSYO24EAA%3D%3D%3D">from the usual culprits</a>.</p> <font class="QuotedText">&gt; ...except if the second malloc returns the same "space" as the first malloc, is 'p' still indeterminate now that the space it was referring to is no longer free? Maybe the 'p == q' is okay (though the '*p' is still undefined because it lost its ability to access the object after the space was first freed). </font> <p>Actually it's the other way around: <code>p == q</code> is not Ok (but you can use <code>memcmp</code> instead), while <code>*p</code> is fine (if you used <code>memcmp</code>). It wouldn't be fine only if you introduce “pointers provenance”. Which is not part of any existing C and/or C++ standard and most definitely not part of C89 (as I have noted elsewhere C++ have <b>optional</b> feature similar to “pointers provenance”, but only in “strict pointer safety” mode which none of the existing compilers support). Otherwise the only way for two different pointers to be equal is to have them point to the same object or, as special corner-case, to have one of them point to one-past-the-end-of-an-array (and there are no arrays in this example). <p>C committee was actually asked about more-or-less that issue (what should happen when one reads value of a pointer passed to <code>free</code>) and the result: <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm">after much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean</a>. Note: they <b>haven't said</b> that standard makes it invalid <b>now</b>, no. They said that the fact that this program is well-defined is a problem and <b>standard needs to be changed</b>. And make currently valid programs invalid.</p> <font class="QuotedText">&gt; I don't think it's fair to blame C99 for changing the semantics here - C89 seems very unclear and ambiguous, and C99 was the first time it was actually specified semi-properly (by having a more precise definition of the lifetime of an object with the term "deallocated" (instead of "freed or reallocated"), and saying realloc() always "deallocates" the old object).</font> <p>I wouldn't call that “semi-properly”. The history here is the following: when C89 was developed there was <b>already</b> a strong push from compiler writers about the need to <s>screw the programmer</s> enable optimizations. And nice (for a compiler writers) scheme was added to the standard draft. Unfortunately it turned up almost impossible to use it for writing real program (as <a href="https://groups.google.com/g/comp.lang.c/c/K0Cz2s9il3E/m/YDyo_xaRG5kJ">Dennis Ritchie noticed</a>). And language which you couldn't use to actually write programs in is not very useful. Thus C committee ripped out the most egregious parts and only left some small remnants of that attempt in C89.</p> <p>In C99, C++11 and all subsequent standards these attempts were repeated. It's unclear how many useful optimizations these attempts enabled, but they made it almost impossible to write a correct programs in C/C++. Sooner or later you end up with some kind of security check which compiler would happily remove to “optimize” your code.</p> <p>This quest is not yet finished, we <b>still</b> have no adequate set of rules there (<a href="https://www.ralfj.de/blog/2020/12/14/provenance.html">apparently transformations which are derived from these rules and used in LLVM can easily turn correct program into incorrect one and that doesn't usually happen simply because these transformation are applied in certain order</a>… that would have been really hilarious it weren't so sad), yet programmers were supposed to write programs, back in 1990, which adhere to rules which would be finalized, maybe (there is hope, but no promises right now) around 2030. Does <b>that</b> sound realistic to you?</p> Thu, 03 Jun 2021 18:56:53 +0000 Quotes of the week https://lwn.net/Articles/858200/ https://lwn.net/Articles/858200/ mpr22 <P>Does the standard require that </P> <p><tt>void * myptr = NULL;<br/> intptr_t myintptr = *(intptr_t *) &amp;myptr;<br/> printf("%ld", (long) myintptr);</tt></p> <p>prints "0", or just that</p> <tt>void *myptr = NULL;<br/> if(myptr == 0) { printf("myptr is equal to zero.\n"); }</tt> <p>prints "myptr is equal to zero"?</p> <p>(Alternatively: Does the standard require that a pointer set to NULL contains the same bit pattern as an intptr_t set to 0, or just that if you compare a pointer set to NULL to an intptr_t set to zero, they are equal?)</p> Thu, 03 Jun 2021 17:34:30 +0000 Quotes of the week https://lwn.net/Articles/858190/ https://lwn.net/Articles/858190/ excors <div class="FormattedComment"> <font class="QuotedText">&gt; My favorite example: dereferencing a NULL pointer is undefined behaviour. It can not be described in any other way since there exist many microcontrollers without an MMU where address 0 is mapped to some memory which may contain a data structure and the behaviour of overwriting some unknown state can not be defined.</font><br> <p> Somewhat related to that, I once found a vulnerability in some microcontroller bootloader code. (Well, I found several, but only one is relevant). It tried to compute the hash of the application image that was stored in flash, then verify the signature of that hash, before booting the application. The hashing function returned a pointer to a buffer containing the computed hash - but on error (e.g. invalid fields in a header structure) it returned NULL. The rest of the code didn&#x27;t bother checking for NULL, so it happily verified the signature over the &#x27;hash&#x27; stored at location 0x00000000 (which was simply the start of flash), then booted the application. An attacker could extract the hash and signature from a valid signed application, write the hash to 0x00000000, copy the signature, and then write an arbitrary unsigned application image with an invalid header (to trigger the NULL return) and the bootloader would think it&#x27;s valid and boot it.<br> <p> Null pointers are particularly dangerous on systems like that.<br> <p> On the other hand, sometimes those microcontrollers *want* to access the data stored at location 0x0. I&#x27;m not sure that&#x27;s technically allowed by C - if I&#x27;m reading it right, null pointers are required to be distinct from any pointer to an actual object, but also they&#x27;re required to convert to 0, which implies you can&#x27;t have an object in memory at 0x0. That&#x27;s not ideal when your hardware is storing useful information there but you can&#x27;t trust your C compiler to let you read it.<br> </div> Thu, 03 Jun 2021 17:09:39 +0000 Quotes of the week https://lwn.net/Articles/858186/ https://lwn.net/Articles/858186/ excors <p>Would you expect</p> <pre>int *p = malloc(sizeof(int)); free(p); int *q = malloc(sizeof(int)); if (p == q) { *p = 1; *q = 2; printf("%d %d\n", *p, *q); }</pre> <p>to have well-defined behaviour?</p> <p>I'd expect no - 'p' and 'q' are intuitively pointers to different objects, regardless of whether they happen to be numerically equal, and one of those objects is being accessed after it was freed.</p> <p>C89 (or at least a version I can find online) says "The pointer returned [by calloc/malloc/realloc] [...] may be assigned to a pointer to any type of object and then used to access such an object in the space allocated (until the space is explicitly freed or reallocated). [...] The value of a pointer that refers to freed space is indeterminate". So I think that largely matches my expectation - the value of 'p' is indeterminate after the 'free', so the 'p == q' is undefined behaviour, and the '*p' is undefined behaviour (because the space has been explicitly freed so 'p' can't be used to access the object any more).</p> <p>...except if the second malloc returns the same "space" as the first malloc, is 'p' still indeterminate now that the space it was referring to is no longer free? Maybe the 'p == q' is okay (though the '*p' is still undefined because it lost its ability to access the object after the space was first freed).</p> <p>Anyway, when you replace the free+malloc with realloc, it sounds like you expect that to be well-defined behaviour in C89?</p> <p>I think the problem is ambiguity with terms like "object" and "space" and "reallocated". C89 indicates realloc() returns the same object with a new size (though in other places it seems to directly contradict that and says it's a new object). But can that be the same object in the same space as the original malloc(), or different space, or logically different space but the pointers might happen to be equal? If you call realloc() with the object's current size, has it been reallocated for the purposes of "until the space is explicitly freed or reallocated"?</p> <p>I don't think it's fair to blame C99 for changing the semantics here - C89 seems very unclear and ambiguous, and C99 was the first time it was actually specified semi-properly (by having a more precise definition of the lifetime of an object with the term "deallocated" (instead of "freed or reallocated"), and saying realloc() always "deallocates" the old object).</p> Thu, 03 Jun 2021 16:30:04 +0000 Quotes of the week https://lwn.net/Articles/858181/ https://lwn.net/Articles/858181/ Vipketsh <div class="FormattedComment"> To take your print_digit(int i) analogy further, if all implementations print, say, &quot;?&quot; on i&gt;=10 and tons of code starts to rely on that, the de-facto standard is that print_digit(int i) prints &quot;?&quot; for i&gt;=10. It doesn&#x27;t matter at all what some document says. If the implementation is then changed to print &quot;HelloWorld!&quot; in the case of i&gt;=10 all the users who&#x27;s code now breaks are rightfully annoyed. No amount of lawyering about standards, documentation and such matters. The situation is not much different to C and undefined behaviours. As an example: for a long time the 2s complement wrapping of signed integers was the norm for compilers and code relied on this until at some point they started to lawyer about the standard and undefined behaviour.<br> <p> This isn&#x27;t a new situation either. It has happened countless times in the past and is occurring today: The linux user interface takes this stance (relied on behaviour is de-facto standard) and the same thing happened (to an extent) with the glibc memcpy and aliasing pointers.<br> <p> I don&#x27;t agree with khim&#x27;s stance that the only correct way forward is to go and explicitly change the standard. If the predominant implementations become sane instead of doing all the stupid things supposedly allowed by the standard, the standards body will have two options: either become irrelevant and let others define what C is or to throw out their garbage and document the de-facto behaviour, as has happened with HTML for example. It also works the other way too: if the predominant implementations implement crazy things while the standard becomes sane, it doesn&#x27;t actually help anyone.<br> <p> This whole dancing around undefined behaviour started because compiler writers forgot the reasons behind C being the way it is and started to ignore de-facto user expectations. Instead the compiler authors decided that if a program is not in 100% conformance with the strictest reading of the standard any insane output is valid. Yodaiken&#x27;s articles explain this pretty well.<br> <p> My favorite example: dereferencing a NULL pointer is undefined behaviour. It can not be described in any other way since there exist many microcontrollers without an MMU where address 0 is mapped to some memory which may contain a data structure and the behaviour of overwriting some unknown state can not be defined. Henceforth any definition for dereferencing a NULL pointer would require a compiler to insert if(&lt;is null pointer&gt;) do_something() before every single load and store on these machines. Clearly this is not reasonable. On the other hand, on anything with an MMU, the user expectation is that a NULL pointer dereference turns into a SIGSEGV. Instead of understanding this reason compiler writers decided that undefined behaviour in this case means that the compiler can do whatever it wants to the program after the dereference.<br> <p> <p> </div> Thu, 03 Jun 2021 16:08:34 +0000 Quotes of the week https://lwn.net/Articles/858178/ https://lwn.net/Articles/858178/ rschroev <div class="FormattedComment"> OK, thanks, that makes it clear what you&#x27;re talking about. I agree this is bad. <br> <p> And now I see that you made that clarification already elsewhere in the thread. Sorry, my bad.<br> </div> Thu, 03 Jun 2021 15:13:59 +0000 Quotes of the week https://lwn.net/Articles/858179/ https://lwn.net/Articles/858179/ khim <p>After reading comment from others I realized that I haven't actually explained what's the issue with that code.</p> <p>The issue here is not that <code>realloc</code> may return different pointer and then program prints nothing. Sure, it's allowed to do that according to C89, K&amp;R or any other standard.</p> <p>No, the issues here is that existing compilers <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:11,positionColumn:2,positionLineNumber:11,selectionStartColumn:2,selectionStartLineNumber:11,startColumn:2,startLineNumber:11),source:'%23include+%3Cstdio.h%3E%0A%23include+%3Cstdlib.h%3E%0Aint+main()+%7B%0A++++int+*p+%3D+(int*)malloc(sizeof(int))%3B%0A++++int+*q+%3D+(int*)realloc(p,+sizeof(int))%3B%0A++++if+(p+%3D%3D+q)+%7B%0A++++++++*p+%3D+1%3B%0A++++++++*q+%3D+2%3B%0A++++++++printf(%22%25d+%25d%5Cn%22,+*p,+*q)%3B%0A++++%7D%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:100,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang1200,filters:(b:'0',binary:'0',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'0'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O2+-std%3Dc89',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+12.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:49.99999999999999,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+clang+12.0.0',t:'0')),k:49.99999999999999,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:50,n:'0',o:'',t:'0')),l:'3',n:'0',o:'',t:'0')),version:4">make it print 1 2 instead of 2 2</a> — and <b>that</b> may never happen according to C89.</p> Thu, 03 Jun 2021 15:00:05 +0000 Quotes of the week https://lwn.net/Articles/858159/ https://lwn.net/Articles/858159/ khim <font class="QuotedText">&gt; I'm asking because I'm not sure how to interpret 'the only possible output is "2 2"'.</font> <p>Ah, sorry. I actually forgot about the fact that <code>realloc</code> can, according to the standard, actually return a different address here. In practice all existing implementations return the same one.</p> <p>Yes, correctly compiled program may return empty output here if, e.g. you <code>realloc</code> just always calls <code>malloc</code> and copies the content. That's not an issue.</p> <p>The issue is: <b>actual output</b> (that is: <code>1 2</code>) is clearly invalid.</p> <font class="QuotedText">&gt; I agree, the program looks legal to me and should output either nothing or "2 2\n". Are you saying that behavior is under threat somehow?</font> <p>Clang, MSVC and ICC <a href="https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAM1QDsCBlZAQwBtMQBGAFlJvoCqAZ0wAFAB4gA5AAYppAFZdSrZrVAB9LclIj2yAnjqVMtdAGFUrAK4BbWstPoAMnlqYAcnYBGmYlwBmUgAHVCFCI1pLG3tlUPDDOld3L1tff04gvUwDSIYCZmICaLsHTl1MfUTafMKCZM8fP0DdAqKS2PKhNvq3RrTmzIBKXVRrYmQOKQBSACYAt2QbLABqaYDzbvxUADoEdexpmQBBOYXaJetV9c2CdFY8bz2Do9OTtwIV22Y3CCG1gDsACFXiswSsPisAFTBNYBAAiKwgHyhQ2%2BrFYqGQEHCAC9MKgqMj6EMhusQSdwRD6NCAI5wxHEgio4iYNiY7HBUgrPEEokfUnk0HgvBUJGw9bwyUrWn/abA4VUsEwhkrThCylK5X06WzDXHLVg4LED5EuZ62YAVnQayt6GmlvMtHN3Jhrtl%2Bqp8qlJ29rykI1Y0kt8gcsnkqGk5h5YwmmFtAU48gI0jkpNIAGsuDIZCppNx5LYQJbc2G5KRI1J5EIQLmU%2BGRnBYDBEChULZgnh2GQKBA0B2u81kEs1MBOLMc3wuwQ/DWIN5U/JvG5CgBPaRJ0j92ymAgAeVorHX4dIWG%2B6nYi9PeFZuQAbpgayfMOIctYZ1ePpUN/IHt5iGulhYD%2BpAECaRZSEmIz8IwLCXjwfB0AQwhiJIJ5KOUqjqCAWgaDof41pAIyoME1RPgAtHuswrORWySsgAAcACcNHBJg6BqIYyDVpUOTVCYZgdGUpBOA0qTpHEYQRHQQmSQkkRiU0GQVFUeQ9LJXS8bkdC1EUikDMp3R1BprR1PpEmcCMQixpMXCBsGoZXpW4gMQAbORrncCsI7qGqsw7DIAVIrghAkAm5QrJYA7duF/zmMmi7pggbJYP4fyZiAE55lIBakEWJakGWEbSNWtagYlpBNq2YwEME77kJQ/adt2jj4EQykwUwbAcAhMEoRIV4AO4AcEP72VIIaFU50hnJwKyDYQCArC57med5WHAH5AUBQlDYjMlzCpZQIxZllQY5Y5J6VqVdYVVVEBIE1g49o17bNUOeDDhOsycOOU6sDOxBzguJ7LrQa4gduu4HkeV5nqOl4nvgt6GA%2BT7li%2Bb4fieX7nZuf4AcQq5AVMm5gXgEFQYh9BdfBvB9SIA3oZlKijjh2gqI8hHpSRZHSJR1G0Xc9HMax7GcZ9PGqcYEBOCZol9OJgwhFJ1QmfE0m0OZyvZNpNTqVYpTKLr1S6b0KRKcbBsxMJRl6YrluWaM4y2U752TUVFbSCtHleZ9yArN9nA7MH1EQCF7XhdyUXvcQCazHFu1pidmW5uduWe9duhlfWaaVS2D0gDVdUEA1fZvc9LPsaFHVIbTPX00h/VoeWw3MKNkEBtlHvTTM8zUQtBBLT7a3%2B4HMg/SHOyzEnAYp2d%2BaXeWWc1rdDb5/AhdPd2Zfb0OSycExGgBLM/2A8DV5gxDndbu2O70DDx7lvDF5TM/N58XgaNXpjyDvm/8hcYgQJoBDApNkzgTGtTWC3UuCN0EIzFu8glBBA2uzPCnNvDc2IqRSINYAD0VEVKfxlnLQ2nQRJmG1spDWatyHCVoQpB2BljZaVNtbI2mlpb6zMswiypl2j0Ktrwi2LCnbWRdj1caPcrrezcl8IQd4A53kPiHGQSIADqABJDw2B/jhzamFWa0cK4xTOAERO5U9oZQXhdKasiqzZzXsnbKM88rFlLL3We6ZzoBCXsVRxuc56kAfEDSIIBuBAA%3D%3D%3D">all produce "1 2\n" output</a> and they all claim that they can do that because <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf">they plan to add new set of undefined hehaviors</a> to the C2x standard.</p> <p>Once again: they miscompile perfectly valid C89 program in a strict C89 mode <b>today</b> and claim that it's fine because it, apparently, violates set of rules which they <b>plan to add</b> to C2x (and which is not yet even finalized).</p> <p>And the explicitly <b>refuse</b> to provide any flags which can make it work (although <a href="https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAM1QDsCBlZAQwBtMQBGAFlJvoCqAZ0wAFAB4gA5AAYppAFZdSrZrVAB9LclIj2yAnjqVMtdAGFUrAK4BbWstPoAMnlqYAcnYBGmYlwBmUgAHVCFCI1pLG3tlUPDDOld3L1tff04gvUwDSIYCZmICaLsHTl1MfUTafMKCZM8fP0DdAqKS2PKhNvq3RrTmzIBKXVRrYmQOKQBSACYAt2QbLABqaYDzbvxUADoEdexpmQBBOYXaJetV9c2CdFY8bz2Do9OTtwIV22Y3CCG1gDsACFXiswSsPisAFTBNYBAAiKwgHyhQ2%2BrFYqGQEHCAC9MKgqMj6EMhusQSdwRD6NCAI5wxHEgio4iYNiY7HBUgrPEEokfUnk0HgvBUJGw9bwyUrWn/abA4VUsEwhkrThCylK5X06WzDXHLVg4LED5EuZ62YAVnQayt6GmlvMtHN3Jhrtl%2Bqp8qlJ29rykI1Y0kt8gcsnkqGk5h5YwmmFtAU48gI0jkpNIAGsuDIZCppNx5LYQJbc2G5KRI1J5EIQLmU%2BGRnBYDBEChULZgnh2GQKBA0B2u81kEs1MBOLMc3wuwQ/DWIN5U/JvG5CgBPaRJ0j92ymAgAeVorHX4dIWG%2B6nYi9PeFZuQAbpgayfMOIctYZ1ePpUN/IHt5iGulhYD%2BpAECaRZSEmIz8IwLCXjwfB0AQwhiJIJ5KOUqjqCAWgaDof41pAIyoME1RPgAtHuswrORWySsgAAcACcNHBJg6BqIYyA0VQtCoOR3jWNObjkay7JYtWlQ5NUJhmB0ZSkE4DSpOkcRhBEdDyWpCSRMpTQZBUVR5D0WldFJuR0LURR6QMBndHUpmtHUNmqZwIxCLGkxcIGwahlelbiAxABs5FBdwKwjuoaqzDsMixUiuCECQCblCslgDt2KX/OYyaLumCBslg/h/JmIBWnmUgFqQRYlqQZYRtI1a1qBeWkE2rZjAQwTvuQlD9p23aOPgRAGTBTBsBwCEwShEhXgA7gBwQ/j5UghnV/nSGcnArHNhAICsgUhWFEVYcA0WxbFuUNiMBXMEVlAjFm5VBpVfknpWTV1q17UQEg/WDj2fXtgNQ54MOE6zJw45TqwM7EHOC4nsutBriB267geR5Xmeo6Xie%2BC3oYD5PuWL5vh%2BJ5fi9m5/gBxCrkBUybmBeAQVBiH0ON8G8NNIizehZUqKOOHaCojyESVJFkdIlHUbRdz0cxrHsZxYM8XxAlCbDIliRiEmGdJkSyRYVilI4ZguYMITqdUjnxBptCW3Z5nVFZxSm50BsWTUPRO8o9ntB7CkB70KT6d5ozjF5bkVWt9UVtIh2heFYPcRDnA7Bn1EQIlI0pdy6Ug8QCazNlV1po9ZWWhVVXxx9ujNfWaZtS2v0gJ13UEL1fbAwDgvsUlo1IVzk080hM1oeWC3MEtkEBrHb3lpWZzUbtBD7Unx2pys6eZzsszlwGlfPfmi8NVWDdfddFUH9VxalhtF9N0fFUBGfCdP3lIwPvDRvcEAA%3D%3D">-fno-builtin-realloc</a> works, but it's extremely unituitive and non-obvious Thu, 03 Jun 2021 14:39:42 +0000 Quotes of the week https://lwn.net/Articles/858153/ https://lwn.net/Articles/858153/ rschroev <div class="FormattedComment"> <font class="QuotedText">&gt; Note that C89 quite explicitly allows that code and says the only possible output is “2 2”. This is because realloc there does the following: The realloc function changes the size of the object pointed to by ptr to the size specified by size. It&#x27;s still the same object, both pointers point to it, so why would they behave differently?</font><br> <p> Do you mean that the example program should always output &quot;2 2\n&quot;? Or do you mean that it should either generate output or not, and if it does output something it should output &quot;2 2\n&quot;? I&#x27;m asking because I&#x27;m not sure how to interpret &#x27;the only possible output is &quot;2 2&quot;&#x27;.<br> <p> In the first case:<br> <p> I think you&#x27;re wrong: C89 doesn&#x27;t guarantee that p == q. The realloc function can legally move the object, even in C89. Look what it says about realloc&#x27;s return value (at least in the draft of the standard -- I don&#x27;t have access to the officially released version): <br> <p> <font class="QuotedText">&gt; The realloc function returns either a null pointer or a pointer to the possibly moved allocated space.</font><br> <p> In the second case:<br> <p> I agree, the program looks legal to me and should output either nothing or &quot;2 2\n&quot;. Are you saying that behavior is under threat somehow?<br> </div> Thu, 03 Jun 2021 14:16:20 +0000 Quotes of the week https://lwn.net/Articles/858088/ https://lwn.net/Articles/858088/ khim <p>According to <b>all</b> existing C standards it's well-defined program. If you don't subscribe to the ridiculous notion that somehow <code>realloc</code> turns existing pointer to existing object (which is not even part of any array) pointer one-past-the-end of array (specifically invented to <s>screw the programmer</s> optimize code better). In fact in that same <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf">provenance proposal</a> this is noted quite explicitly (page 17, “pointer equality comparison and provenance” where they talk about how that part should be changed <b>to make existing programs which are fully standard-compliant invalid</b> — all to make language “better”, of course).</p> <p>Situation with C++ is a bit more complicated. C++ have the notion of “pointer safety” which <a href="http://eel.is/c++draft/basic.stc.dynamic.safety">could have invalidated that program</a> on some compilers… except all existing compilers <a href="https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAM1QDsCBlZAQwBtMQBGAFlJvoCqAZ0wAFAB4gA5AAYppAFZdSrZrVDIApACYAQjt2kR7ZATx1KmWugDCqVgFcAtrWVX0AGTy1MAOWcARpjEILwADqhChOa0do4uyhFRZnRePv5OQSG8xpimMQwEzMQEcc6unEaYJim0hcUEaX6BwaFGRSVlCZVCHY3ezZmt3ACURqgOxMgcUjoAzN7IjlgA1JpzNua9xJjMTuvYmjIAgvOLy5hrG06YTiQAngdHx2sn3gQrTszeECPPmgB2fQnFagla9dAgFATD7rGxwtbabSiVDvYLg5hUTAEe4gRHadbAl5goQAd0IyAQKwgEKhwGxAH0ImjiAyhJjsfdfiM1kDnmCBSsWCJwQRISBmfRgmyOTiods8KY8SSxVC0A5YRsETptArTJoAKw2Wg6wkrAI7ZgAa0J/MFoOFl1pEtRUtZ7KxcolOyxxB2kNF4vVmvhG3xYR9wX9huNprmunNlpt8bt9sdgahkoI0o9nKhO1U4kwAdBzuDV1DNnxBeYRfQMZNSLNFt2yaJAsBABF/gDO1IxqxpAb5K5ZPJUNJ4XoDOCJlNLvNOPICNI5CMxlauDIZCppNx5E4QAad6O5KQJ1J5EIQDuV2OxnBYDBENCnGE8OwyBQIGg3x/WsAnBAXwH7ZsQ14QAEq7yAE3jFLiUhLqQv43PQADytCsAhZ5YF86jsNBpD4Ds%2BQAG6YNeY6kJgRbIBqMxIWig6IfIrB4Ba8F2Fg0hIQQxB4IeLFjPwjAsARPB8HQBDCGIkhUUolSqOoKAGAYKjsdekBjKgYS1JRAC0aHaFe1R5LUljWF0FTUdYTQZFkiSRNEdBWY5yQxHZLQhD0pn5HQ9SdPY5TKLkfl1H0nlDN57QNK5PQRQM9mtJwYxCHO0xcAOQ4joRF7iAAHAAbPphXcCswDIMgKxAQAdJw1K4IQJCInMlQrHYf6fi1KXtapejLtB66kAguxYCEvykJu2gyDV25zfNC28Mx%2B6kIex6kKe47SFeN6kHea6kI%2BL6/u%2Bn7kJQJ3/iESxqIB007lQoHBBBUFUbBtDwTx8goVYBAYVhhG4bdBFUcRZl4ORlFnjReT0V95BSsxSFsRxxD3FxDHLvxglLsJUlMGwHASSJMkSIRSjaCot0qdOejqQEmkTTpenSIZxlVDUMQWbYQXdDZniJV5bnObEvPWUkIuRQ5Pmc/5fRxRz4Nyw0UvJTFgXxNZvQq4LUWZeMkwZSlu5SMOG25dIBXFaVQpKcA1XaLNs0NfgRDEN1pDtagnXovM2g8lO%2Bj9Xtg1jCNzBjZQG4gNNs0LfH25LXuOVUReO23oNh3PhASCXWd35560N3qJw90gawYEvYR72fSxyHe6hf2Ydh8hA/hMw4XgJFmJDhEw3R2bw0x8Mo8QnEYJje3Y19eP0AT4m8CTIhk/JIBzFTylaLThgo4z2m6TEBlGSs%2Bn6RCKPrJ2KNb0HugmbLrgQO4CvuKr0US7UCsfx5uvS4rYUBVKGLEKvlaiALfiFeWwD4o63SELY2aVDZEyyqbFOZ48pFRKmVYu9tS5OxkC7Jq7tFyew6qdX22g5hzADn1O%2BId7zR1jgnBOScpArU2uebaRhdr7X7FneAOdXzkK/Bdb2wjpCkWQGEMIDJSKcAAJwMm0PlBkEAABqDBOCFQZPIkYDJxClXkI9Cuz1KCvTPDXNG8MfroWboDW4wMO6ty7uDXuVF%2B5wzrsPOuo9x7cTrnxASM9JJzzEkTReUlSZyTPEoJa1Mb5qV3vAfeLMAD0Rl/7mSfpZaB/MIGVG/i5XJhTaD5MyQUKBmsQEP3Ab/NW2sNbBRgSUfJqV0rIJNmbDheUbZOCEJIlYcj5E1WUdSDRWiaq6MIW7D2XsfbEMoaMXq28BoMOGqNVoE0pozWYfHVhK1DycG3ObVOXDrwZ3vPw2AucxFXXOj%2BW5n4QCKmQNNbQpcHpPXAmY6ucErF1xsU3AGoMHHt0Bi4siFE%2B60U8YxRGI92JjzRhjeGgScZ8JEvPcJITpLL2ifIJQBoN4aFofTPe54D50CPuzUKWTn65NfnU9%2BTlP7FJZT/OBesZZK3CrFXJtKKmwMGH/BpQCqnNP6Jy6WbSkH62Yl0i2UgrZYJWC8lYbzOAjOdhARqMySFzOES1A0NCVn0LXIwnZuz5r7LQVtS83CLnmpNuzNaJ5FWrKdcxOYtrOH2t4UNci4EubcCAA%3D%3D">use relaxed pointer safety</a>. So that part couldn't justify what they do. <p>If you want to say that program is not guaranteed to print "2 2" then you are absolutely correct: empty output it's <b>also</b> a valid possibility (if <code>realloc</code> would actually move object). But this program, when compiled with a compiler which correctly implements <b>existing standards</b> couldn't print "1 2" — which it does on most compilers (clang/msvc/icc, only gcc produces correct result). And problem here is not with the fact that compilers are buggy (all software may have bugs) but with the fact that prevailing attitude here is “we screwed you up?… and we haven't yet added enough “undefined behaviors” to the standard to justify miscompilation of a perfectly valid program?… oh, that's too bad, let us add more “undefined behaviors” to the standard and apply them retroactively to make it possible to <s>screw you up more</s> optimize better… and no, we wouldn't give you the flag to make it possible to correctly compile correct programs”. Thu, 03 Jun 2021 13:43:45 +0000 Quotes of the week https://lwn.net/Articles/858086/ https://lwn.net/Articles/858086/ HelloWorld <div class="FormattedComment"> Why do you say the rules were retroactively changed? I don&#x27;t think there ever was a C standard that defined the behaviour of that program. <br> </div> Thu, 03 Jun 2021 12:55:46 +0000 Quotes of the week https://lwn.net/Articles/858033/ https://lwn.net/Articles/858033/ khim <font class="QuotedText">&gt; But anything that relies on your interpretation is inherently broken ...</font> <p>How and why? from the same C89 standard: <i>the <code>realloc</code> function returns either a null pointer or a pointer to the possibly moved allocated space.</i></p> <p>Yes, object <b>can be moved</b> by <code>realloc</code> and, as you have correctly noted <b>sometimes</b> it have to be moved, sure. But we have established that it haven't happened in our program with a simple check <code>if (p == q)</code>. If that's the same object and if it wasn't moved — what makes it possible to treat pointer to it as “invalid”?</p> <font class="QuotedText">&gt; Either your definition of realloc is correct and it has to return a failure (q == null), or it has to allocate a larger amount of space elsewhere and move the contents.</font> <p>Of course <code>realloc</code> can move the content. But <b>all</b> standards quite explicitly say that object can be left in place. Even C18 says the following: <i>the <code>realloc</code> function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object has not been allocated.</i></p> <p>This note (it's part of standard and was there since the invention in UNIX, although phrasing changed over time) just begs one to special-case that situation, right? But nope: apparently “provenance rules” (which are, once again, <b>not</b> part of C99, <b>not</b> part of C11, <b>not</b> part of C18, not part of <b>any</b> existing C++ standard and, apparently, are not yet even finalized) give the compiler the right to <s>screw the programmer</s> optimize that code.</p> <p>One can, of course, play weasel-words with phrase “the same value as a pointer to the old object” because standard defines “the same value” in the following way: <i>Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.</i> <p>And yes, I have heard from one clang-developers that, indeed, after call to <code>realloc</code> old pointer becomes pointer to one past the end of zero-sized array object! And using it is thus “undefined behavior”. <p>I couldn't even properly answer anything to such reading of the standard! Since I'm not Linus and I don't know enough English swear words.</p> <p>I guess at some point these rules would be finalized and would become, retroactively, part of C89/C99/C11/C18/C++98/C++11/C++14/C++17/C++20… but I can not see how someone can write any code in a language whose rules can be retroactively changed half-century after they were written.</p> Wed, 02 Jun 2021 20:39:47 +0000 Quotes of the week https://lwn.net/Articles/858022/ https://lwn.net/Articles/858022/ Wol <div class="FormattedComment"> <font class="QuotedText">&gt; Note that C89 quite explicitly allows that code and says the only possible output is “2 2”. This is because realloc there does the following: The realloc function changes the size of the object pointed to by ptr to the size specified by size. It&#x27;s still the same object, both pointers point to it, so why would they behave differently?</font><br> <p> <font class="QuotedText">&gt; C99 changed that. Now realloc works differently: The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. </font><br> <p> But anything that relies on your interpretation is inherently broken ...<br> <p> int *p = (int*)malloc(sizeof(int));<br> int *q = (int*)realloc(p, 16 * sizeof(int));<br> <p> If malloc has only allocated a block 64 bytes in size for p and all the metadata it needs to manage it, it is just not possible to resize it such that q == p. Either your definition of realloc is correct and it has to return a failure (q == null), or it has to allocate a larger amount of space elsewhere and move the contents.<br> <p> So regardless of whether it&#x27;s correct, using your interpretation, and using realloc to grow the allocated space, is a pretty stupid idea if you assume it&#x27;s &quot;just going to work&quot;. Most people assume that malloc/realloc won&#x27;t return a failure. Under your interpretation, it would be a common event.<br> <p> Cheers,<br> Wol<br> </div> Wed, 02 Jun 2021 19:17:39 +0000 Quotes of the week https://lwn.net/Articles/857926/ https://lwn.net/Articles/857926/ khim <font class="QuotedText">&gt; Well, to turn something that is "unhandled" into "handled", I assume you need some sort of specification how to handle it.</font> <p>What about the case where you turn something that <b>was</b> “handled” into “unhandled”? Let's consider concrete example. <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:11,positionColumn:2,positionLineNumber:11,selectionStartColumn:2,selectionStartLineNumber:11,startColumn:2,startLineNumber:11),source:'%23include+%3Cstdio.h%3E%0A%23include+%3Cstdlib.h%3E%0Aint+main()+%7B%0A++++int+*p+%3D+(int*)malloc(sizeof(int))%3B%0A++++int+*q+%3D+(int*)realloc(p,+sizeof(int))%3B%0A++++if+(p+%3D%3D+q)+%7B%0A++++++++*p+%3D+1%3B%0A++++++++*q+%3D+2%3B%0A++++++++printf(%22%25d+%25d%5Cn%22,+*p,+*q)%3B%0A++++%7D%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:100,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang1200,filters:(b:'0',binary:'0',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'0'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O2+-std%3Dc89',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+12.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:49.99999999999999,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+clang+12.0.0',t:'0')),k:49.99999999999999,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:50,n:'0',o:'',t:'0')),l:'3',n:'0',o:'',t:'0')),version:4">This code</a>: <pre> #include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; int main() { int *p = (int*)malloc(sizeof(int)); int *q = (int*)realloc(p, sizeof(int)); if (p == q) { *p = 1; *q = 2; printf("%d %d\n", *p, *q); } }</pre>Note that C89 quite explicitly <b>allows</b> that code and says the only possible output is “2 2”. This is because <code>realloc</code> there does the following: <i>The <code>realloc</code> function changes the size of the object pointed to by <code>ptr</code> to the size specified by <code>size</code></i>. It's still the same object, both pointers point to it, so why would they behave differently? <p>C99 changed that. Now realloc works differently: <i>The <code>realloc</code> function deallocates the old object pointed to by <code>ptr</code> and returns a pointer to a new object that has the size specified by <code>size</code>.</i> <p>Why would that be important? We compared pointers, thus they should behave identically, right? No. There is a <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm">decision</a> of WG14 committee which says literally the following: <i>after much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean</i> — and then short explanation of <b>how standard should be changed to make that program illegal</b>. <p>Note: they <b>haven't said</b> that standard actually means that <b>today</b>. Nope. <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf">Provenance insanity</a> is not yet part of <b>any standard</b>. Not C99, not C18 and not even C++20! Yet compiler writers think they are entitled to apply these rules (which are, apparently, <b>area of research</b> because compiler writers <b>still</b> <a href="https://www.ralfj.de/blog/2020/12/14/provenance.html">couldn't invent usable set of rules which you can use to write correct programs</a>) to old, C89 programs. <p>Nice, huh? <p><font class="QuotedText">&gt; Well, a C compiler basically implements C.</font> <p>Except today it's not true. C compiler writers implement basically whatever they want to implement and reserve the right to <b>retroactively</b> change rules of language. <b>Without</b> providing options which may bring back old behavior (<code>-fno-builtin-realloc</code> <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:11,positionColumn:2,positionLineNumber:11,selectionStartColumn:2,selectionStartLineNumber:11,startColumn:2,startLineNumber:11),source:'%23include+%3Cstdio.h%3E%0A%23include+%3Cstdlib.h%3E%0Aint+main()+%7B%0A++++int+*p+%3D+(int*)malloc(sizeof(int))%3B%0A++++int+*q+%3D+(int*)realloc(p,+sizeof(int))%3B%0A++++if+(p+%3D%3D+q)+%7B%0A++++++++*p+%3D+1%3B%0A++++++++*q+%3D+2%3B%0A++++++++printf(%22%25d+%25d%5Cn%22,+*p,+*q)%3B%0A++++%7D%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:100,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang1200,filters:(b:'0',binary:'0',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'0'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O2+-std%3Dc89+-fno-builtin-realloc',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+12.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:49.99999999999999,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+clang+12.0.0',t:'0')),k:49.99999999999999,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:50,n:'0',o:'',t:'0')),l:'3',n:'0',o:'',t:'0')),version:4">works today</a>, but apparently there are no guarantee that it would work in the future). <p><font class="QuotedText">&gt; That would be an option, but I assume the smoothest path forward is to continue proposing "C language extensions/options", like -fno-strict-aliasing and -fno-strict-overflow to GCC, for specific instances of undefined ("unhandled") behavior in the C language. If considered useful enough to be the default option for the whole C community, it can then be brought up to the C committee.</font> <p>I think at this point it's, basically, pointless. When I explicitly asked some clang developers about something like <code>-fno-provenance</code> option the answer was: <i>provenance is something LLVM *violently* believes in, at the level of alloca, malloc, and similar intrinsics scribbling provenance information all over LLVM's internal representions. Even if you could turn it off, I doubt it would fix all of your miscompilations, since this is a fundamental building block of LLVM's IR. Like I said before: although provenance is not defined by either standard, it is a real and valid emergent property that every compiler vendor ever agrees on.</i> <p>Note that <i>not defined by either standard yet real and valid emergent property</i> part. I think after answer like that… it's, essentially, pointless, to bring <b>anything</b> to a C committee. What's the point if said committee would pick something they like, not something that makes, you know, possible to write anything in that language?</p> <p>We are more-or-less stuck with GCC extensions for the foreseeable future and I think it's good idea to adopt Linus stance. Essentially: “I couldn't forbid you to use clang but I don't consider “clang miscompiles that code” a valid reasoning for any change in any project”.</p> <p>This is unfortunate because the only language which tries to address these issues in practically usable way, Rust, is basically, tied to LLVM currently. gccrs <a href="https://github.com/Rust-GCC/gccrs/commits/master">looks quite active novadays</a>, though, thus there's hope. But C and C++… they should be declared “unfit for any purpose”, sadly. Certain specific implementations probably can be used, maybe, but there are zero hope of getting sane cross-compiler treatment. That ship have sailed.</p> Wed, 02 Jun 2021 13:59:04 +0000 Quotes of the week https://lwn.net/Articles/857703/ https://lwn.net/Articles/857703/ patha <div class="FormattedComment"> <font class="QuotedText">&gt; but it lies in a different place than you think. You can read that article to understand Yodaiken&#x27;s position better.</font><br> <p> Sorry, I think I wasn&#x27;t able to understand much more by reading this article.<br> <p> <font class="QuotedText">&gt; He doesn&#x27;t argue about definition of “undefined behavior” in standard as much as he laments the fact that compilers today tend to interpret undefined behavior as, well, undefined behavior. Always. Unconditionally. To be avoided. Unconditionally.</font><br> <p> Well, to turn something that is &quot;unhandled&quot; into &quot;handled&quot;, I assume you need some sort of specification how to handle it. This also needs to be specified separaterly for each instance (the devil is in the details).<br> <p> <font class="QuotedText">&gt; And that is the gist of the problem: compiler writers insist that C standard should be the only thing that defines semantic of the program.</font><br> <p> Well, a C compiler basically implements C. If you want it to implement something else, you develop an optional change to the language, like for example the -fno-strict-aliasing and -fno-strict-overflow options in GCC.<br> <p> <font class="QuotedText">&gt; The way to reconcile these camps is not to complain and whine — but to change the standard.</font><br> <p> That would be an option, but I assume the smoothest path forward is to continue proposing &quot;C language extensions/options&quot;, like -fno-strict-aliasing and -fno-strict-overflow to GCC, for specific instances of undefined (&quot;unhandled&quot;) behavior in the C language. If considered useful enough to be the default option for the whole C community, it can then be brought up to the C committee. <br> <p> <p> </div> Mon, 31 May 2021 15:36:15 +0000 Quotes of the week https://lwn.net/Articles/857696/ https://lwn.net/Articles/857696/ khim <font class="QuotedText">&gt; I assume the main issue here is a misunderstanding of what undefined behavior actually is.</font> <p>Indeed - but it lies in a different place than you think. You can read <a href="https://www.yodaiken.com/2021/05/21/your-computer-is-a-fast-pdp-11-and-more-on-c-the-c-standard-and-computer-architecture/">that article</a> to understand Yodaiken's position better.</p> <p>He doesn't argue about definition of “undefined behavior” in standard as much as he laments the fact that compilers today tend to interpret undefined behavior as, well, undefined behavior. Always. Unconditionally. To be avoided. Unconditionally. And <b>that</b> drives him (and people like him… including Linus, unfortunatelly) nuts.</p> <p><font class="QuotedText">&gt; In our case with the 'print_digit' function, we may remove the undefined behavior by for example change it to: <blockquote> // Print a digit to stdout. For input outside the range 0 to 9, modulo 10 arithmetic is applied. void print_digit(int i) </blockquote></font> <p>Yup. You can do that. In fact people <a href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html">have done that</a>: they defined <code>SIGSEGV</code> signal and made sure it's triggered when you try to access <code>NULL</code> pointer. Behavior have become defined now, right? Nope: <a href="https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html">compiler doesn't think so</a>.</p> <p>And <b>that</b> is the gist of the problem: compiler writers insist that C standard should be <b>the only</b> thing that defines semantic of the program. Other guys (Yodaiken… Linus… and lots of other people who, coincidentally, <b>don't</b> write compilers) argue that it's completely unrealistic idea.</p> So we have two positions: <ol><li>C was explicitly designed to omit certain pieces thus they are declared as “undefined behavior” in standard — yet they are defined by execution environment thus we may expected “constructive interpretation” from the compiler.</li> <li>If something is important enough for the compiler to <b>actually care</b> — it should be defined in the C standard <b>itself</b>. It's already hard to write compiler which can work with dozen of hardware platforms and two standards (each hundreds of pages long). Adding unknown (and not limited!) number of things to that list is just not realistic.</li> </ol> <p>Guess which position wins when people with #1 stance don't write the compilers and people with #2 actually do?</p> <p>The way to reconcile these camps is <b>not</b> to complain and whine — but to <b>change the standard</b>.</p> <p>When 2-3 guys were developing a compiler — it made some sense not to include <b>everything</b> into definition of language. But today… when literally <b>thousands</b> of people are involved in developing these standards and compilers… If something is important enough for the compiler to treat it as anything but “undefined behavior” then said something is important enough to be explicitly written in the standard. End of discussion.</p> <p>Apparently Yodaiken actually tried to change the standard — but not with proposals to turn some “undefined behaviors” into other kinds (unspecified, implementation-defined, simply well-defined) but with attempt to push “constructive interpretation”. No wonder committee was not impressed. C (and C++… and CPUs… and OSes…) have just become too complicated for #1 stance to make any sense.</p> Mon, 31 May 2021 14:12:40 +0000 Quotes of the week https://lwn.net/Articles/857687/ https://lwn.net/Articles/857687/ patha <p>I assume the main issue here is a misunderstanding of what undefined behavior actually is. Basically, you may view it as "unhandled" cases in the C semantics. The blog post seems to assume that compiler writers currently actively need to figure out a semantics for undefined behavior, and a "constructive interpretation" is to just slightly change the definition of how compiler writers should figure out the semantics. I think this is a fundamental misunderstanding. Undefined cases are basically unhandled, rather than interpreted in any way. Then, going from "unhandled" to "handled" is a fundamental change, rather than a minor change in interpretation. <p>The notion of undefined behavior should be interpreted in the context of section 5.1.2.3, Program execution, of <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf">the standard</a> (excerpt): <blockquote> <p>The semantic descriptions in this document describe the behavior of an abstract machine in which issues of optimization are irrelevant. <p>In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or through volatile access to an object). <p>The least requirements on a conforming implementation are: <ul> <li>Volatile accesses to objects are evaluated strictly according to the rules of the abstract machine. <li>At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced. <li>The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input. </ul> <p>This is the <i>observable behavior</i> of the program. </blockquote> <p>If some C construct lack a "semantic description" – that is, is "unhandled" by the standard (but the C standard is actually currently trying to explicitly point out cases of undefined behavior) – also C language implementations (e.g. a compiler + hardware realization) can leave these C constructs "unhandled". According to my view, this is basically the origination of "undefined behavior". <p>Let's take a simple example (outside the domain of the C standard and compiler implementations) of a simple C function: <blockquote> // Print a digit to stdout. Only 0 to 9 has defined behavior.<br> void print_digit(int i) </blockquote> <p>The comment documenting the function implies that values for i outside of the range 0..9 may be left unhandled by the implementation. That is, if 10 is passed as an argument, the behavior is undefined, and the function is allowed to for example crash or print "Hello world!". (But in practice the last implementation is unlikely.) This also implies that the actual behavior for out of range cases may change, when a new version of the implementation is released. The actual behavior is uninteresting from the implementors point of view and may even be unknown (requires a non-trivial amount of reverse engineering to infer). <p>The Linux kernel turn on some GCC options, such as -fno-strict-aliasing and -fno-strict-overflow to actually change the C semantics into something else for some C constructs. (In my opinion these options should have been listed at <a href="https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html">Options Controlling C Dialect</a> rather than <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html">Options That Control Optimization</a> and <a href="https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html">Options for Code Generation Conventions</a> in the GCC documentation.) Similarly, we can also turn other cases of undefined behavior into defined behavior, by actually changing the C semantics. However, this needs to be done case-by-case, rather than trying to somehow turn all instances of undefined behavior into defined behavior, with some kind of default interpretation of undefined behavior (which I would say is impossible). <p>In our case with the 'print_digit' function, we may remove the undefined behavior by for example change it to: <blockquote> // Print a digit to stdout. For input outside the range 0 to 9, modulo 10 arithmetic is applied.<br> void print_digit(int i) </blockquote> <p>But other changes are also possible. For example: <blockquote> // Print a digit to stdout. For input outside the range 0 to 9, '?' is printed.<br> void print_digit(int i) </blockquote> <p>Both for our simple 'print_digit' function and the C standard or C compilers, the actual benefits of having something as "undefined behavior" are basically the same: <ul> <li>It may simplify the implementation. <li>It may make the execution more efficient. </ul> <p>The drawback is probably also basically the same: <ul> <li>It may make the usage more risky (the program may crash or behave more unpredictably). </ul> Mon, 31 May 2021 08:26:03 +0000 Quotes of the week https://lwn.net/Articles/857586/ https://lwn.net/Articles/857586/ khim <font class="QuotedText">&gt; It would be way more helpful if they just introduced (for example) a statement "these two pointers are never the same, it's undefined if they somehow are" or "this pointer does not point to this other variable".</font> <p>It's doable, it was actually done and it works very-very well. The problem? Th resulting language is called Rust and it's not C-compatible in any way, shape, or form.</p> <p>And that is fundamental. Because once you start going down that path you quickly discover that you need to somehow tell the compiler about <b>lifetimes</b> of different variables — and like <code>const</code> these annotations become viral: for them to help they need to be pervasive and used in the whole program.</p> <p>But while that helps, it's completely tangential issue. The core issue with “undefined behavior” is fundamental misunderstanding. Look again on the blog post where it explained how “shift example” is “constructively interpreted” with three possible “interpretations” and try to explain how that “constructively interpretation” of “undefined behavior” would differ from <a href="https://en.wikipedia.org/wiki/Unspecified_behavior">unspecified behavior</a>. I dare you.</p> <p>The short answer: it wouldn't differ at all — yet all these articles which cry about bad-bad-bad compilers ignore that fact completely.</p> <p>Undefined behavior exist for a reason and as, you have correctly noticed, it was supposed to cover cases which no sane person would try to ever used in their program — but some people (who know “too much” about underlying hardware, usually) like to pretend that they know the full range of possible outcomes (like with unspecified behavior) — and become extremely angry when compiler does something else.</p> <p>The proper solution is reclassification. Certain behaviors should be moved from “undefined behavior” bucket into “unspecified behavior” bucket, “implementation-defined behavior” bucket or may be just defined outright.</p> <p>But this is <b>hard</b>. Complaining about lack of common sense in a compiler is easier. Well… newsflash: compilers with common sense are not coming and if you have no formal definition of how your program should be interpreted then chances of it being interpreted in a correct way by a compiler which does literally <b>hundreds</b> of passes is close to zero.</p> Fri, 28 May 2021 15:34:36 +0000 Quotes of the week https://lwn.net/Articles/857505/ https://lwn.net/Articles/857505/ iabervon <div class="FormattedComment"> I think it comes down to not being willing to introduce a way to specifically tell the compiler statements that would help it optimize, and instead finding ways to notice that code wouldn&#x27;t necessarily be portable if those statements weren&#x27;t true. Early on, that let compilers get a lot better at compiling existing code, which didn&#x27;t have any annotations and where cases where the optimization was not valid had already caused problems and been fixed. But they later got to cases where violating the assumption didn&#x27;t actually cause problems, so existing code has to be changed to still compile correctly. And they miss optimizing cases where the programmer knows something is true, but hasn&#x27;t written the program in such a way that it being false would cause undefined behavior.<br> <p> It would be way more helpful if they just introduced (for example) a statement &quot;these two pointers are never the same, it&#x27;s undefined if they somehow are&quot; or &quot;this pointer does not point to this other variable&quot;.<br> </div> Thu, 27 May 2021 21:29:39 +0000 Quotes of the week https://lwn.net/Articles/857503/ https://lwn.net/Articles/857503/ khim <p>The people who fight against “evil compiler writers” miss the point because they just don't think about how modern compilers work (actually not-so-modern, it was a problem quarter-century ago already).</p> <p>The change in C99 wasn't arbitrary and, in fact, it was unavoidable. The reason is simple: phrase about “permissible undefined behavior” <b>only</b> makes sense for a very simple, basically trivial compiler. Which does not optimizations <b>at all</b>. Once you start doing optimizations you quickly find out that before you can argue if certain optimizations are valid or not and when… you need to know what “expected behavior” <b>is</b>. Because all optimizations, basically, depend on <a href="https://en.wikipedia.org/wiki/As-if_rule">as if rule</a> — but how can you apply it if you have no idea what happens in code at all? For many types of undefined behavior it's definitely not trivial… think <i>an invalid array reference, null pointer reference. or reference to an object declared with automatic storage duration in a terminated block occurs</i>: what kind of optimizations can you offer which would keep observable behavior the same for such a program?</p> <p>Consider the following program: <pre> #include &lt;stdio.h&gt; int foo() { int i; i = 42; } int bar() { int i; return i; } int main() { foo(); printf("%d\n", bar()); } </pre> <p>It <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:16,positionColumn:1,positionLineNumber:1,selectionStartColumn:2,selectionStartLineNumber:16,startColumn:1,startLineNumber:1),source:'%23include+%3Cstdio.h%3E%0A%0Aint+foo()+%7B%0A++int+i%3B%0A++i+%3D+42%3B%0A%7D%0A%0Aint+bar()+%7B%0A++int+i%3B%0A++return+i%3B%0A%7D%0A%0Aint+main()+%7B%0A++foo()%3B%0A++printf(%22%25d%5Cn%22,+bar())%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:cg111,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O0+-Wno-return-type',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+gcc+11.1+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+gcc+11.1',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4">works in GCC -O0</a>. <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:16,positionColumn:1,positionLineNumber:1,selectionStartColumn:2,selectionStartLineNumber:16,startColumn:1,startLineNumber:1),source:'%23include+%3Cstdio.h%3E%0A%0Aint+foo()+%7B%0A++int+i%3B%0A++i+%3D+42%3B%0A%7D%0A%0Aint+bar()+%7B%0A++int+i%3B%0A++return+i%3B%0A%7D%0A%0Aint+main()+%7B%0A++foo()%3B%0A++printf(%22%25d%5Cn%22,+bar())%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:cg111,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O+-Wno-return-type',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+gcc+11.1+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+gcc+11.1',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4">GCC -O2</a> breaks it <b>but not in a way permitted by vyodaken's reading of C89</b>. And Clang <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:16,positionColumn:1,positionLineNumber:1,selectionStartColumn:2,selectionStartLineNumber:16,startColumn:1,startLineNumber:1),source:'%23include+%3Cstdio.h%3E%0A%0Aint+foo()+%7B%0A++int+i%3B%0A++i+%3D+42%3B%0A%7D%0A%0Aint+bar()+%7B%0A++int+i%3B%0A++return+i%3B%0A%7D%0A%0Aint+main()+%7B%0A++foo()%3B%0A++printf(%22%25d%5Cn%22,+bar())%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:cclang1200,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O0+-Wno-return-type',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+12.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+clang+12.0.0',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4">breaks it completely - with -O0, too</a> yet <a href="https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:2,endLineNumber:16,positionColumn:1,positionLineNumber:1,selectionStartColumn:2,selectionStartLineNumber:16,startColumn:1,startLineNumber:1),source:'%23include+%3Cstdio.h%3E%0A%0Aint+foo()+%7B%0A++int+i%3B%0A++i+%3D+42%3B%0A%7D%0A%0Aint+bar()+%7B%0A++int+i%3B%0A++return+i%3B%0A%7D%0A%0Aint+main()+%7B%0A++foo()%3B%0A++printf(%22%25d%5Cn%22,+bar())%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:cclang1200,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-O2+-Wno-return-type',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+12.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),header:(),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',wrap:'1'),l:'5',n:'0',o:'%231+with+x86-64+clang+12.0.0',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4">differently with -O2</a>. <p>Worse: when people start talking about examples of undefined behavior which happen in normal program they often miss the point entirely.</p> <p><font class="QuotedText">&gt; Returning a pointer to indeterminate value data, surely a “use”, is not undefined behavior because the standard mandates that malloc will do that. Use of “asm” does not cause undefined behavior although it is nonportable because the standard (glancingly) mentions using asm.</font></p> <p>See? Standard writers are idiots since noone can write a reasonable program without triggering undefined behavior! Except… <i>returning a pointer to indeterminate value data</i> is <b>not</b> <i>use</i> and it <b>is</b> allowed. It's <b>not</b> an undefined behavior. And <b>asm</b> is very explicitly permitted, it's not an “undefined behavior” either. And I'm yet to see anyone who will complain that compilers can do crazy things if something like this is violated: <i>the format for the fprintf or fscanf function does not match the argument list</i>.</p> <p>The real problem is that when C89 was made lots of things were put in the “undefined behavior” bucket where they should have been put into “implementation-defined behavior” bucket. When C99 was made it become obvious that wording for “undefined behavior” just doesn't work - yet lots of stuff which <b>should have been </b> moved to “implementation-defined behavior” at this point was kept as “undefined behavior”.</p> <p>But instead of trying to fix that, very real problem (and yes, when properly discussed such conversion does happen: read <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf">P0145</a> or <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html">P0593</a> for example), people attack compiler writers again and again and again.</p> <p>As if that may ever change anything.</p> <p>P.S. The mere fact that linux kernel or libsodium even <b>have</b> these switches shows that compiler writers <b>do</b> understand that lists of “undefined behavior”s in standard are not perfect. But demanding that compilers should stop treating “undefined behavior”s as, well… <b>undefined</b> and start and treating <b>all</b> of them as “implementation-defined” ones simply because someone doesn't have know standard well-enough… it just wouldn't fly, sorry.</p> Thu, 27 May 2021 20:21:56 +0000 Quotes of the week https://lwn.net/Articles/857435/ https://lwn.net/Articles/857435/ smoogen <div class="FormattedComment"> When I worked at an HPC center, I found that much of the people who spent time on these sorts of committees were fighting this &#x27;war&#x27;. There was always some sort of &#x27;why do we even have people who program in X, it can&#x27;t do Y&#x27; because funding was always tight and well it is easier to say the lowlevel department should be axed since all our code is in highlevel. <br> <p> I don&#x27;t remember which of my &#x27;elders&#x27; said it but it has stuck with me since: &quot;It&#x27;s like watching a playground fight in a toolbox where the screwdrivers say that no one needs hammers or wiresnips because you do that with a screwdriver.&quot;<br> <p> </div> Thu, 27 May 2021 12:15:02 +0000 Quotes of the week https://lwn.net/Articles/857432/ https://lwn.net/Articles/857432/ ballombe <div class="FormattedComment"> <font class="QuotedText">&gt; It matters because over time the Standard and the common compilers have made C an unsuitable language for developing a range of applications, from memory allocators, to cryptography applications, to threading libraries and, especially operating systems.</font><br> <p> Fully Agreed.<br> <p> The C committee seems stuck optimizing 1970-era HPC benchmarks to get C to compete with FORTRAN (and failing) instead of recognizing what are the real use of C today: why a program is written in C rather than in a memory<br> managed language ? Usually because it is (in part) a memory manager, but the standard is still written in the spirit that all memory is always allocated with malloc. Apparently they did not hear about mmap. They even decided that the only way to copy data in a non aliasing way is byte-by-byte.<br> <p> Instead of adding explicit keyword to specify a type behavior (wrapping/nonwrapping, aliasing/nonaliasing etc.)<br> they force arbitrary behaviour that are unpractical in most situation, break backward compatibility and pretending previous behaviour was undefined (which means: it is possible to write a compliant compiler that breaks this, even if no such compiler have ever been published).<br> <p> And they do not address real issues than need update like the interface with the linker.<br> </div> Thu, 27 May 2021 11:51:02 +0000