Null pointers and error paths
Null pointers and error paths
Posted Jun 6, 2024 12:29 UTC (Thu) by pm215 (subscriber, #98099)In reply to: Null pointers and error paths by Wol
Parent article: Removing GFP_NOFS
Posted Jun 6, 2024 13:37 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (18 responses)
I'm sure I remember something about being able to do that, whether resizing it to zero actually does a free instead, or something. It's 20 years ago!, but I'm sure I remember thinking back in the day that you could achieve a free and assign null to the pointer in one operation. And I'm sure it was actually documented as working, even if nobody ever did it :-)
Cheers,
Posted Jun 6, 2024 15:36 UTC (Thu)
by paulj (subscriber, #341)
[Link] (17 responses)
Posted Jun 6, 2024 16:15 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (16 responses)
Just looked at the man page - I think it's a safe bet the version I used explicitly said it "frees and returns null", but the reality is it's implementation-dependent :-( Shame, would have been a nice fix for preventing double-frees if you could rely on it.
Cheers,
Posted Jun 6, 2024 17:22 UTC (Thu)
by intelfx (subscriber, #130118)
[Link] (12 responses)
Posted Jun 6, 2024 22:04 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (11 responses)
And they wonder why programmers are switching to other languages like Rust ...
Cheers,
Posted Jun 7, 2024 18:01 UTC (Fri)
by riking (guest, #95706)
[Link] (10 responses)
Posted Jun 7, 2024 18:13 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Not "your previous implementation-defined code can now be optimised away as undefined".
Cheers,
Posted Jun 8, 2024 8:47 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (8 responses)
That sounds like a perfect case for implementation defined behaviour, without a set of constrained behaviours; the implementation has to document how it chooses to behave to be compliant, but has to stick to whatever behaviour it documents. I can see why you wouldn't want it to be unspecified behaviour, since (while you can constrain that to a set of allowed options), you generally want unspecified behaviour to be cases where there's a consistent compatible use, and room to do better if the implementation chooses a specific option.
Is there any discussion you can point me to that explains why not implementation defined here?
Posted Jun 8, 2024 10:25 UTC (Sat)
by excors (subscriber, #95769)
[Link] (7 responses)
The key part is:
> Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please.
It won't be undefined behaviour to call realloc(ptr, 0) from C23 on a POSIX system, because POSIX already defines it. ("Undefined behaviour" is a recessive trait - if another document wants to define it, then that definition takes priority). Platform-independent code can't rely on the POSIX definition (since it may run on a non-POSIX platform), but it can't rely on any implementation-defined behaviour either, so that's no worse than how it's been since at least C99.
POSIX says realloc(ptr, 0) can either return NULL, or return a pointer to some allocated space of unknown size, with some extra rules about errno (which were never in the C standard). So you can't use it as an alternative to free() anyway - it may perform a new allocation, which you will leak.
Posted Jun 8, 2024 13:38 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (6 responses)
Ah, so the rationale is that they want POSIX and other downstream standards to add definitions for more things that are UB in Standard C; effectively reducing Standard C to "the things that are portable across all implementations of C", and expecting POSIX C to be an extension of Standard C to "the things that portable across all reasonable UNIX-like implementations of C".
Posted Jun 9, 2024 9:03 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (5 responses)
If the C standard says "the compiler can assume undefined behaviour cannot happen", then if it's defined as undefined surely the compiler can just delete the code as "can't happen"? And isn't that exactly the behaviour we've been moaning about for ages?
In which case any alternative definition never gets considered?
Cheers,
Posted Jun 9, 2024 10:07 UTC (Sun)
by excors (subscriber, #95769)
[Link] (4 responses)
It doesn't say that. It says:
> undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements
A compiler that assumes realloc(ptr, 0) cannot happen will still be a conforming implementation of the C standard, but it won't be a conforming implementation of POSIX. A compiler that implements the POSIX behaviour will be a conforming implementation of both. Any C compiler that targets POSIX will aim to conform with both standards, so it doesn't really matter which one defines the behaviour.
Incidentally, on non-POSIX systems, Microsoft says "realloc hasn't been updated to implement C17 behavior because the new behavior isn't compatible with the Windows operating system" (https://learn.microsoft.com/en-us/cpp/c-runtime-library/r...). N2464 says C17 changed the definition of realloc "to allow for the existing range of implementations", but evidently they failed since Microsoft believes it doesn't allow their behaviour.
I'm guessing C23 made it undefined because they couldn't work out a useful, unambiguous definition that would allow both the POSIX and Windows behaviours, and neither of those are going to change, so it was a waste of time - better to simply defer to the platform documentation.
Posted Jun 9, 2024 16:19 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
But that then leaves us wondering what on earth any cross-platform compiler such as gcc does, seeing as it can produce both Posix and Windows binaries ... and given that linux makes no claim whatsoever to support Posix, that's giving the gcc guys carte blanche to just eliminate a realloc(ptr,0) as "does nothing". BAAADDDD ...
Cheers,
Posted Jun 10, 2024 8:18 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
A compiler like GCC can know what platform it's compiling code for (it has to know which CPU, and which ABI, after all, so extending that to which platform standard applies and allowing an override is a non-issue). It can thus support POSIX C when compiling for POSIX platforms (like glibc-based Linux), Windows C when compiling for Windows, and merely ISO C when compiling a freestanding binary that doesn't depend on a platform.
Note that GCC already has the mechanism you'd need for this in place as part of its support for multiple C dialects - it could have a POSIX/Windows/neither switch in there, too.
Posted Jun 9, 2024 16:25 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
So doing a "ptr = realloc(ptr, 0)" instead of a free would prevent double frees or access-after-free, so long as (a) you're using a Windows-compliant compiler, and (b) you don't make copies of ptr. Surely that would make masses of sense as a simple way of defensive coding!
Cheers,
Posted Jun 10, 2024 10:40 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
So, reading the discussions a bit more, I get the sense that the ISO committee's goal is to move the meanings of undefined behaviour, unspecified behaviour and implementation-defined behaviour around a bit (in a way that's always been valid, but has been under-utilized by downstream standards like POSIX).
The ISO standard sets a common definition of C that all implementations of C must agree on, but allows a lot of latitude for downstream standards to tighten up the ISO definitions; for example, ISO says that "double" is a floating point number, so a downstream standard cannot repurpose "double" for integers represented using a pair of registers, but while ISO C says that the size of "char" in bits is implementation-defined and at least 8 bits, POSIX says that the size of "char" is always exactly 8 bits.
This is then an attempt to push downstream standards to tighten up ISO definitions when it comes to behaviours; it's already obvious to downstream standards that where something is implementation defined, a downstream standard can say "the implementation must define it this way", but it's not so obvious that where something is unspecified behaviour (one of a set of choices, no need to be consistent or to document which one as long as you choose from the set every time you encounter this construct) or undefined behaviour (the program can take any meaning if this construct is encountered) in ISO C, a downstream standard can make that defined, implementation-defined (choose and document a behaviour), or unspecified if it so desires, without conflicting with ISO C.
Taking an example that upsets a lot of people, ISO says that signed integer overflow is undefined behaviour; but they'd be very happy for POSIX to say that signed integer overflow is unspecified behaviour, and must be either saturating, twos complement wrap-around, wraps around to zero (skipping the negative numbers completely), or results in the program receiving a SIGFPE signal. It'd then be on implementations to choose the behaviour that results in the most optimal program from those choices, assuming they claimed POSIX support.
Posted Jun 10, 2024 9:51 UTC (Mon)
by paulj (subscriber, #341)
[Link] (2 responses)
What I do is:
1. At the lower levels that need to deal directly with *alloc and free(), I have a wrapper around free() (possibly a macro) which takes a double-pointer to the caller's pointer. It can then null out the caller's pointer directly.
foo *foo_free(foo_t **foo) {
2. At a higher level, you need to encapsulate the low-level memory allocations into some coherent strategy to manage the lifetime of objects. Often some combination of:
Probably forgetting some strategies.
This is one of the most important and hardest parts to get right. Some languages have features to make it harder, even impossible, to free used objects. But that leads to many programmers in such languages not understanding the importance of lifetime management - which remains important even in such languages for performance.
Posted Jun 10, 2024 16:00 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
How come? It sounds like it typically doesn't work that way on Unix, but if MS define "realloc( ptr, 0)" to free the memory and return null, then "ptr = realloc( ptr, 0)" achieves exactly what I would like by that definition - it destroys the pointer at the same time as destroying the memory.
If you have something that then traps de-referencing a null pointer (or simply your code checks for a null pointer), then the likelihood of double-free, use-after-free, etc goes down.
Cheers,
Posted Jun 17, 2024 17:24 UTC (Mon)
by mrugiero (guest, #153040)
[Link]
Null pointers and error paths
Wol
Null pointers and error paths
Null pointers and error paths
Wol
Null pointers and error paths
Null pointers and error paths
Wol
Null pointers and error paths
Null pointers and error paths
Wol
Undefined, implementation defined, and unspecified behaviour
Undefined, implementation defined, and unspecified behaviour
Pushing for downstream standards to define more
Pushing for downstream standards to define more
Wol
Pushing for downstream standards to define more
>
> Note 1 to entry: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Pushing for downstream standards to define more
Wol
Compiler choices when supporting two conflicting C standards
Pushing for downstream standards to define more
Wol
Downstream standards should define more behaviours (changing the split between US, ID, and UB)
Null pointers and error paths
__whatever_house_and_book_keeping ();
free (*foo);
*foo = NULL;
return NULL;
}
#define attr_cleanup(X) __attribute__ ((__cleanup__(X)))
....
{
foo_t *foo attr_cleanup (foo_free) = foo_new(ctxt);
...
}
a) Allocating a pre-determined number of objects, suitable for the problem being tackled. (Good for deterministic behaviour).
b) Careful alignment of entity lifetime with the structure of the algorithm being run
c) Reference counting
d) Hierarchical allocation management, in combination with one of the previous
e) Liveness checking [either at a low level by scanning for pointers (v rare in C/C++), or some more abstract, problem-domain + type specific check] and GC
Null pointers and error paths
Wol
Null pointers and error paths
#define xfree(ptr) do { \
free(ptr); \
ptr = NULL; \
} while (0)