Scope-based resource management for the kernel [LWN.net]

Scope-based resource management for the kernel

Posted Jun 16, 2023 8:03 UTC (Fri) by mbunkus (subscriber, #87248) [Link] (10 responses)

I'm curious. In all my 30 years or so of writing C++ code I've never felt the need or even had the idea to have destructors with parameters (or more generally, to have cleanup code that requires parameters to work correctly). Can you give me some examples when that might be necessary? Thanks.

Scope-based resource management for the kernel

Posted Jun 16, 2023 9:53 UTC (Fri) by jengelh (guest, #33263) [Link] (7 responses)

Any extra arguments you would want to pass to a destructor function (i.e. besides the pointer to the object being destroyed) can be transformed to/from a model where those arguments are kept around as members of the object. Both approaches make an appearance in the POSIX environment:

1. void *p = mmap(..., z, ...); /* workworkwork */; munmap(p, z);
2. void *p = malloc(z); /* workworkwork */; free(p);

It's a tradeoff. #2 needs memory in each object to store the 'z' used during construction, while #1 needs CPU instructions to load 'z' from some memory location into the function call.

Scope-based resource management for the kernel

Posted Jun 16, 2023 10:12 UTC (Fri) by mbunkus (subscriber, #87248) [Link] (1 responses)

Yeah, I know, but I was explicitly asking about C++, not C, as ibukanov's statement was:

> The latter is even the problem in C++ where a destructor cannot be called with extra arguments.

In C++ both of your examples would not be a problem as you'd encapsulate the knowledge in types, e.g. using std::shared_ptr/std::unique_ptr for ownership management, a custom allocator for malloc/free and a custom thin wrapper class around mmap/munmap that stores z as a member.

That's why I asked & why I'm still curious of situations where "embed required arguments in thin wrapper class" isn't working.

Scope-based resource management for the kernel

Posted Jun 17, 2023 9:28 UTC (Sat) by jengelh (guest, #33263) [Link]

Fewer things are implicit in C. A wrapper, as thin as it might be, may need more lines of source code than invoking a (C) destructor function with multiple arguments a few times.

Scope-based resource management for the kernel

Posted Jun 16, 2023 12:23 UTC (Fri) by gioele (subscriber, #61675) [Link]

> 1. void *p = mmap(..., z, ...); /* workworkwork */; munmap(p, z);
> 2. void *p = malloc(z); /* workworkwork */; free(p);
>
> It's a tradeoff. #2 needs memory in each object to store the 'z' used during construction, while #1 needs CPU instructions to load 'z' from some memory location into the function call.

For #1 there is another disadvantage: you also need to pass `z` to all functions that may destroy (or may call functions that destroy) `p`.

Scope-based resource management for the kernel

Posted Jun 17, 2023 3:49 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

In modern C++ you can just use a scope guard with a small lambda function to capture the needed variables.

Scope-based resource management for the kernel

Posted Jun 22, 2023 19:47 UTC (Thu) by kreijack (guest, #43513) [Link] (2 responses)

> Any extra arguments you would want to pass to a destructor function (i.e. besides the pointer to the object being destroyed) can be transformed to/from a model where those arguments are kept around as members of the object. Both approaches make an appearance in the POSIX environment:

> 1. void *p = mmap(..., z, ...); /* workworkwork */; munmap(p, z);
> 2. void *p = malloc(z); /* workworkwork */; free(p);

> It's a tradeoff. #2 needs memory in each object to store the 'z' used during construction, while #1 needs CPU instructions to load 'z' from some memory location into the function call.

I am not sure that these can be called 'destructor'; a destructor is a way to release automatically the resource. If you have to call explicetely it is not a destructor but is a 'classic' resource releasing.

If the releasing is automatic, you cannot forgot to do that.

Instead if you have to pass a parameter to a destructor you are introducing another potential mistake: what if in your example you call munmap(p, y) (note y as last argument) ?

Using a destructor, or __attribute__((__cleanup__)) are not magic bullet that solve all the issues. A more interesting example is what if two abject are linked and have to be released together (i.e. explicitly). A destructor cannot solve this kind of issue, because it assume that the objects are independent each others, and the destructorS can operate independently.

Scope-based resource management for the kernel

Posted Jun 23, 2023 4:55 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (1 responses)

It is not classic resource releasing. The compiler verifies that a destructor with extra arguments is called exactly at places in the code where an automatic destructor is called.

If there are several things that need to be destructed, then the compiler still inserts automatically calls to parameter-less destructors before or after the explicit call in the reverse order of construction.

Scope-based resource management for the kernel

Posted Jun 24, 2023 8:31 UTC (Sat) by kreijack (guest, #43513) [Link]

> It is not classic resource releasing. The compiler verifies that a destructor with extra arguments is called exactly at places in the code where an automatic destructor is called.

> If there are several things that need to be destructed, then the compiler still inserts automatically calls to parameter-less destructors before or after the explicit call in the reverse order of construction.

This seems more error prone than any other solution. A mix of actions of the developers and the compiler. Still doesn't solve the issue of calling a destructor with a wrong parameter.

Scope-based resource management for the kernel

Posted Jun 16, 2023 10:34 UTC (Fri) by vegard (subscriber, #52330) [Link]

I tried converting some kernel driver C code to C++ once and I don't remember all the details but it was something like a network driver that had two DMA ring buffer objects. Destroying those buffer objects required the buffer size to be passed in (presumably to do some unmapping operation). In normal object-oriented code, those buffer objects would know how to destroy themselves, and the device object would just tell them to do that. However, because they were both the same size, that size was stored in the device struct rather than in the buffer struct (presumably as an optimization -- why store the size in 2 or 3 places when one is sufficient?). It meant that the buffers didn't know how to destroy themselves, as that required information from the device struct. If we'd had destructors with parameters then the device destructor could have passed the size in and called the buffer destructors with the size parameter.

There is probably a more elegant example, but the general idea is "information required to destroy an object comes from outside the object itself".

Scope-based resource management for the kernel

Posted Jun 16, 2023 12:01 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

It is about performance and ownership.

Storing extra fields in the object just for accessing in the destructor will lead to extra memory load that the compiler does not typically optimize away.

Then if one has a unique ptr to a resource manager, one does not want to store that in the object. A workaround is shared_ptr or raw pointers, but they do not reflect the ownership model.

Scope-based resource management for the kernel

Posted Jun 16, 2023 9:28 UTC (Fri) by jengelh (guest, #33263) [Link] (1 responses)

>One bans any kind of an early return

That is the "structured programming" paradigm, and the downside is that - unless you split code off to separate functions - the nesting level can go deep quickly (i.e. right side edge of the editor), violating other coding style paradigms. The lack of an early exit also means you have to mentally keep track of all the branches and conditions to reach a particular line - quickly exceeding the limit of ~7 objects that the human short-term memory can hold. That's all why GNU coding style is unwieldly. [Cf. e.g. coreutils:src/ls.c:do_statx.]

Scope-based resource management for the kernel

Posted Jun 16, 2023 12:05 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

This is not a structured programming. It is about replacing any early return with goto to the cleanup block shared by all code paths.

Scope-based resource management for the kernel

Posted Jun 16, 2023 18:09 UTC (Fri) by error27 (subscriber, #8346) [Link] (4 responses)

Since 2010, I have fixed or reported more kernel error handling bugs than anyone else. It's my thing.

This kind of One Exit Style is doesn't work. It doesn't prevent any bugs. People who are going to forget to unlock are going to forget regardless of what style you use.

There is no upside and there are some downsides to the One Exit Style. It hurts readability, because now you have to scroll to the bottom of the function to see what "goto out;" does where a "return -EINVAL;" is obvious. The second downside is that it introduces forgot to set the error code bugs. People think error codes are a minor thing but returning success on an error path is going to lead to a crash of some sort.

Scope-based resource management for the kernel

Posted Jun 17, 2023 3:24 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> what "goto out;" does where a "return -EINVAL;" is obvious

res = -EINVAL; goto out.

Scope-based resource management for the kernel

Posted Jun 17, 2023 6:49 UTC (Sat) by ibukanov (subscriber, #3942) [Link] (2 responses)

Also, if res is not initialised by the time the code reaches the out: label, then modern compilers will warn about it.

Scope-based resource management for the kernel

Posted Jun 17, 2023 11:34 UTC (Sat) by error27 (subscriber, #8346) [Link]

The kernel disables GCC's uninitialized variable warning because it had too many false positives. We use Clang and Smatch (my static checker) to find uninitialized variables but that doesn't really help with missing error code bugs. Normally the buggy code looks like:

ret = frob();
if (ret)
goto free_thing;

if (val > limit)
goto free_thing;

In Smatch, I consider that ret is set intentionally to zero if the "ret = " assignment is within 5 lines of the goto. But this is kind of a new rule and some people think you should be able to tell it's intentional from the context.

int ret = 0;

/* twenty lines of code */

if (on == ON)
goto done;

Scope-based resource management for the kernel

Posted Jun 19, 2023 9:15 UTC (Mon) by geert (subscriber, #98403) [Link]

modern = gcc-4.1? ;-)

That's the reason I kept on compiling kernels with gcc-4.1 for a while, even after the bar was raised to gcc-4.6...

Scope-based resource management for the kernel

Posted Jul 13, 2023 21:45 UTC (Thu) by ksandstr (guest, #60862) [Link]

Alternatively one can place resource acquisition and release in a function, and the operation on those resources in a second function, which is then at liberty to early-exit as much as it jolly pleases. A single-caller static function will be inlined by the compiler, yielding much the same result as a block of "Enomem: ret = -ENOMEM; goto cleanup;" lines.

More related to the automatic cleanup as discussed in the article, I have to wonder whether the utility of autocleanup is commensurate with the number of possible errors in usage that these structures harbour, and the length of time it'll take before kernel review is up to the task of catching them as well as mistakes in non-compiler cleanup. Forgetting the __highly__((__underscored__)) mess of attributes, which must go after each variable declaration[0] where autocleanup is desired, is just the tip of the iceberg. Could there please be some kind of a linter program to highlight suspicious cases such as "return ptr;" when "return_ptr(ptr);" was intended, so that this glass tower doesn't become a source of the next dozen years' worth of exploitable use-after-frees? A cleanup sanitizer pass in GCC perhaps?

To contrast, in Ada[1] the equivalent of an autocleanup property is associated with a type by specifying it as an extension of a "controlled" base type, and is thereafter transparent[2] to the programmer whether such controlled objects are passed around by copy, out-parameter, array slice, or whatever. This entirely eliminates usage errors in cases where the object is treated as though it contained no pointers, if its implementations of the "dispose" and "post-copy rejigger" methods are correct. Seeing that the amount of diddle involved in approximating something similar in C is about the same, only to give up some of C's flexibility and transparency, I have to wonder if this metaprogramming exercise is worth using "in anger".

[0] which appears to restrict variable declarations to one per line, not unlike the coward's maladaptation to C declaration-follows-use pointer syntax.
[1] please note that this is not in support of its use in the kernel; far from it.
[2] in the sense of "invisible but tangible".