Documenting counted-by relationships in kernel data structures

By Jonathan Corbet
July 3, 2023

The C language is expressive in many ways, but it still does not have ways to express many of the relationships between fields in a data structure. That gap can be at least partially filled, though, if one is willing to create and use non-standard extensions. The adoption of of those extensions, in the form of the __counted_by() macro, has been merged for the 6.5 kernel release, even though the compiler feature it depends on has not yet been finalized.

Flexible arrays ~~(also known as variable-length arrays)~~ are arrays defined within a structure with a length that is only known at run time:

    struct flex {
        int count;
	struct some_item items[];
    };

When a structure of this type is allocated at run time, the number of items to be stored within it will be known; enough memory will be allocated to hold an items array of the expected size. Normally such structures will include a field saying how long the array is; the count field in the above example could be used this way. But there is no way for the compiler (or any other tool) to know about the association between count and the length of items.

Flexible arrays, by their nature, are particularly prone to a number of memory-safety bugs. It is, thus, not surprising that work has been ongoing for some time in the kernel-hardening community to clean up and regularize the code that deals with these arrays in the kernel. As of the 6.5 release, warnings will be generated by code that uses anything other than the standard notation to declare a flexible array (array[] rather than the once-common array[0] or even array[1]). But flexible arrays are still opaque to code that wants to check whether a given reference falls within or outside of the allocated memory, for the simple reason that the actual size of the array is determined at run time and is not known to the compiler or other tools.

That information usually is available, though; it's just that the compiler does not know where to find it. In an attempt to fill in that information, requests were filed with both the GCC and LLVM communities to support a new variable attribute to indicate which structure field contains the length of a variable array. Using this attribute, the above structure could be declared as:

    struct flex {
        int count;
	struct some_item items[] __attribute__((element_count("count")));
    };

Here, the new element_count attribute says that the length of items (in elements, not bytes) is stored in the field count in the same structure. The compiler can use that information to calculate the size of the array; that, in turn, can be used to provide run-time bounds checking for accesses to the array. The result should be a kernel that is a little harder to exploit and better documentation of how the structure's fields relate to each other.

In the kernel, this new attribute is hidden behind a macro:

    # define __counted_by(member) __attribute__((__element_count__(#member)))

This macro makes the code more concise, which is nice, but it is needed for another reason as well. The actual naming of the element_count attribute is not yet set in stone, and might well change (probably to counted_by) before compilers with support for it are released. Once the name settles down, the macro can be changed to match.

Kees Cook, who has done the work of supporting this attribute in the kernel, is ready to go with the next step: annotating over 150 files with the new attribute. Those are the relatively easy cases, found with the Coccinelle tool; others are sure to follow.

Christoph Hellwig, while welcoming the feature in general, worried that it was being introduced too soon:

But this feels a bit premature to me, not only due to the ongoing discussions on the syntax, but more importantly because I fear it will be completely misused before we have a compiler actually supporting available widely enough that we have it in the usual test bots.

Cook answered that he has test systems running with the compiler patches and should be able to catch any incorrect annotations that show up in the near future. Meanwhile, though, he wants to get started marking up the code:

This has been a pain point for years as we've been doing flexible array conversions, since when doing the work it usually becomes clear which struct member is tracking the element count, but that information couldn't be reliably recorded anywhere. Now we can include the annotation (which is the really important part). [...]
But I really want to start capturing the associations _now_, and get us all into the habit of doing it, and I want it to be through some kind of regular syntax (now that there are patches to both GCC and Clang that can validate the results), not just comments.

That reasoning was clearly enough for Linus Torvalds, who pulled this change into the mainline during the 6.5 merge window. This new macro is another example of the kernel community extending the version of C language it uses in an attempt to address some of C's legendary safety issues. We should all gain a slightly more secure and better documented kernel as a result.

Index entries for this article
Kernel	Releases/6.5
Kernel	Variable-length arrays

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 14:24 UTC (Mon) by emersion (subscriber, #125762) [Link] (1 responses)

I wonder if this could be used for "regular" arrays as well. e.g.

struct s {
int count;
struct some_item *items __attribute__((element_count("count")));
};

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 17:31 UTC (Mon) by atnot (subscriber, #124910) [Link]

There is such a project being discussed by llvm: https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-...

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 16:40 UTC (Mon) by josh (subscriber, #17465) [Link] (4 responses)

This attribute would also be useful for bindings to C from other languages.

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 18:21 UTC (Mon) by Paf (subscriber, #91811) [Link] (3 responses)

This isn’t immediately obvious to me, could you say more about it?

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 19:44 UTC (Mon) by iabervon (subscriber, #722) [Link]

These other-language bindings are generally done by programs that use a C parser to analyze a C library's API and produce C code that would call it correctly. When there's a list of records, this currently requires recognizing a naming convention or something in order to interface between a language with built-in data structures with embedded lengths and C. This attribute would tell you the correct answer directly.

Documenting counted-by relationships in kernel data structures

Posted Jul 3, 2023 20:34 UTC (Mon) by josh (subscriber, #17465) [Link] (1 responses)

Binding generators for other languages read C header files, but the C headers (without this attribute) don't say how big a flexible array member is, making the bindings necessarily unsafe. With this attribute, binding generators could generate safer bindings, because they know how long the array is.

Documenting counted-by relationships in kernel data structures

Posted Jul 4, 2023 5:00 UTC (Tue) by Paf (subscriber, #91811) [Link]

Hmm, OK, thanks!