|
|
Subscribe / Log in / New account

Silva: How to use the new counted_by attribute in C (and Linux)

Gustavo A. R. Silva describes the path to safer flexible arrays in the kernel, thanks to the counted_by attribute supported by Clang 18 and GCC 15.

There are a number of requirements to properly use the counted_by attribute. One crucial requirement is that the counter must be initialized before the first reference to the flexible-array member. Another requirement is that the array must always contain at least as many elements as indicated by the counter.

See also: this article from 2023.


to post comments

Culture

Posted Jul 17, 2024 14:54 UTC (Wed) by python (guest, #171317) [Link] (12 responses)

It seems that rust's culture surrounding compilation time sanity checking is rubbing off (in a good way) onto the kernel folks. I would not be surprised if the backend work for this sort of thing was implemented in LLVM long ago explicitly to support rust's compile time checking features (But, I am just speculating).

Culture

Posted Jul 18, 2024 0:02 UTC (Thu) by wahern (subscriber, #37304) [Link]

AFAIU, Rust's array bounds checks are implemented primarily in the Rust array implementation (often literally as regular unsafe {} Rust code), and to LLVM look much the same as manually written checks in C or C++. IOW, Rust's bounds checking is less about typing and more about providing bounds-checked array implementations, similar to C++. counted_by, by contrast, is a kind of dependent typing and necessarily implemented lower in the compiler stack, albeit still in the language front-ends--e.g. clang, not LLVM proper.

What influenced this addition?

Posted Jul 18, 2024 2:48 UTC (Thu) by milesrout (subscriber, #126894) [Link] (9 responses)

I didn't think Rust supported flexible array members. Does it?

This addition looks to me more like a natural extension of the work done around _FORTIFY_SOURCE etc. I believe these significantly predate Rust.

What influenced this addition?

Posted Jul 18, 2024 6:39 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (8 responses)

Sort of, you can use a zero-sized array and access it with runtime bounds checks
use std::ops::Index;
use std::slice;

struct S {
    n: usize,
    array: [u32; 0]
}

impl Index<usize> for S {
    type Output = u32;
    fn index(&self, index: usize) -> &u32 {
        unsafe {
            &slice::from_raw_parts(self.array.as_ptr(), self.n)[index]
        }
    }
}
but there's no equivalent of counted_by. I agree that it's more similar to _FORTIFY_SOURCE.

What influenced this addition?

Posted Jul 19, 2024 6:00 UTC (Fri) by bluss (guest, #47454) [Link] (7 responses)

Some more care is needed to do it in Rust in a way that the language agrees with w.r.t soundness rules, it's unfortunately not this easy.

The `unsafe` keyword is used to say that "I've ensured myself that Rust's expectations and rules are followed", it's not used for "anything goes" unfortunately.

What influenced this addition?

Posted Jul 19, 2024 6:06 UTC (Fri) by mb (subscriber, #50428) [Link] (6 responses)

What else is needed (apart from sound allocation of the whole thing of course)?
I think the example will panic on an out of bounds access.

But I think flexible array members are not really needed in Rust. Generics should cover many use cases.

What influenced this addition?

Posted Jul 19, 2024 17:56 UTC (Fri) by bluss (guest, #47454) [Link] (5 responses)

What I see are these

1. Rust struct field order is impdef by default. Add #[repr(C)] to actually ensure array is the last field member (that would be necessary for this to work).
2. Pointer provenance rules/stacked borrow rules - any pointer or reference derived from S.array is valid for exactly 0 elements, so we can't actually write code that indexes (computes a pointer) based on this field - not even using raw pointers. Miri will point out this error and say it's Undefined Behaviour. (All of creating the pointer - llvm GetElementPtr's inbounds rule, reading from the pointer, writing to the pointer).

What influenced this addition?

Posted Jul 19, 2024 19:47 UTC (Fri) by mb (subscriber, #50428) [Link] (4 responses)

Thanks for explaining. I learnt something today.

What influenced this addition?

Posted Jul 20, 2024 12:13 UTC (Sat) by bluss (guest, #47454) [Link] (3 responses)

It doesn't really make me happy to see the complexity of a "language with UB" reproduced in Rust, but it's sort of like that, for the `unsafe` block enclosed part of the language.

Plain flexible length allocations can use `Box<[T]>`. For the true flexible array member use case, some non-native library solution is needed, and there seems to be crates available that implement this. I think I would look at https://docs.rs/thin-dst/ for this, but I don't know it well.

What influenced this addition?

Posted Jul 22, 2024 17:54 UTC (Mon) by ms-tg (subscriber, #89231) [Link] (2 responses)

Q: Is there a semantic difference (a difference in _intended_ meaning and use) between the C flexible array pattern here, and in Rust having simply a `Box<[T]>` boxed slice in the Struct?

If so, is there a way to return a `Box<[T]>` in a general way to use in Rust code that is handed a C flexible array, that is close to zero cost?

(these are genuine questions, I don't know the answer)

What influenced this addition?

Posted Jul 22, 2024 21:29 UTC (Mon) by atnot (guest, #124910) [Link] (1 responses)

No, but also yes.

I'll start with the no: Box is just a heap allocation, and [T] is just a contiguous set of elements. So a Box<[T]> compiles down to a pointer to a variable sized heap allocation, which is different.

The reason it is done this way is that [T] is one of the few types in Rust that are "Unsized", that is, do not have a size known at compile time. The things you can do with unsized types in safe Rust are extremely limited. They can only exist behind a pointer, you can't create them, put them on the stack, or any of the other things you can do in C. Really the only thing you can do is put them in a Box to get a nice, constant size type again, hence Box<[T]>. You will never see a raw [T] writing normal Rust because of how useless it is.

So no, it's not a direct replacement, unless your flexible array is just {size, buffer[]}. Anything else doesn't really translate to safe Rust. However, this does represent the far majority of cases where flexible array members are used in C in practice. There are a few exceptions, but it's generally not a clear performance win and they often use funky bit stashing tricks that counted_by wouldn't understand either anyway. So it's close enough in practice.

However, while Box<[T]> might only be equivalent in some cases, [T] does exist and can actually be the last member of a struct to create a dynamically sized struct, exactly like in C. In fact to my knowledge it was added for C ffi. It won't be pretty and it will be highly unsafe. But you can totally do it, if you wish to.

Why not in safe Rust? Probably could be done, I just don't think anyone cares enough.

What influenced this addition?

Posted Jul 24, 2024 22:03 UTC (Wed) by rodrigorc (guest, #89475) [Link]

There are a few new features, that hopefully will be stabilized soon, that will make the "unsized struct with unsized last field" trick quite workable.

In particular the `ptr_metadata` feature is quite handy. This sample code runs in nightly Miri without warnings:

#![feature(ptr_metadata)]
use std::ops::Index;
use std::slice;

#[repr(C)]
struct S {
    n: usize,
    array: [u32]
}

impl Index<usize> for S {
    type Output = u32;
    fn index(&self, index: usize) -> &u32 {
        unsafe {
            &slice::from_raw_parts(self.array.as_ptr(), self.n)[index]
        }
    }
}

fn main() {
    unsafe {
        // Fake C code:
        let layout = std::alloc::Layout::from_size_align(100, 4).unwrap();
        let ptr = std::alloc::alloc_zeroed(layout);
        ptr.cast::<usize>().write(10);
        
        // Use the flex array:
        // First with size=0, a distinct Sized type for the header might look nicer
        let n = (*std::ptr::from_raw_parts::<S>(ptr, 0)).n;
        let s = &*std::ptr::from_raw_parts::<S>(ptr, n);
        for i in 0 .. s.n {
            println!("{} {}", i, s[i]);
        }
        
        // More fake C code
        std::alloc::dealloc(ptr, layout);
    }
}

Playground link.

Culture

Posted Jul 25, 2024 2:31 UTC (Thu) by mrugiero (guest, #153040) [Link]

I'd say it's the other way around. The kernel has a long tradition of using and developing tooling to help them overcome language gotchas. Way before Rust was a thing. Rust is a natural evolution of that mindset: rather than being _extra_ tooling, the default tools should enforce correctness as far as possible/practical.

Should “count” be quoted?

Posted Jul 18, 2024 3:08 UTC (Thu) by songmaster (subscriber, #1748) [Link] (1 responses)

In Jon’s 2023 article he showed a different attribute name and the count member name in quotes inside the attribute:

__attribute__((element_count(“count”)))

The online GCC and Clang documentation only mention counted_by and no sign of the quotes, so did the compilers agree on a better name while implementing this?

Should “count” be quoted?

Posted Jul 18, 2024 4:54 UTC (Thu) by Tarnyko (guest, #90061) [Link]

This extract of the article will likely answer your question:

#if __has_attribute(__counted_by__)
# define __counted_by(member) __attribute__((__counted_by__(member)))
#else
# define __counted_by(member)
#endif

Strings

Posted Jul 18, 2024 15:05 UTC (Thu) by mirabilos (subscriber, #84359) [Link]

Is there an equivalent attribute for when the flexible array member is supposed to be a NUL-terminated string?


Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds