LWN: Comments on "Tracking trust with Rust in the kernel"

Different kinds of validation?

farnz — Thu, 25 Sep 2025 09:16:14 +0000

Untrusted deals with the problem of "this is a valid HTML string that preserves soundness-relevant invariants and has the correct structure for HTML, but the contents of this string may not meet other invariants, like "does not contain the English word 'Administrator'".

So Microsoft® notice:: please update your credentials in <a href="https://phishing-site.example.com">our new identity system is a valid HTML string, preserving all the invariants you need preserved (balanced tags, valid tags, entity escaping etc). However, if it came from a template file, it's untrusted (in the sense of Untrusted<&HtmlStr>), and needs validation before you treat it in the same way as HTML generated by your application.

Contrast impl TryFrom<&str> for HtmlStr<'_gt;, where you're saying that the input is a string, and it might, or might not, be valid HTML, but you need to do work to confirm that it is valid. Different problem - "this might not be valid HTML", as opposed to "this is definitely valid HTML, but the source might not be trustworthy".

Different kinds of validation?

taladar — Wed, 24 Sep 2025 07:46:30 +0000

So you are talking about e.g. a &str that you think is HTML but you have to parse it to validate that it has the correct structure and preserves invariants?

Different kinds of validation?

farnz — Tue, 23 Sep 2025 09:20:31 +0000

That's a related problem, solved already (as you've shown) by the newtype pattern.

Untrusted<T> is closer to the way &str relates to &[u8]; a string slice is a byte slice with the additional promise on top that it's valid UTF-8. In this analogy, Untrusted<T> is to T as &[u8] is to &str; you may "know" that this "should" be a safe string, but you have to actually validate it to use it (like you would with str::from_utf8 if you were dealing with byte slices).

Different kinds of validation?

taladar — Tue, 23 Sep 2025 08:03:55 +0000

What I would really want is more along the lines of converting any &str to SafeHTMLString, SafeSQLString,... and then have the templating syntax only allow the latter types. I also don't see why that conversion couldn't be done with TryFrom. No real need for generics here either since the various types of safe string have nothing to do with each other and neither does the question if they exist for any given input type.

Different kinds of validation?

farnz — Mon, 22 Sep 2025 09:34:45 +0000

The challenge is that you want to model "convert Untrusted<&str> to Untrusted<HTML> or Untrusted<SQL>", so that what you're passing round is always an Untrusted<T>, the conversions are zero-cost, the validations are cheap, and using a newtype like DisplayName that requires that something validates as SQL, HTML, shell, and whatever other places you want to use it in.

Solutions to this tend to either feel un-Rusty (since you have your own special UntrustedFrom, instead of using From like everyone else), or fall foul of the orphan rule, or make it hard to verify that Untrusted<DisplayName>::validate really does validate against all of the required validation types cheaply.

Different kinds of validation?

taladar — Mon, 22 Sep 2025 08:52:12 +0000

I don't think Untrusted<HTMLText> and Untrusted<SQLText> are quite the right way to model this. Usually it is more that we have a non-specific untrusted value that we want to use in one or more contexts that require specific validation (e.g. the user's display name might need a different validation to be displayed in HTML than to be used in SQL and a different again to be used as a shell parameter).

Different kinds of validation?

mrugiero — Sat, 20 Sep 2025 19:24:39 +0000

The easy solution to that (both in Rust and many other strongly typed languages) is the newtype pattern. Rather than using `Untrusted<String>` for both your HTML and SQL sanitization, you would do something like:


struct SQLText(pub String);
struct HTMLText(pub String);
impl Validate for Untrusted<HTMLText> {
    fn validate(self) -> Result<HTMLText> {
        // HTML validation logic here
    }
}
impl Validate for Untrusted<SQLText> {
    fn validate(self) -> Result<SQLText> {
        // HTML validation logic here
    }
}

And the functions receiving each trusted type would look like:

fn operate_on_html(HTMLText(text): HTMLText) -> Something {
    // Stuff
}
fn operate_on_sql(SQLText(text): SQLText) -> SomethingElse {
    // Other stuff
}

I wonder if there's a way to ban implementing `Validate` for types that are considered too generic to force one into the newtype pattern when that should be a reasonable requirement (such as strings that may be literally anything).

Parse not validate!

adobriyan — Sun, 07 Sep 2025 17:13:00 +0000

> + pub fn validate_ref<'a, V: Validate<&'a Self>>(&'a self) -> Result<V, V::Err> {
> + V::validate(&self.0)
> + }

I suspect this will be as useful as Ada's integers for months from 1 to 12 mentioned above,
covering only the most simplest examples (which are the least interesting).

Here is the simplest example where "validation" happens in wider context (of a block device).
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

Trusted slice lengths

tialaramex — Sat, 06 Sep 2025 11:22:38 +0000

> that I'm sure serve some useful purpose

Two things are going on here for anybody fascinated but not wanting to go read the Rust internals

1. Runtime performance optimizations. Vec<T> is Rust's growable array type so it must have excellent performance. This is less crucial in Linux or some embedded code but in most normal userspace code you're using growable arrays a LOT and so the performance of this type is crucial to overall Rust performance.

An example of this would be do_reserve_and_handle which is an inner cold function, so that the compiler will emit all the code to actually go do a bunch of work in a separate function, whereas the code around it is warm, and will be inlined aggressively but it will usually do something very cheap, such as checking n <= capacity, only calling do_reserve_and_handle when that's not true.

2. Separation of concerns. RawVecInner literally doesn't know about T. It just makes growable arrays of bytes. The people working on RawVecInner do need to fret about Allocators (you might not use the default one) and efficiency but not about the type T. On the other hand RawVec<T> can pass the allocator problem to RawVecInner and cope only with the difference between bytes and T if T isn't a byte. Cap is a type dedicated solely to the problem of efficiently improving upon the obvious but not quite adequate "Capacity is just an integer" idea. People working on Cap don't need to think about any other part of the Vec<T> problem.

Bounded integers

tialaramex — Sat, 06 Sep 2025 10:40:50 +0000

I disagree that bounded integers aren't useful, but then I would since I'm the person who really wants BalancedI8 and similar types (in that case, the signed 8-bit integer minus its most negative value, thus conveniently balanced, given a niche so that Option<BalancedI8> is also a single byte, and yet in practical terms just as useful as the existing i8 for almost any purpose)

Firstly, bounded integers give us a niche and Rust knows how to use the niche, with built-in types such as Option<T> as well as any user types being allowed to consume a niche - so now our data structures are smaller, yet our software is more correct, that's a win-win deal.

But also - this isn't a thing Rust is expected to do in the foreseeable future but it's certainly reasonable for Linux to be thinking about it in this context, bounded integers mean you can use mechanical proofs to ensure you can't write certain crucial types of bug. This is why WUFFS gets to have no bound misses despite not explicitly emitting bounds checks. It has verified that your code doesn't use any values which would cause a miss, ensuring that you meet these mathematical criteria is your problem and so you might need to write bounds checks, but often your algorithm can prove it doesn't miss anyway and WUFFS checks the proof.

Validate & Copy?

lossin — Sat, 06 Sep 2025 08:04:19 +0000

I think we definitely can add functions to UserSliceReader that copy and validate in the same step. But we'll have to do that for every single way you can read untrusted data. Having a generic API that you can just plug untrusted values into should still exist for APIs that don't provide such a function themselves. There are also other use-cases for untrusted data, for example with UserSliceReader you might also just want to copy some bytes from one place in userspace to another and you wouldn't want to validate anything in between. So if we only had the copy & validate function, you would have to write an empty validation function for [u8].

Different kinds of validation?

daroc — Fri, 05 Sep 2025 14:07:24 +0000

This kind of mechanism definitely still needs care. I think no language can mechanically ensure all validation is correct, for exactly the reason you point out. Lossin's goal with this patch set, as described in the cover letter at least, is to make mistakes harder and the good way to do it easier. So unlike some other Rust mechanisms, this isn't a bullet-proof guarantee, just a helpful API.

Trusted slice lengths

NYKevin — Thu, 04 Sep 2025 22:56:58 +0000

I'm doubtful that Untrusted<Vec<T>> can plausibly mean anything other than Vec<Untrusted<T>>. Vec<T> is a struct consisting of the following fields (behind several layers of internal indirection, that I'm sure serve some useful purpose, but I can't be bothered to understand them now):

* A pointer to some heap allocation, which can be dynamically resized.
* A size
* A capacity

If the pointer points to userspace, then it is very questionable to call this thing a Vec, because you can't dynamically reallocate userspace memory - it's better to characterize that as a slice, or something[1] resembling a slice. But if the pointer points to kernelspace, then that means you called kmalloc or one of its equivalents, and you had darned well better know that it's still a valid allocation, as well as how big that allocation is. That implies the capacity is trusted completely, and the pointer is trusted to point to a live allocation of the correct size and alignment. Which just leaves size. Size tells us how many elements (T objects) in the allocation are initialized.

Technically, uninitialized values are instant UB unless protected by MaybeUninit<T> (or behind a raw pointer). If size is wrong, Vec will produce uninitialized values when we index it, and that's UB. So we do need to validate size at some point in this whole procedure, or else we lose soundness. But by assumption, we are in kernelspace, so "initialized" just means that we called something resembling copy_from_user(), and it wrote some sort of data into that array slot. When we did that, we should have set size to the appropriate value for whatever copying we did (because we know what arguments we passed to copy_from_user()).

In short, if you really have a Vec, and not an elaborately disguised UserSlice or the like, then you should already trust enough of it to make this transformation valid.

[1]: https://rust.docs.kernel.org/next/kernel/uaccess/struct.U...

Trusted slice lengths

NYKevin — Thu, 04 Sep 2025 20:14:09 +0000

> If you want to imply that the length is not trusted, this would be sth. like Untrusted<&mut [u8]>. This would be indeed very strange as not only the length would now be untrusted but also the pointer and the associated lifetime.

The lifetime parameter is indeed trusted, because the borrow checker "lifts" it out of the type parameter and treats it roughly the same as a top-level lifetime parameter (i.e. if you have Untrusted<&'a T>, then Rust automatically infers that the Untrusted object must not outlive 'a, because &'a T must not outlive 'a).

Technically, it's more complicated than that, and the precise behavior depends on the fields of Untrusted, see [1] for details. But in short, Rust will automatically infer relationships between Untrusted<&'a T> and &'a T as appropriate to prevent you from using a reference whose lifetime has ended, and at no point does it consider the possibility that &'a T points to a T object that fails to live for at least 'a. Even if &'a T only appears in PhantomData, a relationship is still inferred (Rust interprets this as "I logically own some &'a T even though you can't see it listed in my fields").

But it gets even worse. If you ever expose a real &'a T to safe code (i.e. code that does not live inside the implementation of Untrusted or its owning crate), and it points into userspace-controlled memory, then you have probably violated soundness in multiple different ways:

* If userspace modifies the T while the reference exists (e.g. from another thread), and T is not UnsafeCell, then the behavior is undefined. If T is UnsafeCell, then you have to turn it into a raw pointer (*mut U) via UnsafeCell::get() to do anything anyway. We can't have e.g. T = Mutex, because then the whole Mutex object lives in userspace, and userspace is not beholden to Rust's safety rules (so it could modify the inner value without taking the lock, or even stomp on the futex and break everything).
* Safe Rust assumes that you can dereference &'a T at any time during 'a, and that it will always succeed and produce a T instance. In particular, the dereference operation is not allowed to fail with EFAULT or the like, so that would have to panic (obviously an unacceptable response to "userspace gave me a bad pointer").
* The T instance must be initialized unless T is MaybeUninit (or some equivalent, since unlike UnsafeCell, MaybeUninit is not magic and can be implemented by hand). I have no idea how that rule even applies when you're pointing into shared memory, but I think it doesn't have a good answer because...
* Safe references are never volatile, so it's pretty much always incorrect to produce or dereference a safe reference pointing into shared memory. This is the case even if you fully serialize all reads and writes with some external locking, because LLVM (or GCC etc.) is permitted to optimize under the assumption that nobody else is reaching into your memory and modifying it behind your back (if the pointer escapes, then the compiler probably has to pessimistically assume that foreign code might fiddle with it, but that does not mean it is valid to do this in general). The "correct" way to do it is with functions like read_volatile[2], but those all take raw pointers, not safe references.

On the other side of things, once you have all of the necessary guardrails around a userspace pointer, there is not much point in giving it a lifetime parameter. You're already (by assumption) calling fallible functions like copy_from_user() or get_user() (or their Rust equivalents) instead of directly dereferencing an address. You already have runtime error checking, and cannot violate soundness by holding onto the pointer for too long. Static borrow checking would provide no benefit, and might as well be disabled for ergonomic reasons.

This is one of (probably) several reasons why UserPtr<T> has to exist and can't be spelled as Untrusted<&'a T> or any similar signature (note the lack of a lifetime).

[1]: https://doc.rust-lang.org/reference/subtyping.html#variance
[2]: https://doc.rust-lang.org/std/ptr/fn.read_volatile.html

Different kinds of validation?

Cyberax — Thu, 04 Sep 2025 19:48:21 +0000

Bounded numeric types are not really that useful. They were often historically used in place of algebraic types, like the names of months or weekdays.

If you want to start using numeric types for things like packet length, you quickly get bogged down with casts when arithmetic results in possible out-of-bounds. In the end, it's just easier to use safe arrays with safe indexing.

Different kinds of validation?

epa — Thu, 04 Sep 2025 17:51:21 +0000

My experience with defending against injection is that usually you can take an approximate, but safe approach. If some field can genuinely be arbitrary text then you must take care to escape it appropriately for each use, but if not, you can define a safe character set like [A-Za-z0-9_] and restrict it to that. Perhaps the $ character does not cause any real injection problems when generating SQL, or HTML for that matter -- but so what, forbid it anyway. A username or the name of a report or a date and time will not contain that character.

Getting back to the kernel and Rust, I would love it if more languages adopted bounded integers as provided by Ada. You can declare your month number variable as range 1..12. I'm not an Ada programmer and I am sure those more experienced with it will point out practical shortcomings in the way Ada does it. But I'm sure it could be nicer in Rust. (You can even do it in C++ templates.) There is a bounded_integer crate, I see.

Trusted slice lengths

excors — Thu, 04 Sep 2025 17:46:42 +0000

Ah, thanks, that makes it a bit clearer. I think the specific detail I didn't realise is that when a struct's last field is a DST, that struct becomes a DST, and so a reference to the struct will contain the size of the struct's last field. Untrusted<[T]> is a struct with a single field of type [T], so that rule applies here. (I had been wrongly assuming the length had to be stored with the slice itself somehow, inside the struct, but the DST rules bring it outside.)

On the other hand Untrusted<Vec<T>> derefs to Vec<Untrusted<T>>, and in that case I don't think there's any DST cleverness happening - it's simply moving the Vec's length field from inside to outside the Untrusted, which looks like a change in the trustedness of the length field, without any validation. I don't think this is a big problem, it just seems like an unfortunate hole in the trust boundary.

Rust could prohibit reuse of a value

iabervon — Thu, 04 Sep 2025 17:12:56 +0000

Oh, yeah, I guess it's not actually necessary to ensure that code doesn't validate a different copy if you ensure that it does validate the copy it uses. Reviewers would take a dim view towards doing inadequate validation of the real one based on having done something (on a different copy) elsewhere.

Rust could prohibit reuse of a value

farnz — Thu, 04 Sep 2025 15:57:43 +0000

It strikes me, reading UserSlice's documentation that the fix isn't so much making UserPtr not implement Copy (since UserPtr is equivalent to a raw pointer), but making UserSlice's, UserSliceReader's and UserSliceWriter's methods always handle Untrusted<T>, and not T directly.

That would prevent the TOCTOU bug, since UserSliceReader::read_all would work in terms of Untrusted<KVec<u8>> instead of KVec<u8>, and thus to return KVec<u8> as per the method signature in the TOCTOU bug example, you'd have to call validate and get back a validated buffer.

Trusted slice lengths

matthias — Thu, 04 Sep 2025 15:09:02 +0000

> > converting an &mut Untrusted<[u8]> (a mutable reference to an untrusted slice of bytes) into an &mut [Untrusted<u8>] (a mutable reference to a slice of individually untrusted bytes) can be done automatically
> My initial thought was: doesn't the first type imply that the length of the slice is untrusted, while the second implies it is trusted? That sounds like a dangerous conversion to do automatically. But from reading the patch set, Untrusted<[T]> is documented as having a trusted length ("as it would otherwise violate normal Rust rules") and only the elements are untrusted, so the conversion is fine.

Actually, the length of a slice is stored in the reference and not stored as part of the slice. &mut Untrusted<[u8]> is a wide pointer that is a pair (ptr,len) that says at position ptr, there is data of type Untrusted<[u8]> that has size len. The len is always part of the reference, not part of the type T inside Untrusted<T>.

This is how all dynamically sized types work in rust. The length is always stored in the reference which therefore needs twice as much space as ordinary references. The difference between &mut Untrusted<[u8]> and &mut [<Untrusted<u8>] is really just the difference between a slice of size len that is untrusted and len many bytes that are individually untrusted.

If you want to imply that the length is not trusted, this would be sth. like Untrusted<&mut [u8]>. This would be indeed very strange as not only the length would now be untrusted but also the pointer and the associated lifetime.

Trusted slice lengths

excors — Thu, 04 Sep 2025 12:28:48 +0000

> converting an &mut Untrusted<[u8]> (a mutable reference to an untrusted slice of bytes) into an &mut [Untrusted<u8>] (a mutable reference to a slice of individually untrusted bytes) can be done automatically

My initial thought was: doesn't the first type imply that the length of the slice is untrusted, while the second implies it is trusted? That sounds like a dangerous conversion to do automatically. But from reading the patch set, Untrusted<[T]> is documented as having a trusted length ("as it would otherwise violate normal Rust rules") and only the elements are untrusted, so the conversion is fine.

I don't entirely see how that would work, though. I guess low-level code that's constructing an Untrusted<[T]> from e.g. core::slice::from_raw_parts(ptr, len) will be responsible for ensuring the length is safe (not extending outside the appropriate address space or backing buffer, etc). But 'safe' does not mean 'trusted' (that distinction is fundamental to this patch set), and the Untrusted<[T]> / [Untrusted<T>] will come into existence before we reach the higher-level code that knows how to validate the length properly, so I'm not sure how it can be claimed that the slice length is trusted.

With APIs like copy_from_user (in C) and iov::IovIterSource::copy_from_iter_raw (in Rust), the caller has to provide a (trusted) maximum length to copy - but the actual length copied might be smaller than that, since it'll truncate the copy if e.g. it hits an unmapped page (I think), meaning the length returned is a user-controlled value and I believe it should be considered untrusted. Otherwise there's a risk of attackers causing out-of-bounds panics, when a driver makes invalid assumptions about the slice length because it's documented as being trusted and they don't realise it's still attacker-controlled and needs validating.

Different kinds of validation?

taladar — Thu, 04 Sep 2025 09:53:26 +0000

My concern with this is that validation depends on the way you want to use the data, e.g. similar mechanisms in user space have often struggled with the different kinds of injection vulnerabilities where HTML injection is not the same as SQL injection is not the same as shell injection,... where validation needs to test for the absence of different kinds of problems in the input.

I would assume similar concerns could be present in the kernel where e.g. an integer needs to be validated differently if it is used as, say, a month number in a date, than if it was used as a plausible size for a TCP packet.

Validate & Copy?

Wol — Thu, 04 Sep 2025 06:57:38 +0000

My reaction on reading bits of this, was should you combine the validate and copy functions? So a user-space untrusted S could be copied into a validated kernel space T. This would then make clear that TOCTOU bugs are (absent other bugs) impossible, and also - by creating a slightly higher abstraction - make it easier for users of the API.

Cheers,
Wol

Rust could prohibit reuse of a value

iabervon — Wed, 03 Sep 2025 22:23:43 +0000

The UserPtr abstraction does prevent you from validating data you haven't copied (it's an integer rather than a pointer, so you can't dereference it), but it doesn't prevent you from effectively doing copy_from_user() twice. (The documentation of UserPtr even provides an example of how you could implement a TOCTOU bug, presumably as a cautionary tale.) It's like a more intense version of "__user", but addresses the same issue.

I think it might be wise for UserPtr to lack the Copy trait, however, which (I think) would mean that the example bug wouldn't compile without using a clone() method call that shows where the data you check diverges from the data you use, while not affecting any code that only sets up the user/kernel data transfer once.

Finally

Sesse — Wed, 03 Sep 2025 19:53:06 +0000

This is a Good Thing, and I am happy that Rust's type system is strong enough to implement it (or so it seems). I've been using this in Perl since forever, and while it's certainly not a panacea in all situations (e.g., does stuff from your database count as tainted or not?), it's a great way of avoiding XSS-by-just-forgetting-to-validate-stuff and similar.