LWN: Comments on "mseal() gets closer"

mseal() gets closer

kmeyer — Thu, 01 Feb 2024 01:58:59 +0000

Maybe https://patchwork.kernel.org/project/linux-mm/cover/20240... :

> The specific scenario currently in mind is
> glibc's use case of loading and sealing ELF executables. To this end,
> Stephen is working on a change to glibc to add sealing support to the
> dynamic linker, which will seal all non-writable segments at startup.

Ok, interesting.

mseal() gets closer

kmeyer — Thu, 01 Feb 2024 01:55:42 +0000

> Röttger is working on adding support to the GNU C Library so that most programs would be able to run with a fair amount of sealing automatically applied.

Did you link to that somewhere or have more details? I'm curious what glibc would seal by default.

mseal() gets closer

amarao — Wed, 24 Jan 2024 05:43:10 +0000

But what it has to do with Microsoft eals?

If you want to misread, you will.

mseal_all()

itsmycpu — Tue, 23 Jan 2024 02:00:46 +0000

> While I agree that in principle a mechanism like this might prove useful, it is far more complex than the mechanism which is currently proposed, and I'm not sure it would make sense to tie it to this particular API (especially seeing as they just got finished *removing* the concept of different "types" of sealing).

Yes, of course this would be a separate step. And thanks, I guess.

> One thing I do feel obligated to point out, as said by the immortal James Mickens[1]: "Gadgets are eternal. There will always be gadgets, there were gadgets before we got here, there'll be gadgets after we're dead." In other words, you really can't say "this was executed by read-only code, therefore it must be non-malicious," because of ROP-style attacks. No matter how many control flow invariants you try to enforce, sooner or later somebody is going to invent another way of fiddling the instruction pointer into a clever position and exploiting code that already exists.

Sure, a high bar which probably can't be reached by any single measure. Any sealing, automatic or explicit, be it from libc, the kernel, the loader, or otherwise, that doesn't require (potentially simple) apps to figure out address ranges, would be another separate step in this sense.

mseal_all()

NYKevin — Tue, 23 Jan 2024 01:24:55 +0000

> This would mean the mappings internally receive a timestamp, and any attempt to change a restricted mapping involves comparing the code's seal-timestamp to the mappings seal-timestamp.

While I agree that in principle a mechanism like this might prove useful, it is far more complex than the mechanism which is currently proposed, and I'm not sure it would make sense to tie it to this particular API (especially seeing as they just got finished *removing* the concept of different "types" of sealing).

One thing I do feel obligated to point out, as said by the immortal James Mickens[1]: "Gadgets are eternal. There will always be gadgets, there were gadgets before we got here, there'll be gadgets after we're dead." In other words, you really can't say "this was executed by read-only code, therefore it must be non-malicious," because of ROP-style attacks. No matter how many control flow invariants you try to enforce, sooner or later somebody is going to invent another way of fiddling the instruction pointer into a clever position and exploiting code that already exists.

[1]: https://youtu.be/ajGX7odA87k?si=y0eIv5UAtcv28-Zd&t=1874

mseal_all()

itsmycpu — Mon, 22 Jan 2024 23:05:08 +0000

Or something like this: mseal_all() would mean that all existing mappings perhaps have certain unconditional restrictions, yet the additional restriction that other operations on them can only be performed by code that is now sealed and in a read-only memory area. (This would mean the mappings internally receive a timestamp, and any attempt to change a restricted mapping involves comparing the code's seal-timestamp to the mappings seal-timestamp.)

mseal_all()

itsmycpu — Mon, 22 Jan 2024 22:39:46 +0000

> This is precisely my point. The only piece of code that knows whether a given mapping is safe to seal is
> the piece of code that actually created that mapping. Neither the kernel, nor the application,
> nor libc can safely seal a mapping that it does not have direct knowledge of.

You are surely right in many ways, however I'd like to question this for a simple application that does fancy things only during intialization if at all.
Perhaps, after setting everything up, a simple app can say: From this point on, only simple things should happen:

For example, no existing mappings that are writable should become executable anymore, and no existing mappings that are executable should become writable anymore. Maybe this requires additional features in mseal() or elsewhere, also glibc should be able to say: this new mapping should not be changeable to 'executable', but it should remain possible to free it.

In any case, the text I quoted implies that the kernel and the "shared library linker" can automatically seal many mappings, and that would be partial success.

mseal_all()

NYKevin — Mon, 22 Jan 2024 20:52:46 +0000

> Well at startup the app might first want to create a few mappings (directly or indirectly) that are unknown to the userspace lib, and maybe not directly known to the app either.
>
> I don't know if you can reliably assume that a userspace lib isn't modified or replaced, in part or whole. [...]

This is precisely my point. The only piece of code that knows whether a given mapping is safe to seal is the piece of code that actually created that mapping. Neither the kernel, nor the application, nor libc can safely seal a mapping that it does not have direct knowledge of.

1. The loader can probably(!) seal things like the .text segment and other very basic "this memory is always statically allocated" segments. If the loader is not prepared to do that, then I suppose libc could probably figure it out.
2. libc can seal mappings that it creates during startup or for other internal purposes, but probably not any mappings involved in malloc (because they might need to be unmapped later when freed). Similarly, it can probably seal mappings created with functions like dlopen(3), but then you can't unmap them when you dlclose(3) them, so maybe that's a bad idea?
3. The application can seal mappings that it creates manually, if desired.
4. libfoo can seal mappings that it creates manually, if it somehow(?) knows that those mappings will never be unmapped or remapped. For most libraries, that seems a bit presumptuous, but I suppose some libraries might explicitly say in their API "this function creates a permanent allocation that cannot be freed, so don't call it in a loop or something, because you'll leak memory."

To the best of my understanding, the kernel is not in a position to distinguish any of these items from each other - it just sees them all as "mappings." So a kernel-side mseal_all() would be very much an all-or-nothing operation, and since that's obviously unworkable (you can't seal random mappings out from under random bits of code without warning them!), it would have to be a userspace function that knows the difference between these mappings and can selectively seal just the mappings that are safe to seal.

(1) and (2) can be done automatically at startup (or when the mapping is created), so mseal_all() doesn't need to touch them (it would be redundant). It might be nice for mseal_all() to do (3), to save the application writer the trouble of calling mseal() repeatedly, but the problem is that you can't reasonably distinguish (3) from (4) at runtime, and it is certainly not safe for an application to seal (4) behind libfoo's back, because...

> Maybe the userspace lib would have options like "Use mseal() to prevent new heap and/or stack mappings from being made executable", just for example, if that can't be enforced from the kernel side.

...the lib would have to refrain from unmapping or re-mapping anything that has been sealed behind its back. Which pretty much means the lib has to be using its own internal malloc-like function (and not libc malloc), and that function has to be designed to never discard or resize a mapping (unlike libc malloc in practice). Of course, you could also have a lib that just uses libc malloc, and never directly creates a mapping itself, and that would be fine if your mseal_all() avoids touching libc-owned mappings. The problem is, what if your lib is itself a custom allocator, but not one that is aware of these funky "don't remap anything" rules? Then you basically can't use mseal_all() when that lib is loaded, or else you will break it. At that point, it's probably cleaner to just tell application writers to manually call mseal() on the specific mappings that are safe to seal.

mseal_all()

itsmycpu — Mon, 22 Jan 2024 09:19:45 +0000

Maybe of interest in this context, this text regarding OpenBSD (linked in a previous article) talks about various ways in which both the kernel and the "shared library linker" could automatically apply seals :

https://lwn.net/Articles/915662/

mseal_all()

itsmycpu — Mon, 22 Jan 2024 06:13:59 +0000

> But I think that starts to become redundant to "just automatically mseal everything that libc knows it can safely mseal,"
> and you don't need a function for that, it can just automatically happen at startup.

Well at startup the app might first want to create a few mappings (directly or indirectly) that are unknown to the userspace lib, and maybe not directly known to the app either.

I don't know if you can reliably assume that a userspace lib isn't modified or replaced, in part or whole. I'd probably rather have a kernel function at least for "most", that maybe doesn't prevent additional new mappings, if that wouldn't work. Or perhaps limits future mappings to some degree.

Maybe the userspace lib would have options like "Use mseal() to prevent new heap and/or stack mappings from being made executable", just for example, if that can't be enforced from the kernel side.

mseal_all()

NYKevin — Mon, 22 Jan 2024 04:21:49 +0000

> This assumes that there is way to do this without preventing the mere allocation of more memory. If that currently isn't possible, maybe it can be made possible.

There is no general way to do that. To explain why, I'm going to introduce two made-up terms:

* Some portion of memory is kernel-allocated if it belongs to a valid mapping of some kind. In other words, memory is kernel-allocated if it is possible to dereference a pointer to that memory without segfaulting.
* Some portion of memory is userspace-allocated if it is valid stack memory (as defined by the architecture etc.), if it is statically allocated, or if has been returned by malloc or some malloc-like function and not subsequently freed. In other words, memory is userspace-allocated if it would be "valid" for the program to actually use the memory for some purpose, without needing to do any malloc-like bookkeeping. The term "valid" is intentionally undefined, because the semantics of malloc and malloc-like functions will depend on the implementation and API, but in general, this is roughly synonymous with C's notion of pointer validity (i.e. you are not generally allowed to just make up your own pointers into the heap and do whatever you like with them).

The basic problem here is that the total amount of kernel-allocated memory is finite. Once the address space is sealed, you cannot add any more mappings, so you cannot kernel-allocate any more memory. Therefore, the only way to create more userspace-allocated memory is to use the kernel-allocated memory you already have, and you will eventually run out.

The other problem is that, in practice, glibc malloc tacitly assumes it can just mmap whatever it wants, whenever it wants. If you malloc a large amount of memory in one call, it will not fiddle around with the existing heap memory, it will just pass the arguments through to mmap, and give you a whole new mapping. This mapping may later be unmapped if you call free (or maybe it isn't, I haven't actually read the source code). It also calls mmap if it detects thread contention, which is probably difficult to predict in real applications. These are theoretically changeable behaviors, but I doubt the glibc people would be happy with the resulting performance regressions.

For these reasons, I think it would have to be a libc service, because only libc has the necessary userspace knowledge to figure out which mappings are safe to seal, which ones might need to be created in the future, and which ones might need to be unmapped in the future. But I think that starts to become redundant to "just automatically mseal everything that libc knows it can safely mseal," and you don't need a function for that, it can just automatically happen at startup.

mseal_all()

itsmycpu — Mon, 22 Jan 2024 00:44:32 +0000

Yes, my understanding is that mseal() fixes the permissions (like read/write/execute) only.

So the idea is that an app first creates any memory definitions it needs, and then, assuming it arrives at a point where it doesn't want to change or add anymore, at that point it calls mseal_all() to prevent any further unwanted or accidental modifications.

This assumes that there is way to do this without preventing the mere allocation of more memory. If that currently isn't possible, maybe it can be made possible.

I'm not sure if a C lib is in the best position to do this, the kernel might have a better overview of the process's resources, and the kernel might be in a better position to do this securely.

mseal_all()

NYKevin — Sun, 21 Jan 2024 23:10:39 +0000

Correction: mseal does not prevent you from writing to the memory, so actually this is less of a problem than I thought. Still not sure it's workable, however, because nearly any nontrivial malloc implementation will eventually want to call sbrk or mmap, and that would be prevented by sealing the whole address space. With cooperation from libc, some kind of more restricted sealing might be possible, however.

mseal_all()

NYKevin — Sun, 21 Jan 2024 22:58:40 +0000

I'm not certain what you propose for mseal_all to do, but there are not a whole lot of options that make sense to me:

1. mseal the process's whole address space. But then you can't create any new memory mappings, because they would have to fall somewhere within the process's address space, and the whole address space is sealed. You also can't modify the data in any existing memory mapping, so your process is pretty much not allowed to touch anything that is not a register. I guess you can move the instruction pointer around (with nops or branch instructions?), but not much else.
2. mseal all parts of the address space that have mappings. But then you probably break half of libc, because libc has significant amounts of live data it expects to be able to update. Unfortunately, that includes libc malloc, which has free lists, block metadata, etc. that it has to update when you malloc. It also includes a significant amount of unallocated heap memory that malloc currently believes it has the right to hand out (without having to call sbrk or mmap), so even if malloc could somehow return a block of memory, that block might already be sealed. The only way around this is to mmap your own private memory arena and then use a third-party malloc that can be confined to that arena. But even then, you've still broken things like fread/fwrite (which have userspace buffers), and probably lots of other stuff.
3. A library function in libc, that seals all parts of the address space that have mappings, except for anything that would cause libc to break. We can subdivide this into two kinds of memory: Memory that is owned by libc (and that libc is OK with sealing), and memory that is not owned by libc. The former can be sealed by libc automatically for all processes during startup, and so we don't need a function to do it. The latter is potentially dangerous, because if you don't own a given piece of memory, you have no way of knowing whether it is safe to seal (so instead of breaking libc, you'd just end up breaking some other library instead).

mseal_all()

itsmycpu — Sun, 21 Jan 2024 11:02:51 +0000

Does a simple app, which plans to do not much else than to allocate more memory via malloc-like functions, have something worth sealing?

I'd guess a C lib can't just seal everything by default, so I wonder if a "mseal_all()" function would make sense that has no parameters at all.
It could be used by those who wouldn't much about how to use mseal() and its parameters.

(Of course in addition to mseal(). )

mseal() gets closer

Cyberax — Sat, 20 Jan 2024 20:52:55 +0000

About naming, I actually started looking up WTF is "mimmu table" when I read the article about mseal/mimmutable several months ago. Can we settle on mseal(), please?

Naming things - mimmutable vs mseal

dezgeg — Sat, 20 Jan 2024 20:03:31 +0000

setuid() is one good example where the Linux syscall has different semantics than the libc setuid(). The syscall version only applies to the current thread which is not POSIX compatible. It's then libc which does setuid() for each thread of the process to match POSIX.

Naming things - mimmutable vs mseal

Karellen — Sat, 20 Jan 2024 15:58:45 +0000

Is it worth considering that whatever libc API is used to wrap the syscall, does not necessarily need to use the same name as the syscall? Or, libc could present multiple wrappers for the same syscall - including adding a BSD-like mimmutable() that calls mseal() under the hood? (e.g. AIUI glibc open() actually calls openat(2) on Linux these days, ignoring Linux's native open(2) syscall altogether.)

Of course, glibc devs may be opposed to this for any number of perfectly valid reasons, but that doesn't mean the option isn't there. Or, if not glibc, then another libc might want to.

Naming things - mimmutable vs mseal

fredrik — Sat, 20 Jan 2024 08:12:26 +0000

Naming things is hard, but I'm curious how it works in this case, and wonder if someone can shed some light on it?

AFAIU the mseal API has now evolved to become semantically very similar to mimutable in OpenBSD, right? Especially if OpenBSD were to remove the ability to reduce the permissions on a sealed region, and we ignore the currently unused flags argument for mseal.

So, assuming they are semantically the same at this point, is there any advantage to change the name of mseal to mimmutable now?

AFAIU, once a new user space API is committed to mainline and released, the name and semantics are more or less set in stone due to Linus' pledge to kernel API stability, right? I.e renaming it is only possible before it goes in.

Which leaves my question, does it make sense to do so from some point of view, say technically or to indicate to users that they do the same thing?

I can imagine that one argument against adopting the OpenBSD name would be that if the semantics of the API in Linux were to diverge from OpenBSD at some point in the future, retaining the original name from OpenBSD would muddle the water at that point. OTOH, userspace API:s are expected to semantically stable too, so perhaps that isn't possible anyway.

What other pros and cons are there to having the same or different names, assuming the API is semantically equal?

Now, it may very well be that the authors of mseal think that their name simply is a better choice to convey its purpose. And I assume that no libc standards committee was involved in picking the name mimmutable for OpenBSD. Basically the only reason mimmutable would have precedence over mseal is that OpenBSD picked that name before any implementation appeared in the Linux kernel.

PS. I really do not intend to start a heated bikeshedding debate with these questions. So if anyone else feel an irresistible urge to do so, please raise another thread for that. Thanks! And do consider that the editors of LWN probably would be very happy if you abstained entirely. :)

mseal() gets closer

acarno — Fri, 19 Jan 2024 20:02:30 +0000

What a pun-derful ending to a very informative article =)