|
|
Subscribe / Log in / New account

kcmp() breaks loose

kcmp() breaks loose

Posted Feb 11, 2021 16:59 UTC (Thu) by rvolgers (guest, #63218)
Parent article: kcmp() breaks loose

I was wondering why people sounded so scared of this syscall, so I looked it up:

https://man7.org/linux/man-pages/man2/kcmp.2.html

> The return value of a successful call to kcmp() is simply the
> result of arithmetic comparison of kernel pointers (when the
> kernel compares resources, it uses their memory addresses).
>
> The easiest way to explain is to consider an example. Suppose
> that v1 and v2 are the addresses of appropriate resources, then
> the return value is one of the following:
>
> 0 v1 is equal to v2; in other words, the two processes
> share the resource.
>
> 1 v1 is less than v2.
>
> 2 v1 is greater than v2.
>
> 3 v1 is not equal to v2, but ordering information is
> unavailable.
>
> On error, -1 is returned, and errno is set appropriately.
>
> kcmp() was designed to return values suitable for sorting. This
> is particularly handy if one needs to compare a large number of
> file descriptors.

In other words, it does not just test for equality, it establishes an ordering (to help reduce the number of kcmp calls). It's easy to see how gaining information about the layout of kernel objects is useful to attackers.

"Pointer obfuscation" is mentioned so I assume the values which are compared are no longer actual pointer values. Does anyone have more information on this?


to post comments

kcmp() breaks loose

Posted Feb 11, 2021 17:07 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

I suppose the kernel could have a random mask (generated at boot) that it XORs with each pointer before comparison. That should preserve *an* order, but not the actual in-memory layout order, no?

kcmp() breaks loose

Posted Feb 11, 2021 17:17 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link]

kernel/kcmp.c says how it is done:

The obfuscation is done in two steps. First we xor the kernel pointer with a random value, which puts pointer into a new position in a reordered space. Secondly we multiply the xor production with a large odd random number to permute its bits even more (the odd multiplier guarantees that the product is unique ever after the high bits are truncated, since any odd number is relative prime to 2^n).

kcmp() breaks loose

Posted Feb 12, 2021 13:56 UTC (Fri) by daenzer (subscriber, #7050) [Link] (8 responses)

FWIW, Mesa only needs KCMP_FILE, and doesn't care about the difference between positive return values. All it needs to know is whether or not two file descriptors reference the same struct file in the kernel. That's what "this functionality" refers to in my post.

In a follow-up, I suggested another possible solution: Make KCMP_FILE available regardless of CONFIG_CHECKPOINT_RESTORE, but restrict the rest of kcmp to that. But nobody seems to have picked up on it.

P.S. Finally made it into a full-blown LWN article, guess I can retire or at least switch careers now. :)

kcmp() breaks loose

Posted Feb 12, 2021 16:28 UTC (Fri) by kleptog (subscriber, #1183) [Link] (7 responses)

The extra return values are needed for scale if you have lots of FDs. With the extra return values the algorithm for comparing everything to everything else goes from O(n^2) to O(n log n). Given the obfuscation described above I don't see a problem returning the extra info. I can see value in tools like lsof being able to tell if files are the same, and they need to work with lots of FDs.

kcmp() breaks loose

Posted Feb 12, 2021 17:43 UTC (Fri) by nickodell (subscriber, #125165) [Link] (6 responses)

But you only need to perform a kcmp check if the two file descriptors refer to the same file. If they refer to different files, they cannot possibly be the same FD. Is there a situation where you have lots of FDs pointing to the *same* file?

kcmp() breaks loose

Posted Feb 12, 2021 18:35 UTC (Fri) by matthias (subscriber, #94967) [Link]

Checking which FDs belong to the same file will boil down to sort them (according to some criteria) and then compare them. Without some kind of sorting you will not end up with O(n*log(n)), but with pairwise testing, i.e. O(n^2). If kcmp() can provide an order (almost) for free, this is probably much cheaper than first constructing such an order in userspace and then using kcmp() only on those files that are the same.

kcmp() breaks loose

Posted Feb 12, 2021 18:40 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

Isn't it trivial to open a file multiple times?

int fd1 = open("/dev/null", "r");
int fd2 = open("/dev/null", "r");
int fd3 = dup(fd1);

kcmp() breaks loose

Posted Feb 12, 2021 23:35 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

More prosaically, in the case where you need to hibernate an entire container, lots of processes are likely to have certain files open at any given time:

* Whatever systemd/sysvinit/put-your-favorite-alternative-here has attached to the average daemon's stdin/stdout/stderr inside the container.
* /dev/null, as you say.
* /dev/urandom
* Certain files in /etc
* For forking servers, some kinds of sockets and/or pipes, including named fifos and Unix domain sockets.
* Log files and other /var crap.
* Probably half a dozen other things.

kcmp() breaks loose

Posted Feb 14, 2021 4:43 UTC (Sun) by dullfire (guest, #111432) [Link] (1 responses)

Incidentally: "/dev/null" is probably one of the very few cases where userspace won't actually care if it gets restored as the same file entry as another process or not.

kcmp() breaks loose

Posted Feb 14, 2021 13:11 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

While true, one still needs to know that `/dev/null` is the backing file of a given fd to know whether to "ignore" it or not when restoring.

kcmp() breaks loose

Posted Feb 12, 2021 18:57 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

I'd have thought that it would be quite common for many processes to share FDs due to fork().


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds