2.6.33 merge window part 1

Posted Dec 10, 2009 9:37 UTC (Thu) by wahern (subscriber, #37304)
Parent article: 2.6.33 merge window part 1

Ugh. Now those who use chroot will have even more headaches to deal with.

For example, my portable arc4random--which uses sysctl(CTL_KERN, KERN_RANDOM, RANDOM_UUID)--will break. Requiring people to seed before the chroot happens, or requiring users to create device files in the chroot tree doesn't help; those things aren't required on other platforms.

One plus is that there'd be less kernel exposure in a chroot without either /proc or sysctl. And certainly in general removing code is good, though /proc has historically been riddled with kernel exploits; far more than sysctl ever produced. Indeed, the mere existence of /proc outside the chroot has its own problems, like exposing file descriptors--pipes, socketpairs--that would otherwise be unaddressable by other processes. Thus one of the strongest security characteristics--using descriptors as ad hoc "capability" tokens--is totally broken. File permissions aren't nearly as strong a security mechanism as the inability to reference the object.

2.6.33 merge window part 1

Posted Dec 10, 2009 9:55 UTC (Thu) by johill (subscriber, #25196) [Link] (1 responses)

"He then adds back a new wrapper which emulates the sysctl() ABI by way of /proc/sys. So any applications using sysctl() should continue to work, but the code dedicated to making it work is much reduced from what was there before."

I don't think that wrapper actually requires it to be mounted.

2.6.33 merge window part 1

Posted Dec 10, 2009 16:20 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

Correct. /proc does not need to be mounted simply compiled in for the
sysctl(2) support to work.

2.6.33 merge window part 1

Posted Dec 10, 2009 16:37 UTC (Thu) by ebiederm (subscriber, #35028) [Link] (3 responses)

I still believe turning off sysctl(2) as scheduled in Documentation/feature-removal-schedule.txt at the end of next year is a good idea. Code that no-one cares about bit-rots in horrible ways.

A few comments.
arc4random prefers to use /dev/urandom and tries that first so even
inside a nicely setup chroot it will work.

sysctl was absolutely riddled with exploitable code, when I started working on it, and a hole was closed just a few weeks ago. It just happens that no one not even those who exploit kernel issues for the fame looked at the implementation details of sysctl.

I will agree that the sysctl format of only exporting simple integer and string values is much harder to exploit, and as such is a good idea.

As for the file descriptors they are not exposed to other users. The permissions on /proc/<pid>/fd/ are limited. Except for one esoteric corner case you can't do anything more with the file descriptors in proc than you could by attaching a debugger. Using file descriptors as ad hoc "capability" tokens is not broken in any way that I am aware of.

2.6.33 merge window part 1

Posted Dec 10, 2009 19:45 UTC (Thu) by spender (guest, #23067) [Link]

Rather than get into a discussion about private research vs public research, exploit writing vs vulnerability finding, etc, I'll just ask:
What's the CVE for the vulnerability that was fixed?

-Brad

2.6.33 merge window part 1

Posted Dec 10, 2009 20:29 UTC (Thu) by wahern (subscriber, #37304) [Link]

Neither mine nor the original in OpenBSD does this. FreeBSD's seems to do this, but that's still a fault all the same. It's very common to use /var/empty as the root directory.

Though, I'll admit then that Linux wouldn't be the first to break this behavior (if indeed it did, which it hasn't yet). I'll have to fix my apps to stir before any chroot.

As for /proc/$$/fd: take Apache as an example. Site A can access descriptors--specifically anonymous pipes--of site B. That the process for site A could theoretically attach itself to site B is beside the point. Typically both processes are running virtual machines and/or interpreters where debugging interfaces aren't available. Regular file routines, however, are usually available. Breaking out of a VM is significantly more difficult than coaxing a script to eval code.

Requiring a different process user for every site is impracticable, unless perhaps the kernel could provide ephemeral UIDs. Anyhow, you can drop ptrace capabilities, yet because of the growing necessity of /proc it's increasingly just as impracticable to not have /proc mounted.

With the rise of "cloud computing" (née SaS, née time-sharing systems), the notion that privileges are necessarily tied to persistent objects or system-wide credentials is short-sighted. The operating system should provide certain primitives and behaviors that allow applications to create ad hoc privilege systems enforceable by the hardware, e.g. the MMU. Solutions like SELinux, or any other system-wide _explicit_ access control, miss the mark entirely in almost every way imaginable.

2.6.33 merge window part 1

Posted Dec 19, 2009 10:48 UTC (Sat) by jengelh (guest, #33263) [Link]

>I still believe turning off sysctl(2) as scheduled in Documentation/feature-removal-schedule.txt at the end of next year is a good idea. Code that no-one cares about bit-rots in horrible ways.

And for where it matters, glibc could emulate sysctl() by going to /proc/sys instead.