Who audits the audit code?
The system call auditing mechanism creates audit log entries in response to system calls; the system administrator can load rules specifying which system calls are to be logged. These rules can include various tests on system call parameters, but there is also a simple bitmask, indexed by system call number, specifying which calls might be of interest. One of the first things done by the audit code is to check the appropriate bit for the current system call to see if it is set; if it is not, there is no auditing work to be done.
Philipp Kern recently noticed a little problem with how that code works with the x32 ABI. When code running under that ABI invokes a system call, it does not use the normal system call numbers defined by the x86 architecture; instead, x32 system calls (which require compatibility handling for some parameters) are marked by setting an additional bit (0x40000000) in that number. The audit code fails to remove that bit before checking the system call number in its bitmask; as one might imagine, the results are not as one might wish. Philipp included a patch to strip out the x32 bit, but it turns out that the problem is a bit bigger than that.
Andy Lutomirski, in looking at Philipp's patch, realized that the code wasn't just failing to strip out one bit; there are, in fact, no bounds checks on the system call number at all. User space can pass in any system call number it wants, and the kernel will use that number to index into its bitmask array; the result for a sufficiently large system call number is a predictable kernel oops. Andy also suggested that this failure could be used to determine the value of specific bits in kernel space, leading to an information-disclosure vulnerability.
Andy submitted a patch to fix this
particular problem, but he didn't stop there. He has come to the
conclusion that the audit subsystem is beyond repair, so his patch marks
the whole thing as being broken, making it generally inaccessible. He
cited a number of problems beyond this security issue: it hurts performance
even when it is not being used, it is not (in his mind) reliable, it has
problems with various architectures, and "its approach to freeing
memory is terrifying
". All told, Andy said, we're better off without it:
It is unsurprising that Eric Paris, who maintains the audit code, disagrees with this assessment. His point of view is that this is just another bug in need of fixing; it does not indicate any systemic problem with the audit code.
It is telling, though, that this particular vulnerability has existed in the audit subsystem almost since its inception. The audit code receives little in the way of review; most kernel developers simply turn it off for their own kernels and look the other way. But this subsystem is just the sort of thing that distributors are almost required to enable in their kernels; some users will want it, so they have to turn it on for everybody. As a result, almost all systems out there have audit enabled (look for a running kauditd thread), even though few of them are using it. These systems take a performance penalty just for having audit enabled, and they are vulnerable to any issues that may be found in the audit code.
If audit were to be implemented today, the developer involved would have to give some serious thought, at least, to using the tracing mechanism. It already has hooks applied in all of the right places, but those hooks have (almost) zero overhead when they are not enabled. Tracing has its own filtering mechanism built in; the addition of BPF-based filters will make that feature more capable and faster as well. In a sense, the audit subsystem contains yet another kernel-based virtual machine that makes decisions about which events to log; using the tracing infrastructure would allow the removal of that code and a consolidation to a single virtual machine that is more widely maintained and reviewed.
The audit system we have, though, predates the tracing subsystem, so it
could not have been based on tracing. Replacing it without breaking users
would not be a trivial job, even in the absence of snags that have been
glossed over in the above paragraph (and such snags certainly exist). So
we are likely stuck with the current audit subsystem (which will certainly
not be marked "broken" in the mainline kernel) for the foreseeable future.
Hopefully it will receive some auditing of its own just in case there are
more old surprises lurking therein.
| Index entries for this article | |
|---|---|
| Kernel | Auditing |
| Security | Linux kernel |
Posted May 30, 2014 6:50 UTC (Fri)
by bnorris (subscriber, #92090)
[Link] (10 responses)
$ grep CONFIG_AUDIT /boot/config-`uname -r`
> (look for a running kauditd thread)
None here.
> even though few of them are using it. These systems take a performance penalty just for having audit enabled, and they are vulnerable to any issues that may be found in the audit code.
I'm not an expert on the kaudit subsystem (in fact, I just learned of it), but it looks like kauditd is only spawned in response to a user-space request for it (e.g. from SELinux auditd). See kernel/audit.c:
static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
So it looks like my Ubuntu system doesn't have any overhead from kauditd; just the overhead of listening (likely low?).
Now, I don't know what other kinds of overhead CONFIG_AUDIT* might have besides kauditd, but I am at least doubtful of these claims now.
Posted May 30, 2014 13:03 UTC (Fri)
by alonz (subscriber, #815)
[Link] (1 responses)
Posted May 30, 2014 18:07 UTC (Fri)
by bnorris (subscriber, #92090)
[Link]
Posted May 30, 2014 17:16 UTC (Fri)
by luto (guest, #39314)
[Link] (7 responses)
To be clear: I don't really object to CONFIG_AUDIT -- it's just CONFIG_AUDITSYSCALL. Once audit has been enabled, you're stuck with syscall auditing overhead until the next reboot. There's a workaround:
# auditctl -a task,never
I'm currently lobbying for Fedora to turn off syscall auditing in their default configuration:
Posted Jun 1, 2014 5:13 UTC (Sun)
by dirtyepic (guest, #30178)
[Link] (6 responses)
Posted Jun 1, 2014 5:21 UTC (Sun)
by luto (guest, #39314)
[Link]
I have no clue what loginuid and sessionid (which appears to be completely unrelated to the POSIX session id) have to do with syscall auditing. It would be easy to split that out from CONFIG_AUDITSYSCALL, since it seems to be almost completely unrelated to syscall auditing, other than the fact that syscall auditing logs the loginuid and sessionid.
Posted Jun 3, 2014 19:48 UTC (Tue)
by zdzichu (guest, #17118)
[Link] (4 responses)
Posted Jun 4, 2014 4:24 UTC (Wed)
by dirtyepic (guest, #30178)
[Link] (3 responses)
Posted Jun 14, 2014 9:19 UTC (Sat)
by Duncan (guest, #6647)
[Link] (2 responses)
But no sign of kauditd. I might just try disabling CONFIG_AUDIT or at least CONFIG_AUDITSYSCALL next time I do a kernel build and see if systemd can run properly with it disabled. I've wondered why I actually needed it ever since I first enabled it. At least that way if I have to actually reenable it, I'll know exactly what I was unbreaking by doing so.
Posted Jun 14, 2014 13:56 UTC (Sat)
by Duncan (guest, #6647)
[Link] (1 responses)
Still don't know what systemd "requires" it for, but it's off now and doesn't seem to hurt me, so... I'm leaving it off.
Posted Jun 16, 2014 11:13 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted May 30, 2014 15:12 UTC (Fri)
by and (guest, #2883)
[Link]
wouldn't it be "kind of straightforward" to rip out the current audit subsystem and replace it with a compatibility layer that translates everything which it exposes to userspace to the tracing subsystem?
Fact checking
CONFIG_AUDIT_ARCH=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
{
[...]
/* As soon as there's any sign of userspace auditd,
* start kauditd to talk to it */
if (!kauditd_task) {
kauditd_task = kthread_run(kauditd_thread, NULL, "kauditd");
[...]
}
According to the posting by Andy, the effect of CONFIG_AUDITSYSCALLS is
Fact checking
It forces all syscalls into the slow path and it can do crazy things
like building audit contexts just in case actual handling of the
syscall triggers an audit condition so that the exit path can log the
syscall. That's way worse than a single branch.
Try it: benchmark getpid on Fedora and then repeat the experiment with
syscall auditing fully disabled. The difference is striking.
Fact checking
Fact checking
Fact checking
Fact checking
Fact checking
Fact checking
Fact checking
Fact checking
Fact checking
Who audits the audit code?
