|
|
Subscribe / Log in / New account

Updates in container isolation

Updates in container isolation

Posted May 17, 2018 1:22 UTC (Thu) by roc (subscriber, #30627)
Parent article: Updates in container isolation

> Kata Containers relies on a kernel running inside the container, which actually expands the attack surface instead of reducing it.

This doesn't sound right. The kernel running inside the container is inside the sandbox (the virtual machine interface implemented by the hypervisor), therefore it cannot add to the attack surface.


to post comments

Updates in container isolation

Posted May 17, 2018 1:36 UTC (Thu) by anarcat (subscriber, #66354) [Link] (2 responses)

The point is that instead of having just the kernel to worry about (which is still running inside the container), you now also have the hypervisor (in this case Xen) to worry about as well.

Updates in container isolation

Posted May 17, 2018 2:23 UTC (Thu) by thinxer (guest, #121772) [Link] (1 responses)

You don't actually worry about the kernel running inside the sandbox. You worry about the sandbox only, which usually has simpler interfaces than the kernel and thus a reduced attack surface.

Updates in container isolation

Posted May 17, 2018 14:00 UTC (Thu) by anarcat (subscriber, #66354) [Link]

Right. I probably got this one backwards, apologies.

Still, the way Xen is designed just feels a little backwards to me as the first layer is not actually the hypervisor itself, but a (compatible) kernel that talks with the hypervisor. And yes, that *does* provide an *extra* layer of security at the cost of performance. But Xen's design also means you need a privileged supervisor domain (the dom0 in the case of Xen) is also part of the attack domain now, and I seem to recall that being used as an attack vector in the past, but I could be mistaken there. I think this is where my analogy came from, but I must admit I cannot substantiate this any further and I am forced to recognize that the attack surfaces are comparable with other hypervisor like gVisor, at least conceptually.

Updates in container isolation

Posted May 17, 2018 23:34 UTC (Thu) by roc (subscriber, #30627) [Link] (3 responses)

Because of this, I think the article obscures the security advantages of gVisor compared to something like Kata Containers. I don't understand the comparison between the number of lines of code in the Linux kernel and gVisor. Are they talking about the guest kernel size? But the guest kernel size in a VM container isn't an issue for security because it's inside the sandbox. Are they talking about the host kernel size making it hard to audit the security of KVM itself? But that would also affect gVisor-KVM, which is going to be the preferred deployment configuration ... assuming your host is running on bare metal or nested virtualization is available.

I think the security argument for gVisor-KVM is that if you have a KVM escape that escapes into the hypervisor's user-level, not the host kernel itself, then you're still in a very restrictive sandbox around the gVisor kernel. Whereas with Kata you'd be in QEMU which probably needs a much less restricted sandbox.

Although one interesting question is, do gVisor-KVM guest processes run at ring 0 or ring 3? If it's ring 3 somehow, then that would be an additional security layer for gVisor, but worse for performance.

I can see advantages for gVisor in terms of memory and storage usage, because the guest can share the host file system rather than mounting its own on a virtual block device.

Updates in container isolation

Posted May 20, 2018 6:31 UTC (Sun) by prattmic (subscriber, #101817) [Link]

The sandboxed application processes run in guest ring 3. The gVisor kernel runs in guest ring 0 (and host ring 3).

Updates in container isolation

Posted May 21, 2018 1:39 UTC (Mon) by bergwolf (guest, #55931) [Link] (1 responses)

> If it's ring 3 somehow, then that would be an additional security layer for gVisor, but worse for performance.

It's ring 3 and each syscall has to vmexit. Bad news for syscall intensive applications.

Updates in container isolation

Posted Jun 1, 2018 8:47 UTC (Fri) by ZhuYanhai (guest, #44977) [Link]

Sentry runs in guest ring0 for kvm platform. So the guest code doesn't trigger a vmexit for its syscall, unless the sentry itself needs some additional syscalls.

And sentry runs in ring3 for ptrace platform, which is designed for development and debug purpose only.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds