Virtualization: now what?
Andrew Morton responded with a note praising the way the work has been done, but asking a fundamental question:
All of which begs the question "now what?".
The worry is that the kernel developers could merge a large amount of non-trivial code, make a number of internal kernel interfaces more complicated, and still not have an end result that is useful to the containers community. The fact that the developers working in this area were able to agree on a patch for UTS namespaces is encouraging, but it is not a guarantee that consensus will be reached on the more complicated changes. The possibility of an intractable disagreement derailing the whole process partway through is a real one.
On the other hand, keeping all of the container code out of the kernel until it is reasonably complete has its own costs. Some of the container changes look to be relatively large and intrusive. Maintaining them all out of the tree would not be a great deal of fun. Neither would merging the whole mess at some future point when enough developers can agree that they are "done."
There are a number of features needed by the projects concerned with virtualization and containers. They include:
- The UTS namespace patch mentioned above.
- PID virtualization,
isolating each group of processes on the system from each other, and
allowing process IDs to be reused between containers.
- Namespaces for SYSV interprocess communication primitives (semaphores,
shared memory, and message queues).
- Time virtualization, so
that each container can have its own idea of what time it is.
- Virtualization of user and group ID values.
- Network namespaces, intended to give each container a specific set of network interfaces to which it has access. When used in conjunction with IP aliases, this feature can set up a separate IP address for each container and keep containers from accessing each others' traffic.
The ability to virtualize the view of the filesystem through namespaces is also required, but Linux has had that capability for some years now. Some of the more advanced container capabilities - live checkpointing and process migration, for example - will require yet another set of deep kernel hooks.
Most container concepts need most of the items from the list above to be able to provide useful isolation. So, somehow, a path must be found to get those features into the kernel without running into a blocking disagreement partway through - assuming that container support is considered desirable in general, of course.
Andrey Savochkin came up with a proposal which could be a good step forward: implement the network namespaces feature first. It is one of the most complex features, and it must be implemented in a way which doesn't upset the highly refined sensibilities of the networking subsystem developers. Some fairly tricky side problems - such as virtualizing access to /proc and sysfs - will have to be solved in the process. All told, it may be the hardest part of the problem, and it may be the place where an extended disagreement is most likely to show up.
Often, developers like to take on the easier parts of a problem first,
then apply any lessons learned to the harder parts. In this case, however,
starting with the hardest part may make some sense. If no universally
acceptable solution can be found, the idea of generalized container support
in the kernel can be dropped before too much other code has been merged.
If, instead, the developers involved are able to implement something which
pleases (or, at least, does not mortally offend) everybody, they should be
able to get over any other roadblocks which may show up later on. In that
case, the various pieces of the puzzle could be merged with confidence as
they become ready.
Index entries for this article | |
---|---|
Kernel | Virtualization/Containers |
Posted May 27, 2006 10:43 UTC (Sat)
by gadeiros (guest, #3929)
[Link] (1 responses)
Wouldn't these changes not be big enough to start Linux 3.0 development (maybe call it 2.9 or 2.99 until it's stable enough for 3.0) and keep the 2.6 branch for driver additions some other "smaller" changes and bug/security fixes which are relativley easy to forward port later?
Posted Jun 1, 2006 14:14 UTC (Thu)
by rvfh (guest, #31018)
[Link]
Why branch when you can do both in the same tree? Disable the
macro/configuration option and you have 2.6, enabled it and you
have '3.0'. Until '3.0' becomes '2.6'. That's how the preemptable kernel stuff works at least.
Posted Jun 3, 2006 13:57 UTC (Sat)
by mtrob (guest, #1404)
[Link]
Dumb question of somebody not involved in kernel development...Virtualization: now what?
Virtualization: now what?
I thought the OpenVZ submission already supported a lot of these features?Virtualization: now what?