Merging the kvm tool

By Jonathan Corbet
August 23, 2011

The "native Linux KVM tool" (which we'll call "NLKT") is a hardware emulation system designed to support virtualized guests running under the KVM hypervisor. It offers a number of nice features, but an attempt to get this code merged into the 3.1 kernel was deferred by Linus, who did not want to deal with another controversial development at that time. This tool's developers have let it be known that it will be back for the 3.2 merge window; controversy is sure to follow. The core question raised by this project is: what code is appropriate for the kernel tree, and which projects should live in their own repositories elsewhere?

NLKT was started in response to unhappiness about QEMU, the state of its code, and the pace of its development. It was designed with simplicity in mind; NLKT is meant to be able to boot a basic Linux kernel without the need for a BIOS image or much in the way of legacy hardware emulation. Despite its simplicity, NLKT offers "just works" networking, SMP support, basic graphics support, copy-on-write block device access, host filesystem access with 9P or overlayfs, and more. It has developed quickly and is, arguably, the easiest way to get a Linux kernel running on a virtualized system.

Everybody seems to think that NLKT is a useful tool; nobody objects to its existence. The controversy comes for other reasons, one of which is the name: the tool simply calls itself "kvm." The potential for confusion with the kernel's KVM subsystem is clear - that is why this article made up a different acronym to refer to the tool. "KVM" is already seen as an unfortunate name - searches for the term bring in a lot of information about keyboard-video-mouse switches - so adding more ambiguity seems like a bad move. It is also seemingly viewed by some as a move to be the "official" hardware emulator for KVM. The NLKT developers have, thus far, resisted a name change, though.

The bigger fight is over whether NLKT belongs in the kernel at all. It is not kernel code; it is a program that runs in user space. The question of whether such code should be in the kernel's repository is certainly the one that will decide whether it is merged for 3.2 or not.

NLKT would not be the first user-space tool to go into the mainline kernel; several others can be found in the tools/ directory. Many of them are testing tools used by kernel developers, but not all. The "cpupower" tool was merged for 3.1; it allows an administrator to tweak various CPU power management features. The most actively developed tool in that directory, though, is perf, which has grown by leaps and bounds since being merged into the mainline. The developers working on perf have been very outspoken in their belief that putting the tool into the mainline kernel repository has helped it to advance quickly.

Proponents say that, like perf, NLKT is closely tied to the kernel and heavily used by kernel developers; like perf, it would benefit from being put into the same code repository. KVM, they say, is also under heavy development; having NLKT and KVM in the same tree would help both to improve more quickly. It would bring more review of any future KVM ABI changes, since a user of that ABI would be merged into the kernel as well. Keeping the hardware emulation code near the drivers that code has to work with is said to be beneficial to both sides. All told, they say, perf would not have been nearly as successful outside of the mainline tree as it has been internally; merging NLKT can be expected to encourage the same sort of success.

That success seems to be one of the things that opponents are worried about; some have worried that the main purpose is to increase the project's visibility so that it succeeds at the expense of competing projects. The ABI development benefits are challenged; any changes would clearly still have to work with tools like QEMU regardless of whether NLKT is in the kernel, so QEMU developers would have to remain in the loop. It is even better, some say, to separate the implementation of an ABI from its users; that forces the implementers to put more effort into documenting how the ABI should be used.

There is also concern that, once we start seeing more user-space tools going into the kernel tree, there will be an unstoppable flood of them. Where does it stop, they ask - should we pull in the C library, the GNU tools, or, maybe, LibreOffice? Linux is not BSD, they say; trying to put everything into a single repository is not the right direction to take. The answer to that complaint is that there is no interest in merging arbitrary tools; only those that are truly tied to the kernel would qualify. By this reasoning, NLKT is an easy decision. A C library is something that could be considered; perhaps even graphics if the relevant developers wanted to do that. But office suites are not really eligible; there are limits to what should go into the mainline.

That was where the discussion stood at the beginning of the 3.1 merge window; Linus decided not to pull NLKT at that time. Instead, he clearly wanted the discussion to continue; he told the NLKT developers that they would have to convince him in the 3.2 merge window instead. It looks like that process is about to begin; the NLKT repository is about to be added to linux-next in anticipation of a pull request once the merge window opens. This time, with luck, we'll have a resolution of the issue that gives some guidance for those who would merge other user-space tools in the future.

Index entries for this article
Kernel	KVM
Kernel	Virtualization/KVM

Merging the kvm tool

Posted Aug 25, 2011 2:27 UTC (Thu) by dberkholz (guest, #23346) [Link] (4 responses)

Has anyone considered using the kernel tree like an incubator? Let things live there while the API is "growing up" and then move them out into independent repos once they've matured.

Merging the kvm tool

Posted Aug 25, 2011 11:09 UTC (Thu) by mingo (guest, #31122) [Link] (3 responses)

That's exactly what we've done with tools/perf/ and it has worked very well for us.

Except for the detail that we've indefinitely postponed the 'moving out into a separate project' step:

Dependable release schedule: the kernel gives us and our users a perfectly timed, externally enforced stable release heartbeat every 3 months.

No monopoly on maintenance: we cannot abandon the repository and kill/sabotage development that way. If we make a mess of tools/perf/ within the kernel repo others will pick it up and Linus will start merging them.

Dependable development model: well-understood and constantly evolving coding style and review/discussion methodology

Easy access to early adopters: Kernel testers and kernel developers are pretty curious by nature and doing 'cd tools/perf; make -j;' is not something they are scared of trying or lazy to perform, in the kernel repository they already have anyway.

Widespread distribution: the Linux kernel is widely distributed and packaged, so it's easy for anyone to get started hacking on tools/perf/.

These are IMO some of the most important attributes of a good user-space project and by living in the kernel repository we get these benefits without having to fight much to enforce them. (Where the 'fight' would often be against ourselves.)

So I am not surprised at all that the tools/kvm/ developers (disclaimer, I write the occasional patch for tools/kvm/ myself and review their patches) are feeling about it in a similar way.

Merging the kvm tool

Posted Aug 25, 2011 13:10 UTC (Thu) by dberkholz (guest, #23346) [Link]

I agree, tools where devs are the primary/only audience might be more fitting for bundling with the kernel for longer periods of time. I don't really see KVM that way, though.

The problem with not leaving is obvious; there's a never-ending growth of semi-random packages distributed with the kernel, and pretty soon they'll call it LinuxBSD because it includes the whole OS.

Making it explicit that the kernel tree is an incubator rather than a permanent home means that it's not unbounded growth but instead more of an adolescence for new userspace code, which will eventually move out and go to college instead of sucking your resources forever.

Merging the kvm tool

Posted Aug 25, 2011 14:37 UTC (Thu) by deater (subscriber, #11746) [Link] (1 responses)

One thing not addressed is the stable-ABI issue.

External tools trying to use the perf_event ABI tend to struggle, as the perf_event developers are primarily interested in perf working. Breaks in the ABI (and there have been many, though most are minor) are not deemed important if you don't catch them fast enough. And since the perf_event developers never feel the need to use the ABI from outside the kernel tree, it ends up meaning that userspace developers need to be running bleeding-edge kernels at all times or risk their tools being broken.

Kernel and userspace development are very different processes, and if your code isn't in the kernel tree you face an extreme disadvantage.

Merging the kvm tool

Posted Aug 25, 2011 14:45 UTC (Thu) by deater (subscriber, #11746) [Link]

To add an example: the whole OFFCORE_EVENTS issue on Nehalem processors with perf.

Currently it is impossible for external tools to detect if this feature is enabled or not. If you try to use it, sometimes it will return 0, sometimes not (depending which generalized event you've used due to a buggy leak of MSR state). This bug has been there since 2.6.39. What should happen is an error code returned.

If RAW OFFCORE_EVENT support is ever added, then there is no possible way for an external tool to detect this properly, as correct behavior is indistinguishable from the current buggy disabling.

Does the perf tool care? No. Why? Because when it gets support it will happen at the same time that the kernel is updated, so as long as kernel/perf are updated at the same time you'll never notice. perf never has to worry about backward compatability, which is a bit of an unfair advantage.

Zawinski's Law

Posted Aug 25, 2011 2:31 UTC (Thu) by cesarb (subscriber, #6266) [Link] (4 responses)

> Where does it stop, they ask - should we pull in the C library, the GNU tools, or, maybe, LibreOffice?

"Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."

Zawinski's Law

Posted Aug 25, 2011 4:00 UTC (Thu) by jzbiciak (guest, #5246) [Link] (3 responses)

I suspect if this law were stated anew today, you would have "browse the web" in place of "read mail."

Zawinski's Law

Posted Aug 25, 2011 11:41 UTC (Thu) by cesarb (subscriber, #6266) [Link] (1 responses)

> I suspect if this law were stated anew today, you would have "browse the web" in place of "read mail."

That is only because now we have webmail.

Zawinski's Law

Posted Aug 25, 2011 16:25 UTC (Thu) by jzbiciak (guest, #5246) [Link]

And LWN. ;-)

Zawinski's Law

Posted Aug 26, 2011 17:17 UTC (Fri) by sorpigal (guest, #36106) [Link]

In that case, since I don't want Linux to be relaced any time soon, I hereby request that Firefox be moved into the kernel git repo.

Merging the kvm tool

Posted Aug 25, 2011 9:29 UTC (Thu) by Lennie (subscriber, #49641) [Link] (2 responses)

"..the easiest way to get a Linux kernel running on a virtualized system."

Yes, a kernel maybe. But if you don't need a kernel. Which most people don't, A container solution (LXC, Jails, OpenVZ/Linux-VServer, Zones) is much better.

I'm glad Linus thinks the same thing, atleast then there is the will by Linus to accept great checkpoint/restart and live-migration ?:

http://www.networkworld.com/community/node/77850

Merging the kvm tool

Posted Aug 25, 2011 13:29 UTC (Thu) by Klavs (guest, #10563) [Link] (1 responses)

Agreed - LXC is def. a good fit for tools/lxc - if tools/kvm is suppose to be there.

I def. depend on LXC - as I have vserver in the past.

Merging the kvm tool

Posted Aug 29, 2011 13:28 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

LXC is the "Linux" container technology. It's very tied to the Linux kernel that hosts the containers. Instead, the KVM userspace part actually uses KVM just as an accelerator: it has been ported to Solaris with very little changes to QEMU, and reuses almost all of the device models that QEMU uses in it dynamic-recompilation mode.

Merging the kvm tool

Posted Aug 25, 2011 16:40 UTC (Thu) by iabervon (subscriber, #722) [Link]

I don't think an actual full C library would make sense to have in the kernel source; there's a ton of stuff that the C library provides that's only tangentially related to the host kernel. On the other hand, shipping the syscall stubs and such would probably save a whole lot of hassle.

Merging the kvm tool

Posted Aug 26, 2011 7:29 UTC (Fri) by marcH (subscriber, #57642) [Link]

> There is also concern that, once we start seeing more user-space tools going into the kernel tree, there will be an unstoppable flood of them. Where does it stop, they ask.

Where to draw this line is indeed a difficult question. Re-using the existing line between kernel and user-space is a lazy answer.

Merging the kvm tool

Posted Aug 26, 2011 20:10 UTC (Fri) by meyert (subscriber, #32097) [Link] (2 responses)

so what is the advantage of this tool compared to user mode linux (UML)?

Merging the kvm tool

Posted Aug 28, 2011 20:10 UTC (Sun) by robbe (guest, #16131) [Link] (1 responses)

kvm (native or otherwise) runs a plain, vanilla Linux as its guest OS. UML on the other hand is quite heavily modified. It may also lag the current kernel.

For testing your kernel-patch-du-jour, kvm is most certainly the better choice.

Merging the kvm tool

Posted Aug 31, 2011 15:16 UTC (Wed) by nix (subscriber, #2304) [Link]

What? UML is just another architecture, that is all. It's upstream: it is not meaningful to say that it is 'heavily modified' or that it 'may lag', any more than any other userspace program 'may lag'. If you want a matching version, compile with ARCH=um and you have it.

Merging the kvm tool

Posted Aug 28, 2011 3:20 UTC (Sun) by giraffedata (guest, #1954) [Link] (2 responses)

A C library is something that could be considered; perhaps even graphics if the relevant developers wanted to do that. But office suites are not really eligible; there are limits to what should go into the mainline.

Another important category is programs whose function is to control or analzye a Linux kernel. E.g. procps, schedutils, mount.

Merging the kvm tool

Posted Aug 28, 2011 15:42 UTC (Sun) by Julie (guest, #66693) [Link] (1 responses)

procps, schedutils, mount

Wouldn't the use of these imply that they count as being 'tied to the kernel' as suggested earlier on in the paragraph?

Merging the kvm tool

Posted Aug 28, 2011 19:12 UTC (Sun) by giraffedata (guest, #1954) [Link]

procps, schedutils, mount
Wouldn't the use of these imply that they count as being 'tied to the kernel' as suggested earlier on in the paragraph?

Right, that was my point. The article suggests things tied to the kernel could be distributed with it and gives examples of things that may or may not be tied enough to the kernel to warrant inclusion. But it omitted one important category of things tied to the kernel.

Merging the kvm tool

Posted Aug 29, 2011 13:23 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (1 responses)

I must say I have never read a worse article on LWN. Take it as a compliment, as I have learnt so much from LWN that I just cannot enumerate everything. Also, Jon is in all likelihood not the only one that has misunderstood the matter, so it probably helps other people if some of the problematic points in the article are corrected.

Here is a short list of observations:

NLKT was started in response to unhappiness about QEMU, the state of its code, and the pace of its development: NLKT was started in response to Ingo complaining about QEMU and because it's undoubtedly fun to hack on, but no one ever constructed any proof of the problems with QEMU. The command-line problem is obviously solvable with a simple wrapper script.
NLKT is meant to be able to boot a basic Linux kernel without the need for a BIOS image or much in the way of legacy hardware emulation: It's worse than that. NLKT doesn't emulate anything close to a modern PC. For example, it does not support ACPI. This means that even if it is mostly touted as a developer tool, developers of some subsystems would still not be able to use it (unlike QEMU). NLKT is also developing hardware models against Linux kernel source code rather than against the spec; this is very wrong because it is vulnerable to improvements to the drivers. In fact, when QEMU did it this way, newer versions of Linux and/or Windows revenged later (one example is in the ATAPI implementation).
Everybody seems to think that NLKT is a useful tool; nobody objects to its existence. As you correctly point out later in the article, for KVM developers the NLKT is useful because it's an independent user of the same ABI. But for the general public (developers and users) it's about as useful as lguest; it would be useful as a proof of concept tool, but the NLKT developers are expanding it way beyond that. At the same time, they are leaving out features that would obviously be very useful for Linux kernel developers. For example, NLKT does not support a gdbstub for totally transparent debugging of guest kernels.
It is also seemingly viewed by some as a move to be the "official" hardware emulator for KVM: it's open source and we all know that there is nothing "official" but what people use. NLKT so far has 10% of the features that are needed to test real-world guests (all of them, not just Linux), which in turn is what is needed to test KVM kernel changes.
like perf, NLKT is closely tied to the kernel and heavily used by kernel developers". But perf is used a lot by non-kernel developers as a profiling tool. NLKT simply won't work for most Linux enthusiasts who want to check out FreeBSD every once in a while, or have some hardware with Windows-only drivers.
Another problem is that while NLKT is indeed getting some things right, it is doing too many mistakes that QEMU already did, and its roadmap includes too many features that contradict its original goal. From the recent talk at KVM forum, Grub support stands out: a boot loader is all about legacy device support.
there is no proof that perf would not have been nearly as successful outside of the mainline tree as it has been internally: There is actually no proof for this. Unlike oprofile, perf was developed by several core kernel maintainers. It seems to me like a much stronger factor impacting the success of perf vs. oprofile.
You even say That success seems to be one of the things that opponents are worried about, but that's absolutely not a factor. Everybody is amazed at the lengths at which NLKT has come, but still NLKT is a solution in search of a problem (except if your problem is to find something fun to hack on, of course! :). The kernel component of KVM is very mature, and so is the ABI. Unlike perf, KVM would not have benefits from forcing lockstep upgrades of the kernel and userspace components. The part of KVM that is more in flux is the host-guest ABI, which you must preserve anyway since you cannot break compatibility with existing guests.
NLKT is also repeating the same mistakes as early QEMU, and a few others: for example, unlike KVM itself, NLKT is strictly limited to x86. While undoubtedly a burden for QEMU developers, non-x86 support helps keeping you honest about the specs you are trying to implement, and it can also be extremely helpful for kernel developers. (A few years ago Ingo was suggesting that Linux be compiled with its own compiler, but now he objects to QEMU because it includes a just-in-time compiler ;)). See also the point above about developing hardware models against Linux kernel source code.

I hope this long comment helps understanding the situation better. I look forward to the replies!

Merging the kvm tool

Posted Aug 31, 2011 1:35 UTC (Wed) by nevets (subscriber, #11875) [Link]

there is no proof that perf would not have been nearly as successful outside of the mainline tree as it has been internally: There is actually no proof for this. Unlike oprofile, perf was developed by several core kernel maintainers. It seems to me like a much stronger factor impacting the success of perf vs. oprofile.

I actually believe there's proof that perf would have done just as good if not better outside the tools directory. Sure, it has lots of developers, but most of them are kernel developers and very few userspace developers. The proof that perf would have done well outside of the kernel is based on the fact that the tool that influenced perf's development model was done outside the kernel. That tool is called git.

git did not need to be in the tools directory to become popular. I'm sure perf would have followed git's success if it was also outside the kernel tree. In fact, I believe it would attract more userspace developers, and it would have probably grown a graphical user interface as well if it was outside the kernel.

If perf and NLKT wants to be mostly developed by kernel developers, than perhaps it should stay in the kernel. I wounder how many more user space developers perf would have if it was outside the kernel tree.

Building the kernel bundled user-space en-masse

Posted Sep 7, 2011 6:56 UTC (Wed) by alex (subscriber, #1355) [Link]

With this growth in userspace code in the kernel git tree is there an easy way to build them all. Given that they are potentially linked to the kernel ABI should I be thinking of installing them in /lib/modules/`uname -r`/bin?