By Jake Edge
May 6, 2009
The drive for faster boot times has led to a number of changes in the
kernel. Some, like the parallelization of USB
initialization we looked at last week, have caused disruptions for some
users. But others, like the recently proposed devtmpfs, have a different set of challenges.
While it may provide a good solution to reducing boot times,
devtmpfs faces some
fairly stiff resistance, at least partially because it reminds some folks
of a feature previously excised from the kernel, namely devfs.
The basic idea is to create a tmpfs early in the kernel
initialization before the driver core has initialized. Then, as each
device registers with the driver core, its major and minor numbers and
device name can be used to create an entry in that filesystem. Eventually,
the root filesystem will be mounted and the populated tmpfs can be
mounted at /dev.
This has a number of benefits, all of which derive from the fact that no
user-space support is required to have a working /dev directory.
With the current udev-based approach, there is a need for a
reasonably functional user-space environment for udev to operate
in. For simplified booting scenarios—like rescue tools or using the
init=/bin/sh kernel boot parameter—a functional
/dev directory is needed, in particular because of
dynamic device numbers. It would also be useful for embedded devices that
do not need or want a full-featured user space.
Andrew Morton's immediate reaction was amusement: "Lol, devfs." Greg
Kroah-Hartman, who authored the patch along with Kay Sievers and Jan
Blunck, admitted that it was a kind of
devfs: "Well, devfs 'done right' with hopefully none of the
vfs problems the
last devfs had. :)" But Morton is somewhat concerned that "devfs2", as he calls
it, is just going over old ground:
I think Adam Richter's devfs rewrite (which, iirc, was tmpfs-based)
would have fixed up these things. But it was never quite completed and
came when minds were already made up.
I don't understand why we need devfs2, really. What problems are
people having with [the] existing design?
Though the other advantages are important, Kroah-Hartman replied with the crux of the argument for
devtmpfs:
Boot speed, boot speed, boot speed.
Oh, and reduction in complexity in init scripts, and saving embedded
systems a lot of effort to implement a dynamic /dev properly (have you
_seen_ what Android does to keep from having to ship udev? It's
horrible...)
But Alan Cox is not so sure. His argument
is that moving this
functionality (back) into
the kernel, just papers over a user-space problem, while increasing kernel,
thus not pageable, memory usage. Others think that the kernel should just
buffer uevents—the messages generated by the kernel to send to udev
on device state
changes—until udevd is started. But, that doesn't solve the
synchronization problem: user space must still wait for a populated
/dev hierarchy.
A problem with the current scheme is that it
essentially does the device enumeration twice—once in the kernel as
devices are registered and once in user space by udevd, when it gets
started. The device information that was gathered by the kernel is lost. When
udevd initializes, it walks the /sys directory to find
devices, then creates device nodes for them. That can take 1-2 seconds on
a complex system—on the order of twice the kernel boot time—but
worse still, no other user-space processes can start until this "coldplug"
pass has completed. Using devtmpfs, there will be a working
/dev that other user-space code can use, so that the udev
coldplug pass can be done in parallel.
Several alternate methods of solving the problem were proposed in the
thread, but, by and large, Sievers was able to show why they didn't
actually solve
the problem. In some cases, the behavior of devfs is being
incorrectly attributed to devtmpfs, but the two are quite different.
The new scheme would create root-owned device nodes, with fixed 0600
permissions, for each device. It would avoid much of complexity of
devfs. As Sievers puts
it:
We are not implementing anything crazy here like devfs did, including
the later versions - there is no modprobe behind your back, no lookup
hooks, no stupid new naming scheme, no new filesystem type to
register.
Christoph Hellwig objected to the proposal
as well. Part of his complaint is how quickly devtmpfs was added
to the linux-next tree, but he also sees it as adding devfs back
into the kernel:
It basically does re-introduce devfs under a different name, and from
looking at the implementation it might not be quite as bad a Gooch's
original, but it's certainly worse than Adam Richters rewrite the we
never ended up merging.
Now we might want to revisit the decision to leave all the device name
handling to a userspace daemon, because it [proved] to be quite fragile
under certain circumstances, and you apparently see performance issues.
Sievers outlines the differences between
devtmpfs and Adam Richter's proposal
from 2003. It mostly boils down to complexity; devtmpfs is a much
simpler scheme, which really adds very little to the kernel. The
implementation is around 300 lines of code, in comparison to roughly 3600
for devfs and 600 for an early version of Richter's mini-devfs.
Anticipating the next complaint, Sievers also points out that the device
naming policy is already in the kernel, but that udev can override
the kernel-supplied values if need be. From his perspective this has
already occurred, making that an invalid argument against devtmpfs:
The kernel carries the policy today for 98% of the devices,
if you change any driver given name, it will no longer show up in /dev
with the current name. That's the reality since years, and will not be
different anytime soon, there is no real naming policy besides the
current kernel supplied names.
It is clear that the devtmpfs developers have put a fair amount of
thought into just what was needed, and how it could work with existing
code—both inside and outside the kernel. It is also clear that there
is some resistance to returning to anything even remotely reminiscent of
devfs. Because devtmpfs is really quite different, and
has a nice effect on boot speed, one would think that it is likely to find
its way into the mainline sooner or later. If no further objections are
raised, and the
linux-next trials go well, 2.6.31 may very well be the release that sees
the inclusion of
devtmpfs.
(
Log in to post comments)