LWN.net Logo

The future of device numbers

Greg Kroah-Hartman has, it seems, received a fair amount of email from devfs users, many of whom are not pleased with the fact that devfs has been marked "deprecated" in 2.6. Never mind that Greg didn't do that... But Greg is the primary author of udev, which is intended to replace devfs in the future. With the intent of cutting down on hate mail, Greg has posted a lengthy diatribe on why, he thinks, the udev approach is better. It's not at all clear that his posting will have succeeded in that goal, but it does make the current thinking (accepted by most kernel developers, it seems) clearer.

The posting also inspired a lengthy thread on the meaning of Linux device numbers and how they will be handled in the future. For starters, we now have Linus's explanation of why he chose to expand the device number type to 32 bits, rather than the expected 64:

Note that one reason I didn't much like the 64-bit versions is that not only are they bigger, they also encourage insanity. Ie you'd find SCSI people who want to try to encode device/controller/bus/target/lun info into the device number.

We should resist any effort that makes the numbers "mean" something. They are random cookies. Not "unique identifiers", and not "addresses".

Linus's talk of "random cookies" set off some alarms from developers who foresee a world where devices could have different numbers every time the system boots. Linus's response was unrepentant; he claims that (1) that world already exists, and (2) attempts to create relatively stable device numbers just encourage applications to depend on those numbers not changing, and thus create bugs.

Anybody who has plugged two similar USB devices into the same system has already experienced one kind of device number instability. The kernel will assign numbers based on the order in which it discovers the devices; that order depends on a number of things, including, simply, which device was plugged in first. There is no way in the general case to provide stable numbers for this sort of hot-pluggable device. Other devices, such as iSCSI disks, are even worse. Discovering all of the available devices can be a challenge by itself; there is no way that this discovery will happen in a predictable order.

So, for many kinds of devices, variable device numbers is simply a fact of life. So, says Linus, it is better not to even try to keep numbers stable.

Basically, if you cannot 100% guarantee reproducibility (and nobody can, not your hashes, not anything else), then the _appearance_ of reproducibility is literally a mistake. Because it ends up being a bug waiting to happen - and one that is very very hard to reproduce on a developer machine.

To bring that point home, Linus has raised an idea that Greg has presented a few times in the past: making all device numbers random. This change would quickly flush out any code which made assumptions about device numbers, whether it be in the kernel or in user space. Of course, random device number assignment is a feature for a development kernel; Linus acknowledges that, "for simple politeness reasons," device numbers should be kept as stable as possible in stable kernel releases.

In any case, the point of all this is not to confuse users about the organization of their system. But, in a world where device numbers can offer no real clues about the hardware on a computer, something else needs to create stable names by which devices can be identified. That, of course, is the purpose of tools like udev. As a way of showing how flexible udev can be, Greg posted a brief script which makes CD drives available by the name of the disk (as obtained from CDDB) currently inside. This scheme is unlikely to become part of any major distribution in the near future, but it does show how elaborate device naming can be. For some sorts of devices, a conversation with a remote server may well be part of the naming process. As naming gets more complex, it becomes increasingly clear that it simply cannot be done in the kernel.

That, of course, is one of the main objections to devfs - the naming policy is implemented entirely in kernel space. The udev approach moves that policy back out to user space, where it can be easily changed and extended. The remaining devfs users will want to look at switching over, but there is no particular hurry; Andrew Morton has made it clear that devfs will continue to be supported through the lifetime of 2.6 and, possibly, beyond.


(Log in to post comments)

"Long live the Amiga." Or something.

Posted Jan 8, 2004 15:28 UTC (Thu) by kena (subscriber, #2735) [Link]

Back in the day of AmigaDOS, device numbering was done in two different ways: you could access the primary floppy device (fd0:), or -- and the OS was just as happy -- you could access the _label_ of the floppy, eg. "cool_games:". That way, you could just plug your floppy into any given drive, and access it, without needing to know the name of the device. While this would clearly require some serious rethinking in Linux-land, it might be something to keep in mind as, at the least, a way others have dealt with similar situations in the past.

"Long live the Amiga." Or something.

Posted Jan 12, 2004 20:47 UTC (Mon) by elanthis (guest, #6227) [Link]

You will absolutely love udev then. That's exactly what it lets you do.

The future of device numbers

Posted Jan 8, 2004 17:36 UTC (Thu) by holstein (subscriber, #6122) [Link]

Sorry to show my complete ignorance of the matter, but who exactly will remain, in a 2.6 world, as the devfs users, if udev is the default solution used by the kernel?

Does that means that there will still be distributions using it? Or perhaps embeded systems users? Is it because there not enough tools built around udev?

Thanks for any light on this!

The future of device numbers

Posted Jan 8, 2004 18:37 UTC (Thu) by set (guest, #4788) [Link]

The kernel really doesnt care what you use; you can have a static
/dev directory filled with ordinary device files, or if your kernel
has devfs support configured in, you can optionally mount devfs over
/dev, and use that. Additionally, if your kernel has, IIRC, hotplug
and sysfs configured in, you could optionally use udev.

In other words, what you use is a mater of choice, or more likely
the choice of your distribution. (eg. Gentoo likes to use devfs,
but you dont _have_ to.) However, when they say that devfs is
deprecated, it means someday they will rip out that code, and then
choosing devfs will be harder;)

(as an example, currently, I run Gentoo, with a 2.6 kernel, and use
devfs. a friend of mine does the same, except no devfs, just a
static /dev directory.)

devfs, or the lack thereof, in 2.6

Posted Jan 8, 2004 22:23 UTC (Thu) by Duncan (guest, #6647) [Link]

As a Mandrake user, with Mandrake defaulting to devfs, tho a static /dev was
possible and used in "failsafe" mode by default, the stuff I'd read about devfs
being depreciated in 2.6 was one of the reasons I've hesitated to upgrade to it.
I always knew it was just a matter of sloughing thru a bunch of
documentation in ordered to understand the changes and be confortable with
them, but I'd somehow never gotten around to it, yet. This article, with the
backrounder of the long udev posting it was about, and the backgrounders
that message in turn pointed to, have gone a long way toward correcting my
understanding of the situation in 2.6, and I now feel much more comfortable
with the idea of upgrading.

It was also gratifying to see an LWN article listed as a reference in the
presentation given back in July at OLS. The comparison of the Linux /dev
tree to "the web of a spider on drugs" seems apt indeed, if you consider each
symlink a strand between the two points it links. It's nice to see the greater
Linux community quoting LWN! That's exactly the sort of thing I was
referring to in my comment on the LWN Update article on the front page of
this week's weekly edition -- that even when I DON'T get a chance to follow
LWN myself as I'd like, it continues to provide an important resource for the
Linux community, and as such continues to be good value for my subscription
dollars. Again, thanks, LWN!

Duncan

devfs, or the lack thereof, in 2.6

Posted Feb 26, 2004 12:18 UTC (Thu) by Duncan (guest, #6647) [Link]

> I now feel much more comfortable with the idea of upgrading.

Whoever may be reading this far back in the archives I don't know, but
someone might come across this article while doing a search on
2.6/udev/devfs, and someone else might do what I just did and take a look
around after following an LWN back-reference to an earlier article, so..

.. In particular for anyone coming by looking to upgrade, and particularly if
that upgrade is on Mandrake..

It's now late Feb. and I've been running the 2.6 kernel for several weeks. The
switch was fairly easy and painless, even on Mandrake, with their normal
devfs dependance, and even tho they haven't posted a packaged kernel, even
to cooker, for my arch (amd64/x86_64/ia86e), because their supermount
patches apparently won't compile on the platform with 2.6. I'd never liked
supermount anyway, it was to potentially problematic, and once I got used to
mounting removable drives manually (since supermount had been
temporarily removed due to issues in 8.1, my "jump from MSWormOS"
version), I actually PREFERRED the control of doing it that way and
KNOWING the status of my various mounts, so that loss wasn't missed.

I decided to take the opportunity to learn a bit more about the kernel, this
time, since the last time I'd really examined things and when I learned how to
compile my own kernels was a couple years ago, back less than three months
into the switch from MSWormOS thing.. while I was still booting back to it to
run OE for mail and news! Also, since I hadn't tried compiling a kernel AT
ALL on my new architecture (dual AMD Opteron, thus AMD64), I wasn't
familiar with 2.4 either, yet. Therefore, I started there, procuring Mandrake's
latest 2.4 kernel source package, installing it, and then starting with make
menuconfig to get an idea of what had changed since I last looked at the
kernel and how it might be different on the new arch.

After eventually getting a workable 2.4 kernel up and running with my chosen
options, I d/led the two available Mdk kernel 2.6 srpms, the mainline one, and
the tmb or "hackkernel". Note again that Mdk hadn't provided binary 2.6
packages for AMD64, as their supermount wouldn't (and still won't, AFAIK)
compile in 2.6, on AMD64, apparently due to 64-bit unclean code. Thus, I
had to extract the tarballs from the srpms and copy the source over manually,
but that sort of srpm hacking has become somewhat the norm on Mdk 9.2 RC
for AMD64, as I've had fiddle with them to procure binaries for stuff not yet
ported in a number of instances. However, i586 folks wouldn't have had that
issue to deal with.

Anyway, having extracted the two Mdk 2.6 kernels, then 2.6.2-rc1, I believe, I
manually configured each one separately including all options from scratch
(using make menuconfig), compiled, attempted to run, tweaked and
recompiled again, until one ran, manually went thru the menuconfig on the
other again, then ran diff on the opposite config files to figure out how they
differed, and what I wanted to do about it where I'd chosen different options
on one vs the other. Again, note that I had one running by this point, so I
was just holding true to my goal of going a bit more in depth learning about
the kernel, and trying to get the best compilation for my system. After
deciding how I really wanted the options, from the diffs, and figuring out
which items appeared only in one kernel, I compiled the other kernel (and
modules), installed, and fired it up. After further tweaking now that I had 2.6
running to find exactly what I needed and what I could leave uncompiled (or
as modules, but without an initrd), I had both of those kernels configured
roughtly the same and both operational.

Then I did the same thing over again only with the vanilla 2.6 kernel.org
kernel, by now a couple RCs later, this time importing the .config and doing a
make oldconfig on it first, then verifying with make config before compiling.

As I had two years ago with 2.4 on my old Athlon system, I eventually
decided it wasn't worth the hassle doing Mdk kernels, and now run the 2.6
kernel.org kernels exclusively.

An additional note on devfs, traditional /dev dirs, and /udev. I can't vouch for
the authenticity of the claim, but I read somewhere that 2.4 is now considered
depreciated for AMD64, due to issues with the arch mostly stemming from
devfs, which apparently isn't even an option any more for new kernel.org 2.4
kernels, for the AMD64 architecture, with the fixing the problems judged not
worth the trouble on the arch, for a new arch and a depreciated devfs anyway.
That may or may not have been part of the cause of my stability issues on the
platform (again, dual AMD64 Opteron, Mandrake 9.2 for AMD64 RC), but
I'm now running entirely without devfs, not compiled in, not activated,
period.

As for udev, which mounts by default on /udev, lacking any documentation
on the subject that I could find, I decided to experimentally find out if, like
devfs, it could mount directly over /dev. At this point, THE ANSWER IS
NO!! DO NOT TRY TO MOUNT UDEV ON /DEV!! At least here, not
only did it fail to boot, but it also screwed up the existing static /dev dir, so I
couldn't boot my OLD kernel either! Luckily, I could boot off my other
drive, which I keep loaded with a working system for just such issues, and I
was able to copy its unaffected static /dev over the normal one that udev
hosed.

As best I can figure from the docs I've seen (and from the Mdk /udev init
script), mounting udev on /dev should eventually be possible, as being the
replacement for devfs would indicate it should, but all the parts aren't there
yet, and at least as on Mandrake, udev can't bootstrap itself yet, but requires
being initialized from an existing /sys on a running system. It simply won't
mount directly over /dev, then, again, on Mandrake (with a static /dev, as
mounting it over devfs would be expected to be rather problematic <g>)
Cooker for AMD64 as of this date.

OTOH.. something I did NOT try that MIGHT work would be remounting it
once fully up and running over /dev. That MIGHT work.. tho safely
unmounting for shutdown would potentially create another race condition..
unless one reversed the process and remounted it back to /udev first,
unmounting it on /dev, exposing the static /dev for proper shutdown of udev
on the way to system shutdown. Still.. I think I'll wait another update of
/udev and inittools first, unless I get bored someday and want some
excitement like a not normally bootable system! <g>

Duncan

Linus smoking crack

Posted Jan 9, 2004 0:21 UTC (Fri) by Ross (subscriber, #4065) [Link]

Randomized numbers?! Anything that depends on them broken?! Linus likes to shock people too much.

But taking him literally, just about everything is broken. My /dev directory isn't going to magically update itself by somehow guessing the proper major nad minor numbers for each device. And NFS exports of devices will do "interesting" things... randomly different on each client.

I'd have to boot my system hundreds of thousands of times before getting lucky enough to get the correct mapping.

So I don't see this happening until something like udev and a static mapping layer for things like NFS. And since udev is on hold and devfs is a piece of junk... I'll continue to use my static mapping and it better keep working or I'm just not going to use 2.6.

Linus smoking crack

Posted Jan 9, 2004 6:53 UTC (Fri) by AdHoc (subscriber, #1115) [Link]

I think Linus means to put the randomization into the 2.7 (unstable dev) tree when it is released, then remove it when the 2.8 (stable) tree is released.

Linus smoking crack

Posted Jan 9, 2004 8:45 UTC (Fri) by iabervon (subscriber, #722) [Link]

Linus is actually talking about randomization of the devices that the
kernel can't necessarily keep stable. So it's only really hotpluggable
devices, disks that can show up in surprising places, and that sort of
thing. I doubt /dev/zero will be change around between boots (which would
potentially cause problems with memory allocation in udev necessary to
create a device node with the right number...), and IDE disks and
floppies will probably stay the same. TTYs, PTYs, and so forth might as
well stay the same, since they're mostly kernel constructs anyway.
What'll be different is things like SCSI disks (which includes USB
storage), which will probably get random numbers instead of getting
numbered sequentially by when they're detected; if you're depending on
those being static, you could be in for an unpleasant surprise if you
boot a SCSI machine with a USB camera plugged in (even today, maybe).

The point is that the kernel can't necessarily identify the same device
is you unplug it and replug it, or if you reboot with different hardware
attached than you did last time. There are some cases where it is
reliable, and some cases where hardware doesn't matter, but there are
other cases where it can't do it reliably; in this case, it's far better
to fail the first time than wait to fail until it really matters.

No he really means ALL device numbers

Posted Jan 10, 2004 4:58 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Linus is actually talking about randomization of the devices that the kernel can't necessarily keep stable.

That's exactly what he's not talking about. Linus says that the fact that the kernel can't provide stable device numbers for everything means that anyone who expects the kernel to provide static device numbers for anything is fooling himself. Therefore, he suggests making device numbers random even when they don't have to be so someone can't possibly think that device numbers are stable.

He backpedals a little and says that might be a little too hostile and out of practicality, some device numbers should be kept unrandom. But what he really believes is that all the device numbers should be random.

This whole thing assumes udev, of course. If you have static device special files in /dev that you created with mknod, as we have for 30 years, you obviously can't make all the device numbers random.

I wish we'd get away from device numbers altogether. Naming things with integers is really archaic. In the modern world, we either name them with long text strings or with temporary handles that have reference counts and are in reality memory addresses.

Linus smoking crack

Posted Jan 15, 2004 18:10 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

Well, it seems that 2.7 will be almost unusable without udev or devfs.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds