Block device registration and 32-bit dev_t
[Posted March 11, 2003 by corbet]
Long-suffering block driver maintainers will have to cope with a new change
in 2.5.65:
this patch from Andries Brouwer
changes the prototype of
register_blkdev(), which is used by block
drivers to tell the kernel of their existence. The previous version of
this function took a
struct block_device_operations pointer,
which contains some of the operations provided by the driver. That
parameter has not been used for some time (block operations are now
directly associated with disks, and are kept in the generic disk
structure), so Andries removed it.
Not everybody agreed with this change. With all of the work that has been
done in the block layer, register_blkdev() does not actually do
very much anymore. Its main remaining purpose is to associate a driver
name with a major number, so that it shows up in /proc/devices. A
block driver can now function nicely without calling
register_blkdev() at all. The long-term plan is to remove
register_blkdev() altogether. In the mean time, it was asked, why
bother changing the prototype of a doomed function? Even so, the change
was merged into 2.5.65.
The real purpose of Andries's patch, however, was to get rid of the static
blkdevs array used to keep track of block devices in the kernel.
blkdevs is about the only static array left in the block
subsystem, and thus is one of the remaining impediments to Andries's real
goal: the long-awaited expansion of dev_t to 32 bits.
The 32-bit dev_t is one of the final items on the 2.5
"todo" list. It is still considered important by many users: an Oracle
engineer mentions 4000-disk systems that
"want to go to Linux" but can't, and from IBM we hear about a 5000-drive system with waiting
customers. There appears to be little opposition to the adoption of a
larger dev_t, even at this late stage. But everybody agrees that
it would be best to get this change done sooner rather than later.
The amount of work remaining is said to be relatively small. The block
layer, for example, is almost ready for a larger dev_t now. The
char device
subsystem could take more work - many drivers "know" that device numbers
(especially minor numbers) are only eight bits. So a detailed audit of
many drivers could be required. This suggestion
from Alan Cox could make life a little easier, though. The idea would
be to replace the venerable register_chrdev() function with a new
register_chr_device() which takes a parameter indicating the
largest minor number that the driver can deal with. A change to
all char drivers would still be required, but, by defaulting the maximum
minor number to 255, these drivers could be made safe without the need for
a larger "audit and fix" operation. The few drivers that actually need
more minor numbers could be fixed individually.
There are, of course, other issues to deal with before a larger
dev_t will be truly stable. Some protocols (i.e. NFSv2) aren't
prepared for large device numbers. The interface to user space may well
hold a surprise or two. And so on. These are all problems that can be
solved, but the process will take time.
(As an aside, Alexander Viro, who has been an active participant in the
block layer and dev_t work, has been absent from kernel
development for a few months. In a recent
message, however, he proclaimed "I'm finally back - hopefully for
good." Welcome back, Al).
(
Log in to post comments)