LWN.net Logo

Block device registration and 32-bit dev_t

Long-suffering block driver maintainers will have to cope with a new change in 2.5.65: this patch from Andries Brouwer changes the prototype of register_blkdev(), which is used by block drivers to tell the kernel of their existence. The previous version of this function took a struct block_device_operations pointer, which contains some of the operations provided by the driver. That parameter has not been used for some time (block operations are now directly associated with disks, and are kept in the generic disk structure), so Andries removed it.

Not everybody agreed with this change. With all of the work that has been done in the block layer, register_blkdev() does not actually do very much anymore. Its main remaining purpose is to associate a driver name with a major number, so that it shows up in /proc/devices. A block driver can now function nicely without calling register_blkdev() at all. The long-term plan is to remove register_blkdev() altogether. In the mean time, it was asked, why bother changing the prototype of a doomed function? Even so, the change was merged into 2.5.65.

The real purpose of Andries's patch, however, was to get rid of the static blkdevs array used to keep track of block devices in the kernel. blkdevs is about the only static array left in the block subsystem, and thus is one of the remaining impediments to Andries's real goal: the long-awaited expansion of dev_t to 32 bits.

The 32-bit dev_t is one of the final items on the 2.5 "todo" list. It is still considered important by many users: an Oracle engineer mentions 4000-disk systems that "want to go to Linux" but can't, and from IBM we hear about a 5000-drive system with waiting customers. There appears to be little opposition to the adoption of a larger dev_t, even at this late stage. But everybody agrees that it would be best to get this change done sooner rather than later.

The amount of work remaining is said to be relatively small. The block layer, for example, is almost ready for a larger dev_t now. The char device subsystem could take more work - many drivers "know" that device numbers (especially minor numbers) are only eight bits. So a detailed audit of many drivers could be required. This suggestion from Alan Cox could make life a little easier, though. The idea would be to replace the venerable register_chrdev() function with a new register_chr_device() which takes a parameter indicating the largest minor number that the driver can deal with. A change to all char drivers would still be required, but, by defaulting the maximum minor number to 255, these drivers could be made safe without the need for a larger "audit and fix" operation. The few drivers that actually need more minor numbers could be fixed individually.

There are, of course, other issues to deal with before a larger dev_t will be truly stable. Some protocols (i.e. NFSv2) aren't prepared for large device numbers. The interface to user space may well hold a surprise or two. And so on. These are all problems that can be solved, but the process will take time.

(As an aside, Alexander Viro, who has been an active participant in the block layer and dev_t work, has been absent from kernel development for a few months. In a recent message, however, he proclaimed "I'm finally back - hopefully for good." Welcome back, Al).


(Log in to post comments)

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds