LWN.net Logo

The unfinished SCSI job

The repository for SCSI patches has just been forked into two separate trees. One of them is a bugfix-only repository, with its contents meant to get past Linus's "stability fixes only" filter and into the 2.6.0-test kernel. The other is for everything else, which will be held for 2.7, or, at least, a post-2.6.0 release.

This change brought out the question: what about expanding the number of SCSI disks (and partitions) that can be supported by the kernel? That was, after all, one of the reasons for expanding the dev_t type in the first place. The larger device numbers are now in place, but there are no patches in the mainline to make more SCSI disks available.

There are, as it turns out, a few remaining issues that must be addressed before the SCSI expansion can be completed. One of those is naming. Currently, the first 26 SCSI drives are called sda through sdz. Then a second letter is added, making sdaa through sdzz available. The default plan seems to be to go to sdaaa thereafter, and sdaaaa if need be.

Is the number of partitions per drive to be expanded? The current limit of fifteen is apparently constraining to some. As a result, there has been persistent talk of raising the limit to 63. That change, however, would create interesting numbering challenges. The current numbering scheme divides the (eight-bit) minor number in half; the upper nibble is the drive number, and the lower nibble is the partition number. To support more partitions, the portion of the (now 20-bit) minor number dedicated to the partition number would have to be expanded. A naive implementation would simply remap the minor number so that bits 0..5 describe the partition, and bits 6..19 the drive number.

The only problem with that approach is that it would break all existing SCSI device nodes. The kernel hackers have a sense that they might get a complaint or two if they did that, so they are fairly strongly committed to ensuring that old device numbers continue to work. As a result, there have been proposals for more complicated schemes, with the two new partition bits being placed, for example, up at the high end of the minor number. This approach would put an end to the manual creation of device nodes for large SCSI devices - who wants to figure out what number to give to mknod? - but there was not likely to be much of that going on anyway.

A better long-term approach might be to go to one or more completely new major numbers for SCSI drives. The block layer could then assign numbers dynamicly as the drives are discovered, with a tool like udev creating device nodes on demand. For sites that need old numbers to work, a small compatibility module could map between the old and new numbers at device open time. That is all certainly 2.7 material, however. For 2.6.0, the most likely scenario might be the merging of a simple patch (like Badari Pulavarty's patch found in the -mm tree) which expands the number of disks supported in a relatively unintrusive way. The complete solution can come later.


(Log in to post comments)

The unfinished SCSI job

Posted Oct 23, 2003 3:28 UTC (Thu) by dougg (subscriber, #1894) [Link]

We could also have a kernel boot time (or module load time) parameter to the sd driver telling it what major/minor number scheme to assume. The default could be the current scheme.

With a corresponding sysfs parameter, more intelligent code (than the current) that looks for the root partition could tell sd to fall back to the default scheme if some other scheme is chosen and the root partition is not found.


Doug Gilbert

The unfinished SCSI job

Posted Oct 24, 2003 20:48 UTC (Fri) by garloff (subscriber, #319) [Link]

It should be noted that there are already implementations for allowing
a large number (>2000) of SCSI disks as patches for the 2.4 kernel.
http://www.suse.de/~garloff/linux/scsi-many/
This code is part of the UnitedLinux kernels and used successfully by a
number of customers on various platforms.

There, free block majors are used dynamically beyond the first 16 ones
(which are officially assigned to SCSI). The numbers can be found by
userspace by looking at /proc/devices or the extended version of
/proc/scsi/scsi (despised by Linus). After sdaz, the names sdaaa ... are
used. The 16 partitions pre disk has not been changed. Users with that
many disks will have a large storage device where they can split into
virtual disks according to their needs. Home users with just a few disks
on the other hand should use LVM (or LVM2 on DM in 2.6).
scsidev is recommended for managing the naming and proving persistent
names.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds