LWN.net Logo

Lots of SCSI disks

One of the motivations for increasing the size of the dev_t device number type in 2.6 was to allow the use of huge numbers of SCSI disks. In the 2.6.4 kernel, however, that promise remains unfulfilled; the SCSI subsystem makes no use of the expanded device number range. That will change in 2.6.5, however; a patch has been merged which allows the enumeration of up to 1 million SCSI disks.

The authors of this patch had an interesting problem to solve: they wanted to be able to enumerate all of those disks without breaking existing systems. In other words, all of the existing SCSI device numbers have to work as they do in 2.4 and prior kernels. The solution is expressed in the following macro, which turns a device index (the "nth disk") and a partition number into its associated device number:

static unsigned int make_sd_dev(unsigned int sd_nr, unsigned int part)
{
	return  (part & 0xf) | ((sd_nr & 0xf) << 4) |
		(sd_major((sd_nr & 0xf0) >> 4) << 20) | (sd_nr & 0xfff00);
}

LWN readers will, no doubt, immediately understand what is going on here. Your editor, however, had to stare at it for a little while. Then, as a way of avoiding doing real work, he made the following diagram to show how a device index and partition number are transmogrified into a device number.

[SCSI numbering diagram]

The "remap" operation takes four bits from the device index and uses them to index into an array of the 16 major numbers which have been assigned for some time to SCSI disks: 8, 65-71, and 128-135. The lowest four bits of the device index move directly down into the minor number. The result is that the first 256 SCSI disks will get exactly the same major and minor numbers that they have in 2.4 kernels.

Once that space has been exhausted, however, the four red bits in the diagram will return to zero, the major number will go back to 8, the highest-order bits in the device index are routed back into the minor number, and, as a result, the 257th disk will be given device number 8:256. The 273rd disk will advance again to the next major number; it will be given number 65:256. Additional disks will be distributed across the available major numbers indefinitely until their combined power load flips a breaker somewhere.

The result is a scheme which might be a little hard for humans to follow, but, when you are dealing with thousands of disks, that will be the case anyway. Meanwhile, most of the main design goals - support lots of disks without breaking existing systems - have been met. There is one remaining issue, however: some SCSI users have been asking for the ability to have more than 15 partitions on one drive. Supporting a larger partition space and simultaneously preserving compatibility is not currently possible because the block layer expects partitions to be assigned contiguous minor numbers. Fixing that will require tweaks to the gendisk code.


(Log in to post comments)

Lots of SCSI disks/Partitions

Posted Mar 18, 2004 6:46 UTC (Thu) by wolfrider (guest, #3105) [Link]

--That is a Truly Neat Hack. I don't understand it, but the explanation given in the article by the Editor is sufficient for me.

--AFA having more than 15 partitions on one disk - WHY?? It's generally better to just add disks; that way you're usually looking at less impact if one particular device fails.

--Can anybody come up with a good real-world reason to have >15 partitions on one device? I mean, my setup looks like this:

hda: (80GB)
1 - 1600MB Win98 C:
2 - 16MB /boot
3 - 8MB Knoppix/Syslinux bootfloppy image
4 - Extended --=> rest of drive
5 - 256MB Swap
6 - 9GB D:
7 - 4480MB E: DVD/CD temp partition
8 - reserved (leftover from a resize)
9 - 4800MB (?Mepis install?)
10 - 4GB LVM
11 - ~53GB Backups (Reiserfs)

hdb: (80GB)
1 - 5GB Knoppix hdinstall
2 - 3GB LVM
3 - 8MB alternate /boot
4 - Extended
5 - 500MB Swap
6 - 5GB DVD/CD temp space #2 (ext2)
7 - 10GB F: -- ISO storage, games, etc
8 - 4GB ??? (Man I wish I wasn't writing this from Windoze)
9 - 5GB ??? (Space for testing another Linux install, prolly)
10 - 4GB (Reserved for future expansion)
11 - ~38GB Backups #2 (Reiserfs)

--I think the largest number of partitions I've ever had on 1 device went up to 13, but that was before restructuring. If anyone can come up with a valid and -necessary- scheme for having 16 or more partitions, I'd really like to see it.

--OTOH, *BSD has a partition scheme that uses "slices" - which are basically sub-partitions. I dunno exactly how well the Linux kernel currently supports UFS(? might be FFS) filesystem writing or formatting, but might be something to consider.

Lots of SCSI disks/Partitions

Posted Mar 18, 2004 8:15 UTC (Thu) by proski (subscriber, #104) [Link]

Can anybody come up with a good real-world reason to have >15 partitions on one device?

It's needed if you want many OSes on the same machine for testing and cannot stand more noise from additional drives. Don't forget that BSD subpartitions (disklabel) also count in Linux as partitions. Imagine installing Linux, FreeBSD, NetBSD and OpenBSD on the same machine in the default configuration, i.e. with separate swap and /usr partitions. That will put you very close to the limit.

18 partitions here (on IDE, which allows 64)

Posted Mar 18, 2004 22:25 UTC (Thu) by Duncan (guest, #6647) [Link]

> Can anybody come up with a good real-world
> reason to have >15 partitions on one device?

I recently ran into that question here, on my (luckily IDE, as IDE allows
64 partitions) 250G drive, as I considered installing Gentoo dual-boot on
my current Mandrake system. I do have a second, older, disk, but it
contains a backup installation useful for when my main drive won't boot
because I hosed something up, and for critical personal data backup. I
don't want to change that, at least until/unless Gentoo becomes my primary
distrib so I don't have the possibility of having the regular Mandrake
system hosed by updates, while Gentoo may also not yet be operational or
is itself hosed.

Now, this wouldn't apply to corporate installations, and SCSI tends to be
used more in that environment than in the home, since IDE is cheaper if
more limited and slower, but yes, there ARE reasons to have more than 16
partitions, for some of us.

FWIW, here's my layout and why I needed more than 16 partitons (hmm,
looks better tabulated, but don't want to bother with the html).

hda## mntpt comment
01 /boot
02 /
03 /mnt/rtm rootmirror, always keep a / backup
04 -- extended partition mapping
05 swap
06 /usr
07 /var
08 /tmp
09 /opt
10 /usr/local
11 /home
12 /mnt/news dedicated usenet cache
13 /mnt/mail dedicated mail partition
14 /mnt/mm dedicated multimedia

Those are my Mandrake partitions. As I got set to install Gentoo, I
remembered reading about a 16 partition limit, and had to go look it up
and find (to my great relief) that IDE had a larger 64 partition limit.
Here's how I set that up (as listed in my mandrake fstab: boot, tmp, mail,
news, opt, swap, mm, and home, are to be shared)

15 /mnt/g g=gentoo, thus, the gentoo root
16 /mnt/g/mnt/rtm root-mirror for gentoo
17 /mnt/g/usr gentoo's /usr
18 /mnt/g/var
19 /mnt/g/usr/local

That still doesn't include dedicated /var/log partitions, one for each
distrib, likely shared public www, p2p, and ftp partitions, if I were so
inclined, or any MSWormOS partitions, since I am MS free (the only
proprietary-ware I believe I still have is my original Master of Orion
game, which I continue to play on occasion using a DOSBox VM). In
addition, further distribs or other OSs, the BSDs, for instance, if I were
to install any of them, would take up further multiple partitions.

BTW, on that 250G drive, with the above 19 partitions (well, 18, since
hda4 is virtual), I still have over 100G of unpartitioned free space to
eventually expand into, so I'm definitely glad, with that sort of space
around, that IDE does more than 16 partitions. <g>

Duncan

Re: 18 partitions here (on IDE, which allows 64)

Posted Mar 23, 2004 2:38 UTC (Tue) by roelofs (guest, #2599) [Link]

Can anybody come up with a good real-world reason to have >15 partitions on one device?

FWIW, here's my layout and why I needed more than 16 partitons (hmm, looks better tabulated, but don't want to bother with the html).

I've also had 20 or more on a single system; some of the reasons:

  • I like to have system dirs mounted read-only where possible; it's hard to mount / as "read-only except for var and etc and tmp".
  • I like to download directly into a CD-sized partition for ~ quick burning of archives without moving a lot of files around (and I like to have at least two or three such partitions since I don't always get the sorting/burning done right away, nor the cleaning-out part after the burn).
  • In the old days, Linux didn't handle swap partitions bigger than 128 MB. I usually had three or four.
  • OS/2 partitions (including boot mangler), DOS partitions, etc...

These days I still do the read-only and CD-R things, but I'm trying to trim things where possible. My backup disks are single-partition monsters (read-only, of course, and spun down 99% of the time).

Greg

18 partitions here (on IDE, which allows 64)

Posted Mar 23, 2004 6:26 UTC (Tue) by wolfrider (guest, #3105) [Link]

--I see your point... However, I suppose my "need" for more partitions has been largely bypassed due to using Vmware Workstation. (Note - I have no affiliation, blahblahblah; just a satisfied customer.)

--For testing purposes, or even virtual servers, Vmware is really well done - separate "disks" are merely files on an existing partition, which makes backups really easy. (Although you can in fact give the virtual machine access to the real disk hardware; I was using this method to access my Win98 files in-situ from a VM for a while.)

--By using Vmware, I was able (for instance) to beta-test the Knoppix DVD without allocating any additional disk space (even though I didn't have a DVD burner at the time) simply by booting the VM directly from the ISO file; vmware can treat an ISO as a CDROM drive.

--If you have a need/want for constantly testing new distros (Linux, *BSD, etc - I've become something of a live-cd addict) without the repartitioning involved, I would seriously consider trying Vmware. It's ~$300 for the 1st purchase, but only $99 for upgrades -- and they have a 30-day free trial IIRC. It's come in quite handy over the years - I'm still on version 3.x, and I believe the current rev is 4.x now. (I might have to upgrade eventually though; my 3.x is having trouble with Linux kernel 2.6.x.)

(Side note - I've run VM's with only 128MB of RAM installed, on a Pentium 233; but it runs better with higher specs. On my AMD Duron 900 with 512MB, you can hardly tell the difference between a VM and the native OS. YMMV. About the only thing vmware is NOT good at, are 3D-intense apps - such as FPS games - and 3.x has a few problems with sound. I think they fixed the sound issue in 4.x tho.)

( http://www.vmware.com ) ( http://www.vmware.com/products/desktop/ws_features.html )

Partition problem

Posted Mar 18, 2004 10:16 UTC (Thu) by stuart (subscriber, #623) [Link]

Surely running lvm (or device mapper -- dm) over the top of the disks makes sense in this environment? One can then create as many parititions as one wants using the LVM equivalents and hence the problem goes away.

Stu.

Partition problem

Posted Mar 19, 2004 2:00 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Frankly, I've never understood why partitions are implemented down in the block device layer at all. If it were up to me, I'd use an LVM-style setup for ALL partitions -- i.e. a partition device driver stacked on top of a physical device driver, with the latter being blissfully ignorant of partitions.

Then there's no need for partition bits in a SCSI disk minor number. And people with thousands of disks (which of course are not partitioned) wouldn't have to worry about partitions at all.

Partition problem

Posted Mar 19, 2004 14:41 UTC (Fri) by corbet (editor, #1) [Link]

Block drivers are blissfully ignorant of partitions - in 2.6. Occasionally somebody brings up the idea of moving partition handling out entirely, all the way to user space. That probably will not happen, though; things like booting from an arbitrary partition get increasingly hard when you do that.

Partition problem

Posted Mar 19, 2004 17:01 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

They're not totally ignorant, or it wouldn't have come up here. The SCSI disk driver (sd) has to be aware enough of the existence of partitions to allocate exactly 16 minor numbers for each physical device. If partitions were instead handled by a separate device driver with a separate major number, LVM style, we wouldn't be worrying about how many bits to reserve for partitions.

I can't imagine how partitions could be moved all the way out to user space and maintain any significant part of their value.

Lots of SCSI disks

Posted Mar 18, 2004 17:00 UTC (Thu) by dlapine (subscriber, #7358) [Link]

Millions of scsi disks? Stop the insanity!! :)

Have the kernel developers ever watched a system boot with just 100 Luns? It's not pretty. Hopefully, any work to increase the number of available luns will also have done some abstraction on the boot sequence list, i.e.
1 of 230 LUNs found
2 of 230 LUNs found

etc.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds