LWN.net Logo

Partitioned loopback devices

The expanded device number type in the 2.6 kernel makes it possible, at the lowest level, to support vast numbers of partitions on every block device in the system. Unfortunately, the Linux block drivers have not caught up with this change. SCSI, in particular, is still limited to 15 partitions per device. There are a few reasons for this lag, but the largest is simple compatibility: there is no easy way to incorporate support for more partitions without breaking the existing device numbering scheme. The block layer assumes that partitions have consecutive minor numbers, so supporting more partitions means increasing the portion of the minor number which is dedicated to the partition number. But changing the interpretation of minor numbers in this way would break existing systems, and that is something the kernel developers are reluctant to do.

Carl-Daniel Hailfinger has recently posted an interesting solution to the partition limit: partitioned loopback devices. A loopback device is a kernel-implemented virtual block device which is backed up by something real - usually a disk partition or a file on a disk somewhere. Common uses for loopback devices include mounting regular files as filesystems or the creation of encrypted filesystems (though the device mapper is the preferred means for the latter application in 2.6). Loopback devices do not support partitions in their own right; they simply provide block-level access to the backing store as a single partition.

Carl-Daniel noticed, however, that adding partition support to loopback devices would be a relatively straightforward thing to do. In 2.6, partition handing is (finally) part of the block layer; all that is really required to support partitions in the loopback driver is to tell the block layer that those partitions exist. So, with a small patch, each loopback device can have up to 127 partitions. The bulk of the patch, in fact, is there to ensure continued compatibility for users of non-partitioned loopback devices.

This capability is interesting because it is a simple matter of one losetup command to create a loopback interface to a real disk drive. Thus, by using loopback devices in this mode, system administrators can get around the partition limits enforced by the real hardware drivers and divide their disks into lots of tiny little pieces. There is some small overhead associated with using the loopback device, but, for users in need of more partitions, it may well be a price worth paying.


(Log in to post comments)

Partitioned loopbackdevices

Posted Nov 11, 2004 7:16 UTC (Thu) by Duncan (guest, #6647) [Link]

This is an interesting solution indeed.

A couple months ago, my attention was drawn abruptly to this partition
issue. As luck would have it, I had just decided to forgo SATA for
another upgrade round and stick with PATA for one more generation, so it
wasn't me. However, someone else on the Gentoo AMD64 list ran into the
entirely predictable problem, attempting to upgrade his SATA disks from
the old IDE side SATA drivers to the newer SCSI side SATA drivers. A good
portion of his partitions were suddenly unreachable!!!

Unfortunately, there wasn't much to tell him except to go back to the old
kernel and drivers at least long enough to grab the data from the extra
partitions, store it elsewhere, and repartition into fewer partitions. I
DID thank him, however, for pointing out the problem to this guy who had
decided to wait another upgrade cycle for SATA, due to a general feeling
that I was already pushing the envelope enough with newer AMD64 gear, and
running ~amd64 (Gentoo uses ~ to denote unstable/beta, altho it's supposed
to have been tested past alpha at least), and that I didn't want to gamble
any further with as yet unstable driver implementations for SATA on TOP of
the other leading/bleeding edge stuff I was running.

Anyway, it would have been very useful to have this solution available in
the kernel at that time, such that with a couple additional configuration
tweaks, he'd have been on his way. Barring some sort of magic and SCSI or
at least the SATA-SCSI subset, developing >16 partition support by the
time I DO switch, hopefully this solution WILL be in the mainline kernel
by then and decently widely deployed and documented. As it happens, I've
20 partitions now, on my 250G PATA, and that's with ~100G still
unpartitioned. It's possible I'll have mid-20s partitions by upgrade
time, and be ready for even MORE, on what I expect by that time will be my
new half terabyte or larger drive. (Or drives, if I go RAID by then, as I
might.)

Maybe this'll serve at a bit of a heads-up to some others, thinking about
upgrading to SATA, as well. It could certainly add a bit of unexpected
complexity to your upgrade, if you aren't ready for it and have the 20-ish
partitions I do.

Duncan

Partitioned loopbackdevices

Posted Nov 11, 2004 11:59 UTC (Thu) by ekj (guest, #1524) [Link]

Just out of curiosity; what exactly are you doing that means it makes sense to make 20 partitions, on a single harddisk, totaling 150GB ?

Partitioned loopbackdevices

Posted Nov 11, 2004 12:13 UTC (Thu) by Liefting (subscriber, #8466) [Link]

And, more importantly, why are they not under LVM?

Partitionedloopbackdevices

Posted Nov 13, 2004 13:18 UTC (Sat) by Duncan (guest, #6647) [Link]

Well, you asked...

hda1 boot, 2,3 root and root-mirror (root copied to root-mirror
periodically, when I know stuff is working, so I can just switch roots at
the boot prompt if an update screwed things up and I can't boot my working
root), 4 is of course the extended partition, mapping the additional
logical partitions. That takes care of the four primary partitions.

5 and 19 are /usr and usr-mirror, giving me a backup /usr in the event an
update screws my working copy up.

6-8 are my Gentoo portage partitions (which would normally be
under /usr/portage, thus their location after /usr), 6 being the
equivalent of /usr/portage, getting it off of /usr as it's rsynced as part
of my daily update, 7 being the package sources (as opposed to the Gentoo
portage ebuild install scripts on 6), and 8 being binary packages created
at source merge time, so I don't have to go recompiling if I have to
backup a version or two. The partitions serve to size discipline each of
these, of course.

9 and 10 are /usr/src and /usr/local, thus getting them AND
the /usr/portage dirs off of the /usr partition making mirroring it much
simpler. src doesn't need mirrored as the stuff there is easily replaced
from the net, and local is mirrored to another disk.

11-13 are /var, a separate /var/log for size control reasons, and a
separate ccache partition (which by default would be a subdir of /var).

14 is an empty /opt partition. 15 is a 10 gig /home (again, the backup is
on another disk). 16-18 are my dedicated mail, news, and media
partitions, also relatively large (20 gig mail archive, 8 gig news cache
only, 40+ gig media archive, respectively). Thus, the 10 gig home is
PLENTY big, even for duplicated backup user dirs.

After 18, my media partition, is the 100 gig of blank space, allowing for
expansion of the media partition or other flexibility as desired. 19 as I
mentioned is the usr-mirror. 20 is a quite large 15 gig /tmp. I could
easily do with just a gig, but I have the room, and I decided to
appropriate enough space for it so I could stick a couple DVD images there
if necessary, when I was partitioning. Also, emerge can take up to 5 gigs
or so of tmpspace for packages such as OOo, according to reports, and
while that's normally in /var/tmp for security in multi-user situations,
that's not an issue here, so I have portages tmpspace mapped to /tmp,
allowing me to avoid yet ANOTHER partition for /var/tmp.

Note that I don't mention swap partitions. I have a gig of memory, and
decided to disable swap in my kernel config, as I didn't need it and it
only added needless complication and code complexity to the kernel. (On
AMD64's flat memory architecture, the memory zone issues that cause
problems with swap disabled on ia32 don't apply, and the first one that
might hits at 4G, so with only a gig, I'm safe with it too.) I had done
that while running Mandrake, so eliminated the swap partitions when I
wiped Mandrake and reorganized Gentoo on the remaining space.

I mentioned a second disk. It's far smaller, only 36G, but I still keep
two additional copies (backup-working and backup-backup) of / and /usr on
it, meaning I have four copies of those critical partitions, a working and
a backup copy on each of a working and backup disk. It has additional
(single) partitions for /var, /usr/local, and /tmp, and a copy of the
critical personal data from /home as well.

With all that, I keep two copies of both disk's partition tables in /root,
root's home, on the / partition, meaning a total of EIGHT copies of the
partition tables, two each in four different /root homedirs. Likewise
with fstab in /etc, eight copies of that as well (plus automated edit
backups in fstab~).

I could have accomplished the same goal using mount --bind and fewer
partitions, putting all the /usr subdir partitions on one partition in
different subdirs mount-bound as appropriate, for example. That would
have kept me under the 16-partition barrier, and is actually what I may
end up doing when I upgrade to SATA. However, the 20-partition thing has
worked out quite well on PATA. I actually had a few more partitions (24,
I think) when I was dual booting Mandrake and Gentoo, as I learned about
Gentoo and made the switch. However, I reorganized things when I killed
my Mandrake install, just as I had for it when I killed my MSWormOS
install.

As for LVM, I've not learned it yet, and besides, it'd only be something
else that could go wrong. I do fine without it, tho I'll probably take
the trouble to learn it at some point.

Duncan

Partitionedloopbackdevices

Posted Nov 18, 2004 12:04 UTC (Thu) by job (guest, #670) [Link]

Learn LVM! It's madness not to. All you need to learn are a few more words and two or three simple command line utilities. It's an half hour really well spent. It works just like partitions, but you can resize them at will, and refer to them by name instead of number (which gets really handy when these partitions, called volumes, span multiple disks).

Partitionedloopbackdevices

Posted Nov 18, 2004 17:41 UTC (Thu) by wolfrider (guest, #3105) [Link]

--Webmin is your friend for LVM... Best interface I've seen since Yast.

' apt-cache search webmin|grep lvm '
webmin-lvm - lvm control module for webmin

Partitioned loopbackdevices

Posted Nov 11, 2004 16:40 UTC (Thu) by vmole (guest, #111) [Link]

Anyway, it would have been very useful to have this solution available in the kernel at that time, such that with a couple additional configuration tweaks, he'd have been on his way.

I don't think that's actually the case (although I haven't looked at the actual patch, so correct me if I'm wrong). The implication of this article was that you could create a loopback device whose backing store was a single SCSI (SATA) partition, and then partition the loopback device. Accessing existing partitions isn't the same thing.

Partitioned loopbackdevices

Posted Nov 11, 2004 18:04 UTC (Thu) by pflugstad (subscriber, #224) [Link]

no, I think this patch lets you map a loopback device to an entire block device - see the example - he uses /dev/hdb with 60+ partitions.

libata limits

Posted Nov 11, 2004 18:06 UTC (Thu) by pflugstad (subscriber, #224) [Link]

So libata is limited to 15 partions as well? Is that related to the SCSI limitiation somehow?

libata limits

Posted Nov 13, 2004 13:22 UTC (Sat) by Duncan (guest, #6647) [Link]

There are two SATA implementations in the kernel. The older one is under
IDE, and has the 64-partition IDE limit. The newer one (that uses libata)
is part of the SCSI subsystem, yes, so is limited to the SCSI 16
partitions.

Duncan

Partitioned loopback devices

Posted Nov 12, 2004 21:30 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I've always thought that partitioning should be done only by something like the loopback device, which means the logic could go in the loopback device driver instead of the block layer. This means the loopback device driver is an LVM, by the way.

Is there some reason I've missed that partition awareness by the block layer is a good thing?

Partitioned loopback devices

Posted Nov 15, 2004 12:27 UTC (Mon) by garloff (subscriber, #319) [Link]

But unfortunately we have only 255 loopback devices, don't we?
So either we use 32k SCSI disks with 16 partitions each, or 256
SCSI disks in loopback mode with 127 partitions each. But not both.
Therefore this does not offer a good generic solution :-(

Partitioned loopback devices

Posted Nov 19, 2004 11:40 UTC (Fri) by Blaisorblade (guest, #25465) [Link]

> But unfortunately we have only 255 loopback devices, don't we?

We had those. But with 32 bit majors/minors, we can build far more (2^20 minors are available). And from reading the patch, it seems it can already take advantage of that (it uses MINOR_BITS to calculate the maximum minor number, and I assume MINOR_BITS is set to 20, i.e. the correct value).

Partitioned loopback devices

Posted Nov 20, 2004 18:33 UTC (Sat) by theraphim (subscriber, #25955) [Link]

Loop device partitioning (and 64bit losetup offset) is handy when doing forensic analysis of entire harddisk images.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds