Fedora and LVM [LWN.net]

Fedora and LVM

Posted Oct 31, 2012 18:31 UTC (Wed) by dwmw2 (subscriber, #2063) [Link]

It's more than that. For most simple desktop/laptop installs, using LVM is a flagrant and gratuitous violation of the KISS principle. With the expected results.

Fedora and LVM

Posted Oct 31, 2012 20:55 UTC (Wed) by rleigh (guest, #14622) [Link] (10 responses)

There are some serious quality issues with LVM. I can (and have) locked up the kernel hard many times (no panic, just completely dead) using nothing more than "lvcreate -s" and "lvremove".

The schroot tool uses LVM snapshots to create and destroy transient scratch build environments. We use them on the Debian autobuilders. If you kick off 24 parallel builds on a 24 core system, the system will be dead as a doornail in a few tens of seconds. Entirely due to lvcreate/lvremote triggering what looks like some kernel locking bug. Even on systems only running a single build, we still regularly have lvcreate and lvremove failures. There's some fundamental bugs in LVM which really need fixing, and which I'm surprised haven't been addressed given that they are easily reproducible. schroot is admittedly a special case--most people don't churn through as many LVs as we do--rebuilding the whole archive is ~14 hours with 24 parallel builds, and ~18000 transient LVs (though it always died in under 5 mins, less than 100 LVs in). But it should certainly work without killing your system.

We now also support btrfs snapshots, and while the filesystem itself is still not perfect, I've not yet run into a single issue doing heavy parallelised snapshotting.

Fedora and LVM

Posted Oct 31, 2012 21:58 UTC (Wed) by BlueLightning (subscriber, #38978) [Link] (5 responses)

Sounds like a serious bug. Did you report it?

Fedora and LVM

Posted Oct 31, 2012 22:18 UTC (Wed) by rleigh (guest, #14622) [Link] (4 responses)

IIRC it was sent to the LVM and/or kernel lists.

Fedora and LVM

Posted Nov 1, 2012 21:39 UTC (Thu) by agk (guest, #23332) [Link] (3 responses)

> IIRC it was sent to the LVM and/or kernel lists.

Please would you dig up and post a link to this here?

Fedora and LVM

Posted Nov 1, 2012 23:23 UTC (Thu) by rleigh (guest, #14622) [Link] (2 responses)

I've had a good search, but I'm afraid I can't find it on either list, sorry. I may be misremembering, or just looking in the wrong place.

Fedora and LVM

Posted Nov 2, 2012 0:11 UTC (Fri) by agk (guest, #23332) [Link] (1 responses)

Well if you're still having problems, please let us know the details and we'll see what we can suggest. If it was 24 writeable snapshots of the same device, then it might be worth trying the new thin snapshots. (Set up the origin as a thin device, then drop the size parameter from CHROOT_LVM_SNAPSHOT_OPTIONS.)

Fedora and LVM

Posted Nov 2, 2012 0:28 UTC (Fri) by rleigh (guest, #14622) [Link]

Thanks Alasdair, it was indeed all writable snapshots of the same device in this instance. I'll give thin snapshots a go with a current kernel. I no longer have access to the 24 core system (which was also remote, making debugging lockups difficult), but I'll see what I can do on a local quad core system.

Fedora and LVM

Posted Nov 1, 2012 12:06 UTC (Thu) by Cato (guest, #7643) [Link] (3 responses)

I think most LVM users don't do much with snapshots, if they even use them, and the snapshot code is much more buggy as a result.

The non-snapshot LVM/DM code is not too bad, though there are some tools issues - e.g. pvmove can corrupt data if it runs out of memory - see http://serverfault.com/a/339899/79266 and http://deranfangvomende.wordpress.com/2009/12/28/a-primer...

Fedora and LVM

Posted Nov 1, 2012 17:44 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

pvmove uses snapshots internally, so this is just another indication of the general principle that LVM snapshots are dangerous.

Fedora and LVM

Posted Nov 1, 2012 21:32 UTC (Thu) by agk (guest, #23332) [Link] (1 responses)

> pvmove uses snapshots internally

pvmove uses mirrors, not snapshots.

It works by setting up temporary mirrors for the kernel to sync the data between the old and new locations.

Fedora and LVM

Posted Nov 1, 2012 22:29 UTC (Thu) by nix (subscriber, #2304) [Link]

Oh. You're right, of course. (I've only ever encountered one pvmove failure, anyway -- a deadlock -- and it restarted fine on reboot.)

Fedora and LVM

Posted Oct 31, 2012 23:32 UTC (Wed) by mezcalero (subscriber, #45103) [Link] (16 responses)

Actually I am concerned about the implementation. The fact that the LVM/dm userspace bits heavily rely on stuff such as sysv semaphores and things is just awful. If code uses SysV semaphores then this usually is a pretty strong sign that something is not right about the code, i.e. either that it hasn't been touched in decades, or that it simply is questionnable code.

But my main beef with LVM is and has been since years that it's assembly is just broken and wrong. They assume that during boot there was a time where all devices have shown up, and which point they can invoke vgchange -ay, and that's the only time where they need to enumerate devices. But that's really not how things work these days, and haven't been working in the last decade or so. Hardware devices show up all the time, and we do not know when all connected devices have been detected, so in the boot process we don't actually know when we could invoke "vgchange -ay". Also, harddisks in times of USB and iSCSI show up at any time, depending on network status and their own initialization time and there are no general rules about when initialization has to be complete. The way distributions hack around the fact that they don't know when to invoke vgchange -ay is by pulling in udev-settle and the atrocity that is scsi-wait-scan. These hacks make things work for the "majority" of runs, and as long as you only have SATA disks and other pretty standard stuff. But the question is whether the "majority" is good enough where reliability is required, and whether limiting stuff to SATA and friends is such a good choice. But on top of that, it's just slow, since it basically is little more than just delaying the boot arbitrarily in the hope that all possible devices might have shown up when some time passes. Of course, if you are unlucky the delay is not sufficient, but still everybody has to endure the delay.

This issue has been known by the LVM folks since many years, and pointed out to them again and again. But nothing ever happened. Now, they say they'll soon have the "option" to make LVM work like any other hw daemon and actually wait on its own for precisely the devices it needs and not longer, thus not delaying the boot any bit longer than necessary. And they'll investigate if they can make that "option" default one day... At that speed I am sure that LVM well get discovery/assembly right only after the time it already has been replaced by btrfs... (btrfs in turn does get all this right: assembly is based of device plug events and a btrfs raid will delay the boot only exactly until the point where all devices it actually needs have shown up. btrfs raid is hence hotplug-safe, snappy at boot and absolutely reliable).

Anyway, the take-away is: LVM is the one major slowness in Fedora's boot. It also adds fragility where it shouldn't. And this hasn't changed in years. LVM is written they way hardware was in the 80's and 90's, but it is not how hardware has beeen working in the past 15 years or so.

Fedora and LVM

Posted Nov 1, 2012 0:31 UTC (Thu) by marcH (subscriber, #57642) [Link] (4 responses)

While I use and like LVM on "normal" disks, I really never expected anyone would try to use it on removables or on top of iSCSI... What are the use cases here?

Fedora and LVM

Posted Nov 1, 2012 1:25 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Using an external USB drive to temporarily expand a small internal SSD?

Fedora and LVM

Posted Nov 1, 2012 14:02 UTC (Thu) by TRS-80 (guest, #1804) [Link]

Proxmox VE uses it to store virtual machine images, which it can then LVM snapshot for crash-consistent backups while the VM is running. Being a VM hypervisor, iSCSI is a natural choice to allow shared storage and migration between hosts. I have it running at work and it's pretty awesome.

Fedora and LVM

Posted Nov 2, 2012 19:06 UTC (Fri) by phiggins (subscriber, #5605) [Link] (1 responses)

I used to run Fedora as my primary OS on my work laptop. I resized the Windows partition down but didn't remove it, and that left me with little space to actually work in. I put /home on a USB drive and used LVM. It was a huge mistake for exactly the reasons mentioned. It rarely was mounted automatically and caused big problems (I don't remember exactly what they were now) whenever the USB cable accidentally became disconnected.

I honestly can't remember why I used LVM in the first place, but Fedora using it by default probably influenced that decision.

I used to be in the "LVM is worthless confusing overhead for a desktop and even small servers" camp. Now that I have more experience with it after using it for years *because* Fedora made it the default, I have found its features useful a few times per year. However, I am still nervous that one day I will be unable to recover my data because LVM is too complicated and I won't know how to restore a non-booting system.

Fedora and LVM

Posted Nov 2, 2012 20:39 UTC (Fri) by nix (subscriber, #2304) [Link]

One saving grace of LVM is that the metadata backups are not just kept in /etc/lvm. They are *also* kept in a defined place on every PV (IIRC at the start), in ASCII. So if you are so badly shagged that you've lost / and vgscan can't reconstruct your VG, you can get started by sucking the metadata off the PV with dd. (But normally as long as the PVs are there the VG can be reconstructed. A VG doesn't have any existence beyond its PVs, after all: there is no 'VG area' that can get damaged to destroy your VG.)

Fedora and LVM

Posted Nov 1, 2012 21:06 UTC (Thu) by agk (guest, #23332) [Link] (2 responses)

> The fact that the LVM/dm userspace bits heavily rely on stuff such as sysv semaphores

Semaphores are used only by the LVM/dm code for synchronisation with the udev rules that have to handle asynchronous uevents.

On a system using udev, after LVM/dm creates a device (or group of related devices) it needs to wait until udev has finished setting up those particular devices before it continues. This is done by having the last udev rule decrement a (per-transaction) semaphore to zero, which the blocked LVM/dm code waits for.

Fedora and LVM

Posted Nov 2, 2012 2:17 UTC (Fri) by mezcalero (subscriber, #45103) [Link] (1 responses)

There is no excuse for every using SysV semaphores, unless you are living in the 80s. There are a multitude of alternatives around. But SysV semphores, na, thank you.

Also, LVM could just subscribe to udev devices coming and going with libudev like everyvody else, and things would be good...

Fedora and LVM

Posted Nov 2, 2012 3:58 UTC (Fri) by agk (guest, #23332) [Link]

> LVM could just subscribe to udev devices coming and going with libudev

If only! We had lengthy discussions with the udev developers which led to the existing synchronisation mechanism.

Fedora and LVM

Posted Nov 1, 2012 22:36 UTC (Thu) by agk (guest, #23332) [Link]

> This issue has been known by the LVM folks since many years, and pointed out to them again and again. But nothing ever happened.

If it had been easy, people would have submitted patches long ago:)

But the final piece of the assembly jigsaw is now passing our tests and should hit rawhide builds this week.

Fedora and LVM

Posted Nov 3, 2012 17:54 UTC (Sat) by Cato (guest, #7643) [Link]

I've just realised that I seem to have had this problem on a server using LVM a while back - I didn't have time to investigate so I just put a "vgchange -ay" into the /etc/init.d/checkfs script. This is on Ubuntu 8.04 LTS server, so it's a cross-distro issue, and the hard disks are SATA.

Presumably an earlier invocation had failed, with the result that the fsck step failed, causing boot to fail. A shame that LVM, which generally helps uptime in many cases, caused downtime here - presumably btrfs wouldn't have this issue as it doesn't require this extra 'vgchange' type step.

I really look forward to btrfs (or perhaps ZFS in-kernel) being mature enough to use, largely for the parent-block checksumming to catch various errors.

This is on a home server in an inconvenient location, where there is only voltage regulation not UPS, and the utility power is flaky - so reliable unattended reboots are really helpful.

Fedora and LVM

Posted Nov 5, 2012 10:51 UTC (Mon) by nmav (guest, #34036) [Link] (5 responses)

> Actually I am concerned about the implementation. The fact that the LVM/dm
> userspace bits heavily rely on stuff such as sysv semaphores and things is
> just awful. If code uses SysV semaphores then this usually is a pretty
> strong sign that something is not right about the code, i.e. either that
> it hasn't been touched in decades, or that it simply is questionnable code.

I like those generalizations. If a thing does A it is crap. While semaphores may be used in bad code, there is nothing to indicate bad code because it uses semaphores. Semaphores are a tool (arguably an old one), but as every tool it can be used in a bad way or _not_.

Fedora and LVM

Posted Nov 5, 2012 11:04 UTC (Mon) by dwmw2 (subscriber, #2063) [Link] (4 responses)

It's a generalisation. Of course it isn't 100% accurate but it's a good predictor. It's much the same as "code stored in BZR or SVN probably isn't worth the pain of trying to get it out of its antiquated version control system to look at it" — there are exceptions, but they are few.

Fedora and LVM

Posted Nov 6, 2012 13:33 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

The thing is, that's pretty much not true. Sure, dealing with such version control systems is painful, but if I followed those rules, I'd lose the source trees for Emacs, KDE (before the recent move to git), Enlightenment, Calibre... there's no real correlation between quality and choice of VCS, though if someone is still using CVS it is probably a sign that the project is not very actively maintained. Equally, there's no correlation between 'project is forced to use SysVIPC to interoperate with other software' and general quality -- only if the project used SysVIPC when it had another choice can such a conclusion be drawn.

Fedora and LVM

Posted Nov 6, 2012 16:19 UTC (Tue) by admax88 (guest, #75035) [Link] (2 responses)

OpenBSD uses CVS, and its pretty actively maintained and very high quality.

Fedora and LVM

Posted Nov 6, 2012 16:45 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

True. There are rare exceptions even here. (I can't think of many, though.)

Fedora and LVM

Posted Nov 6, 2012 16:52 UTC (Tue) by admax88 (guest, #75035) [Link]

Most of the original GNU software is still in CVS.

Just because a project doesn't adapt to the new hotness in version control doesn't say anything about the quality of the code.

The choice of VCS is more likely an indicator of the age of the project and/or the age of the developers.