Not logged in
Log in now
Create an account
Subscribe to LWN
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Little things that matter in language design
LWN.net Weekly Edition for June 6, 2013
Fedora and LVM
Posted Oct 31, 2012 18:31 UTC (Wed) by dwmw2 (subscriber, #2063)
Posted Oct 31, 2012 20:55 UTC (Wed) by rleigh (subscriber, #14622)
The schroot tool uses LVM snapshots to create and destroy transient scratch build environments. We use them on the Debian autobuilders. If you kick off 24 parallel builds on a 24 core system, the system will be dead as a doornail in a few tens of seconds. Entirely due to lvcreate/lvremote triggering what looks like some kernel locking bug. Even on systems only running a single build, we still regularly have lvcreate and lvremove failures. There's some fundamental bugs in LVM which really need fixing, and which I'm surprised haven't been addressed given that they are easily reproducible. schroot is admittedly a special case--most people don't churn through as many LVs as we do--rebuilding the whole archive is ~14 hours with 24 parallel builds, and ~18000 transient LVs (though it always died in under 5 mins, less than 100 LVs in). But it should certainly work without killing your system.
We now also support btrfs snapshots, and while the filesystem itself is still not perfect, I've not yet run into a single issue doing heavy parallelised snapshotting.
Posted Oct 31, 2012 21:58 UTC (Wed) by BlueLightning (subscriber, #38978)
Posted Oct 31, 2012 22:18 UTC (Wed) by rleigh (subscriber, #14622)
Posted Nov 1, 2012 21:39 UTC (Thu) by agk (subscriber, #23332)
Please would you dig up and post a link to this here?
Posted Nov 1, 2012 23:23 UTC (Thu) by rleigh (subscriber, #14622)
Posted Nov 2, 2012 0:11 UTC (Fri) by agk (subscriber, #23332)
Posted Nov 2, 2012 0:28 UTC (Fri) by rleigh (subscriber, #14622)
Posted Nov 1, 2012 12:06 UTC (Thu) by Cato (subscriber, #7643)
The non-snapshot LVM/DM code is not too bad, though there are some tools issues - e.g. pvmove can corrupt data if it runs out of memory - see http://serverfault.com/a/339899/79266 and http://deranfangvomende.wordpress.com/2009/12/28/a-primer...
Posted Nov 1, 2012 17:44 UTC (Thu) by nix (subscriber, #2304)
Posted Nov 1, 2012 21:32 UTC (Thu) by agk (subscriber, #23332)
pvmove uses mirrors, not snapshots.
It works by setting up temporary mirrors for the kernel to sync the data between the old and new locations.
Posted Nov 1, 2012 22:29 UTC (Thu) by nix (subscriber, #2304)
Posted Oct 31, 2012 23:32 UTC (Wed) by mezcalero (subscriber, #45103)
But my main beef with LVM is and has been since years that it's assembly is just broken and wrong. They assume that during boot there was a time where all devices have shown up, and which point they can invoke vgchange -ay, and that's the only time where they need to enumerate devices. But that's really not how things work these days, and haven't been working in the last decade or so. Hardware devices show up all the time, and we do not know when all connected devices have been detected, so in the boot process we don't actually know when we could invoke "vgchange -ay". Also, harddisks in times of USB and iSCSI show up at any time, depending on network status and their own initialization time and there are no general rules about when initialization has to be complete. The way distributions hack around the fact that they don't know when to invoke vgchange -ay is by pulling in udev-settle and the atrocity that is scsi-wait-scan. These hacks make things work for the "majority" of runs, and as long as you only have SATA disks and other pretty standard stuff. But the question is whether the "majority" is good enough where reliability is required, and whether limiting stuff to SATA and friends is such a good choice. But on top of that, it's just slow, since it basically is little more than just delaying the boot arbitrarily in the hope that all possible devices might have shown up when some time passes. Of course, if you are unlucky the delay is not sufficient, but still everybody has to endure the delay.
This issue has been known by the LVM folks since many years, and pointed out to them again and again. But nothing ever happened. Now, they say they'll soon have the "option" to make LVM work like any other hw daemon and actually wait on its own for precisely the devices it needs and not longer, thus not delaying the boot any bit longer than necessary. And they'll investigate if they can make that "option" default one day... At that speed I am sure that LVM well get discovery/assembly right only after the time it already has been replaced by btrfs... (btrfs in turn does get all this right: assembly is based of device plug events and a btrfs raid will delay the boot only exactly until the point where all devices it actually needs have shown up. btrfs raid is hence hotplug-safe, snappy at boot and absolutely reliable).
Anyway, the take-away is: LVM is the one major slowness in Fedora's boot. It also adds fragility where it shouldn't. And this hasn't changed in years. LVM is written they way hardware was in the 80's and 90's, but it is not how hardware has beeen working in the past 15 years or so.
Posted Nov 1, 2012 0:31 UTC (Thu) by marcH (subscriber, #57642)
Posted Nov 1, 2012 1:25 UTC (Thu) by Cyberax (✭ supporter ✭, #52523)
Posted Nov 1, 2012 14:02 UTC (Thu) by TRS-80 (subscriber, #1804)
Posted Nov 2, 2012 19:06 UTC (Fri) by phiggins (subscriber, #5605)
I honestly can't remember why I used LVM in the first place, but Fedora using it by default probably influenced that decision.
I used to be in the "LVM is worthless confusing overhead for a desktop and even small servers" camp. Now that I have more experience with it after using it for years *because* Fedora made it the default, I have found its features useful a few times per year. However, I am still nervous that one day I will be unable to recover my data because LVM is too complicated and I won't know how to restore a non-booting system.
Posted Nov 2, 2012 20:39 UTC (Fri) by nix (subscriber, #2304)
Posted Nov 1, 2012 21:06 UTC (Thu) by agk (subscriber, #23332)
Semaphores are used only by the LVM/dm code for synchronisation with the udev rules that have to handle asynchronous uevents.
On a system using udev, after LVM/dm creates a device (or group of related devices) it needs to wait until udev has finished setting up those particular devices before it continues. This is done by having the last udev rule decrement a (per-transaction) semaphore to zero, which the blocked LVM/dm code waits for.
Posted Nov 2, 2012 2:17 UTC (Fri) by mezcalero (subscriber, #45103)
Also, LVM could just subscribe to udev devices coming and going with libudev like everyvody else, and things would be good...
Posted Nov 2, 2012 3:58 UTC (Fri) by agk (subscriber, #23332)
If only! We had lengthy discussions with the udev developers which led to the existing synchronisation mechanism.
Posted Nov 1, 2012 22:36 UTC (Thu) by agk (subscriber, #23332)
If it had been easy, people would have submitted patches long ago:)
But the final piece of the assembly jigsaw is now passing our tests and should hit rawhide builds this week.
Posted Nov 3, 2012 17:54 UTC (Sat) by Cato (subscriber, #7643)
Presumably an earlier invocation had failed, with the result that the fsck step failed, causing boot to fail. A shame that LVM, which generally helps uptime in many cases, caused downtime here - presumably btrfs wouldn't have this issue as it doesn't require this extra 'vgchange' type step.
I really look forward to btrfs (or perhaps ZFS in-kernel) being mature enough to use, largely for the parent-block checksumming to catch various errors.
This is on a home server in an inconvenient location, where there is only voltage regulation not UPS, and the utility power is flaky - so reliable unattended reboots are really helpful.
Posted Nov 5, 2012 10:51 UTC (Mon) by nmav (subscriber, #34036)
I like those generalizations. If a thing does A it is crap. While semaphores may be used in bad code, there is nothing to indicate bad code because it uses semaphores. Semaphores are a tool (arguably an old one), but as every tool it can be used in a bad way or _not_.
Posted Nov 5, 2012 11:04 UTC (Mon) by dwmw2 (subscriber, #2063)
Posted Nov 6, 2012 13:33 UTC (Tue) by nix (subscriber, #2304)
Posted Nov 6, 2012 16:19 UTC (Tue) by admax88 (subscriber, #75035)
Posted Nov 6, 2012 16:45 UTC (Tue) by nix (subscriber, #2304)
Posted Nov 6, 2012 16:52 UTC (Tue) by admax88 (subscriber, #75035)
Just because a project doesn't adapt to the new hotness in version control doesn't say anything about the quality of the code.
The choice of VCS is more likely an indicator of the age of the project and/or the age of the developers.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds