User: Password:
|
|
Subscribe / Log in / New account

Btrfs for Rawhide users?

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jonathan Corbet
November 18, 2009
Your editor stopped using Rawhide (the Fedora development distribution) after things melted down spectacularly back in July. Since then, problems have been scarce but all that stability on the desktop has proved to be seriously boring. Additionally, running a stable distribution can make it harder to test leading-edge project releases. So your editor has been looking to return to a development distribution on the desktop as soon as time allows and things look safe enough. Rawhide's worst problems are far behind it for now; it might just be safe to go back into the water, though the beginning of the Fedora 13 development cycle could add some excitement. As an added incentive, the Fedora developers now are considering mixing in Btrfs snapshots as an optional feature; use of an experimental filesystem might not seem like the way to improve stability, but Btrfs could, in fact, make life easier for Rawhide testers.

It is worth noting at the outset that Fedora is not, yet, considering using Btrfs in Rawhide by default. What has been proposed, instead, is the implementation of a "system rollback" feature for Rawhide users who are crazy enough to install on Btrfs despite its young and immature state. If this feature works out, it could remove much of the risk of tracking Rawhide and begin the exploration of a new capability which could prove highly useful for Linux users in general in the future.

One of the many features provided by Btrfs is copy-on-write snapshots. At any time, it is possible to freeze an image of the state of the filesystem. Snapshots are cheap - at creation time, their cost is almost zero. As changes are made to the filesystem, copies will be made of modified blocks while the snapshot remains unchanged. One can certainly fill a filesystem through use of the snapshot facility - and filling Btrfs filesystems remains a bit of a hazardous thing to do - but Btrfs will share data between snapshots for as long as possible.

The value of snapshots to system administrators is fairly obvious: a snapshot can be taken immediately prior to an operating system upgrade. Should that upgrade turn out to be less of a step forward than had been hoped, the filesystem can simply be reverted back to its pre-upgrade state. The days of digging around for older versions of a broken packages - perhaps with the assistance of a rescue disk - should be long gone.

That said, there are a number of details which need to be worked out before snapshots can be made ready even for Rawhide users, much less the wider user community. Perhaps the biggest problem is that Btrfs snapshots cover the entire filesystem, so reverting to an older state will lose all changes made to the filesystem in the meantime. If a system update fails to boot, dumping the update seems like a straightforward choice - there will be no other changes to lose. But going back to a snapshot after the system has been running for a while could lose a fair amount of work, log data, etc. along with the unwelcome changes. One can always cherry-pick changed files after reverting to the snapshot, but that would be a tedious and error-prone process.

There are a lot of user interface details to take care of as well. Tools need to be created to allow administrators to look at existing snapshots, mount them for examination, clean them up, and so on. Btrfs will probably have to be extended with a concept of a user-selectable "default" snapshot for each filesystem. Grub needs some work for boot-time snapshot selection. There is also talk of eventually adding snapshot-browsing support to Nautilus as well.

Snapshots will clearly be a useful feature for Linux in the future. Back in your editor's system administration days, backup tapes were occasionally used to recover from disk disasters, but much more frequently used to help users recover from "fat-finger" incidents. Snapshots are not true backups, but they should certainly be useful as a quick error-recovery mechanism. Your editor is looking forward to the day when his system always supports a series of snapshots allowing the recent state of the filesystem to be recovered.

A snapshot is a heavyweight tool for dealing with system upgrade problems, though. In the longer term, it would make sense to have better rollback support built into the package management system itself. Interestingly, Yum and RPM have had some rollback support in the past, but that feature does not seem to be well supported now. Providing rollback support at this level is a hard problem, to say the least, but solving that problem would put a powerful tool into the hands of Linux system administrators.

In the absence of this feature, filesystem-level snapshots will have to do; certainly they are a major improvement over what we have now. In the short term, potential users should remain aware that Btrfs is a very young filesystem, and that snapshots may not be a viable recovery mechanism if the filesystem itself gets corrupted. In the longer term, though, there will be a day when we will wonder how we ever used our systems without this feature. The work being done by the Fedora developers is an important step in that direction.


(Log in to post comments)

Btrfs for Rawhide users?

Posted Nov 19, 2009 2:05 UTC (Thu) by JoeBuck (guest, #2330) [Link]

The NAC fileserver I use at work makes snapshot data visible under a magic hidden subdirectory called .snapshot, which can be used to find the previous state of files under a given directory. This takes care of most fat-finger problems. Does btrfs provide something similar?

Btrfs for Rawhide users?

Posted Nov 19, 2009 2:18 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

Very recently, yes.

Btrfs for Rawhide users?

Posted Nov 19, 2009 6:18 UTC (Thu) by jimparis (subscriber, #38647) [Link]

Perhaps the biggest problem is that Btrfs snapshots cover the entire filesystem
Another problem is that Btrfs snapshots only cover the entire filesystem. Unless you're in the habit of dropping to single-user mode to do your upgrades and subsequent recoveries, applications will left in a highly inconsistent state when their disk storage suddenly changes from under them.

Makes me wonder if integrating something like checkpoint/restart would be required to make system-wide snapshotting complete. Or "virsh save" + btrfs snapshots, etc.

Btrfs for Rawhide users?

Posted Nov 19, 2009 11:08 UTC (Thu) by farnz (subscriber, #17727) [Link]

I may be misunderstanding the btrfs filesystem format, but it looks to me like a future development of btrfs would allow you to keep snapshots of subtrees, and to assemble views of the filesystem from a mixture of snapshots. I certainly can't see how the B-trees prohibit this.

Of course, there's admin complexity galore here - but I can envisage something that goes "snapshot the filesystem. Replace all of /var except /var/rpm with the live state, leaving /var/rpm as part of this snapshot, replace /home with the live state" as part of a future btrfs version. Then, once the upgrade of the snapshot is complete, you can do "swap snapshot with live state", and get the atomic update you wanted.

Btrfs for Rawhide users?

Posted Nov 19, 2009 11:31 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

Isn't the best answer (until BTRFS is stable) to partition your disk, and use an experimental btrfs partition for storing backups? Snapshot, then rsync for backup only of the changed data to the btrfs filesystem.

Even better using two disks, of course. Click-click scritch ...

Btrfs for Rawhide users?

Posted Nov 19, 2009 13:07 UTC (Thu) by epa (subscriber, #39769) [Link]

Fedora does drop to single-user mode to perform a distribution upgrade. Individual package updates during the life of one release happen asynchronously, but I would hope those can be rolled back using the normal rpm and yum mechanisms (although admittedly there isn't much good user interface for that, as there is for upgrades).

Btrfs for Rawhide users?

Posted Nov 19, 2009 13:55 UTC (Thu) by masoncl (subscriber, #47138) [Link]

Btrfs supports snapshotting on a per-subvolume basis, and subvolumes are a lot like a directory (except you can snapshot them).

So it isn't entirely true that you have to roll back the entire filesystem, but you would have to roll back the entire thing that you've snapshotted.

Exactly what to snapshot is a similar problem to breaking up the FS tree into partitions. Do we keep / and /home separate? If the upgrade fails do we want to roll back just one of the two?

It's a fun problem...to answer another question I see farther down, can we easily recover the space from old snapshots deleted, yes.

Btrfs for Rawhide users?

Posted Nov 23, 2009 22:08 UTC (Mon) by oak (guest, #2786) [Link]

> It's a fun problem...to answer another question I see farther down, can
we easily recover the space from old snapshots deleted, yes.

And if running processes are still keeping files open from the old state
of the file system (like previous version of the C-library kept open by
every process mmap), what do you do?

Kill all those process when deleting the old snapshot? Tell that snapshot
is busy and cannot be deleted and just list processes or recommend reboot
to get rid of them?

a file which a process holds open is still an inode in the mounted fs, even if no filename points to it.

Posted Nov 27, 2009 3:29 UTC (Fri) by xoddam (subscriber, #2322) [Link]

If running processes have files open which have been removed from their original *directories* and replaced by new files at the same pathnames by package updates, the old *inodes* will still be in the *current* filesystem until such time as the running processes close the files.

This 'just works' even in the absence of snapshots.

With snapshots, it means the disk space in question won't be reclaimable when old snapshots are removed, but it won't stop the snapshots themselves from being removed. The space will become reclaimable only when the data is no longer in use -- i.e. when the inode is no longer open in the live, mounted filesystem.

Btrfs for Rawhide users?

Posted Nov 22, 2009 16:47 UTC (Sun) by anton (subscriber, #25547) [Link]

Yes, for most applications you will probably have to shut them down before switching a file system from one snapshot to another one. You usually don't have to do that when you do a snapshot, because (if Btrfs gets it right) the snapshot will represent some state that logically existed during the execution of the application. When you recover from that snapshot, it's as if the application was killed or crashed by itself at that point, and the application should be able to recover from that. For upgrades the package management system and the scripts for the application package tend to perform whatever shutdowns and restarts are necessary for the particular package (at least that's my experience in Debian, don't know about Fedora).

Of course, only covering one file system can pose a consistency problem if the application distributes its data across several file systems. It would be cool if one could snapshot all file systems at the same (logical) time.

Butterfish

Posted Nov 19, 2009 8:00 UTC (Thu) by ncm (subscriber, #165) [Link]

I gather that Chris pronounces it something like "better-eff-ess", but I can't see the name as anything other than "butterfish". In Hawaii this would be pronounced "bahttah-feesh", always preceded by "mmmm!". Remember, you can tune a file system, but you can't tuna fish.

Butterfish

Posted Nov 19, 2009 12:14 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

LWN moderators: Take this (post) out, and a UNIX daemon will dog your steps from now until the time_t's wrap around.

;-)

Btrfs for Rawhide users?

Posted Nov 19, 2009 9:21 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> A snapshot is a heavyweight tool for dealing with system upgrade problems, though. In the longer term, it would make sense to have better rollback support built into the package management system itself.

As long as the package manager and its database aren't too badly affected by the problems you want to roll back from. At least on first thought, using snapshots here seems to me to provide a certain healthy redundancy.

Btrfs for Rawhide users?

Posted Nov 19, 2009 9:26 UTC (Thu) by RobWilco (guest, #40828) [Link]

With LVM snapshots, you get no way to suppress or merge old snapshots once
they are not needed anymore. The snapshots are extra partitions which
contain the new files and modifications. So it is easy to rollback in case
of problem, but impossible to merge without service interruption when the
upgrade went well.

With btrfs, is there a way to suppress without service interruption old
snapshots while keeping the new files and modifications?

Concerning package management, I think Rpath Linux and specifically their
Conary package management handle package rollback by designed.

Btrfs for Rawhide users?

Posted Nov 20, 2009 2:49 UTC (Fri) by msnitzer (subscriber, #57232) [Link]

LVM snapshots will soon have the ability to merge a snapshot into its
origin, see:
https://www.redhat.com/archives/dm-devel/2009-November/ms...
http://people.redhat.com/msnitzer/patches/snapshot-merge/

v4 of the DM snapshot-merge patchset will be posted to the dm-devel mailing
list tomorrow-ish.

The finishing touches (to both DM and lvm2) are being actively worked now.
LVM2 snapshot merge will work with any filesystem that is layered ontop of
the "origin" LVM logical volume (LV).

When you create an LVM snapshot of the origin it preserves a copy of the
logical volume at a particular point in time. In general, the origin
volume continues to be used (changes and new files stored in origin) and
the snapshot is a backup. In the context of system upgrades
this would almost certainly be the case: root filesystem is layered ontop
of an LV (the "origin"); a snapshot is taken of this LV; this origin LV's
filesystem is changed (via a system upgrade). The upgrade doesn't go
well... you want to rollback. With recent LVM (and DM) snapshot
advances you can merge the snapshot back into the origin. In the process
you rollback the origin LV to the state it was in before the system
upgrade.

There is still the work of integrating and productizing LVM snapshot-merge
for use as a system rollback mechanism. An intelligent init needs to
developed to allow the user to say "I want to rollback to snapshot X". In
practice, arming the system to perform a snapshot merge requires a small
LVM metadata change. If the system is still running/bootable it is very
easy to instruct the system to rollback to a snapshot. But if the system
isn't bootable how does one make this happen (need to be able to run lvm2's
lvconvert somehow)?

But taking a step back, what do you mean by "suppress" an old snapshot? In
terms of LVM if you no longer have a use for a snapshot LV you just delete
it. And as I said the snapshot generally doesn't contain the new files and
modifications.

Btrfs for Rawhide users?

Posted Nov 24, 2009 11:01 UTC (Tue) by RobWilco (guest, #40828) [Link]

Thanks for your reply.

When asked, I'll blame everything on the documentation!

I got back to RTFM and doing some test and I realized LVM snapshots suits my
need just fine. I got the wrong impression that, under the hood, the "real"
partition was not touched anymore, and all the update would take place and
the snapshot. I was confused by the "difference" being written in the
snapshot.

Now I do not understand the problem that "snapshot merge" resolve...

Bye,

Btrfs for Rawhide users?

Posted Nov 19, 2009 9:56 UTC (Thu) by TRS-80 (subscriber, #1804) [Link]

OpenSolaris has supported this for a while, thanks to pkg image-update, so there's already code for GRUB to do something similar with ZFS. ZFS has an advantage in that you can create subfilesystems for /home, /opt etc. in the same partition that aren't captured by the image and so are unaffected if you rollback.

The OpenSolaris implementation is slightly different though - rather than snapshotting then upgrading, it clones and then upgrades the clone. This is also how the older Solaris Live Upgrade feature worked too, so in this case changes you make after the upgrade but before rebooting aren't in the upgrade.

Btrfs for Rawhide users?

Posted Nov 19, 2009 15:10 UTC (Thu) by zooko (guest, #2589) [Link]

Maybe Fedora can borrow some code or ideas from apt-clone:

http://www.nexenta.org/os/AptCloneMan
http://www.nexenta.org/os/TransactionalZFSUpgrades

Btrfs for Rawhide users?

Posted Nov 26, 2009 14:07 UTC (Thu) by yhdezalvarez (guest, #29255) [Link]

I think more than that is needed. There is an emerging theme here:

* Software Configuration Management and Version Control Systems

Can you imagine a hybrid of a package manager and (for example) git and all its tools to manage versions? I certainly can. Combined with snapshots? Different processes using different snapshots? A “testing snapshot”? wow, even better

But no, we keep insisting in beating around the bush, and not to tackle the main problem.

Btrfs for Rawhide users?

Posted Nov 19, 2009 15:34 UTC (Thu) by pjones (subscriber, #31722) [Link]

It is worth noting at the outset that Fedora is not, yet, considering using Btrfs in Rawhide by default.

This isn't really the case - we discussed this at FudCon in Boston last January, and it's certainly been something people are working towards. I imagine it will also be discussed at length in Toronto in two weeks.

Btrfs for Rawhide users?

Posted Nov 23, 2009 22:11 UTC (Mon) by oak (guest, #2786) [Link]

> Interestingly, Yum and RPM have had some rollback support in the past,
but that feature does not seem to be well supported now.

Package manager will then need to track everything package install does.
So package cannot have its own "postrm" / "postinst" scripts, or package
manager needs file system level tracking of the changes (file system state
snapshots...).

postinst and LD_PRELOAD

Posted Dec 7, 2009 4:17 UTC (Mon) by gmatht (subscriber, #58961) [Link]

The postinst scripts could be run with fopen overridden using LD_PRELOAD
so as to make a backup of each file fopened by the script before resuming
the script. (This trick is used by plash, plasticfs etc.)


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds