A btrfs update at LinuxCon Europe
Chris started by talking about btrfs and its goals in general; those have been well covered here and need not be repeated now. He reiterated Oracle's plan to use btrfs as the core filesystem for its RHEL-derivative Linux distribution; needless to say, supporting that role requires a rock-solid implementation. So a lot of work has been going into extensive testing of the filesystem and fixing bugs.
The 3.2 kernel release will see the results of that work; it will contain
lots of fixes. There will also be significant improvements to the logging
code. It turns out that a lot of data was being logged more than once,
greatly increasing the amount of I/O required; that has now been fixed.
I/O traffic for the log, it seems, has been cut to about 25% of its
previous level.
For 3.3, the main improvement seems to be the use of larger blocks for nodes in the filesystem B-tree. Larger blocks can hold more data, of course, and, in particular, more metadata. That means that metadata that was previously scattered in the filesystem can be kept together with the relevant inode. That, in turn, leads to significant performance improvements for many filesystem operations.
Another near-term feature, due to arrive "
Talk of protecting metadata leads naturally to the problem of recovering a
filesystem when its metadata has been corrupted. That is what a filesystem
checker program is for; btrfs, thus far, has been increasingly famous for
it lack of a proper checker (and, more importantly, a proper filesystem
repair tool). As of the LinuxCon talk, btrfs still does not have a real
repair tool, but some progress has been made in that direction and a couple
of other mechanisms have been provided.
The copy-on-write nature of btrfs implies that there will be numerous old
copies of the filesystem metadata on the storage device at any given time.
Any change, after all, will create a new copy, leaving the previous version
in place until the block is reused.
Chris observed that filesystem corruptions rarely affect that older
metadata, so it makes sense to use it as a primary resource in the recovery
of a corrupted disk. But, first, one needs to be able to find that
older metadata.
To that end, btrfs maintains an array containing the block locations of
many older versions of the filesystem root. The root block, he said, is
more important than the superblock when it comes to recovering data. The
root is replaced often as metadata changes percolate up to the top of the
directory hierarchy, so the "old root blocks" array contains pointers to
what is, in effect, a set of snapshots of the very recent state of the
filesystem. Clearly, this will be a valuable resource should something go
badly wrong.
One way of using that array is simply to mount the filesystem using an
older version of the root. Chris demonstrated this feature by poking holes
in a test filesystem, then mounting an older root to get back to where
things had been before. For simple, quickly-detected problems, older root
blocks should be a path toward a quick solution.
It is not too hard to imagine situations where this approach will not work,
though. If a metadata block in a rarely-changed subtree is, say, zeroed by
a hardware malfunction, it could go undetected for some time. By the time
the user realizes that something is wrong, there may be no older hierarchy
containing the information needed to put things back together. So other
solutions will be necessary.
Obviously, one of those solutions will be the full filesystem checker and
repair tool. That tool is still not ready, though. Getting a repair tool
right is a hard problem; without a lot of care, a well-intentioned attempt
to repair a filesystem can easily make it worse. Data that may have been
recoverable before the repair attempt may no longer be so afterward. Even
if a proper btrfsck were available today, it would probably be some years
before it reflected enough experience to inspire confidence in users who
are concerned about their data.
So it seems that something else is required. That "something else" turns
out to be a data recovery tool written by Josef Bacik. This tool has a
simple (to explain) job: dig through a corrupted filesystem in read-only
mode and extract as much of the data as possible. Since it makes no
changes, it cannot make things worse; it seems like a worthwhile tool to
have around even if a full repair tool existed.
That tool, along with all the requisite filesystem support, is expected to
be available in the 3.2 kernel time frame. Meanwhile, there is a new btrfs-progs repository that will include
the recovery tool in the near future. All told, it may not be quite the
btrfsck that some users were hoping for, but it should be enough to make
those users feel a bit more confident about entrusting their data to a new
filesystem. Judging from the size of the crowd at Chris's talk, there are
a lot of people interested in doing exactly that.
[Your editor would like to thank the Linux Foundation for funding his travel to LinuxCon Europe.]right after fsck
",
is the merging
of Dave Woodhouse's RAID5 and RAID6 implementations. That work was initially posted in 2009; Chris apologized for
taking so long to get it merged. How this feature will actually be used
still needs some thought; RAID5 or 6 is quite good for data, but it
can be problematic for metadata, which tends to not fill anything close to
a full RAID stripe and, thus, can lead to low I/O performance. Happily, btrfs has
been designed from the beginning to keep
data and metadata separate; that means that things can be set up where data
is protected with full RAID while metadata is managed using simple
mirroring.
Index entries for this article Kernel Btrfs Kernel Filesystems/Btrfs Conference LinuxCon Europe/2011
Posted Nov 3, 2011 1:03 UTC (Thu)
by kragilkragil2 (guest, #76172)
[Link] (2 responses)
Posted Nov 3, 2011 4:51 UTC (Thu)
by drag (guest, #31333)
[Link] (1 responses)
That is good and bad.
Good for us because now we will get to see what happens when people start to use it in large scale production environments.
Bad for Oracle customers, because they will be the ones beta testing it.
Posted Nov 10, 2011 2:21 UTC (Thu)
by clump (subscriber, #27801)
[Link]
Posted Nov 3, 2011 4:17 UTC (Thu)
by ncm (guest, #165)
[Link] (2 responses)
Posted Nov 3, 2011 6:35 UTC (Thu)
by njs (subscriber, #40338)
[Link] (1 responses)
Posted Nov 3, 2011 16:20 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted Nov 10, 2011 12:59 UTC (Thu)
by callegar (guest, #16148)
[Link] (3 responses)
Posted Nov 11, 2011 11:25 UTC (Fri)
by eru (subscriber, #2753)
[Link] (1 responses)
Posted Nov 11, 2011 23:19 UTC (Fri)
by cladisch (✭ supporter ✭, #50193)
[Link]
Yes; it's essentially a 'normal' file system like, e.g., ext2.
> I have only ever seen in on DVD:s, and I suspect OS'es might cheat and not implement UDF features not needed for that task.
The Linux UDF driver defaulted to a 2048 byte sector size which would be wrong for other disk types; this was fixed two years ago. The userspace tool (mkudffs) still has the same bug; you need to remember to specify the sector size explicitly when formatting a HD or a USB stick.
At that time, there were problems with interchanging data between OSes (IIRC new files created in Linux didn't always show up in Windows); I don't know if this is still the case.
Posted Nov 13, 2011 22:35 UTC (Sun)
by skierpage (guest, #70911)
[Link]
It worked fine on my One Laptop Per Child laptop for years until it didn't, and there's no utility to repair it; neither Wikipedia nor its FAQ mention this absence. Fortunately (?) userspace has no idea of the carnage going on below it, so I could tar off my files despite all the "jffs2_get_inode_nodes: Eep. No valid nodes for ino #340448" syslog messages.
Oracle
Not releasing the code for checker a long time ago was a mistake and waiting only makes it worse. So why isn't there code? Sure people will frag their FS, so what. Tell people it eats babies in flashing red letters for a minute before they use it. BtrFS is not production ready. If some of those users provides a good bug report we will get a working BtrFS a lot sooner.
Oracle
Oracle
A btrfs update at LinuxCon Europe
A btrfs update at LinuxCon Europe
A btrfs update at LinuxCon Europe
Btrfs is not the only filesystem without a checker, unfortunately.
UDF is in the same condition. Which is equally bad since it leaves linux without an unencumbered , vendor neutral, cross platform, filesystem (and most likely this is the reason why every linux user still sticks with FAT). And which is also sort of funny, since many people do backups on that. I wonder if this btrfs case may result in more attention from distributions at the need to invest in tools so that /all/ filesystems that are supported with R/W can be checked and in case something goes wrong some data recovery can be practiced.
A btrfs update at LinuxCon Europe
Can UDF really be used as a normal R/W FS on a) Linux, b) Windows?
I have only ever seen in on DVD:s, and I suspect OS'es might cheat and not implement UDF features not needed for that task.
UDF
UDF
Windows doesn't have this problem.
JFFS2 also has no fsck
