LWN.net Logo

Whither btrfsck?

By Jonathan Corbet
October 11, 2011
The btrfs filesystem was merged into the mainline in January, 2009 for the 2.6.29 kernel release. Since then, development on the filesystem has accelerated to the point that many consider it ready for production use and some distributions are considering using it by default. The filesystem itself is nearly functionally complete and increasingly stable, but there is still one big hole: there is no working filesystem checker for Btrfs. As user frustration over the lack of this essential utility grows, an interesting question arises: is some software too dangerous to be released early?

This tool (called "btrfsck") has been under development for some time, but, despite occasional hints to the contrary, it has never escaped from Chris Mason's laptop into the wild. This delay has had repercussions elsewhere; Fedora's plan to move to btrfs by default, for example, cannot go forward without a working filesystem checker. Most recently, Chris said that he hoped to be able to demonstrate the program at the upcoming LinuxCon Europe event. That, however, was not enough for some vocal users who have started to let it be known that their patience has run out. Thus we've seen accusations that Oracle really intends to keep btrfs as a private, proprietary tool and statements that "It's really time for Chris Mason to stop disgracing the open source community and tarnishing Oracle's name." Those are strong words directed at somebody who has done a lot to create a next-generation filesystem for Linux.

Your editor would like to be the first to say that both the open source community and Oracle benefit greatly from Chris's presence. The cynical might add that Oracle has delegated the task of "tarnishing its name" to employees who are more skilled in that area. That said, it is worth examining why btrfsck remains under wraps; had the tool been put out in the open - the way the filesystem itself was - chances are good that others would have helped with its development. One could argue that the failure to release btrfsck in any form has almost certainly retarded its development and, thus, the adoption of btrfs as a whole.

According to Chris, the early merging of btrfs was important for the creation of the filesystem's development community:

Keep in mind that btrfs was released and ran for a long time while intentionally crashing when we ran out of space. This was a really important part of our development because we attracted a huge number of contributors, and some very brave users.

But, he says, the filesystem checker ("fsck") is a bit different, and is not ready yet even for the braver users:

For fsck, even the stuff I have here does have a way to go before it is at the level of an e2fsck or xfs_repair. But I do want to make sure that I'm surprised by any bugs before I send it out, and that's just not the case today. The release has been delayed because I've alternated between a few different ways of repairing, and because I got distracted by some important features in the kernel.

Josef Bacik expressed the fears that keep btrfsck out of the community more clearly:

Fsck has the potential to make any users problems worse, and given the increasing number of people putting production systems on btrfs with no backups the idea of releasing a unpolished and not fully tested fsck into the world is terrifying, and would likely cause long term "I heard that file system's fsck tool eats babies" sort of reputation.

He went on to say "Release early and release often is nice for web browsers and desktop environments, it's not so nice with things that could result in data loss." This is a claim that raises some interesting questions, to say the least.

One could start by questioning the wisdom of running a new filesystem like btrfs in production with no backups and no working filesystem repair tool. How is it that releasing the filesystem itself is OK, but releasing the repair tool presents too much of a risk for users? How does that tool really differ from a web browser, especially given that the browser is exposed to all the net can throw at it and bugs can easily lead to exposure of users' credentials or the compromise of their systems? There is no shortage of software out there that can badly bite its users when things go wrong.

That said, there are some unique aspects to the development of filesystem repair tools. They are invoked when things have already gone wrong, so the usual rules of how the filesystem should be structured are out the window. They must perform deep surgery on the filesystem structure to recover from corruptions that may be hard to anticipate and correct; one could paraphrase Tolstoy and say that happy filesystems are all alike, but every corrupted filesystem is unhappy in its own way. As the checker tries to cope with a messed-up filesystem, it works in an environment where any change it makes could turn a broken-but-recoverable filesystem into one that is a total loss. In summary, btrfsck will not be an easy tool to write; it is a job that is almost certainly best left to developers with a lot of filesystem experience and who understand btrfs to its core. That narrows the development pool to a rather small and select group.

And, in the end, no responsible developer wants to release a tool which, in his or her opinion, could create misery for its users. Those users will run btrfsck on their filesystems regardless of any blood-curdling warnings that it may put up first; if it proceeds to destroy their data, they will not blame themselves for their loss. If Chris does not yet believe that he can responsibly release btrfsck for wider use, it is not really our place to second-guess his reasoning or to tell him that he should release it anyway. Anybody who feels they cannot trust him to make that decision probably should not be running the filesystem he designed to begin with.

Releasing software early and often is, in general, good practice for free software development; keeping code out of the public eye often does not benefit it in the long run. Perhaps btrfsck has been withheld for too long, but that is not our call to make. The need for the tool is clear - if nothing else, Oracle has decided to go with btrfs by default in the near future. There can be no doubt that this need is creating a fair amount of pressure. The LinuxCon demonstration may or may not happen, but btrfsck seems likely to make its much-delayed debut before too much longer.


(Log in to post comments)

Whither btrfsck?

Posted Oct 11, 2011 18:40 UTC (Tue) by maniax (subscriber, #4509) [Link]

I don't see the difference between a file system that has no repair tools or one with even bad ones. If your gets broken, short of the tool there's nothing else you can do anyway (and you can backup whatever possible/readable). The problem is that there's this filesystem in the kernel which people are using and is not to be relied on.

And well, for example the reiserfsck for reiser3 was just this kind of tool - created more problems, had weird issues, even crashed sometimes, and we lived through it and it got properly fixed. So I don't buy this argument at all.

Whither btrfsck?

Posted Oct 11, 2011 19:06 UTC (Tue) by raven667 (subscriber, #5198) [Link]

I have to agree with you on this one and disagree with our intrepid editor. The best time to release the fsck tool would have been a couple of years ago when the filesystem itself was in its eating babies stage and both could have matured together. As it is maybe it would be better to reduce the scope of the tool, instead of trying to comprehensively detect and fix any possible type of corruption, only fix what can be done robustly. You could add in new checks and fixes bit by bit depending on what kinds of corruption are ran into in production use.

Hmm, brainstorming for a minute how about taking another tack with the fsck tool entirely, have it leave the corrupt filesystem unmodified and write entirely new metadata on an external block device that can be used to recover data in extreme situations. Better that then nothing.

Whither btrfsck?

Posted Oct 11, 2011 19:56 UTC (Tue) by DiegoCG (subscriber, #9198) [Link]

I'm more worried about the lack of updates to the rest of the btrfs tools. Scrubbing, per file/directory COW and compression and read-only snapshots are available in the mainline fs code, but the corresponding changes to the btrfs tool have not been merged...

Whither btrfsck?

Posted Oct 11, 2011 19:57 UTC (Tue) by DiegoCG (subscriber, #9198) [Link]

(oops, I replied to the wrong message)

Whither btrfsck?

Posted Oct 11, 2011 22:37 UTC (Tue) by terryburton (subscriber, #26261) [Link]

On more that one occasion I have performed recoveries on corrupt filesystems of multi-user *nix boxes. (Think of Computer Science students keen to explore the latest kernel exploits against a teaching/research box...)

In severe circumstances the safest approach is usually to image the damaged filesystem to a different host, restore the live system from a trusted backup, let the users back in, then to perform the actual data recovery offline (fsck + debugfs) - restoring only the recovered content that users actually request or is full trusted (e.g. file checksums in btrfs.) This balances the need for access to fresh data against the risk of rouge data appearing into other users' files or latent corruption remaining in the recovered filesystem that may result in future data loss or crashes.

For the more routine, periodic checks that should be performed online (check interval expired, mount count exceeded, etc.) you had better trust your tools.

Whither btrfsck?

Posted Oct 11, 2011 19:00 UTC (Tue) by Lovechild (subscriber, #3592) [Link]

Perhaps a nice compromise would be to make the current source available in a git tree but not make a release and make it clear to distributions that it is not in a releasable state. Thus hopefully keeping it out of the hands of users but make the source available to the wider btrfs development community for review and early testing.

Whither btrfsck?

Posted Oct 11, 2011 20:46 UTC (Tue) by drag (subscriber, #31333) [Link]

Maybe Chris can send out invites to people he knows that can help him, and then if they accept his terms for keeping everything relatively out of the hands of end users then they can join him. Then those people can request the same for their groups of testers.

Even if 'pre-prod' code does slip out into the wild it's not going to be damaging as long as the distributions don't pick up on it.

Whither btrfsck?

Posted Oct 12, 2011 20:22 UTC (Wed) by rahvin (subscriber, #16953) [Link]

Terms? On GPL software?

I don't agree with any of you. I know people will complain regardless of how scary the warning is, but you simply tell them to go pound sand and that they were sufficiently warned before they ran the tool of the danger. That's how I've always viewed it and when my FS was fried and I got that scary warning I knew I was on my own. After all, that's what real offline backups are for.

I don't understand how fsck can keep up with the file-system when one is integrated into the kernel and advancing independently of the other. It just seems like no matter what you do if you don't involve the wider development community you will never catch up.

I'm not a dev so my opinion doesn't matter but I think this is one of those things that should be out there for everyone to comment and help with. That doesn't mean he has to take all the suggestions as Chris could act just like Linus and lay down ground rules and play the benevolent dictator. I just don't see how he can essentially work alone (or even small team) on a tool that services a FS that's in wide use and development.

Let the distributions decide when it's ready to be compiled and included. Anything else seems to be favoring a single company. What's happening now would be like Linus declaring that no one can use Kernel 3.X until he decides they can.

"Keeping up" with btrfs.

Posted Oct 13, 2011 14:17 UTC (Thu) by kena (guest, #2735) [Link]

btrfsck doesn't -- unless I miss my mark -- need to "keep up." The on-disk format has been finalized for some time, and *that* is what btrfsck needs to be aiming at. The hueristics and algorithms of *how* btrfs, say, optimizes writes really doesn't much matter.

Whither btrfsck?

Posted Oct 12, 2011 14:08 UTC (Wed) by kragilkragil2 (guest, #76172) [Link]

Couldn't he just release it under some source-only license or something that prohibits distribution in binary or whatever? (doesn't have to be OSI)

Sorry, but I really don't get why this thing is not used/tested by more people. Smells horribly like some new Oracle evil to me.

Whither btrfsck?

Posted Oct 11, 2011 19:11 UTC (Tue) by fuhchee (subscriber, #40059) [Link]

"Oracle has decided to go with btrfs by default in the near future"

Jon, can you give a link to more information about this?

Oracle going to btrfs

Posted Oct 11, 2011 19:14 UTC (Tue) by corbet (editor, #1) [Link]

All I know I learned from this message from Chris.

Whither btrfsck?

Posted Oct 11, 2011 19:15 UTC (Tue) by Lovechild (subscriber, #3592) [Link]

Chris Mason stated so in the latest btrfs.fsck update thread on the btrfs mailing list.

Whither btrfsck?

Posted Oct 11, 2011 19:29 UTC (Tue) by masoncl (subscriber, #47138) [Link]

This is true, Oracle is greatly expanding it's use of btrfs in QA and will soon add it as a fully supported option. We will definitely have a working btrfsck before we recommend btrfs use in production.

Btrfs still won't be a good choice for database use, but thankfully the database has a few other ways to get rows and columns down to the storage.

It's worth pointing out that I have a number of slides featuring btrfsck for my linuxcon europe presentation and that Josef Bacik moved forward with a recovery tool that can copy data out of a damaged filesystem.

We're making progress, I'm just slower than I wanted to be.

Whither btrfsck?

Posted Oct 12, 2011 19:44 UTC (Wed) by Baylink (subscriber, #755) [Link]

We know, and you're not billing us for the time, and that's fine.

The question, Chris, isn't "should btrfs be in production use by now", or even "should btrfs be in production use while it doesn't have a working fsck in the wild".

It's "should people too stupid to breathe be protected from themselves by delaying the release of a tool that might make a bad situation worse"?

While I don't think Libertarianism is a workable political theory on the national scale, growing up on Heinlein has made me lean towards it as a personal philosophy, and I'm inclined towards the arguments made above: in short, that with enough eyeballs, all bugs are shallow.

And that anyone dumb enough to trust a development-release filesystem to unbacked up real data of any sort deserves exactly what they get.

The real question is: what percentage of such users will *really* blame the developer -- especially if they had to, for example, go out and get the FS code, instead of just choosing it off an installer screen?

I suspect there are people who think it's higher than I do, but I don't have data.

In any event, we look forward to your patch. :-)

Whither btrfsck?

Posted Oct 12, 2011 23:37 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

where things get ugly is when distros jump in and make development/experimental releases/features standard.

just look at the horrible reputation that KDE4 got from distros including it too early as an example.

Whither btrfsck?

Posted Oct 12, 2011 23:39 UTC (Wed) by Baylink (subscriber, #755) [Link]

An excellent example, cause SUSE 11 did that to me, and I won't touch KDE4 now with an 39-1/2 foot pole.

Perhaps it's actually usable now, but it certainly wasn't, then...

Whither btrfsck?

Posted Oct 13, 2011 6:48 UTC (Thu) by niner (subscriber, #26151) [Link]

It is. The desktop and most applications are in my experience more stable and powerful than 3.5.x was. Only kontact is still having problems (running kmail 2 with akonadi), but they always had. It's just different problems now and hopefully in the new codebase they are actually gonna be fixed.

Whither btrfsck?

Posted Oct 13, 2011 9:09 UTC (Thu) by malor (subscriber, #2973) [Link]

Well, but to be fair, they released it as a .0 version, which has a very specific meaning in the computer world (this is ready for you to use now!), and they did it explicitly to fool people into testing it... there are posts to that effect, that they knew putting a .0 on it would get more testers involved. Then they put not ready for production use down in the mass of announcement text. When people complained that the code was pretty crap, they shrugged and pointed at their escape clause.

They broke the implied promise of end-user readiness, so I'd say the KDE team fully earned their loss of reputation. All they had to do was release it as 4.0 Alpha, and they'd have been mostly okay. Everyone knows code takes time to stabilize.

Likewise, I think if btrfsk is released as 0.1 alpha, and it warns you that it sucks when you run it, then even if it does eat some babies, it won't be any big deal. And it MIGHT develop faster, although as corbet says, there may not yet be enough expertise in btrfs for anyone but Chris to be very useful working on it.

Whither btrfsck?

Posted Oct 13, 2011 10:14 UTC (Thu) by sorpigal (subscriber, #36106) [Link]

The primary benefit at this point of releasing the code seems to be "building trust" and not "speeding development." Maybe it gets debugged and improved faster, maybe not, but at least accusations of chicanery can be eliminated.

Whither btrfsck?

Posted Oct 11, 2011 20:02 UTC (Tue) by drdabbles (subscriber, #48755) [Link]

While BTRFS is in the kernel, it isn't considered "stable" by any sane system admin. team. The only people putting it on their systems are individual users with individual machines that could potentially lose all data without harm. Anybody else should expect nothing less than having a baby eaten.

Having said that, I've run BTRFS in several different scenarios with varying levels of success. In the early days, it was easy to run a balance that would never end. Those situations would lead to data being lost, and that was to be expected. Most recently, I run it on my personal laptop with a slow HDD in it- I find the compression option to be a welcome speedup for the sub-par performing media.

So, who's at fault for this out-cry for a fsck tool? Did the distros include BTRFS too soon? Did the BTRFS guys market their filesystem too well? Or are we all just a group of tinkerers that would have found BTRFS anyway and ran it on everything we could..and damn the consequences? Personally, I think it's mostly the latter. This exact pattern has happened several times before with non-ext* filesystems. As mentioned already, reiser was the most obvious example.

Whither btrfsck?

Posted Oct 11, 2011 20:59 UTC (Tue) by jcm (subscriber, #18262) [Link]

I might be old fashioned... :)

I learned a long time ago that you want your data on tried and tested technology and filesystems. Years ago when I was first getting into doing sysadmin stuff part time in college I thought it would be fun to run upstream kernels on a production system built on reiserfs. Yea, those were the days. Then I grew up. Now, no matter how hard you convinced me, I wouldn't choose to use btrfs for data I cared about (and I wouldn't run random kernels either). It's nothing to do with the developers (who I'm sure are doing great work) and everything to do with the same reason I haven't had Lasik yet. I'm going to let many other people beta test and learn their own lessons from the experience, so I don't have to do that all over again.

But also, nobody will listen to things like the above. So yes, some stuff is too dangerous to be released. Someone will install and use it no matter how many warnings you put on it. And when they do, they'll forever hate you when it behaves exactly as you said it would. Like how I would never use reiserfs again no matter how good it claims to be. It ate my data once, so it will never be getting a second chance.

Jon.

Whither btrfsck?

Posted Oct 11, 2011 21:23 UTC (Tue) by fuhchee (subscriber, #40059) [Link]

"It ate my data once, so it will never be getting a second chance."

Do you think it is practical to have that strict-sounding an attitude with more mainstream parts of linux? Ever had a data-corruptor kernel bug? Or a crash with a corrupted filesystem? Or a hard drive model fail?

Whither btrfsck?

Posted Oct 12, 2011 0:13 UTC (Wed) by PaulWay (✭ supporter ✭, #45600) [Link]

The problem is that people do have this attitude. I work with developers who don't like Red Hat because of what they did in 1999. I work with people who prefer Debian because of "RPM Hell". I've worked with people who hate Perl because of the quality of code written in Perl 3. I've worked with people who have sworn never to use a Seagate disk because of that great problem they had back in 2003 or whatever.

To repurpose http://xkcd.com/242/, maybe scientists can afford the time to see if the same problem happens again later. But sysadmins and programmers tend to be under pressure to produce results, and lots of them can't afford the time to retest their assumptions later. The better ones I've seen test new things out or test their assumptions out at home or on test systems in spare time. The cowboys test them on live systems in production and then wonder why everyone gets upset.

I agree we should all be able to test out btrfsck and have the code in the open so that people can contribute ideas about those corner cases and weird fuzzy problems that Chris might not have thought about. But if there are already people complaining that btrfs is eating their data, then releasing btrfsck early and buggy is only going to hurt its reputation. That's why I support Chris's decision to hold onto it for now. I'm sure he's trying to do the right thing but it's his decision to make.

Have fun,

Paul

Whither btrfsck?

Posted Oct 12, 2011 19:46 UTC (Wed) by Baylink (subscriber, #755) [Link]

Paul: that argument seems like it would support Chris *not releasing the FS code*. But as the article makes clear, that wasn't the case. It's *only* the fsck that's not in the wild.

Whither btrfsck?

Posted Oct 12, 2011 0:52 UTC (Wed) by jcm (subscriber, #18262) [Link]

As Paul points out, people have this attitude (and I'm one of them sometimes). I'm a scientist and I'm all for zapping myself with magic lightening machines (see xkcd reference cited below) on my own dime and at home, but if I were a sysadmin again I wouldn't be doing that with other people's valuable data. I'd be saying "gee, reiserfs bit me once so now it has to go out of its way to prove that I should use it over something I know is just going to work". This is also why in the real world people are still running OS releases from several years ago, and why they want them supported for the next million years. Because it works and that's all they want.

Whither btrfsck?

Posted Oct 12, 2011 8:23 UTC (Wed) by arnd (subscriber, #8866) [Link]

I think that's all fine, as long as we have enough people zapping themselves with btrfs in their homes[1]. A problem that I frequently heard cited with mainframe OSs is that every customer buys the latest release the moment it comes out and then waits for six to twelve months before installing it since they know it is less stable than what they want.
This makes absolute sense from an individual customer's point of view, but is extremely counterproductive on a global scale since many bugs that matter in real life only get found in real-life conditions.
At this point I would like to thank everybody who is running the latest kernel or distro snapshots and writes bug reports about them.

[1] Me included, will gladly donate broken root fs image to btrfsck testing now that I copied all the readable data to a new partition.

Whither btrfsck?

Posted Oct 12, 2011 19:47 UTC (Wed) by Baylink (subscriber, #755) [Link]

I Am Not A VM Guy... but what I know of them suggests that

> A problem that I frequently heard cited with mainframe OSs is that every customer buys the latest release the moment it comes out and then waits for six to twelve months before installing it since they know it is less stable than what they want.

isn't actually true: I would assume the load the new OS in a VM or LPAR, and dump test loads on it to see if it works, before promoting it to production.

Hardware-level virtualization makes that stuff easy...

Whither btrfsck?

Posted Oct 12, 2011 20:19 UTC (Wed) by arnd (subscriber, #8866) [Link]

You're right, I oversimplified. What tends to happen is that people wait half a year after the release, then install the new OS into a test partition and test for another six months before putting the system into production.
The times are obviously different depending on how paranoid the admin is and on the type of workload.

However, there are still good reasons for waiting: The bugs you encounter while running the new software on your test partition are often different from the bugs that other people encounter while running the same software in production. If you care a lot about stability, you probably want to have both kinds of problems solved before you get to the point of no return.

Whither btrfsck?

Posted Oct 13, 2011 13:35 UTC (Thu) by pspinler (subscriber, #2922) [Link]

Speaking as a VM guy, you're mostly right, except that we do our new releases in a 2nd level virtual.

VM is pretty cool, and it makes a neat platform to run linux on. :-)

-- Pat

Whither btrfsck?

Posted Oct 13, 2011 15:06 UTC (Thu) by Baylink (subscriber, #755) [Link]

And that's a conversation I'd like to have with Someone Who Knows, off line, in much greater detail, Pat... :-)

Whither btrfsck?

Posted Oct 11, 2011 22:21 UTC (Tue) by SEJeff (subscriber, #51588) [Link]

<sarcasm>
LASIK was approved by the FDA in 1995. Do you run RHEL 2.1 (released Oct 1995) in production because you're letting other people beta test this new stuff like RHEL4?
</sarcasm>

Even though many people see Oracle's Unbreakable Linux as CentOS with a nice kernel, that doesn't mean Oracle isn't doing some absolutely bleeding edge Linux work. For them to commit to officially supporting btrfs in their OEL builds means that they'll commit to supporting it in real production environments. That does say an awful lot about it coming from the company who employs the lead developer of said filesystem.

Give it a year or two of solid beating and it will be good to go. My personal take is that if ext4 gets stable snapshotting support, it will be good enough for most people.

Whither btrfsck?

Posted Oct 11, 2011 22:33 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

and how exactly did the FDA test what happens to people's eyes 40 years after the LASIK surgery? how do you really know that you aren't trading the inconvenience of wearing glasses for some years of blindness later?

being wary of LASIK for something irreplaceable like your eyes is not completely unreasonable.

Whither btrfsck?

Posted Oct 11, 2011 22:38 UTC (Tue) by corbet (editor, #1) [Link]

What?? You mean you didn't back up your eyes before letting the Lasik people hack on them?!? I bet you didn't wait for the repair tool to be ready either.

Actually, for some of us, being able to restore our eyes to a 40-year-old backup starts to sound pretty nice. That said, I'm not sure how apt the comparison really is...

Whither btrfsck?

Posted Oct 11, 2011 22:49 UTC (Tue) by SEJeff (subscriber, #51588) [Link]

Not being an opthomologist, I am the wrong person to ask. However, since Dr Caster (the guy who wrote *the* book[1] on LASIK and performed my procedure) had the surgery himself, chances are the effects are understood well enough. . LASIK is a physical procedure, not something like a medicine which may cause kidney damage 40 years down the road. It either works, or it doesn't.

However this is way off topic and I was only using that reference in jest. The parent commenter has good reasons for his opinion as some people are more risk adverse than others. This is LWN, not /., lets talk more about the tech.

[1] http://www.amazon.com/Lasik-Miracle-Complete-Better-Visio...

Whither btrfsck?

Posted Oct 12, 2011 2:01 UTC (Wed) by jg (subscriber, #17537) [Link]

I sat next to an opthamologist who specialized in LASIK one trip; turns out the basic operation of using a microtome goes back a very long way, so they had very high confidence about the very long term consequences of opening up the eye since it had been done for many decades for other reasons, The only twist was adding the laser to precisely control the material removed.

Whither btrfsck?

Posted Oct 12, 2011 22:18 UTC (Wed) by rahvin (subscriber, #16953) [Link]

PRK and it's predecessors were in wide use in the early 80's and it was developed far earlier. What they did and what Lasik does are only different in the tool used to make the cuts. The stuff in the early 80's was done by hand with scalpels in a hospital, today it's done in an outpatient surgery through the lens with computer controlled lasers using layouts developed mostly by hand in the 80's. The most recent versions of Lasik now can map and create custom modifications of the standard cut patterns.

But basically the procedure's been in use for more than 30 years. Considering the procedure isn't even recommended until your 20's and myopia sets in at 40 the effects of Lasik on aging eyes are well understood at this point. If you are avoiding Lasik because you think it's long term effects aren't known you really need to look into it more and understand the history.

My wife went from about 20-400 with an astigmatism (before surgery got worse every year) to 20-15 and hasn't had a single problem (other than the night halos which are guaranteed). In her opinion it was the best 3 grand we ever spent. She can even see better than me and her vision is fixed by the scars, other than age related myopia her vision will never change.

OT: myopia vs. presbyopia

Posted Dec 6, 2011 4:54 UTC (Tue) by Duncan (guest, #6647) [Link]

Myopia (near-sightedness) setting in in one's forties? Umm... Not commonly.

More like presbyopia, age related loss of focus/accommodation ability for near work, reading and the like, typically noticed first in one's 40s reading small print in dim light, and generally thought to be related to loss of crystalline lens elasticity.

There's also hyperopia aka farsightedness. Presbyopia is often called farsightedness as well, tho presbyopia is a special term for the age related loss of lens flexibility and thus near focus.

I'm highly myoptic (-11-ish diopters, too much so for good lasik results) with astigmatism (irregular or toric curvature of the cornea or lens, thing American football shaped). Hard (gas-permeable) contacts correct more accurately for the astigmatism and give me a much wider field of view without the peripheral distortion of glasses at the required corrective values. I'm also in my mid-40s and need reading glasses to reduce the contact correction (calibrated for distance vision) by a couple diopters (so to -9-ish, still highly myoptic), thus have personal experience with astigmatic myopia from early childhood, and presbyopia for a half decade or so. Thus my personal knowledge of the subject.

Wikipedia and google of course have more.

Duncan

Whither btrfsck?

Posted Oct 12, 2011 23:05 UTC (Wed) by gdiffey (subscriber, #65017) [Link]

and you can now do it yourself at home!

http://www.lasikathome.com/

Whither btrfsck?

Posted Oct 13, 2011 8:02 UTC (Thu) by hickinbottoms (subscriber, #14798) [Link]

Off topic, but I particularly enjoyed how the guy holding the device on that website is still wearing glasses. Nice touch.

Stuart

Whither btrfsck?

Posted Oct 11, 2011 21:03 UTC (Tue) by josh (subscriber, #17465) [Link]

I really don't understand why people consider fsck a part of any production environment. I'd expect a filesystem to detect corruption, avoid crashing, and attempt to continue to allow access to uncorrupted data. But for the corrupted data itself, I don't see anything wrong with saying "restore from backup if that happens", rather than "hope that fsck fixes it".

Whither btrfsck?

Posted Oct 11, 2011 21:23 UTC (Tue) by drdabbles (subscriber, #48755) [Link]

I think there are varying levels of corruption and varying levels of acceptability of recovering corrupted data. For instance, recovering partial documents is better than recovering nothing. But, recovering data that isn't obviously corrupted to the human eye (say, scientific data at the LHC) could be disastrous.

But to your point, I agree completely. Having backups and recovering from them is almost always best.

Whither btrfsck?

Posted Oct 11, 2011 22:08 UTC (Tue) by terryburton (subscriber, #26261) [Link]

Restoring from a backup may save you from the dole queue. Recovering to the data to a point immediately prior to the failure... priceless.

Whither btrfsck?

Posted Oct 12, 2011 6:27 UTC (Wed) by njs (guest, #40338) [Link]

In most situations I find I actually prefer "here's an accurate snapshot, it's 12 hours old" to "here's an up-to-the-minute copy of the data recovered as best we could, maybe your files are all correct and maybe some are corrupted, who knows?".

Which isn't to say that fsck doesn't have its uses -- most people don't have proper backups in the first place, and after things go pear-shaped, fsck is your last hope. And it can be faster than restoring from backup. And perhaps btrfsck will be better than other fsck's, in that it could potentially use btrfs's hashes to give you a list of which files might be corrupted, so you can check them (or recover just them from a backup).

I'm just saying, fsck is nice to have, but it's not *that* critical.

Whither btrfsck?

Posted Oct 12, 2011 8:52 UTC (Wed) by rvfh (subscriber, #31018) [Link]

> In most situations I find I actually prefer "here's an accurate snapshot, it's 12 hours old" to "here's an up-to-the-minute copy of the data recovered as best we could, maybe your files are all correct and maybe some are corrupted, who knows?".

That's whole point isn't it? If the fsck is not able to restore the data and gives you crap instead, then it's useless, and that's why Chris does not want it in the wild!

We need fsck, and we need it to work correctly.

Whither btrfsck?

Posted Oct 12, 2011 18:43 UTC (Wed) by njs (guest, #40338) [Link]

> If the fsck is not able to restore the data and gives you crap instead, then it's useless

But, IIUC, this is exactly what existing fsck's do. If they can recover data, great, but their main priority is to make the filesystem data structures internally consistent again. If that means randomly making up data that's missing, throwing away some files whose metadata got confused, etc., then oh well, too bad.

Hopefully btrfsck will have a mode where it uses the checksumming information to give you a guaranteed-accurate list of which files were left in an inconsistent state with potentially screwed up contents, but if so then that will be a fancy unique feature that has never been seen in a mainstream Linux fsck before. (And does btrfs even keep data checksums by default?)

I can understand why Chris doesn't want to release a known-buggy fsck, but let's be realistic about what a bug-free fsck actually does...

Whither btrfsck?

Posted Oct 12, 2011 15:20 UTC (Wed) by edt (subscriber, #842) [Link]

One point. With btrfs setup to use a mirror it should be possible to tell exactly which files, if any, are corrupt post fsck. All those checksums can be really handy...

Whither btrfsck?

Posted Oct 14, 2011 3:01 UTC (Fri) by zlynx (subscriber, #2285) [Link]

That would fix a problem caused by a hardware error, but it wouldn't fix a disk structure problem caused by a logic bug in the btrfs driver itself.

Whither btrfsck?

Posted Oct 11, 2011 21:35 UTC (Tue) by jimparis (subscriber, #38647) [Link]

For me, it's more the attitude than anything else. The strong focus on a good fsck for ext2/3/4 tells me that the developers are highly concerned with my data. They've spend a lot of time and effort writing a high-quality tool to try to recover from code mistakes or hardware errors, and it's been a part of the design from the start, not an afterthought. I suspect that the experience has also taught them the types of corruption that hard drives and filesystems are likely to experience, and made the fs code more robust as a result.

Whither btrfsck?

Posted Oct 18, 2011 21:13 UTC (Tue) by ableal (guest, #57174) [Link]

Well, just today I had a disk with a couple of bad sectors, according to the SMART info, which would give me read errors, but I could not tickle with writes to force remapping. Not with the GUI 'disk utility' anyway.

So, I got the data off, reformatted, and let loose a CLI 'fsck -c -c' on its ext2/3/4 ass. This, obscurely enough, invokes 'badblocks' with read-write test (you have to read carefully the e2fsck man page, not just 'man fsck').

After that, the SMART data claims the disk is now fine, with no pending remapping of bad sectors. But, somehow, I get the feeling there's something uncanny about this rigmarole, not totally unlike burning the feathers of a black chicken at midnight of new moon ...

Whither btrfsck?

Posted Oct 20, 2011 6:03 UTC (Thu) by eduperez (guest, #11232) [Link]

Next time, try to zero-write those bad blocks with "hdparam"; it worked for me.

Whither btrfsck?

Posted Oct 11, 2011 21:48 UTC (Tue) by jmorris42 (subscriber, #2203) [Link]

Sorry, I'm not understanding this argument. The code should be out there. If it isn't believed to be safe the distributions shouldn't package it, but people who want to try and understand it should be able to try to contribute. It couldn't hurt and might help. Not seeing the downside.

And perhaps had the fsck tool been developed in the open alongside the filesystem itself people developing the filesystem might have been motivated to add to the checking tool when adding a filesystem feature. And if they couldn't figure out how to fsck that feature they might have rethought it's implementation to include sufficient information to allow recovery in as many situations as they could plan for. Filesystem checking isn't something to worry about after the filesystem's on disk format is set in stone.

If someone is desperate enough to go pull a git repo, build a tool with UNSTABLE written all over it and run it, that they get to keep the pieces is something they probably understand. It would be a desperation move anyway, things would already be terribly wrong so why not let them give it a go? At least you might get a bug report out of it. What is the answer right now? Too bad, so sad, time to reformat? Even a buggy fsck tool beats that answer.

Whither btrfsck?

Posted Oct 11, 2011 22:04 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

I also don't buy this argument.

how is it better to leave people with a corrupted filesystem and no tool that can recover it than a corrupted filesystem with a tool that may fix the problem or may corrupt it more?

in the first case they will always have a corrupted filesystem, in the second case they may have a corrupted filesystem, or they may be able to recover it.

It's not as if people who fine their system is corrupted are going to keep the corrupted image around for years until a fsck program is available (and for that matter, without something to check the filsystem, how will they even know that it's corrupted?

Whither btrfsck?

Posted Oct 12, 2011 0:32 UTC (Wed) by lordsutch (guest, #53) [Link]

"for that matter, without something to check the filsystem, how will they even know that it's corrupted?"

More than likely the same way as with any other filesystem: kernel messages when file accesses are attempted. I don't know the default settings in mke2fs these days, but even if they're relatively conservative (on the order of every 10-20 mounts/30 days) a full filesystem check is a rare event on most Linux boxes.

Not to say that btrfsck isn't needed, but most of the time you're relying on what the kernel filesystem code is doing to maintain integrity no matter whether fsck is available or not.

Whither btrfsck?

Posted Oct 12, 2011 4:04 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

some forms of corruption will cause kernel messages to pop up, other forms of corruption will be silent and can gradually destroy other files the longer you use the filesystem.

What does the ck mean?

Posted Oct 12, 2011 2:42 UTC (Wed) by ncm (subscriber, #165) [Link]

This last line seems to me the key to the whole problem.

You can't (further) corrupt a file system if you don't
write to it. There's a great deal of checking that you can
only afford to do when some program isn't waiting for read()
to come back. On btrfs, if it's designed right, you should
be able to run a consistency checker in background on live
file systems.

It's no fun to be informed that your file system is corrupt
and, further, that it can't be fixed, but that's much better
than *not* being informed that your file system is corrupt,
when it is. The sooner you find out, the fewer backups will
also be corrupt. A tool that can only constrain the locus of
the corruption would still be helpful; only the faulty part
needs to be reloaded from backups.

A widely used checker would result in better bug reports for the
file system proper, as corruption is found early. How many bugs
are still waiting to be found just because nothing is looking?

The way forward, then, is to release a pure checker, first,
and then begin to release repair capabilities one at a time as
they become ready. If the repair tool generated a journal of
changes without writing them to the file system proper, then
you could run a full check on the sum of fs+journal, and only
commit the changes if the result is clearly better than before.
Ideally the repair machinery would actually be the same well
tested code that, in production, integrates more usual changes
into the file system.

What does the ck mean?

Posted Oct 13, 2011 23:42 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

Exactly what I was thinking. A slightly corrupt filesystem is likely to become a seriously corrupt filesystem and then a heap of completely useless bytes, if no-one gets to know about the corruption while it is too minor to be screamingly obvious.

Especially with a new filesystem like btrfs it would make sense to combine backup with fsck. Do backup, do btrfsck, if filesystem structure checks AOK allow it to continue being used, and recycle the oldest of your backups.

Whither btrfsck?

Posted Oct 12, 2011 1:45 UTC (Wed) by cmccabe (guest, #60281) [Link]

In a lot of ways, filesystem development turns the normal rules of software development on their head.

Normally, you want to push something out the door as early as possible so you can gain more users, more developers, and more features than the competition. This is usually phrased as "release early, release often." Releasing often is also generally thought to make you more responsive to what users want ("agile").

When you're developing a filesystem, you want the wider community to deploy your changes infrequently and only after a lot of testing. You only need to lose someone's data once to lose that user forever-- and probably a lot of his friends. It's much, much more important to have something rock solid than it is to have a lot of features. Filesystems have a fairly well-defined role to play, and user feedback is usually limited to letting you know when you've screwed up, rather than suggesting new features.

I hope btrfsck gets out the door soon! I'd like to see how distributions make use of the new btrfs subvolumes and other features.

Whither btrfsck?

Posted Oct 12, 2011 12:32 UTC (Wed) by iq-0 (subscriber, #36655) [Link]

No it doesn't. But apparently "putting the code out there" seems to be equivalent to "release a program" in this discussion.

Personally I think the thing is the same for filesystems as it is for any other major piece of software. Release early and release often. And don't simply say it's experimental, but tell people very hard that it will eat your data and that you should run it on a copy of the actual filesystem you want to restore. You also want to explicitly flag the release as alpha/experimental/eat-my-data and communicate that so distributions won't pick it up under the assumption that something is better than nothing.

But irrelevant of that: Putting the code out there doesn't make it a release!

Sure people will see it as one when you encouring them to try it (which would constitute an implicit release).
But when you explicitly post stuff (whether as a set of patch or a reference to some repository), add the eat-my-data warnings and explictly state what your intent is for posting it, than shame on any who get burned by it.

And if your intent is to protect people from themselves, than I'd say that purging your beta quality filesystem with a pre-alpha quality auto-repair tool would be the place to start ;-)

Whither btrfsck?

Posted Oct 12, 2011 21:02 UTC (Wed) by cmccabe (guest, #60281) [Link]

I think ideally btrfsck would have been developed alongside the filesystem and at the same level of stability, so that this dilemma wouldn't have come up. We don't live in an ideal world, though, so I understand where Chris is coming from. I guess releasing a read-only fsck might be potentially be a good intermediate step.

Checking only

Posted Oct 12, 2011 12:20 UTC (Wed) by epa (subscriber, #39769) [Link]

The primary purpose of fsck ('filesystem check') was to check the filesystem for consistency. Repairing it is an additional feature, inherently dangerous but sometimes a lifesaver when you don't have recent backups.

Why not release an fsck tool that checks only, without the repair functionality?

Coming up with dodgy heuristics to guess what a broken filesystem should look like is the sort of task ideally suited to community or bazaar development. It is then the user's choice to only use repair rules officially blessed by the btrfs maintainer (conservatively limiting the possible errors that can be repaired) or to throw caution to the winds and try anything.

Checking only

Posted Oct 12, 2011 21:01 UTC (Wed) by drag (subscriber, #31333) [Link]

I think that btrfs already has a fsck to check for consistency.

Also judging from the developer's comments it's not unknown bugs that bother him, but that there are data destroying bugs he knows about but hasn't fixed yet.

Although I agree that he should just get it out there for other interested parties to help him out.

Checking only

Posted Oct 21, 2011 3:56 UTC (Fri) by jd (guest, #26381) [Link]

Then clone him a few hundred times. What's the problem?

Checking only

Posted Oct 13, 2011 4:04 UTC (Thu) by Kissaki (subscriber, #61848) [Link]

I agree wholeheartedly. I would even prefer it if all fsck tools defaulted to read-only (or "check" only) with no command-line options, and required an explicit command-line option before making changes to the data on disk.

The argument against releasing the fsck tool seems to be that the tool they have may take the file-system that has detectable errors and add more detectable errors. I thought that went without saying. To go further, I don't see how you can produce a fsck tool that writes to disk and can promise otherwise.

fsck is for regular use, after N many mounts, after M many days, just to make sure that most of the pieces are there and they have generally the right color. Repairing filesystems is its hobby, not its job.

What kind of fsck?

Posted Oct 13, 2011 7:17 UTC (Thu) by cwillu (subscriber, #67268) [Link]

Part of the problem is that there's a whole variety of tools commonly referred to as "fsck". btrfs does have several of them publicly available, and has for some time:

* A readonly btrfsck which serves as a diagnostic tool, and has been distributed along with mkfs.btrfs and company for at least 2 years now.
* Online scrub in the kernel module to check all data and metadata against the checksums, which is also able to correct the bad data from redundant copies if available, and attempt to do the same from repeated reads if not. This has been around for about a year (9 months?).
* A extraction tool to recover files from an unmountable or otherwise damaged filesystem to other media, although it has a somewhat limited scope in what it can recover. This has been available in various forms for 6 months or so.

What we're waiting for is the tool that dives into the guts of a broken filesystem to make it mountable, without requiring independent storage nor deep understanding from its user. This is the tool that easily destroy what almost certainly would have been recoverable data (i.e. the offline extraction tool), and yet is also the tool that distros stick in the boot process on failure which circumspectly warns of the dangers and requires the user to take manual action, at which point the distros will helpfully dump the user at a shell prompt with a convenient "To proceed, run: fsck.btrfs -A --NO_DONT_THIS_WILL_EAT_YOUR_DATA" banner.

Incidentally, this confusion is why zfs gets away with the "no fsck required!" marketing: they have the full set of tools, they just don't call any of them "zfsck".

What kind of fsck?

Posted Oct 27, 2011 19:15 UTC (Thu) by oak (guest, #2786) [Link]

To proceed, type:
"Yes, I have taken a backup of this btrfs file system"
>
To proceed, type:
"Yes, I have verified that the backup data was saved correctly"
>
To proceed, type:
"Yes, I want to destroy my data so that this partition can be mounted"
>

Timbre of discussion.

Posted Oct 13, 2011 16:36 UTC (Thu) by kena (guest, #2735) [Link]

I can understand Chris's reservations in releasing the code. And, as an end user, I can also understand the frustration in not having -- and continuing to not have -- a working fsck. But I have to say that some pot shots were taken at Chris that I felt were entirely inappropriate. So, as a user who, himself, is frustrated at not having a full-fledged fsck (indeed, that's the only thing keeping me from recommending btrfs as the FS for a new in-house product), I have to say that ascribing ulterior motives to Chris for its delay is unfair, uncool, and unprofessional.

So, while I'm glad to see this topic highlighted by Jon, I do hope that the original accuser would try to set aside his frustration, and, instead, continue to work with the community in testing of code, generating bug reports where appropriate, and other things that we, as users, can contribute.

$.02

Whither btrfsck?

Posted Oct 20, 2011 11:42 UTC (Thu) by callegar (guest, #16148) [Link]

Also UDF has no available fsck on linux.

Which I believe is the reason why everybody sticks with vfat even in those cases where udf would be an applicable unencumbered alternative.

Whither btrfsck?

Posted Oct 24, 2011 11:11 UTC (Mon) by nye (guest, #51576) [Link]

This article badly mischaracterizes the nature of the discussion.

The complaints made about btrfsck have been far less focussed on its absence as on the continued assurances that it will be available imminently for nearly a year.

When somebody keeps claiming that something is so close to ready that it will be available in a couple of weeks, or a "couple of days
(hopefully)", or even " I still might
post it tonight", and they keep talking like that for 11 months with no external indicator of progress, it's not unreasonable to start wondering if that person is perhaps not entirely on the level.

Even if they have the best of intentions (as is surely the case), it becomes very hard to trust future similar claims from that person.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds