User: Password:
Subscribe / Log in / New account

That massive filesystem thread

That massive filesystem thread

Posted Apr 1, 2009 10:24 UTC (Wed) by bojan (subscriber, #14302)
In reply to: That massive filesystem thread by nix
Parent article: That massive filesystem thread

> They're not really intended for use by everyman, anyway.

You are kidding, right? dup3() is not for general use?

> That's a big difference.

Look, I'm not really bent on a particular mechanism of actually making sure that programmers have a reliable interface for doing this. Using fsync() before close() is the only portable solution now, but it is far from optimal. I think there is very little doubt about that. And we all know it sucks to high heaven on ext3 in ordered mode.

I don't know what the best way is: new call, some kind of flag to open that says O_ALWAYSDATABEFOREMETADATA, rename2(), close_with_magic() or whatever. But, saying that application programmers cannot grok this kind of stuff is just not true. They can and they will, only if given the tools. Just like they did dup3() and friends (and as you point out, there is little danger of misuse - these are new calls).

As I said many times before, overloading current pattern with non-portable behaviour is dangerous, because it provides false sense of robustness and ties one up to a particular FS and kernel. If we can get POSIX updated so that rename() actually means "always data before metadata, but don't put on disk now", then it may even fly. But, I don't know how that's going to make guarantees retroactively, when even Linux features file systems that don't do that (e.g. ext3 in writeback mode).

Also, having things like delayed allocation, where metadata can legitimately be committed before data, is really useful. Most short lived temporary files will never see disk platters, therefore making things faster and disks last longer. Meaning, keeping the old cruft around ain't that bad.

As for utility programs that are called from scripts, you can use dd with conv=fsync or conv=fdatasync in your pipe to commit files to disk today. On FreeBSD, they already have standalone fsync program for that. Yeah, I know. It sucks. But, your usual tools don't have to make any decisions on fsync()-ing - you can.

(Log in to post comments)

That massive filesystem thread

Posted Apr 1, 2009 18:09 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

By your logic, we should never fix bugs. Remember the 25 year old readdir bug? Don't you agree it was good to fix that? What if a program, somewhere, depended on that behavior? In reality, programs use rename for atomic replacement. POSIX doesn't say anything about guarantees after a hard system crash, and it's just disingenuous to think that by punishing application authors by giving them as little robustness as possible, you're doing them some kind of portability favor.

That massive filesystem thread

Posted Apr 1, 2009 20:55 UTC (Wed) by bojan (subscriber, #14302) [Link]

I will just answer this one comment, so that nobody gets "offended".

Quite the opposite. I'm all for fixing bugs and giving application programmers the _right_ tools for the job. If some Linux developers took a second to lift their noses out of the specifics of Linux and actually looked around, this could be fixed for _everyone_, not just for some Linux specific file systems. That is my point, in case you didn't get it by now.

That massive filesystem thread

Posted Apr 1, 2009 21:37 UTC (Wed) by man_ls (guest, #15091) [Link]

It is a worthless effort. Each filesystem must keep its house clean. Why invent a new system call which cannot (by necessity) be honored by ext2, or ext4 without a journal? Everything is working now fine in ext3, and if it doesn't work right in ext4 people will just look for a different filesystem.

After reading that Linus is not pulling from Mr Tso's trees made me suspect. Well, now that Ts'o's commit rights have been officially revoked I think that the whole discussion is moot. I wonder if the next ext4 head maintainer will learn from this painful experience and just do the right thing.

ext4 trees

Posted Apr 1, 2009 21:46 UTC (Wed) by corbet (editor, #1) [Link]

I'm confused. The article said that Ted's trees had not been pulled yet. In fact, that happened today; a bunch of ext4 work went into the mainline, including a number of patches which increase robustness for applications which don't use fsync(). I dunno what you were trying to link to, but it didn't work. I've not seen anything about revocation of commit rights. (It's hard to "revoke commit rights" in a distributed system in any case; at worst you can refuse to pull from somebody else's repository.)

Maybe it's an April 1 post that went over my head?

Recursive linking

Posted Apr 2, 2009 6:21 UTC (Thu) by man_ls (guest, #15091) [Link]

Sorry, it was a stupid attempt from a foreigner at an April Fools' prank :D I was hoping that the recursive link would give it away, but maybe it was too plausible altogether.

Will try to do better next time :D)

That massive filesystem thread

Posted Apr 1, 2009 22:38 UTC (Wed) by bojan (subscriber, #14302) [Link]

Just a few points, so please don't get offended. I apologise in advance to all sensitive LWN readers for any injury caused by this post.

> Why invent a new system call which cannot (by necessity) be honored by ext2, or ext4 without a journal?

Even if there was some kind of magical law that said that you could not order commits on the non-journaled file system this way, it can always be trivially implemented through - wait for it - fsync(), which has acceptable performance characteristics on such file systems.

> Everything is working now fine in ext3

Sure. Except fsync(), which locks the whole system for a few seconds. Hopefully, this will get fixed (or at least its effect reduced) as a result of the hoopla.

> Well, now that Ts'o's commit rights have been officially revoked I think that the whole discussion is moot.

Now you are really making a fool of yourself.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds