Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
This thing alone has actually made me decided I wouldn't be moving to ext4 any time soon.
This is a regression
Posted Mar 14, 2009 1:53 UTC (Sat) by tialaramex (subscriber, #21167)
Zero length files were probably not possible (or at least so rare that you'd never see it) in ext3 for the rename case if you have data=ordered. The patch makes them similarly rare in ext4.
Neither happens if you run normally, or even if you soft hang, losing interactivity but allowing the kernel to flush to disk. Neither happens if your laptop doesn't wake up from sleep so long as the sleep code properly calls sync(). Neither happens if your changes were at least 5 seconds old (ext3 data=ordered) or 60 seconds old (other cases) The people getting bitten either lost power suddenly while working, or hit the reset button.
I agree that zero length files are undesirable, and shouldn't be common even if you pull the plug. Evidently Ted does too, since the patches are enabled by default. Still, it remains the case that applications which must have data integrity need to be more careful than this, because otherwise things can (even in ext3 with data=ordered) go badly wrong for you.
I believe that nodelalloc is just as much overkill as fully preserving atime is. Sure, in theory it might be slightly safer to disable the delayed allocator, but in practice it doesn't make enough difference to worry about, and the performance gain is very attractive. Sooner or later if you use computers you will lose some data, that's why we have backups.
Posted Mar 14, 2009 2:26 UTC (Sat) by bojan (subscriber, #14302)
Thanks for pointing this out. Essentially, relying on this behaviour was an accident, waiting to bite. Unfortunately, due to broken semantics of fsync on ext3, having a correct application would break the performance of the system. Looks to me that ext3 is far more broken than ext4 (which doesn't seem broken at all to me).
Posted Mar 14, 2009 12:30 UTC (Sat) by ikm (subscriber, #493)
Posted Mar 14, 2009 21:37 UTC (Sat) by bojan (subscriber, #14302)
Which just proves that most users are irrational, because they don't know any better. So, people that _know_ what really is the problem should listen to people that don't in order to fix it?
Posted Mar 14, 2009 22:50 UTC (Sat) by ikm (subscriber, #493)
Users don't care which solution is the right one as long as it *works*. And the solution went to 2.6.30 indeed. Distributors would hopefully backport. Problem solved. Horray. But all the blabbering about how POSIX allows this and stuff is unhelpful to end-user, if surely interesting and inspiring to developers.
Posted Mar 15, 2009 2:02 UTC (Sun) by bojan (subscriber, #14302)
And that is exactly why Ted, being a practical person, reverted to the old behaviour in some situations. Doesn't mean application writers should continue using incorrect idioms.
> It's totally unrealistic and not doable in any short- or even mid-term. Why suggest this then? And who is irrational after all?
Sorry, fixing bugs is irrational?
> But all the blabbering about how POSIX allows this and stuff is unhelpful to end-user, if surely interesting and inspiring to developers.
POSIX isn't blabbering (see http://www.kernel.org/):
> Linux is a clone of the operating system Unix, written from scratch by Linus Torvalds with assistance from a loosely-knit team of hackers across the Net. It aims towards POSIX and Single UNIX Specification compliance.
Posted Mar 15, 2009 3:12 UTC (Sun) by bojan (subscriber, #14302)
1. By default make ext3 ordered mode have fsync as a no-op. People that want current broken behaviour could specify a mount option to get it.
2. Tell folks that they _must_ use fsync in order to commit their data.
3. Once critical mass of applications achieved the above, remove all hacks from ext4, XFS etc.
4. Retire ext3.
Posted Mar 15, 2009 4:53 UTC (Sun) by foom (subscriber, #14868)
Hopefully it can be the default fs for Ubuntu Jaded Jackal. If anyone complains, I'm sure "But POSIX
says it's okay to do that, the apps are broken for not obsessively calling sync after every write!" will
satisfy everyone. :)
Posted Mar 15, 2009 5:26 UTC (Sun) by bojan (subscriber, #14302)
All the people here suggesting that well established standards Linux _aims_ to implement should be ignored, should remember the screaming Microsoft had to face from the FOSS community when they started twisting various standards to their own ends.
Posted Mar 15, 2009 12:34 UTC (Sun) by nix (subscriber, #2304)
Posted Mar 15, 2009 21:09 UTC (Sun) by bojan (subscriber, #14302)
Of course, Ted put hacks into ext4 because application writers missed the above and it will take time to fix it. That's called a workaround.
Posted Mar 15, 2009 23:50 UTC (Sun) by nix (subscriber, #2304)
Agreed the apps are buggy, but I think this is a deficiency in POSIX,
rather than anything else.
Posted Mar 16, 2009 0:17 UTC (Mon) by bojan (subscriber, #14302)
And that's going to help the broken application running on another filesystem exactly how? The problem with hypocrisy here is not related to ext4 - it related to application code.
BTW, it is obvious that Ted already decided to make sure ext4 does that. The man is not stupid - he doesn't want the file system rejected over this - no matter how wrong the people blaming ext4 for this are.
> Agreed the apps are buggy, but I think this is a deficiency in POSIX, rather than anything else.
Well, yeah - the spec is, shall we say - demanding. But, it is what it is. We tell Microsoft not to ignore the specs. What makes us so special that we can? I would suggest nothing. If we take the right to demand that from Microsoft, we should make sure we do it ourselves.
Posted Mar 16, 2009 1:07 UTC (Mon) by nix (subscriber, #2304)
There's no point talking to you at all, IMNSHO.
Posted Mar 16, 2009 2:19 UTC (Mon) by bojan (subscriber, #14302)
If you don't want to talk to me, then don't. That's OK.
Posted Mar 16, 2009 13:45 UTC (Mon) by ikm (subscriber, #493)
> And that's going to help the broken application running on another filesystem exactly how?
It's not. We are talking about fixing problems users start to experience when they switch from ext3 to ext4. None of the other goals, such as fixing all the apps, making all filesystems happy, feeding the hungry and making world a better place are being pursued here. The 2.6.30 fixes do what they are supposed to do, without breaking anything else. So it is a good thing, and I don't understand why you seem to be against it.
Sure, there's lots of stuff which ain't working right, but it's not a subject here. World's not perfect, and it's not going to be any time soon.
Posted Mar 15, 2009 12:57 UTC (Sun) by ikm (subscriber, #493)
Gosh. What people suggest here is that standards should not be used as an excuse for unwanted filesystem behavior.
Posted Mar 16, 2009 0:21 UTC (Mon) by bojan (subscriber, #14302)
(I wonder if we can allow it to write dirty data to disk when under memory
pressure, as well? ;) )
Posted Mar 15, 2009 15:18 UTC (Sun) by dcoutts (guest, #5387)
Posted Mar 15, 2009 12:20 UTC (Sun) by alexl (subscriber, #19068)
Are you crazy? That would break ACID guarantees for all databases, etc.
fsync() is about much more than data-before-metadata.
Posted Mar 15, 2009 21:28 UTC (Sun) by bojan (subscriber, #14302)
Close to it ;-)
I admit, that was a bit tongue-in-cheek, to point out that current ext3 "lock up on fsync" behaviour is total nonsense.
Posted Mar 16, 2009 14:09 UTC (Mon) by ikm (subscriber, #493)
Once I had MySQL running on an XFS filesystem, and the system has hanged for some reason. The database got broken so horribly I had to restore it from backups. I wouldn't really count on any 'ACID guarantees' here :) An UPS and a ventilated dust-free environment is our only ACID guarantee :)
Posted Mar 17, 2009 5:41 UTC (Tue) by efexis (guest, #26355)
Posted Mar 17, 2009 11:59 UTC (Tue) by ikm (subscriber, #493)
Posted Mar 14, 2009 13:09 UTC (Sat) by nix (subscriber, #2304)
In any case anyone writing for Unix/Linux should know about and use at
least the rename trick for replacing small files. Not doing so causes much
worse problems than this one.
One person knew this trick (the only person there other than me who
reads standards documents for fun), and not even he had spotted the old
oops-better-retry-my-writes-on-EINTR trap. Most people assumed that 'oh,
the O_TRUNC and the writes will all get put off until I close the file,
won't they?' and hadn't even thought it through that much until I pressed
them on it.
J. Random Programmer is much, much less competent than you seem to think.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds