LWN.net Logo

Atomicity vs durability

Atomicity vs durability

Posted Mar 15, 2009 13:13 UTC (Sun) by man_ls (subscriber, #15091)
In reply to: Atomicity vs durability by bojan
Parent article: Ts'o: Delayed allocation and the zero-length file problem

That is right, that is why we are not using ext2 (which is POSIX-compliant), FreeBSD (which is POSIX-compliant) or even Windows Vista (which can be made to be POSIX-compliant). We are running a journalling file system in the (apparently silly) hope that the system will hold our data and then give it back.


(Log in to post comments)

Atomicity vs durability

Posted Mar 15, 2009 13:19 UTC (Sun) by bojan (subscriber, #14302) [Link]

Look, I'm all for reliability. But, if the manual says: "fsync if you want your data on disk" and we don't fsync, then it is us that are creating the problem.

I think we should come up with a new API that guarantees what people really want. Making the existing API do that on a particular FS is just going to make applications non-portable to any FS that doesn't work that way using existing POSIX API. We've seen this with XFS. Who knows what's lurking out there. Better do the proper thing, fsync and be done with it. Then we can invent the new, better, smarter API.

Atomicity vs durability vs reliability

Posted Mar 15, 2009 13:35 UTC (Sun) by man_ls (subscriber, #15091) [Link]

No, you are not all for reliability if you cannot see beyond your little POSIX manual. Or if you don't care about system crashes because the manual is silent about this particular point. Sorry to break it to you: reliability is such little details such as having predictable response to a crash, or surviving the crash while retaining all the nice properties.
I think we should come up with a new API that guarantees what people really want.
APIs are good enough as they are -- we don't need a special "reliability API" so we can build a special "reliability manual" for guys who just follow the book.
We've seen this with XFS.
Nope. What we have seen with XFS is how some anal-retentive developers lost most of their user base while trying to argue such points as "POSIX-compliance", and then they finally give in. With ex4 we are hoping to get to the point where the devs give in before they lose most of their user base. Just because ext4 is important for Linux and for our world domination agenda. Meanwhile you can keep waving the POSIX standard in our face. The POSIX standard seems to be about compatibility, not about reliability, and it should keep playing that role. Reliability is left as an exercise for the attentive reader. Let us hope that Mr Ts'o is attentive and can tell atomicity, reliability and durability apart.

Actually it's done deal...

Posted Mar 15, 2009 17:34 UTC (Sun) by khim (subscriber, #9252) [Link]

If you read the comments on tytso's blog you'll see that current position is: "POSIX is right while applications are broken yet we'll save them anyway". Even if "proper way" is fix thousands of applications its just not realistic - so ext4 (starting from 2.6.30) will try to save these broken applications by default. And if you want performance - there are a switch. Good enough for me. Can we close the discussion?

Actually it's done deal...

Posted Mar 15, 2009 21:10 UTC (Sun) by bojan (subscriber, #14302) [Link]

Exactly. Ted is a practical man, so he already put a workaround in place, until applications are fixed.

Sorry

Posted Mar 15, 2009 21:20 UTC (Sun) by man_ls (subscriber, #15091) [Link]

Sure, I have polluted the interwebs enough with my ignorance, and there is little chance to learn anything else.

Atomicity vs durability vs reliability

Posted Mar 15, 2009 21:06 UTC (Sun) by bojan (subscriber, #14302) [Link]

> No, you are not all for reliability if you cannot see beyond your little POSIX manual.

POSIX manual is not little ;-)

Seriously, we tell Microsoft that going out of spec is bad, bad, bad. But, we can go out of spec no problem. There is a word for that:

http://en.wikipedia.org/wiki/Hypocrisy

> What we have seen with XFS is how some anal-retentive developers lost most of their user base while trying to argue such points as "POSIX-compliance", and then they finally give in.

Yep, blame the people that _didn't_ cause the problem. We've seen that before.

Sorry, but I don't see it this way...

Posted Mar 15, 2009 22:08 UTC (Sun) by khim (subscriber, #9252) [Link]

I'm yet to see anyone who asks Microsoft to never go beyond the spec. It'll be just insane: if you can not ever add anything beyond what the spec says how any progress can occur?

When Microsoft is blamed it's because Microsoft
1. Does not implement spec correctly, or
2. Don't say what's the spec requirements and what's extensions.

When Microsoft says "JNI is not sexy so we'll provide RMI instead" the ire is NOT about problems with RMI. Lack of JNI is to blame.

I don't see anything of the sort here: POSIX does not require to make open/write/close/rename atomic but it certainly does not forbid this. And it's useful thing to have so why not? It'll be best to actually document this behaviour, of course - after that applications can safely rely on it and other systems can implement it as well if they wish. We even have nice flag to disable this extensions if someone wants this :-)

Sorry, but I don't see it this way...

Posted Mar 15, 2009 22:24 UTC (Sun) by bojan (subscriber, #14302) [Link]

> 1. Does not implement spec correctly

Which is exactly what our applications are doing. POSIX says, commit. We don't and then we blame others for it.

This is the same thing HTML5 is doing

Posted Mar 15, 2009 22:33 UTC (Sun) by khim (subscriber, #9252) [Link]

Sorry, but it's not the problem with POSIX or FS - it's problem with number of applications. Once a lot of applications are starting to depend on some weird feature (content sniffing in case of HTML, atomicity of open/write/close/rename on case of filesystem) it makes no sense to try to fix them all. Much better to document it and make it official. This is what Microsoft did with a lot of "internal" functions in MS-DOS 5 (and it was praised for it, not ostracized), this is what HTML is doing in HTML5 and this is what Linux filesystems should do.

Was it good idea to depend on said atomicity? May be, may be not. But the time to fix these problems come and gone - today it's much better to extend the spec.

This is the same thing HTML5 is doing

Posted Mar 15, 2009 23:37 UTC (Sun) by bojan (subscriber, #14302) [Link]

> But the time to fix these problems come and gone - today it's much better to extend the spec.

Time to fix these problems using the existing API is now, because right now we have the attention of everyone on how to use the API properly. To the credit of some in this discussion, bugs are already being fixed in Gnome (as I already mentioned in another comment). I also have bugs to fix in my own code - there is no denying that :-(

In general, I agree with you on extending the spec. But, before the spec gets extended officially, we need to make sure that _every_ POSIX compliant file system implements it that way. Otherwise, apps depending on this new spec will not be reliable until that's the case. So, can we actually make sure that's the case? I very much doubt it. There is a lot of different systems out there that are implementing POSIX, some of them very old. Auditing all of them and then fixing them may be harder than fixing the applications.

Why do we need such blessing?

Posted Mar 16, 2009 0:05 UTC (Mon) by khim (subscriber, #9252) [Link]

Linux extends POSIX all the time. New syscalls, new features (things like "According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait."), etc. If application wants to use such "extended feature" - it can do this, if not - it can use POSIX-approved features only.

As for old POSIX systems... it's up to application writers again. And you can be pretty sure A LOT OF them don't give a damn about POSIX compliance. They are starting to consider Linux as third platfrom for their products (first two are obviously Windows and MacOS in that order), but if you'll try to talk to them about POSIX it'll just lead to the removal of Linux from list of supported platforms. Support of many distributions is already hard enough, support of some exotic filesystems "we'll think about it but don't hold your breath...", support for old exotic POSIX systems... fuggetaboudit!

Now - the interesting question is: do we welcome such selfish developers or not? This is hard question because the answer "no, they should play by our rules" will just lead to exodus of users - because they need these applications and WINE is not a good long-term solution...

Atomicity vs durability

Posted Mar 15, 2009 22:05 UTC (Sun) by dcoutts (guest, #5387) [Link]

Remember, we do not care if the data is on disk or not, just that if it does make it to disk that it preserves the atomic property we were after. All that needs to happen is for the rename not to be reordered in front of the write. That hardly restricts performance.

As for a new API, yes, that'd be great. There are doubtless other situations where it would be useful to be able to constrain write re-ordering. For example for writes within a single file if we're implementing a persistent tree structure where the ordering is important to provide atomicity in the face of system failure.

Having a nice new API does not mean that the obvious cases that app writers have been using for ages are wrong. We should just insert the obvious write barriers in those cases.

Atomicity vs durability

Posted Mar 16, 2009 4:52 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

remember that the drive has it's own buffer (that usually isn't battery backed), and it will tell the OS that the data is written when it's in the buffer, not when it is on the disk. it then can re-order the writes to the disk.

so everything that you are screaming that the OS should guarantee can be broken by the hardware after the OS has done it's best.

you can buy/configure your hardware to not behave this way, but it costs a bunch (either in money or in performance). similarly you can configure your filesystem to give you added protection, at a significant added cost in performance.

Atomicity vs durability

Posted Mar 16, 2009 11:00 UTC (Mon) by forthy (guest, #1525) [Link]

Any reasonable hard disk (SATA, SCSI) has write barriers which allow file system implementers to actually implement atomicy.

Atomicity vs durability

Posted Mar 15, 2009 23:51 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

I just don't understand all this "extN isn't crash-proof" whining... Yes, Linux systems do crash on occasion. It is thankfully very rare. Yes, hardware does fail. Even disks do fail. Yes, if you are unlucky you will lose data. Yes, the system could fail horribly and scribble all over the disk. Yes, the operating system could mess up its internal (and external) data structures.

It is just completely impossible for the operating system to "do the right thing with respect to whatever data the user values more", even more so in the face of random failures. Want performance? Then you have to do tricks caching/buffering data, disks are horribly _s_l_o_w_ when compared to your processor or memory.

Asking Linux developers to create some Linux-only beast of a filesystem in order to make application developer's life easier doesn't cut it, there are other operating systems (and Linux systems with other filesystems) around, and always will be. Plus asking for a filesystem that is impossible in principle won't get you too far either.

Atomicity vs durability

Posted Mar 16, 2009 0:08 UTC (Mon) by man_ls (subscriber, #15091) [Link]

Yes, isn't it silly to ask for the moon like this? Apart from the fact that ext3 does exactly what we are asking for; and XFS since 2007; and now ext4 with the new patches. Oh wait... maybe you really didn't understand what we were asking for.

Listen, the sky might fall on our heads tomorrow and eventually we are all to die, we understand that. But until then we really want our filesystems to do atomic renames in the face of a crash (i.e. what the rest of the world [except POSIX] understands as "atomic"). Not durable, not crash-proof, not magically indestructible -- just all-or-nothing. Atomic.

YMMV

Posted Mar 16, 2009 0:26 UTC (Mon) by khim (subscriber, #9252) [Link]

Yes, Linux systems do crash on occasion. It is thankfully very rare.

Depends of what hardware and what kind of drivers you have.

Want performance? Then you have to do tricks caching/buffering data, disks are horribly _s_l_o_w_ when compared to your processor or memory.

The problem is: fast filesystem is useless if it can't keep my data safe. Microsoft knows this - that's why you don't need to explicitly unmount flash drive there. Yes, cost is huge, it means flash wears down faster and speed is horrible - but anything else is unacceptable. Oh, and I know A LOT OF users who just turn off computer at the end of day. This problem is somewhat mitigated by design of current systems ("power off" button is actually "shutdown" button), but people are finding ways to cope: they just switch power to the desk.

The same thing applies to developers. They are lazy. Most application writers do not use fsync and do not check the error code from close. Yet if data is lost - OS will be blamed. Is it fair to OS and FS developers? Not at all! Can it be changed? Nope. Life is unfair - deal with it.

The whining started when it was found it that new filesystem can lose valuable data - where ext3 never does it in this fashion (it can do this with O_TRUNC, but not with rename). This is pretty serious regression to most people. The approach "let's fix thousads upon thousands applications" (including proprietary ones) was thankfully rejected. This is good sign: this means Linux is almost ready to be usable by normal people. Last time such problem happened (OSS->ALSA switch) offered solution was beyond the pale.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds