Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
the only time they could see a blank or corrupted file is if the system crashes.
so the atomicity is there now
it's the durability that isn't there unless you do f(data)sync on the file and directory (and have hardware that doesn't do unsafe caching, which most drives do by default)
Atomicity vs durability
Posted Mar 15, 2009 12:48 UTC (Sun) by man_ls (subscriber, #15091)
"Atomic without a crash" is not good enough; "atomic" means that a transaction is committed or not, no matter at what point we are -- even after a crash.
Even if the POSIX standard does not speak about system crashes it is good engineering to take them into account IMHO.
Posted Mar 15, 2009 13:06 UTC (Sun) by bojan (subscriber, #14302)
Which is not what POSIX requires.
Posted Mar 15, 2009 13:13 UTC (Sun) by man_ls (subscriber, #15091)
Posted Mar 15, 2009 13:19 UTC (Sun) by bojan (subscriber, #14302)
I think we should come up with a new API that guarantees what people really want. Making the existing API do that on a particular FS is just going to make applications non-portable to any FS that doesn't work that way using existing POSIX API. We've seen this with XFS. Who knows what's lurking out there. Better do the proper thing, fsync and be done with it. Then we can invent the new, better, smarter API.
Atomicity vs durability vs reliability
Posted Mar 15, 2009 13:35 UTC (Sun) by man_ls (subscriber, #15091)
I think we should come up with a new API that guarantees what people really want.
We've seen this with XFS.
Actually it's done deal...
Posted Mar 15, 2009 17:34 UTC (Sun) by khim (subscriber, #9252)
If you read the comments on tytso's blog you'll see that current
position is: "POSIX is right while applications are broken yet we'll save
them anyway". Even if "proper way" is fix thousands of applications its
just not realistic - so ext4 (starting from 2.6.30) will try to save these
broken applications by default. And if you want performance - there are a
switch. Good enough for me. Can we close the discussion?
Posted Mar 15, 2009 21:10 UTC (Sun) by bojan (subscriber, #14302)
Posted Mar 15, 2009 21:20 UTC (Sun) by man_ls (subscriber, #15091)
Posted Mar 15, 2009 21:06 UTC (Sun) by bojan (subscriber, #14302)
POSIX manual is not little ;-)
Seriously, we tell Microsoft that going out of spec is bad, bad, bad. But, we can go out of spec no problem. There is a word for that:
> What we have seen with XFS is how some anal-retentive developers lost most of their user base while trying to argue such points as "POSIX-compliance", and then they finally give in.
Yep, blame the people that _didn't_ cause the problem. We've seen that before.
Sorry, but I don't see it this way...
Posted Mar 15, 2009 22:08 UTC (Sun) by khim (subscriber, #9252)
I'm yet to see anyone who asks Microsoft to never go beyond the spec.
It'll be just insane: if you can not ever add anything beyond what
the spec says how any progress can occur?
When Microsoft is blamed it's because Microsoft
1. Does not implement spec correctly, or
2. Don't say what's the spec requirements and what's extensions.
When Microsoft says "JNI is not sexy so we'll provide RMI instead" the
ire is NOT about problems with RMI. Lack of JNI is to blame.
I don't see anything of the sort here: POSIX does not require to make
open/write/close/rename atomic but it certainly does not forbid this. And
it's useful thing to have so why not? It'll be best to actually document
this behaviour, of course - after that applications can safely rely on it
and other systems can implement it as well if they wish. We even have nice
flag to disable this extensions if someone wants this :-)
Posted Mar 15, 2009 22:24 UTC (Sun) by bojan (subscriber, #14302)
Which is exactly what our applications are doing. POSIX says, commit. We don't and then we blame others for it.
This is the same thing HTML5 is doing
Posted Mar 15, 2009 22:33 UTC (Sun) by khim (subscriber, #9252)
Sorry, but it's not the problem with POSIX or FS - it's problem with
number of applications. Once a lot of applications are starting to depend
on some weird feature (content sniffing in case of HTML, atomicity of
open/write/close/rename on case of filesystem) it makes no sense to try to
fix them all. Much better to document it and make it official. This is what
Microsoft did with a lot of "internal" functions in MS-DOS 5 (and it was
praised for it, not ostracized), this is what HTML is doing in HTML5 and
this is what Linux filesystems should do.
Was it good idea to depend on said atomicity? May be, may be not. But
the time to fix these problems come and gone - today it's much better to
extend the spec.
Posted Mar 15, 2009 23:37 UTC (Sun) by bojan (subscriber, #14302)
Time to fix these problems using the existing API is now, because right now we have the attention of everyone on how to use the API properly. To the credit of some in this discussion, bugs are already being fixed in Gnome (as I already mentioned in another comment). I also have bugs to fix in my own code - there is no denying that :-(
In general, I agree with you on extending the spec. But, before the spec gets extended officially, we need to make sure that _every_ POSIX compliant file system implements it that way. Otherwise, apps depending on this new spec will not be reliable until that's the case. So, can we actually make sure that's the case? I very much doubt it. There is a lot of different systems out there that are implementing POSIX, some of them very old. Auditing all of them and then fixing them may be harder than fixing the applications.
Why do we need such blessing?
Posted Mar 16, 2009 0:05 UTC (Mon) by khim (subscriber, #9252)
Linux extends POSIX all the time. New syscalls, new features (things
like "According to the standard specification (e.g., POSIX.1-2001),
sync() schedules the writes, but may return before the actual writing is
done. However, since version 1.3.20 Linux does actually wait."), etc.
If application wants to use such "extended feature" - it can do this, if
not - it can use POSIX-approved features only.
As for old POSIX systems... it's up to application writers again. And
you can be pretty sure A LOT OF them don't give a damn about POSIX
compliance. They are starting to consider Linux as third platfrom for their
products (first two are obviously Windows and MacOS in that order), but if
you'll try to talk to them about POSIX it'll just lead to the removal of
Linux from list of supported platforms. Support of many distributions is
already hard enough, support of some exotic filesystems "we'll think about
it but don't hold your breath...", support for old exotic POSIX systems...
Now - the interesting question is: do we welcome such selfish developers
or not? This is hard question because the answer "no, they should play by
our rules" will just lead to exodus of users - because they need these
applications and WINE is not a good long-term solution...
Posted Mar 15, 2009 22:05 UTC (Sun) by dcoutts (guest, #5387)
As for a new API, yes, that'd be great. There are doubtless other situations where it would be useful to be able to constrain write re-ordering. For example for writes within a single file if we're implementing a persistent tree structure where the ordering is important to provide atomicity in the face of system failure.
Having a nice new API does not mean that the obvious cases that app writers have been using for ages are wrong. We should just insert the obvious write barriers in those cases.
Posted Mar 16, 2009 4:52 UTC (Mon) by dlang (✭ supporter ✭, #313)
so everything that you are screaming that the OS should guarantee can be broken by the hardware after the OS has done it's best.
you can buy/configure your hardware to not behave this way, but it costs a bunch (either in money or in performance). similarly you can configure your filesystem to give you added protection, at a significant added cost in performance.
Posted Mar 16, 2009 11:00 UTC (Mon) by forthy (guest, #1525)
Any reasonable hard disk (SATA, SCSI) has write barriers which allow
file system implementers to actually implement atomicy.
Posted Mar 15, 2009 23:51 UTC (Sun) by vonbrand (subscriber, #4458)
I just don't understand all this "extN isn't crash-proof" whining...
Yes, Linux systems do crash on occasion. It is thankfully very rare.
Yes, hardware does fail. Even disks do fail. Yes, if you are unlucky you will lose data. Yes, the system could fail horribly and scribble all over the disk. Yes, the operating system could mess up its internal (and external) data structures.
It is just completely impossible for the operating system to "do the right thing with respect to whatever data the user values more", even more so in the face of random failures. Want performance? Then you have to do tricks caching/buffering data, disks are horribly _s_l_o_w_ when compared to your processor or memory.
Asking Linux developers to create some Linux-only beast of a filesystem in order to make application developer's life easier doesn't cut it, there are other operating systems (and Linux systems with other filesystems) around, and always will be. Plus asking for a filesystem that is impossible in principle won't get you too far either.
Posted Mar 16, 2009 0:08 UTC (Mon) by man_ls (subscriber, #15091)
Listen, the sky might fall on our heads tomorrow and eventually we are all to die, we understand that. But until then we really want our filesystems to do atomic renames in the face of a crash (i.e. what the rest of the world [except POSIX] understands as "atomic"). Not durable, not crash-proof, not magically indestructible -- just all-or-nothing. Atomic.
Posted Mar 16, 2009 0:26 UTC (Mon) by khim (subscriber, #9252)
Yes, Linux systems do crash on occasion. It is thankfully very
Depends of what hardware and what kind of drivers you have.
Want performance? Then you have to do tricks caching/buffering
data, disks are horribly _s_l_o_w_ when compared to your processor or
The problem is: fast filesystem is useless if it can't keep my data
safe. Microsoft knows this - that's why you don't need to explicitly
unmount flash drive there. Yes, cost is huge, it means flash wears down
faster and speed is horrible - but anything else is unacceptable. Oh, and I
know A LOT OF users who just turn off computer at the end of day. This
problem is somewhat mitigated by design of current systems ("power off"
button is actually "shutdown" button), but people are finding ways to cope:
they just switch power to the desk.
The same thing applies to developers. They are lazy. Most application
writers do not use fsync and do not check the error code from
close. Yet if data is lost - OS will be blamed. Is it fair to OS and FS
developers? Not at all! Can it be changed? Nope. Life is unfair - deal with
The whining started when it was found it that new filesystem can lose
valuable data - where ext3 never does it in this fashion (it can do
this with O_TRUNC, but not with rename). This is pretty serious regression
to most people. The approach "let's fix thousads upon thousands
applications" (including proprietary ones) was thankfully rejected. This is
good sign: this means Linux is almost ready to be usable by normal people.
Last time such problem happened (OSS->ALSA switch) offered solution was
beyond the pale.
Posted Apr 8, 2009 15:30 UTC (Wed) by pgoetz (subscriber, #4931)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds