|
|
Log in / Subscribe / Register

Ted speaks again

Ted speaks again

Posted Mar 16, 2009 12:29 UTC (Mon) by forthy (guest, #1525)
In reply to: Ted speaks again by regala
Parent article: Garrett: ext4, application expectations and power management

Maybe some people didn't read the POSIX standard. But this doesn't actually matter, because Ted Ts'o didn't read it either. He's just ducking behind it, because POSIX makes no promise in case of a system crash. That's anal-retentive, because POSIX makes promises about ordering (ordering is strong), and a supposed-to-be-reliable file system should keep that order even after a crash. If ext4 doesn't (ext3 in data=ordered mode does, btrfs should do according to the FAQ, and will actually do - bugs happen - in 2.6.30, too, etc.), then the users will just not use ext4.

In the long thread of the previous discussion about this topic, the semantics behind the different operations were clearly described. POSIX promises atomicy of operations like rename(). Should this atomicy be preserved in case of a crash? Sane file system design says: yes. People who use create-write-close-rename want atomicy, if they also want durability (i.e. know that the new file is actually committed), they need fsync, too. To be precise: fsync on the file and on the directory. Atomic operations are part of the POSIX file system semantics, durability is part of fsync's semantics.

When do you need durability? E.g. in a networked ordering system - if you receive an order over network, you update your books, make the first half of the booking durable, confirm the order, and when you know that the confirmation is out, then you finalize your booking and make that durable, as well (double handshake). You don't need durability if you just update your configuration settings, but you need atomicy to avoid loss of all configuration settings.

Note that POSIX still does not guarantee anything in case of a crash - complete loss of data and metadata is "allowed". Whether I or anyone else actually wants to use such a file system is a completely different question.


to post comments

Ted speaks again

Posted Mar 16, 2009 13:17 UTC (Mon) by kleptog (subscriber, #1183) [Link] (2 responses)

As far as I can see all this is just changing expectations. Just a few years ago we were *happy* that our filesystems were readable after a crash (after running fsck). Then we progressed to being happy that after a crash we could use the filesystem without waiting hours for the fsck.

Now we're at the stage of worrying about exactly what the files should look like after a crash. Give it a few years and I'm sure we'll find something else to worry about. Also, POSIX was written a long time ago and deliberately vague on some points because they wanted to support many existing systems which all worked slightly differently.

NB: ISTM the solution to the 'lots of little files on ext3' problem is obvious. Create all the new files, then fsync them (fsync on ext3 may be slow, but it wouldn't be as much of a problem this way because all the data would be written out for all the files in one go). Finally rename them all.

Ted speaks again

Posted Mar 16, 2009 15:25 UTC (Mon) by drag (guest, #31333) [Link] (1 responses)

> Now we're at the stage of worrying about exactly what the files should look like after a crash. Give it a few years and I'm sure we'll find something else to worry about. Also, POSIX was written a long time ago and deliberately vague on some points because they wanted to support many existing systems which all worked slightly differently.

Well ya. That's progress I guess. People always want better, demand better.

In the case of Linux your traditionally dealing with half-way decent hardware running with UPS and ran by professionals. That is your designing the OS to perform well and reliably when managed by a person who knows, understands, and cares quite a bit about the hardware they are using.

Now with consumer-oriented Linux devices your dealing with people constantly putting excessive demands and loads on the system (especially graphics, which has been a weak point in stability for all systems including Linux) devices that are cheap and mass produced, ran by people that don't even understand what a OS is, have to operate with as low as power usage as possible, and have users with very low tolerances for anything really technical.

In this specific case your having Ubuntu users using unstable graphics drivers with developer versions of the operating system. They were crashing their system frequently; several times a day sometimes. They are doing weird things like over clocking RAM and all that crap.

They were finding that Ext4 was eating a significant portion of their file system, were as with Ext3 it didn't.

But that is just a tip of the iceberg. Your going to deal with mobile phones with batteries that just 'crap out'. Your going to deal with mobile internet devices that get used in abusive environments. Your going to deal with hand held devices that suspend to ram a dozen times a minute.

Try explaining to your grandma or to the guy down the street running a Moblin netbook that their system is not bootable anymore, or they can't use most of their applications, because POSIX doesn't give a shit that users get half their file system blown away when they shut their devices down incorrectly.

I don't know the best way to fix it, whether it's best to:
* Get the Kernel developers to care about maintaining a consistent file system image on the disk at all times
or
* Get the biggest clue stick in the world and collectively drive the "fsync is your friend" point home to all potential Linux developers.
or
* third option

I don't know.

But certainly demands and expectations change. Just like everything else in the computing landscape changes.

Ted speaks again

Posted Mar 16, 2009 16:54 UTC (Mon) by kleptog (subscriber, #1183) [Link]

Try explaining to your grandma or to the guy down the street running a Moblin netbook that their system is not bootable anymore, or they can't use most of their applications, because POSIX doesn't give a shit that users get half their file system blown away when they shut their devices down incorrectly.

Honestly, I don't see why POSIX should care. It's a standard that describes an API that can be used by programs that wish to be portable. In principle it could be implemented on anything from the smallest handheld to the largest mainframe. Reliability after a crash is outside the purview of POSIX since the requirements are vastly different in different situations. People writing software for embedded devices don't rely on POSIX to give them crash safety, they read the manuals for the device to see what the manufacturers say they should do.

POSIX compliance is a property of the OS-userspace boundary, crash-safety is a property of an entire system. They're largely orthogonal.

In my opinion it's wrong for people to say that either behaviour is mandated by POSIX. IMHO it's neither mandated nor forbidden. Crash reliability is a contract between you and the OS+hardware+kernel. A ramdisk can be POSIX compliant yet is clearly not crash safe. Leave POSIX out of it, decide what Linux wants to guarantee. POSIX provides a way of guaranteeing a certain reliability but Linux is free to provide additional guarantees if it sees fit.

Maybe something for LSB? I'd like to see the language lawyers work out a way of defining "crash-safety" in a way that doesn't exclude things like ramdisks and several existing filesystems.

Ted speaks again

Posted Mar 16, 2009 14:54 UTC (Mon) by k8to (guest, #15413) [Link]

The idea that tytso hasn't read the posix standard. hah! A good one sir!

Ted speaks again

Posted Mar 17, 2009 0:08 UTC (Tue) by jlokier (guest, #52227) [Link]

POSIX promises atomicy of operations like rename()

It promises atomicity of the directory modification done by rename, and every version of ext4 provides that. Renaming is equivalent to an atomic sequence of unlink() and link() calls.

You're confusing atomicity of the directory modification with serialising against the file content modification. POSIX doesn't promise anything about that in the absence of fsync() or fdatasync() used as a barrier between them. [I can't tell from the standard if fdatasync() is sufficient.]


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds