User: Password:
|
|
Subscribe / Log in / New account

Temporary files: RAM or disk?

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:46 UTC (Mon) by dvdeug (subscriber, #10998)
In reply to: Temporary files: RAM or disk? by giraffedata
Parent article: Temporary files: RAM or disk?

The default policy should be that when I save a file it's saved. If they had created this idea that only fsync puts the file on the disk, say, forty years ago, code would be littered with fsyncs (and no doubt filesystem writers would be cheating on that invariant and complaining that people overused fsync.)

Right now, after I've spent 15 minutes working on something and saving my work along the way, if I lose my data because something didn't run fsync in that 15 minutes, I'm going to be royally pissed. It takes a lot of speed increase on a benchmark to make up for 15 minutes of lost work. The time that users lose when stuff goes wrong doesn't show up on benchmarks, though.


(Log in to post comments)

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:57 UTC (Mon) by dlang (subscriber, #313) [Link]

the idea that your data isn't safe if the system crashes and you haven't done an fsync on that file (not just any other file in the system) HAS been around for 40 years.

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases (I remember that at one point reiserfs allowed for 30 seconds, and so was posting _amazing_ benchmark numbers, for benchmarks that took <30 seconds to run), but it's possible for it to take longer, or for the data to get to disk on the wrong order, or partially get to disk (again in some random order)

because of this, applications that really care about their data in crash scenarios (databases, mail servers, log servers, etc), do have fsync calls "littered" through their code. It's only recent "desktop" software that is missing this. In part because ext3 does have such pathological behaviour on fsync

Temporary files: RAM or disk?

Posted Jun 4, 2012 21:25 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases

Are you sure? The last time I looked at this was ten years ago, but at that time there were two main periods: every 5 seconds kswapd checked for dirty pages old enough to be worth writing out and "old enough" was typically 30 seconds. That was easy to confirm on a personal computer, because 30 seconds after you stopped working, you'd see the disk light flash.

But I know economies change, so I could believe dirty pages don't last more than 5 seconds in modern Linux and frequently updated files just generate 6 times as much I/O.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:11 UTC (Mon) by dlang (subscriber, #313) [Link]

this is a filesystem specific time setting for the filesystem journal. I know it's ~5 seconds on ext3. it could be different on other filesystems.

also, this is for getting the journal data to disk, if the journal is just metadata it may not push the file contents to disk (although it may, to prevent the file from containing blocks that haven't been written to yet and so contain random, old data)

Temporary files: RAM or disk?

Posted Jun 4, 2012 8:00 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> The default policy should be that when I save a file it's saved.

You are, of course, correct.
However this is a policy that is encoded in your editor, not in the filesystem. And I suspect most editors do exactly that. i.e. they call 'fsync' before 'close'.

But not every "open, write, close" sequence is an instance of "save a file". It may well be "create a temporary file which is completely uninteresting if I get interrupted". In that case an fsync would be pointless and costly. So the filesystem doesn't force an fsync on every close as the filesystem doesn't know what the 'close' means.

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:11 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

Another choice for a set of semantics would be to make programs that don't want to use a filesystem as a permanent storage area for files specify that. That is, fail safe, not fail destructive. As it is, no C program can portably save a file; fsync is not part of the C89/C99/C11 standards. Many other languages can not save a file at all without using an interface to C.

I've never seen this in textbooks and surely that should be front and center with the discussion of file I/O, that if you're actually saving user data, that you need to use fsync. It's not something you'll see very often in actual code. But should you actually be in a situation where this blows up in your face, it will be all your fault.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:51 UTC (Mon) by dgm (subscriber, #49227) [Link]

It's not in the C standard because it has nothing to do with C itself, but with the underlaying OS. You will find fsync() in POSIX, and it's portable as long as the target OS supports POSIX semantics (event Windows used to).

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:24 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

What do you mean nothing to do with C itself? Linux is interpreting C semantics to mean that a standard C program cannot reliably produce permanent files. That's certainly legal, but it means that most people who learn to write C will learn to write code that doesn't reliably produce permanent files. Linux could interpret the C commands as asking for the creation of permanent files and force people who want temporary file to use special non-portable commands.

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:33 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

Mount your filesystems with O_SYNC and see how long you can endure that. Making everything synchronous by default is a completely useless behaviour. *NO* general purpose OS in the last years does that.
Normally you need only very few points where you fsync (or equivalent) and quite some more places where you write data...

Temporary files: RAM or disk?

Posted Jun 4, 2012 11:20 UTC (Mon) by neilbrown (subscriber, #359) [Link]

To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.

O_SYNC means every write request is safe before the write system call returns.

An alternate semantic is that a file is safe once the last "close" on it returns. I believe this has been implemented for VFAT filesystems which people sometimes like to pull out of their computers without due care.
It is quite an acceptable trade-off in that context.

This is nearly equivalent to always calling fsync() just before close().

Adding a generic mount option to impose this semantic on any fs might be acceptable. It might at least silence some complaints.

Temporary files: RAM or disk?

Posted Jun 4, 2012 12:19 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

> To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.
> O_SYNC means every write request is safe before the write system call returns.
Hm. Not sure if that really is what people expect. But I can certainly see why it would be useful for some applications. Should probably be a fd option or such though? I would be really unhappy if a rm -rf or copy -r would behave that way.

Sometimes I wish userspace controllable metadata transactions where possible with a sensible effort/interface...

Temporary files: RAM or disk?

Posted Jun 4, 2012 16:44 UTC (Mon) by dgm (subscriber, #49227) [Link]

Linux does not interpret C semantics. Linux implements POSIX semantics, and C programs use POSIX calls to access those semantics. So this has nothing to do with C, but POSIX.

POSIX offers a tool to make sure your data is safely stored: the fsync() call. POSIX and the standard C library are careful not to make any promises regarding the reliability of writes, because this would mean a burden for all systems implementing those semantics, some of which do not even have a concept of fail-proof disk writes.

Now Linux could chose to deviate from the standard, but that would be exactly the reverse of portability, wouldn't it?

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:37 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Well, it's a little more complex because applications are more complex than just C programs. Sometimes the application is a person sitting at a workstation typing shell commands. The cost of replacing the data is proportional to the amount of data lost. For that application, the rule isn't that the application must use fsync, but that it must use a sync shell command when the cost of replacement has exceeded some threshold. But even that is oversimplified, because it makes sense for the system to do a system-wide sync automatically every 30 seconds or so to save the user that trouble.

On the other hand, we were talking before about temporary files on servers, some of which do adhere to the fsync dogma such that an automatic system-wide sync may be exactly the wrong thing to do.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:06 UTC (Mon) by dlang (subscriber, #313) [Link]

a system-wide sync can take quite a bit of time, and during that time it may block a lot of other activity (or make it so expensive that the system may as well be blocked)


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds