User: Password:
|
|
Subscribe / Log in / New account

Temporary files: RAM or disk?

Temporary files: RAM or disk?

Posted Jun 1, 2012 3:36 UTC (Fri) by neilbrown (subscriber, #359)
In reply to: Temporary files: RAM or disk? by Cyberax
Parent article: Temporary files: RAM or disk?

> "Why not leave /tmp as it is?"

Isn't that answered in the article?

Because "as it is", /tmp imposes unnecessary disk IO which can be noticed when creating lots of small short-lived files. Let's see if we can make it faster, without making it any smaller.


(Log in to post comments)

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:17 UTC (Fri) by wahern (subscriber, #37304) [Link]

I don't understand why this would be so. Small, short-level files should only ever exist in the buffer cache. This smells like a filesystem problem, not a backing store problem.

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:31 UTC (Fri) by neilbrown (subscriber, #359) [Link]

I believe that with ext2, small short-lived files only ever do exist in memory, just as you suggest. Few people store '/' on ext2 these days.

With journalling things become a bit more complex. You need to ensure that the various metadata are journalled in the right order and by far the easiest way to do that it to place every updated block in the "next" transaction. So with ext3 journalling (if I understand it correctly), every metadata block that gets changed will be written to the journal on the next journal commit, and then to the filesystem.

A filesystem which does delayed allocation would be better placed to optimise out short lived files completely and maybe ext4/xfs/btrfs do better at this. However I suspect is it far from trivial to optimise out *all* storage updates for short-lived files and I doubt it is something that fs developers optimise for.

So I think that you probably could see it as a filesystem problem, but I'm not sure that seeing it that way would lead to the best solution (but if some fs developers see this as a challenge and prove me wrong, I won't complain).

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:03 UTC (Fri) by wookey (subscriber, #5501) [Link]

Before rehashing the _whole_ discussion here, I suggest people go read the thread, which is fairly thorough:
It starts here https://lwn.net/Articles/499534/

One thing Serge keeps coming back to is 'Please show us real-world improvements from /tmp-in-tmpfs, significant enough to make it a better _default_, given the well-documented problems'. This seems to be key, and I leave it to posters to make up their own minds about that. I certainly learned a lot from the thread. And there is clearly a longer-term issue to fix this properly.

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:35 UTC (Fri) by wujj123456 (subscriber, #84680) [Link]

Application is indeed the key.

I always mount /tmp as tmpfs, but I have large RAM and know exactly what I am doing. I used to analysis ~10G of data, and reading from RAM was at least 300% faster, even including the heavy data processing. I also rendered movies using tmpfs when size fits, and again observed dramatical difference.

The problem is: if a user cares about that performance difference, he probably knows how to use tmpfs himself. Setting /tmp to tmpfs will confuse normal users when an application fails. Given the popularity of those big distros, it might not be a good move. Even Firefox doesn't store tmp file in /tmp unless you override it in about:config. It might be worthwhile to check how existing applications are using tmpfs (/dev/shm). I have a feeling that most applications don't care at all.

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:43 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> The problem is: if a user cares about that performance difference, he probably knows how to use tmpfs himself.

Are you serious? The only people who care about performance are people who dig into the arcane configuration details of OSes ?? I don't think so.

Wasn't there a recent quote of the week along the lines of "We should make things simple and safe so that people don't *need* to carefully form good habits."?? I think that applies here to, only is so that people don't *need" to dig into arcane details.

I agree that we shouldn't make /tmp == tmpfs the default while it causes problems. But I do think that we should work to fix the problems so that we can do it safely.

Temporary files: RAM or disk?

Posted Jun 2, 2012 7:01 UTC (Sat) by Los__D (guest, #15263) [Link]

"if a user cares about that performance difference, he probably knows how to use tmpfs himself."
Errrr... Yeah, right.

Temporary files: RAM or disk?

Posted Jun 2, 2012 23:41 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I used to analyze ~10G of data, and reading from RAM was at least 300% faster, ...

You imply that with /tmp in a disk-based filesystem, you didn't read from RAM. Why would that be? Why weren't your files in cache?

Temporary files: RAM or disk?

Posted Jun 3, 2012 15:09 UTC (Sun) by bronson (subscriber, #4806) [Link]

I bet the files were cached and reads took the same amount of time. The slowdown would be due to writes. tmpfs is allowed to lose the disk contents on reboot, filesystems aren't.

I can write 1G of data to tmpfs, read it badk, and delete it (a typical scientific profile), without ever expecting it to hit rust. I'd be very VERY disappointed in any filesystem that allowed its write buffers to get that far behind.

Temporary files: RAM or disk?

Posted Jun 3, 2012 17:44 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

I'd be very VERY disappointed in any filesystem that allowed its write buffers to get that far behind.

Getting this far behind is a valuable feature and any filesystem that doesn't let you do it is lacking. Someone pointed out earlier that the more modern ext3 is incapable of getting that far behind, whereas the less modern ext2 is not. That's a regression (but effectively explains why a tmpfs /tmp could be faster than an ext3 one).

I've seen filesystems that have mount options and file attributes that specifically indicate that files are temporary -- likely to be overwritten or deleted soon -- so that the page replacement algorithm doesn't waste valuable I/O time cleaning the file's pages.

Furthermore, many people believe that whenever you want data to be hardened to disk, you should fsync. Given that philosophy, the default kernel policy should be not to write the data to disk until you need the memory (with some allowance for forecasting future need for memory).

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:46 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

The default policy should be that when I save a file it's saved. If they had created this idea that only fsync puts the file on the disk, say, forty years ago, code would be littered with fsyncs (and no doubt filesystem writers would be cheating on that invariant and complaining that people overused fsync.)

Right now, after I've spent 15 minutes working on something and saving my work along the way, if I lose my data because something didn't run fsync in that 15 minutes, I'm going to be royally pissed. It takes a lot of speed increase on a benchmark to make up for 15 minutes of lost work. The time that users lose when stuff goes wrong doesn't show up on benchmarks, though.

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:57 UTC (Mon) by dlang (subscriber, #313) [Link]

the idea that your data isn't safe if the system crashes and you haven't done an fsync on that file (not just any other file in the system) HAS been around for 40 years.

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases (I remember that at one point reiserfs allowed for 30 seconds, and so was posting _amazing_ benchmark numbers, for benchmarks that took <30 seconds to run), but it's possible for it to take longer, or for the data to get to disk on the wrong order, or partially get to disk (again in some random order)

because of this, applications that really care about their data in crash scenarios (databases, mail servers, log servers, etc), do have fsync calls "littered" through their code. It's only recent "desktop" software that is missing this. In part because ext3 does have such pathological behaviour on fsync

Temporary files: RAM or disk?

Posted Jun 4, 2012 21:25 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases

Are you sure? The last time I looked at this was ten years ago, but at that time there were two main periods: every 5 seconds kswapd checked for dirty pages old enough to be worth writing out and "old enough" was typically 30 seconds. That was easy to confirm on a personal computer, because 30 seconds after you stopped working, you'd see the disk light flash.

But I know economies change, so I could believe dirty pages don't last more than 5 seconds in modern Linux and frequently updated files just generate 6 times as much I/O.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:11 UTC (Mon) by dlang (subscriber, #313) [Link]

this is a filesystem specific time setting for the filesystem journal. I know it's ~5 seconds on ext3. it could be different on other filesystems.

also, this is for getting the journal data to disk, if the journal is just metadata it may not push the file contents to disk (although it may, to prevent the file from containing blocks that haven't been written to yet and so contain random, old data)

Temporary files: RAM or disk?

Posted Jun 4, 2012 8:00 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> The default policy should be that when I save a file it's saved.

You are, of course, correct.
However this is a policy that is encoded in your editor, not in the filesystem. And I suspect most editors do exactly that. i.e. they call 'fsync' before 'close'.

But not every "open, write, close" sequence is an instance of "save a file". It may well be "create a temporary file which is completely uninteresting if I get interrupted". In that case an fsync would be pointless and costly. So the filesystem doesn't force an fsync on every close as the filesystem doesn't know what the 'close' means.

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:11 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

Another choice for a set of semantics would be to make programs that don't want to use a filesystem as a permanent storage area for files specify that. That is, fail safe, not fail destructive. As it is, no C program can portably save a file; fsync is not part of the C89/C99/C11 standards. Many other languages can not save a file at all without using an interface to C.

I've never seen this in textbooks and surely that should be front and center with the discussion of file I/O, that if you're actually saving user data, that you need to use fsync. It's not something you'll see very often in actual code. But should you actually be in a situation where this blows up in your face, it will be all your fault.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:51 UTC (Mon) by dgm (subscriber, #49227) [Link]

It's not in the C standard because it has nothing to do with C itself, but with the underlaying OS. You will find fsync() in POSIX, and it's portable as long as the target OS supports POSIX semantics (event Windows used to).

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:24 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

What do you mean nothing to do with C itself? Linux is interpreting C semantics to mean that a standard C program cannot reliably produce permanent files. That's certainly legal, but it means that most people who learn to write C will learn to write code that doesn't reliably produce permanent files. Linux could interpret the C commands as asking for the creation of permanent files and force people who want temporary file to use special non-portable commands.

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:33 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

Mount your filesystems with O_SYNC and see how long you can endure that. Making everything synchronous by default is a completely useless behaviour. *NO* general purpose OS in the last years does that.
Normally you need only very few points where you fsync (or equivalent) and quite some more places where you write data...

Temporary files: RAM or disk?

Posted Jun 4, 2012 11:20 UTC (Mon) by neilbrown (subscriber, #359) [Link]

To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.

O_SYNC means every write request is safe before the write system call returns.

An alternate semantic is that a file is safe once the last "close" on it returns. I believe this has been implemented for VFAT filesystems which people sometimes like to pull out of their computers without due care.
It is quite an acceptable trade-off in that context.

This is nearly equivalent to always calling fsync() just before close().

Adding a generic mount option to impose this semantic on any fs might be acceptable. It might at least silence some complaints.

Temporary files: RAM or disk?

Posted Jun 4, 2012 12:19 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

> To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.
> O_SYNC means every write request is safe before the write system call returns.
Hm. Not sure if that really is what people expect. But I can certainly see why it would be useful for some applications. Should probably be a fd option or such though? I would be really unhappy if a rm -rf or copy -r would behave that way.

Sometimes I wish userspace controllable metadata transactions where possible with a sensible effort/interface...

Temporary files: RAM or disk?

Posted Jun 4, 2012 16:44 UTC (Mon) by dgm (subscriber, #49227) [Link]

Linux does not interpret C semantics. Linux implements POSIX semantics, and C programs use POSIX calls to access those semantics. So this has nothing to do with C, but POSIX.

POSIX offers a tool to make sure your data is safely stored: the fsync() call. POSIX and the standard C library are careful not to make any promises regarding the reliability of writes, because this would mean a burden for all systems implementing those semantics, some of which do not even have a concept of fail-proof disk writes.

Now Linux could chose to deviate from the standard, but that would be exactly the reverse of portability, wouldn't it?

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:37 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Well, it's a little more complex because applications are more complex than just C programs. Sometimes the application is a person sitting at a workstation typing shell commands. The cost of replacing the data is proportional to the amount of data lost. For that application, the rule isn't that the application must use fsync, but that it must use a sync shell command when the cost of replacement has exceeded some threshold. But even that is oversimplified, because it makes sense for the system to do a system-wide sync automatically every 30 seconds or so to save the user that trouble.

On the other hand, we were talking before about temporary files on servers, some of which do adhere to the fsync dogma such that an automatic system-wide sync may be exactly the wrong thing to do.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:06 UTC (Mon) by dlang (subscriber, #313) [Link]

a system-wide sync can take quite a bit of time, and during that time it may block a lot of other activity (or make it so expensive that the system may as well be blocked)

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:39 UTC (Mon) by dgm (subscriber, #49227) [Link]

Ext3 does worse than ext2 because it tries to keep metadata consistency, but that is useless for a tmp filesystem, where all files are going to be wiped out on reboot or crash.

It's not a regression, but a conscientious design decision, and that use case is outside of what Ext3 is good for.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 4, 2012 15:43 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

It's not a regression, but a conscientious design decision

It's a regression due to a conscious design decision. Regression doesn't mean mistake, it means the current thing does something worse than its predecessor. Software developers have a bias against regressions, but they do them deliberately, and for the greater good, all the time.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 4, 2012 21:24 UTC (Mon) by dgm (subscriber, #49227) [Link]

Regression does mean mistake, and this is clearly not the case.

A more enlightening example: the latest version of the kernel requires more memory than 0.99 but nobody could possibly claim this is a regression. If anything, it's a trade-off.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 1:42 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

the latest version of the kernel requires more memory than 0.99 but nobody could possibly claim this is a regression

I claim that's a regression. Another area where kernel releases have steadily regressed: they run more slowly. And there are machines current kernels won't run on at all that previous ones could. Another regression.

I'm just going by plain meaning of the word (informed somewhat by it's etymology, the Latin for "step backward."). And the fact that it's really useful to be able to talk about the steps backward without regard to whether they're worth it.

Everyone recognizes that sometimes you have to regress in some areas in order to progress in others. And sometimes it's a matter of opinion whether the tradeoff is right. For example, regression testing often uncovers the fact that the new release runs so much slower than the previous one that some people consider it a mistake and it gets "fixed."

I like to use Opera, but almost every upgrade I've ever done has contained functional regressions, usually intentional. As they are often regressions that matter to me, I tend not to upgrade Opera (and it makes no difference to me whether it's a bug or not).

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 8:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

Whatever, keep using 0.99 then, or better go back to first version that just printed AAAABBBB on the screen. Everything from there is a regression.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 14:25 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

Whatever, keep using 0.99 then, or better go back to first version that just printed AAAABBBB on the screen. Everything from there is a regression.

Everything since this is a regression in certain areas, but you seem to be missing the essential point that I stated several ways: These regressions come along with progressions. The value of the progressions outweigh the cost of the regressions. I hate in some way every "upgrade" I make, but I make them anyway.

Everyone has to balance the regressions and the progressions in deciding whether to upgrade, and distributors tend to make sure the balance is almost always in favor of the progressions. We can speak of a "net regression," which most people would not find current Linux to be with respect to 0.99.

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:51 UTC (Mon) by bronson (subscriber, #4806) [Link]

No. There are so many buggy, non-fsyncing programs out there that, if a filesystem has 1G of writes outstanding, it's almost certainly going to lose many hours of work. (Unless it's manually flushing every 20 seconds or so, in which case that's fine but also slower than tmpfs).

In an ideal world, you're exactly right. In today's world, that would be fairly dangerous.

> I've seen filesystems that have mount options and file attributes that specifically indicate that files are temporary

Agreed, but if you're remounting part of your hierarchy with crazy mount options, why not just use tmpfs?

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:08 UTC (Mon) by dlang (subscriber, #313) [Link]

because tempfs just uses ram? and while you can add swap to give you more space, the use of the swap will not be targeted. This means that you may end up with things swapped out that you really would rather have remained active, even if the result was that it took a little more time to retrieve a temporary file.

Temporary files: RAM or disk?

Posted Jun 5, 2012 7:05 UTC (Tue) by bronson (subscriber, #4806) [Link]

That's true, that's an important difference. But you could have a smilar situation with the filesystem-with-options, right? If the filesystem uses a lot of memory, the important things could get swapped out as well.

Temporary files: RAM or disk?

Posted Jun 5, 2012 7:19 UTC (Tue) by dlang (subscriber, #313) [Link]

True, but the difference is that it would need to be a very poorly written filesystem to eat up more memory than the contents that it's holding. And it's much easier to tell where the memory is being used, and therefor make an intelligent decision about what to write to disk (and what to throw away), than when it all has to be stored in memory and your only disk backing you have is swap.

Also, reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.

Temporary files: RAM or disk?

Posted Jun 5, 2012 15:33 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.

I believe the tendency is the other way around. One of the selling points for tmpfs for me is that reading and writing swap is more efficient than reading and writing a general purpose filesystem. First, there aren't inodes and directories to pull the head around. Second, writes stream out sequentially on disk, eliminating more seeking.

Finally, I believe it's usually the case that, for large chunks of data, the data is referenced in the same groups in which it becomes least recently used. A process loses its timeslice and its entire working set ages out at about the same time and ends up in the same place on disk. When it gets the CPU again, it faults in its entire working set at once. For a large temporary file, I believe it is even more pronounced - unlike many files, a temporary file is likely to be accessed in passes from beginning to end. I believe general purpose filesystems are only now gaining the ability to do the same placement as swapping in this case; to the extent that they succeed, though, they can at best reach parity.

In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.

Temporary files: RAM or disk?

Posted Jun 6, 2012 6:53 UTC (Wed) by Serge (guest, #84957) [Link]

> I believe the tendency is the other way around. One of the selling points for tmpfs for me is that reading and writing swap is more efficient than reading and writing a general purpose filesystem. First, there aren't inodes and directories to pull the head around.

It's not that simple. Tmpfs is not "plain data" filesystem, you can create directories there, so it has to store all the metadata as well. It also has inodes internally.

> Second, writes stream out sequentially on disk, eliminating more seeking.

This could be true if swap was empty. Same when you write to the empty filesystem. But what if it was not empty? You get the same swap fragmentation and seeking as you would get in any regular filesystem.

> In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.

And filesystem is intentionally optimized for storing files. Swap is not a plain data storage, otherwise "suspend to disk" could not work. Swap has its internal format, there're even different versions of its format (`man mkswap` reveals v0 and v1). I.e. instead of writing through one ext3fs level you write through two fs levels tmpfs+swap.

Things get worse when you start reading. When you read something from ext3, the oldest part of the filecache is dropped and data is placed to RAM. But reading from swap means that your RAM is full, and in order to read a page from swap you must first write another page there. I.e. sequential read from ext3 turns into random write+read from swap.

Temporary files: RAM or disk?

Posted Jun 6, 2012 15:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> But reading from swap means that your RAM is full, and in order to read a page from swap you must first write another page there. I.e. sequential read from ext3 turns into random write+read from swap.

_Writing_ to swap means that your RAM is full (possibly including things like clean cache which are currently higher priority, but could be dropped at need). _Reading_ from swap implies only that something previously written to swap is needed in RAM again. There could be any amount of free space at that point. Even if RAM does happen to be full, the kernel can still drop clean data from the cache to make room, just as with reading from ext3.

Temporary files: RAM or disk?

Posted Jun 6, 2012 17:43 UTC (Wed) by dgm (subscriber, #49227) [Link]

Yes, merely reading from swap doesn't imply that your RAM is full. What is true is that _when_ your RAM is full (notice that I don't say "if") it _may_ imply a write to swap, depending in how dirty the page cache is. The problem is, tmpfs is a factor that contributes a lot to pollute the page cache. Temporary files are created to be written and then re-read in short, so all pages used by tmpfs are expected to be dirty.

All of this is of no consequence on system startup, when the page cache is mostly clean. Once the system has been up for a while, though... I think a few tests have to be done.

Temporary files: RAM or disk?

Posted Jun 7, 2012 2:28 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

... First, there aren't inodes and directories to pull the head around.
It's not that simple. Tmpfs is not "plain data" filesystem, you can create directories there, so it has to store all the metadata as well. It also has inodes internally.

I was talking about disk structures. Inodes and directory information don't go into the swap space, so they don't pull the head around.

(But there's an argument in favor of regular filesystem /tmp: if you have lots of infrequently accessed small files, tmpfs will waste memory).

Second, writes stream out sequentially on disk, eliminating more seeking.
This could be true if swap was empty. Same when you write to the empty filesystem. But what if it was not empty? You get the same swap fragmentation and seeking as you would get in any regular filesystem.

It's the temporary nature of the data being swapped (and the strategies the kernel implements based on that expectation) that makes the data you want at any particular time less scattered in swap space than in a typical filesystem that has to keep copious eternally growing files forever. I don't know exactly what policies the swapper follows (though I have a pretty good idea), but if it were no better at storing anonymous process data than ext3 is at storing file data, we would really have to wonder at the competence of the people who designed it. And my claim is that since it's so good with process anonymous data, it should also be good with temporary files, since they're used almost the same way.

in order to read a page from swap you must first write another page there.

Actually, the system does the same thing for anonymous pages as it does for file cache pages: it tries to clean the pages before they're needed so that when a process needs to steal a page frame it usually doesn't have to wait for a page write. Also like file cache, when the system swaps a page in, it tends to leave the copy on disk too, so if it doesn't get dirty again, you can steal its page frame without having to do a page out.

Temporary files: RAM or disk?

Posted Jun 7, 2012 13:15 UTC (Thu) by njs (guest, #40338) [Link]

I don't know about tmpfs, but my experience is: if I have a process with a large (multi-gigabyte) working set, and it goes to sleep and gets swapped out, then there's no point in waking it back up again; I might as well kill it and start over. At least on our compute servers (running some vaguely recent Ubuntu, IIRC), swap-in is definitely not doing successful readahead. I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory; it would be hundreds of times faster.

Temporary files: RAM or disk?

Posted Jun 7, 2012 13:28 UTC (Thu) by Jonno (subscriber, #49613) [Link]

If you have enough free memory at the time you want to swap in that process, try to run "sudo swapoff -a ; sudo swapon -a", it will sequentially read in all swap to memory, no random access.

I find that if I have two processes with large working sets causing swaping, and kill one them, doing a swapoff will get the other one performant again much faster than letting it swap in only the stuff it needs as it needs it.

Temporary files: RAM or disk?

Posted Jun 7, 2012 15:44 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

At least on our compute servers (running some vaguely recent Ubuntu, IIRC), swap-in is definitely not doing successful readahead

Good information.

That's probably a good reason to use a regular filesystem instead of tmpfs for large temporary files.

I just checked, and the only readahead tmpfs does is the normal swap readahead, which consists of reading an entire cluster of pages when one of the pages is demanded. A cluster of pages is pages that were swapped out at the same time, so they are likely to be re-referenced at the same time and are written at the same spot on the disk. But this strategy won't effect streaming, like typical filesystem readahead.

And the kernel default size of the cluster is 8 pages. You can control it with /proc/sys/vm/page-cluster, though. I would think on a system with multi-gigabyte processes, a much larger value would be optimal.

Temporary files: RAM or disk?

Posted Jun 11, 2012 14:51 UTC (Mon) by kleptog (subscriber, #1183) [Link]

This is actually related to another problem I ran into recently: is there some way see what is actually in swap? I know /proc/<pid>/smaps gives you information about which blocks are in swap. But I can't see a way to get information about the order. That is, is my swap fragmented?

Temporary files: RAM or disk?

Posted Jun 7, 2012 21:36 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory

Windows 8 will do that for modern applications. http://blogs.msdn.com/b/b8/archive/2012/04/17/reclaiming-...

Temporary files: RAM or disk?

Posted Jun 8, 2012 0:15 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory
Windows 8 will do that for modern applications.

When njh says "hack" I think it means something an intelligent user can invoke to override the normal system paging strategy because he knows a process is going to be faulting back much of its memory anyway.

The Windows 8 thing is automatic, based on an apparently pre-existing long-term scheduling facility. Some applications get long-term scheduled out, aka "put in the background," aka "suspended," mainly so devices they are using can be powered down and save battery energy. But there is a new feature that also swaps all the process' memory out when it gets put in the background, and the OS takes care to put all the pages in one place. Then, when the process gets brought back to the foreground, the OS brings all those pages back at once, so the process is quickly running again.

This of course requires applications that explicitly go to sleep, as opposed to just quietly not touching most of their memory for a while, and then suddenly touching it all again.

Temporary files: RAM or disk?

Posted Jun 8, 2012 0:59 UTC (Fri) by CycoJ (guest, #70454) [Link]

I encourage anyone who wants to see the benefit of having a tmpfs in RAM to try relocating the firefox profile to a tmpfs (see https://wiki.archlinux.org/index.php/Firefox_Ramdisk). I've recently done this on my new system which normally has plenty of RAM to spare. The difference is quite impressive, even though I have a latest generation SSD. Mind you I've been bitten by this once. I kept too many tabs open while doing some simulation work on the side. When I tried to open one more tab, the whole system went into a complete freeze because it ran out of RAM (and I don't have a swap partition), obviously this happened when I was just booking a flight online, with only one last ticket available at this price.

Temporary files: RAM or disk?

Posted Jun 8, 2012 17:14 UTC (Fri) by apoelstra (subscriber, #75205) [Link]

For about a year now, I have had my "/.mozilla mounted as a tmpfs. I don't have a SSD, but I have 2Gb of RAM, and Firefox has never run out of memory for me.

It's screaming fast. I originally started doing this when I had my $HOME mounted over SSHFS, and Firefox would single-handedly saturate my pipe, and took forever to do anything. Its disk IO is (was) obscene.

This also has the benefit (if you want to see it that way) that my history does not get so filled with garbage, since every reboot the profile is reset. I have a line in my .Xclients which copies a template .mozilla into place, so that I start off with Noscript, Adblock, Tor, etc, all enabled, and my history is seeded with LWN and other sites I frequent.

Temporary files: RAM or disk?

Posted Jun 9, 2012 15:51 UTC (Sat) by Serge (guest, #84957) [Link]

> I encourage anyone who wants to see the benefit of having a tmpfs in RAM to try relocating the firefox profile to a tmpfs (see https://wiki.archlinux.org/index.php/Firefox_Ramdisk). The difference is quite impressive, even though I have a latest generation SSD.

It might be a good idea to save some SSD writes, but does it really increases performance? My ~/.mozilla profile is about 2GB, so it was not a good idea to put it in RAM, but I tried that with a new empty profile and noticed no difference. What should I look at?

PS: it's not related to the /tmp dir, I assume, but it's still interesting to see some tmpfs benefits for a popular application.

Temporary files: RAM or disk?

Posted Jun 2, 2012 23:05 UTC (Sat) by mirabilos (subscriber, #84359) [Link]

This is probably the best-written argumentation that can even defeat this "Serge" person’s queries for real-world examples (since of course those would be highly subjective…)

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:10 UTC (Mon) by Serge (guest, #84957) [Link]

> So with ext3 journalling [...] changed will be written to the journal on the next journal commit

Probably. But it won't trigger disk access. You can check that:
for i in `seq 5`; do echo 123 > f; rm -f f; grep sda1 /proc/diskstats; done
(replace "sda1" with the disk you write to)

If file creation/deletion (metadata change) triggers disk access you'll see all the lines different. But if lines are same, then there was no disk access.

Cache still works for journaled filesystems. Linux kernel is written by smart people, yeah.

PS: I've seen reiserfs to trigger "read" in such test. You can see description of diskstats numbers in:
http://www.kernel.org/doc/Documentation/iostats.txt

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:27 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> If file creation/deletion (metadata change) triggers disk access you'll see all the lines different.

This doesn't agree with my understanding of ext3 journalling, so maybe I expressed it poorly.

If you put a 5 second sleep in that loop, I expect you would see changes. I do - once I found a suitably quiet ext3 filesystem to test on.

The metadata blocks do go into the next transaction, but transactions can live in memory for up to 5 seconds before they are flushed.

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:17 UTC (Mon) by Serge (guest, #84957) [Link]

> If you put a 5 second sleep in that loop, I expect you would see changes.

The exact number of seconds depends on /proc/sys/vm/dirty_*_centisecs value and /proc/sys/vm/laptop_mode...

Anyway, are you talking about file content or file name being written to disk in 5 seconds? Or both?

We can check whether content of deleted file is written to disk, run:
for i in `seq 100`; do dd if=/dev/zero of=f bs=1M count=10; rm -f f; done
then check /proc/diskstats or `iostat -k`. If you see writes increased in 1GB, your filesystem writes data even for deleted files. My ext3 does not.

> I do - once I found a suitably quiet ext3 filesystem to test on.

Try /boot. :) Or just insert some USB flash stick and create ext3 there.

Temporary files: RAM or disk?

Posted Jun 4, 2012 11:28 UTC (Mon) by neilbrown (subscriber, #359) [Link]

No. The "5 seconds" that I was taking about is not a /proc/sys/vm/dirty* number. It is ext3 (and presumably ext4) specific.
It defaults to 5 seconds (JBD_DEFAULT_MAX_COMMIT_AGE) and can be changed by the "commit=nn" mount option.

That many seconds after a journal transaction has been opened, it is closed and flushed - if it hadn't been closed already.

It is the metadata that is written to the journal - inodes, free-block bitmaps, directory names etc.
The file contents are handled differently for different settings of "data=".
ordered: data that relates to the metadata in flushed before the metadata is written to the journal
writeback: data is written according to /proc/sys/vm/dirty* rules
journal: data is written to the journal with the metadata.

I'm not sure what the default is today. If you create then delete a file, the data will not go to disk, except possibly for "data=journal". But the metadata will.

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:17 UTC (Mon) by Serge (guest, #84957) [Link]

> If you create then delete a file, the data will not go to disk, except possibly for "data=journal". But the metadata will.

That's harder to test. Maybe compare amount of writes generated by something like:
for i in `seq 10`; do touch $i; rm -f $i; done
with amount of writes generated by:
for i in `seq 1000`; do touch $i; rm -f $i; done
Every creation/deletion is written to disk if the latter line generates about 100 times more writes. On my ext3 I see sub-equal number of writes...

But, anyway, looks like it's not a problem for /tmp then, meaning that ext2 would not be (noticeably) better than ext3 in /tmp use cases.

Temporary files: RAM or disk?

Posted Jun 4, 2012 14:13 UTC (Mon) by hummassa (subscriber, #307) [Link]

In my work machine, on ext4, all lines are different.

Temporary files: RAM or disk?

Posted Jun 1, 2012 13:21 UTC (Fri) by Richard_J_Neill (subscriber, #23093) [Link]

It seems to me that the solution would be an extra flag for a mountpoint that says "files put in this directory should only be flushed to disk with low priority". i.e. have /tmp really existing on disk, but ionice the process for writing the in-memory pages to disk.

BTW, Mandriva/Mageia has done /tmp on tmpfs for ages (I think ~ 5 years), and it does work fine.

Temporary files: RAM or disk?

Posted Jun 5, 2012 12:23 UTC (Tue) by roblucid (subscriber, #48964) [Link]

Except application writers can open temporary files read/write and unlink the newly created file so only the file descriptor provides access to it.

That prevents files getting left around, so rather than a new flag, filesystems could stop sync-ing the disk copy, in this situation, reasoning the file is ephemeral.

On TMPFS based /tmp systems like Solaris (I used it with SunOS 4) then humongeous temporary files would need special arrangements and placing, disks just tended not to have much free space. Disks were not even 1GB and overloading memory + swap space with temp files, tended to be more reliable in operation in practice, because processes could still process even when some luser had filled the disk.

Temporary files: RAM or disk?

Posted Jun 8, 2012 11:46 UTC (Fri) by Wol (guest, #4433) [Link]

Coming at it from a gentoo / SuSE user's viewpoint ...

Gentoo shoves all its compiles into /tmp. And when compiling LO, you need a lot of temp space. So rather than having space dedicated to tmp for compiling, I have something like 10 or 20Gb of swap (plus 8Gb RAM), and simply have a huge tmpfs /tmp.

SuSE on the other hand ... Why oh WHY can't they give you sane defaults! Swap space defaults to twice ram (good) but without doing a "wipe and redo manually", you can't *increase* swap space! I always set swap space to at least twice the mobo's max ram.

The other thing I didn't realise, is that tmpfs defaults to half available ram. So with 8Gb, the first few times I tried to compile OOo, I couldn't work out why it kept crashing !-)

So yeah, I'm all in favour of /tmp in tmpfs. But make sure you have *sane* defaults, and those defaults are *easy* to over-ride. SuSE, I'm glaring at you !!!

Cheers,
Wol

Temporary files: RAM or disk?

Posted Jun 8, 2012 15:20 UTC (Fri) by anselm (subscriber, #2796) [Link]

Swap space defaults to twice ram (good) but without doing a "wipe and redo manually", you can't *increase* swap space!

You can always increase swap space after the fact by means of swap files (rather than swap partitions).

Temporary files: RAM or disk?

Posted Jun 8, 2012 19:35 UTC (Fri) by dlang (subscriber, #313) [Link]

or by creating additional swap partitions and adding them

Temporary files: RAM or disk?

Posted Jun 8, 2012 20:23 UTC (Fri) by jackb (guest, #41909) [Link]

Gentoo shoves all its compiles into /tmp.
As long as I've been using it compiling has always been done in /var/tmp, not /tmp.

Mounting /var/tmp/portage on tmpfs is not the default behavior but has become extremely common. For large packages like Chromium or LibreOffice there are ways to override the default PORTAGE_TMPDIR to point to a non-tmpfs directory.

Temporary files: RAM or disk?

Posted Jun 9, 2012 18:06 UTC (Sat) by Serge (guest, #84957) [Link]

> And when compiling LO, you need a lot of temp space. So rather than having space dedicated to tmp for compiling, I have something like 10 or 20Gb of swap (plus 8Gb RAM), and simply have a huge tmpfs /tmp.

Why? Does it makes things faster for you? It would be interesting to see some benchmarks. I've seen tests showing there's no difference, and seen one with extfs being faster than tmpfs+swap for compiling.

> and simply have a huge tmpfs /tmp.

Imho, it's much simpler to have it on disk. :)

> So yeah, I'm all in favour of /tmp in tmpfs.

/tmp is not the only place where you can mount tmpfs. If you want your /var/tmp/portage in tmpfs, you don't have to break other apps and put /tmp there.

Temporary files: RAM or disk?

Posted Jun 12, 2012 14:03 UTC (Tue) by TRauMa (guest, #16483) [Link]

Compiles on tmpfs are faster, factor is 1.8 to 2 in my tests, provided the working set nearly fits into RAM. With lots of swapping going on, you may end up taking longer to compile. Contrary to what is stated above, tmpfs is not smart about swapping, the data in swap is accessed very randomly and I'd be very surprised if inode data wouldn't also end up in swap on high memory pressure. I found all of this out a long time ago on gentoo trying to compile openoffice with 1G of RAM and a dynamic swapfile manager. Now, with 16G, it is actually feasible.

Another thing: I thought the plan was to migrate to per-user-tmp anyway, somewhere in $HOME, for apps that use a lot of tmp like DVD rippers this would be a good idea anyway.

Temporary files: RAM or disk?

Posted Jun 16, 2012 4:30 UTC (Sat) by Serge (guest, #84957) [Link]

> I thought the plan was to migrate to per-user-tmp anyway, somewhere in $HOME, for apps that use a lot of tmp like DVD rippers this would be a good idea anyway.

Per-user directory would not get cleaned on reboot. Using per-user temporary directory may be a bad thing for users with NFS /home, they would prefer using local tmp if it is. Also a common /tmp for all users still needed for file exchange on a multiuser servers. And finally, why would DVD soft used something-in-HOME, if they can use /tmp which is there exactly for those things. ;)

Why put /tmp on tmpfs? Having /var/tmp/portage on tmpfs does not force you to put /tmp there. And it's really hard to find an application that becomes faster just because of /tmp on tmpfs. Even for portage it's not that obvious.

> Compiles on tmpfs are faster, factor is 1.8 to 2 in my tests

Hm... My simple test shows that tmpfs is just about 1-2% faster.
Here's the script to resemble a basic package build:
mount tmpfs or ext3 to /mnt/test, then
$ cd /mnt/test
$ wget http://curl.haxx.se/download/curl-7.26.0.tar.bz2
$ export CFLAGS='-O2 -g -pipe' CXXFLAGS='-O2 -g -pipe'
$ time sh -c 'tar xf curl-7.26.0.tar.bz2 && cd curl-7.26.0 && ./configure && make install DESTDIR=/mnt/test/root && cd ../root && tar czf ../curl-package.tar.gz * && cd .. && rm -rf curl-7.26.0 root'

tmpfs results:
real 70.983s user 48.685s sys 26.527s
real 70.635s user 48.390s sys 26.694s
real 70.701s user 48.203s sys 26.929s
real 70.867s user 48.636s sys 27.090s
real 70.744s user 48.297s sys 27.082s

ext3 results:
real 71.690s user 48.401s sys 27.498s
real 71.614s user 48.340s sys 27.869s
real 71.531s user 48.836s sys 27.520s
real 71.479s user 48.306s sys 27.469s
real 71.635s user 48.540s sys 27.496s

What have I missed?

Temporary files: RAM or disk?

Posted Jun 16, 2012 13:44 UTC (Sat) by nix (subscriber, #2304) [Link]

I thought the idea of per-user /tmp was that every user got his own /tmp, sure, but this was implemented via subdirectories of the *real*, tmpfs, cleared-on-boot /tmp, e.g. /tmp/user-$name/... This can all be done fairly easily with pam_namespace: there's even an example in the default /etc/security/namespace.conf.

(One application that becomes a lot faster with /tmp on tmpfs is GCC without -pipe, or, even with -pipe, at the LTO link step. It writes really quite a lot of large extremely temporary intermediate output to files in /tmp in each stage of the processing pipeline, then reads it back again in the next stage.)

Temporary files: RAM or disk?

Posted Jun 25, 2012 9:40 UTC (Mon) by Serge (guest, #84957) [Link]

> I thought the idea of per-user /tmp was that every user got his own /tmp, sure, but this was implemented via subdirectories of the *real*, tmpfs, cleared-on-boot /tmp.

You don't need tmpfs then. This will work with /tmp anywhere (disk, ram, separate partition, nfs, etc). I mean this is neither a reason to use tmpfs nor it's a reason to avoid it.

> One application that becomes a lot faster with /tmp on tmpfs is GCC without -pipe, or, even with -pipe, at the LTO link step.

Faster linking? Let's check that with something having a lot of binaries:
mount tmpfs or ext3 to /mnt/test, then
$ cd /mnt/test
$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.17.tar.xz
$ export CFLAGS='-O2 -g -flto' TMPDIR=/mnt/test
$ time sh -c "tar xf coreutils-8.17.tar.xz; cd coreutils-8.17; ./configure; make install DESTDIR=/mnt/test/root; cd ../root; tar czf ../coreutils-package.tar.gz *; cd ..; rm -rf coreutils-8.17 root"

tmpfs results:
real 882.876s user 760.111s sys 110.353s
real 884.456s user 761.408s sys 110.603s
real 885.245s user 762.770s sys 110.525s
real 884.914s user 762.417s sys 110.395s
real 885.352s user 762.865s sys 110.360s

ext3 results:
real 895.244s user 762.620s sys 115.027s
real 893.134s user 762.447s sys 114.841s
real 898.353s user 763.645s sys 116.369s
real 898.010s user 763.472s sys 116.074s
real 897.525s user 763.671s sys 116.219s

If my test is correct, it's still same 1-2%. It is faster, but not a lot.

Temporary files: RAM or disk?

Posted Jun 26, 2012 15:49 UTC (Tue) by nix (subscriber, #2304) [Link]

[lots of crude benchmarking ahead.]

It's not just linking that a tmpfs /tmp speeds up a bit, in theory: it's compilation, because without -pipe GCC writes its intermediate .S file to TMPDIR (and -pipe is not the default: obviously it speeds up compilation by allowing extra parallelism as well as reducing potential disk I/O, so I don't quite understand *why* it's still not the default, but there you are.)

btw, coreutils is by nobody's standards 'something having a lot of binaries'. It has relatively few very small binaries, few object files, and an enormous configure script that takes about 95% of the configure/make time (some of which, it is true, runs the compiler and writes to TMPDIR, but most of which is more shell-dependent than anything). LTO time will also have minimal impact in this build.

But, you're right, I'm pontificating in the absence of data -- or data less than eight years old, anyway, as the last time I measured this was in 2004. That's so out of date as to be useless. Time to measure again. But let's use some more hefty test cases than coreutils, less dominated by weird marginal workloads like configure runs.

Let's try a full build of something with more object files, and investigate elapsed time, cpu+sys time, and (for non-tmpfs) disk I/O time as measured from /proc/diskstats (thus, possibly thrown off by cross-fs merging: this is unavoidable, alas). A famous old test, the kernel (hacked to not use -pipe, with hot cache), shows minimal speedup, since the kernel does a multipass link process and writes the intermediates to non-$TMPDIR anyway:

tmpfs TMPDIR, with -pipe (baseline): 813.75user 51.28system 2:13.32elapsed
tmpfs TMPDIR: 812.23user 50.62system 2:12.96elapsed
ext4 TMPDIR: 809.74user 51.90system 2:29.15elapsed 577%CPU; TMPDIR reads: 11, 88 sectors; writes: 6394, 1616928 sectors; 19840ms doing TMPDIR I/O.

So, a definite effect, but not a huge one. I note that the effect of -pipe is near-nil these days, likely because the extra parallelism you get from combining the compiler and assembler is just supplanting the extra parallelism you would otherwise get by running multiple copies of the compiler in parallel via make -j. (On a memory-constrained or disk-constrained system, where the useless /tmp writes may contend with useful disk reads, and where reads may be required as well, we would probably see a larger effect, but this system has 24Gb RAM and a caching RAID controller atop disks capable of 250Mb/s in streaming write, so it is effectively unconstrained, being quite capable of holding the whole source tree and all build products in RAM simultaneously. So this is intentionally a worst case for my thesis. Smaller systems will see a larger effect. Most systems these days are not I/O- or RAM-constrained when building a kernel, anyway.)

How about a real 900kg monster of a test, GCC? This one has everything, massive binaries, massive numbers of object files, big configure scripts writing to TMPDIR run in parallel with ongoing builds, immense link steps, you name it: if there is an effect this will show it. (4.6.x since that's what I have here right now: full x86_64/x86 multilibbed biarch nonprofiled -flto=jobserver -j 9 bootstrap including non-multilib libjava, minus testsuite run: hot cache forced by cp -a'ing the source tree before building; LTO is done in stage3 but in no prior stages so as to make the comparison with the next test a tiny bit more meaningful: stage2/3 comparison is suppressed for the same reason):

tmpfs TMPDIR: 13443.91user 455.17system 36:02.86elapsed 642%CPU
ext4 TMPDIR: 13322.24user 514.38system 36:01.62elapsed 640%CPU; TMPDIR reads: 59, 472 sectors; writes: 98661, 20058344 sectors; 83690ms doing TMPDIR I/O

So, no significant effect elapsed-time-wise, well into the random noise: though the system time is noticeably higher for the non-tmpfs case, it is hugely dominated by the actual compilation. However, if you were doing anything else with the system you would have noticed: paging was intense, as you'd expect with around 10Gb of useless writes being flushed to disk. Any single physical disk would have been saturated, and a machine with much less memory would have been waiting on it.

That's probably the most meaningful pair of results here, a practical worst case for the CPU overhead of non-tmpfs use. Note that the LTO link stage alone writes around six gigabytes to TMPDIR, with peak usage at any one time around 4Gb, and most of this cannot be -pipe'd (thus this is actually an example of something that on many machines cannot be tmpfsed effectively).


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds