LWN.net Logo

Temporary files: RAM or disk?

By Jake Edge
May 31, 2012

Temporary files on Linux have traditionally been written to /tmp, at least those that don't need to persist across boots. Several Linux distributions are now planning to mount /tmp as a RAM-based tmpfs by default, which should generally be an improvement in a wide variety of scenarios—but not all. Debian has been planning to make that switch for the upcoming "wheezy" (7.0) release, but as a discussion on debian-devel shows, not all are happy with that decision.

Mounting /tmp on tmpfs puts all of the temporary files in RAM. That will reduce the amount of disk I/O that needs to be done, as the filesystem never actually touches the disk unless there is memory pressure. In that case, the tmpfs memory could get swapped out like other pages in the system, but in many cases a temporary file will be created without needing any disk I/O. The installer (or system administrator) can specify a maximum size for the filesystem as part of the mount options, but the memory is only actually used if files are written to it.

The latest incarnation of the dispute over putting /tmp into RAM (vs. having it as a disk directory in / or its own partition) was started by a posting from "Serge" that claimed the change was "useless". His complaint stemmed from various applications that put large files into /tmp (long videos, ISO images, unpacked archives, sort temporary files, etc.), which can cause problems for systems with little memory or with too small of a tmpfs. He argued that putting /tmp on tmpfs by default was the wrong choice for new users, and that savvy users could make the switch (or know enough to choose it at install time).

It is clear that there is a difference of opinion among participants in the thread about how /tmp should be used. Several people thought that /var/tmp should be used for large temporary files, with /tmp reserved for small files. Others pointed out that the file hierarchy standard (FHS) just says that /tmp is for files that do not need to persist across reboots, while /var/tmp is for those that do. But using /var/tmp for non-persistent large files has a drawback: those files do not get cleaned up on boot, unlike /tmp files—at least on Debian.

Many thread posters have been running their systems with a RAM-based /tmp for a long time with few or no issues, while others are clearly running into problems in that configuration. Large videos downloaded via the Flash plugin seem to be a particular problem, but there are others. It really comes down to a question of what /tmp is for.

Many in the long thread—something of a Debian tradition—believe that writing large files to /tmp is the wrong thing for an application to do. But no real alternative location that preserves the "wipe on reboot" semantics for /tmp has been offered. Workarounds like the TMPDIR environment variable can be used, but whatever directory that points to may just fill up with garbage over time.

Running out of /tmp space is a problem regardless of what kind of filesystem lies under it, but in the default Debian installation a disk-based /tmp will essentially have the whole disk available, as it only creates a single / partition. On the other hand, a RAM-based /tmp will almost certainly be much smaller, which can lead to applications filling up the filesystem much more easily. Even if those applications are "wrong" to do so, they evidently exist, so forcing users to confront that problem at times could be sub-optimal. More advanced users can make their own choice.

There were numerous invocations of Solaris in the thread as well, because it has used a RAM-based temporary filesystem for many years, seemingly without many problems. Russ Allbery said that he started in the "Solaris has been doing this for years, what's the problem?" camp, but after reading some of the objections in the thread had basically changed his mind. It comes down to a question of functionality vs. speed. For a default setting, working should be preferred over speed optimization.

As Joey Hess pointed out, until there are clear numbers about the performance improvement that comes from a RAM-based /tmp, making it the default is a premature optimization. There were examples offered where tmpfs made an enormous performance difference, but the question is not whether the feature is useful; it is, instead, whether it should be the default.

This is not the first time the issue has come up. Roger Leigh posted pointers to other threads and bug reports (some of which have their own long threads, of course). He is the developer that added the tmpfs-based /tmp for wheezy, but he mostly stayed clear of the discussion this time. It does not look like he is inclined to remove the default, though, so there have been suggestions that the issue be referred to the technical committee.

It is interesting to note that Fedora plans to move to a RAM-based /tmp for Fedora 18, and has already enabled it for Rawhide. The Fedora feature page notes that Solaris has been doing so since 1994 and that Arch currently defaults to a tmpfs-based /tmp. It also mentions that Debian and Ubuntu both plan to move in that direction.

At least for the near future—and probably beyond—RAM sizes will generally be far less than those of disks, particularly for machines targeted at non-technical users. While those users might benefit from the performance improvements that come from keeping /tmp in RAM, there is a risk that their applications will chew up their RAM with huge temporary files, leading to swap storms or broken applications.

The Solaris example may not be all that compelling as it is not really a consumer-oriented desktop system. The other Linux distributions may offer a better test case and Debian may well get the benefit of seeing what happens with them before wheezy is completely frozen. On the other hand, though, it is not really clear that Debian targets (or attracts) many novice users, so why make the vast majority of Debian users suffer the penalty of disk-based /tmp when they don't really need to? They can certainly switch to that, but the default might serve the majority quite well. At this point, one suspects that RAM-based /tmp will soon be the norm and that applications will get "fixed", but only time will tell.


(Log in to post comments)

Temporary files: RAM or disk?

Posted Jun 1, 2012 2:41 UTC (Fri) by neilbrown (subscriber, #359) [Link]

It seems to me that there is an artificial distinction between 'fast' and 'big'. This comes from the fact that 'fast' must be backed by swap space, and swap space must be pre-allocated and so cannot be too 'big'.

Many years ago I worked with Apollo workstations running "Domain/OS" - which was Unix-like. They didn't have a swap partition, or a swap file. They just used spare space in the filesystem for swap.

Could that work for Linux? You could probably create a user-space solution that monitored swap usage and created new swap files on demand. But I suspect it wouldn't work very well.
Or you could teach Linux filesystems to support swap files that grow on demand - or instantiate space on demand.

Once the swap-over-NFS patches get merged this should be quite possible. The filesystem is told that a given file is being used for swap, then it can preload enough data so that it can allocate space immediately without needing any further memory allocation. You could then create a 100G sparse file and add that as a swap destination and it would "just work". Writing to a tmpfs filesystem would be fast for small files, but big files would spill out into the same space as is used by the filesystem.

(Yes, I realise this is a long-term solution while what is needed is a short-term solution.)

Temporary files: RAM or disk?

Posted Jun 1, 2012 3:18 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Ok. The only question then would be: "Why not leave /tmp as it is?"

Right now if I need to create a big file (and I do need it quite often) there is no alternative for /tmp.

Temporary files: RAM or disk?

Posted Jun 1, 2012 3:36 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> "Why not leave /tmp as it is?"

Isn't that answered in the article?

Because "as it is", /tmp imposes unnecessary disk IO which can be noticed when creating lots of small short-lived files. Let's see if we can make it faster, without making it any smaller.

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:17 UTC (Fri) by wahern (subscriber, #37304) [Link]

I don't understand why this would be so. Small, short-level files should only ever exist in the buffer cache. This smells like a filesystem problem, not a backing store problem.

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:31 UTC (Fri) by neilbrown (subscriber, #359) [Link]

I believe that with ext2, small short-lived files only ever do exist in memory, just as you suggest. Few people store '/' on ext2 these days.

With journalling things become a bit more complex. You need to ensure that the various metadata are journalled in the right order and by far the easiest way to do that it to place every updated block in the "next" transaction. So with ext3 journalling (if I understand it correctly), every metadata block that gets changed will be written to the journal on the next journal commit, and then to the filesystem.

A filesystem which does delayed allocation would be better placed to optimise out short lived files completely and maybe ext4/xfs/btrfs do better at this. However I suspect is it far from trivial to optimise out *all* storage updates for short-lived files and I doubt it is something that fs developers optimise for.

So I think that you probably could see it as a filesystem problem, but I'm not sure that seeing it that way would lead to the best solution (but if some fs developers see this as a challenge and prove me wrong, I won't complain).

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:03 UTC (Fri) by wookey (subscriber, #5501) [Link]

Before rehashing the _whole_ discussion here, I suggest people go read the thread, which is fairly thorough:
It starts here https://lwn.net/Articles/499534/

One thing Serge keeps coming back to is 'Please show us real-world improvements from /tmp-in-tmpfs, significant enough to make it a better _default_, given the well-documented problems'. This seems to be key, and I leave it to posters to make up their own minds about that. I certainly learned a lot from the thread. And there is clearly a longer-term issue to fix this properly.

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:35 UTC (Fri) by wujj123456 (subscriber, #84680) [Link]

Application is indeed the key.

I always mount /tmp as tmpfs, but I have large RAM and know exactly what I am doing. I used to analysis ~10G of data, and reading from RAM was at least 300% faster, even including the heavy data processing. I also rendered movies using tmpfs when size fits, and again observed dramatical difference.

The problem is: if a user cares about that performance difference, he probably knows how to use tmpfs himself. Setting /tmp to tmpfs will confuse normal users when an application fails. Given the popularity of those big distros, it might not be a good move. Even Firefox doesn't store tmp file in /tmp unless you override it in about:config. It might be worthwhile to check how existing applications are using tmpfs (/dev/shm). I have a feeling that most applications don't care at all.

Temporary files: RAM or disk?

Posted Jun 1, 2012 7:43 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> The problem is: if a user cares about that performance difference, he probably knows how to use tmpfs himself.

Are you serious? The only people who care about performance are people who dig into the arcane configuration details of OSes ?? I don't think so.

Wasn't there a recent quote of the week along the lines of "We should make things simple and safe so that people don't *need* to carefully form good habits."?? I think that applies here to, only is so that people don't *need" to dig into arcane details.

I agree that we shouldn't make /tmp == tmpfs the default while it causes problems. But I do think that we should work to fix the problems so that we can do it safely.

Temporary files: RAM or disk?

Posted Jun 2, 2012 7:01 UTC (Sat) by Los__D (guest, #15263) [Link]

"if a user cares about that performance difference, he probably knows how to use tmpfs himself."
Errrr... Yeah, right.

Temporary files: RAM or disk?

Posted Jun 2, 2012 23:41 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I used to analyze ~10G of data, and reading from RAM was at least 300% faster, ...

You imply that with /tmp in a disk-based filesystem, you didn't read from RAM. Why would that be? Why weren't your files in cache?

Temporary files: RAM or disk?

Posted Jun 3, 2012 15:09 UTC (Sun) by bronson (subscriber, #4806) [Link]

I bet the files were cached and reads took the same amount of time. The slowdown would be due to writes. tmpfs is allowed to lose the disk contents on reboot, filesystems aren't.

I can write 1G of data to tmpfs, read it badk, and delete it (a typical scientific profile), without ever expecting it to hit rust. I'd be very VERY disappointed in any filesystem that allowed its write buffers to get that far behind.

Temporary files: RAM or disk?

Posted Jun 3, 2012 17:44 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

I'd be very VERY disappointed in any filesystem that allowed its write buffers to get that far behind.

Getting this far behind is a valuable feature and any filesystem that doesn't let you do it is lacking. Someone pointed out earlier that the more modern ext3 is incapable of getting that far behind, whereas the less modern ext2 is not. That's a regression (but effectively explains why a tmpfs /tmp could be faster than an ext3 one).

I've seen filesystems that have mount options and file attributes that specifically indicate that files are temporary -- likely to be overwritten or deleted soon -- so that the page replacement algorithm doesn't waste valuable I/O time cleaning the file's pages.

Furthermore, many people believe that whenever you want data to be hardened to disk, you should fsync. Given that philosophy, the default kernel policy should be not to write the data to disk until you need the memory (with some allowance for forecasting future need for memory).

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:46 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

The default policy should be that when I save a file it's saved. If they had created this idea that only fsync puts the file on the disk, say, forty years ago, code would be littered with fsyncs (and no doubt filesystem writers would be cheating on that invariant and complaining that people overused fsync.)

Right now, after I've spent 15 minutes working on something and saving my work along the way, if I lose my data because something didn't run fsync in that 15 minutes, I'm going to be royally pissed. It takes a lot of speed increase on a benchmark to make up for 15 minutes of lost work. The time that users lose when stuff goes wrong doesn't show up on benchmarks, though.

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:57 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

the idea that your data isn't safe if the system crashes and you haven't done an fsync on that file (not just any other file in the system) HAS been around for 40 years.

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases (I remember that at one point reiserfs allowed for 30 seconds, and so was posting _amazing_ benchmark numbers, for benchmarks that took <30 seconds to run), but it's possible for it to take longer, or for the data to get to disk on the wrong order, or partially get to disk (again in some random order)

because of this, applications that really care about their data in crash scenarios (databases, mail servers, log servers, etc), do have fsync calls "littered" through their code. It's only recent "desktop" software that is missing this. In part because ext3 does have such pathological behaviour on fsync

Temporary files: RAM or disk?

Posted Jun 4, 2012 21:25 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

current filesystems attempt to schedule data to be written to disk within about 5 seconds or so in most cases

Are you sure? The last time I looked at this was ten years ago, but at that time there were two main periods: every 5 seconds kswapd checked for dirty pages old enough to be worth writing out and "old enough" was typically 30 seconds. That was easy to confirm on a personal computer, because 30 seconds after you stopped working, you'd see the disk light flash.

But I know economies change, so I could believe dirty pages don't last more than 5 seconds in modern Linux and frequently updated files just generate 6 times as much I/O.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:11 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

this is a filesystem specific time setting for the filesystem journal. I know it's ~5 seconds on ext3. it could be different on other filesystems.

also, this is for getting the journal data to disk, if the journal is just metadata it may not push the file contents to disk (although it may, to prevent the file from containing blocks that haven't been written to yet and so contain random, old data)

Temporary files: RAM or disk?

Posted Jun 4, 2012 8:00 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> The default policy should be that when I save a file it's saved.

You are, of course, correct.
However this is a policy that is encoded in your editor, not in the filesystem. And I suspect most editors do exactly that. i.e. they call 'fsync' before 'close'.

But not every "open, write, close" sequence is an instance of "save a file". It may well be "create a temporary file which is completely uninteresting if I get interrupted". In that case an fsync would be pointless and costly. So the filesystem doesn't force an fsync on every close as the filesystem doesn't know what the 'close' means.

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:11 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

Another choice for a set of semantics would be to make programs that don't want to use a filesystem as a permanent storage area for files specify that. That is, fail safe, not fail destructive. As it is, no C program can portably save a file; fsync is not part of the C89/C99/C11 standards. Many other languages can not save a file at all without using an interface to C.

I've never seen this in textbooks and surely that should be front and center with the discussion of file I/O, that if you're actually saving user data, that you need to use fsync. It's not something you'll see very often in actual code. But should you actually be in a situation where this blows up in your face, it will be all your fault.

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:51 UTC (Mon) by dgm (subscriber, #49227) [Link]

It's not in the C standard because it has nothing to do with C itself, but with the underlaying OS. You will find fsync() in POSIX, and it's portable as long as the target OS supports POSIX semantics (event Windows used to).

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:24 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

What do you mean nothing to do with C itself? Linux is interpreting C semantics to mean that a standard C program cannot reliably produce permanent files. That's certainly legal, but it means that most people who learn to write C will learn to write code that doesn't reliably produce permanent files. Linux could interpret the C commands as asking for the creation of permanent files and force people who want temporary file to use special non-portable commands.

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:33 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

Mount your filesystems with O_SYNC and see how long you can endure that. Making everything synchronous by default is a completely useless behaviour. *NO* general purpose OS in the last years does that.
Normally you need only very few points where you fsync (or equivalent) and quite some more places where you write data...

Temporary files: RAM or disk?

Posted Jun 4, 2012 11:20 UTC (Mon) by neilbrown (subscriber, #359) [Link]

To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.

O_SYNC means every write request is safe before the write system call returns.

An alternate semantic is that a file is safe once the last "close" on it returns. I believe this has been implemented for VFAT filesystems which people sometimes like to pull out of their computers without due care.
It is quite an acceptable trade-off in that context.

This is nearly equivalent to always calling fsync() just before close().

Adding a generic mount option to impose this semantic on any fs might be acceptable. It might at least silence some complaints.

Temporary files: RAM or disk?

Posted Jun 4, 2012 12:19 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

> To be fair, O_SYNC is much stronger than what some people might reasonably want to expect.
> O_SYNC means every write request is safe before the write system call returns.
Hm. Not sure if that really is what people expect. But I can certainly see why it would be useful for some applications. Should probably be a fd option or such though? I would be really unhappy if a rm -rf or copy -r would behave that way.

Sometimes I wish userspace controllable metadata transactions where possible with a sensible effort/interface...

Temporary files: RAM or disk?

Posted Jun 4, 2012 16:44 UTC (Mon) by dgm (subscriber, #49227) [Link]

Linux does not interpret C semantics. Linux implements POSIX semantics, and C programs use POSIX calls to access those semantics. So this has nothing to do with C, but POSIX.

POSIX offers a tool to make sure your data is safely stored: the fsync() call. POSIX and the standard C library are careful not to make any promises regarding the reliability of writes, because this would mean a burden for all systems implementing those semantics, some of which do not even have a concept of fail-proof disk writes.

Now Linux could chose to deviate from the standard, but that would be exactly the reverse of portability, wouldn't it?

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:37 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Any application that is handling costly-to-replace data should use fsync. An app that is handling cheap data should not. It is really that simple.

Well, it's a little more complex because applications are more complex than just C programs. Sometimes the application is a person sitting at a workstation typing shell commands. The cost of replacing the data is proportional to the amount of data lost. For that application, the rule isn't that the application must use fsync, but that it must use a sync shell command when the cost of replacement has exceeded some threshold. But even that is oversimplified, because it makes sense for the system to do a system-wide sync automatically every 30 seconds or so to save the user that trouble.

On the other hand, we were talking before about temporary files on servers, some of which do adhere to the fsync dogma such that an automatic system-wide sync may be exactly the wrong thing to do.

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:06 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

a system-wide sync can take quite a bit of time, and during that time it may block a lot of other activity (or make it so expensive that the system may as well be blocked)

Temporary files: RAM or disk?

Posted Jun 4, 2012 9:39 UTC (Mon) by dgm (subscriber, #49227) [Link]

Ext3 does worse than ext2 because it tries to keep metadata consistency, but that is useless for a tmp filesystem, where all files are going to be wiped out on reboot or crash.

It's not a regression, but a conscientious design decision, and that use case is outside of what Ext3 is good for.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 4, 2012 15:43 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

It's not a regression, but a conscientious design decision

It's a regression due to a conscious design decision. Regression doesn't mean mistake, it means the current thing does something worse than its predecessor. Software developers have a bias against regressions, but they do them deliberately, and for the greater good, all the time.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 4, 2012 21:24 UTC (Mon) by dgm (subscriber, #49227) [Link]

Regression does mean mistake, and this is clearly not the case.

A more enlightening example: the latest version of the kernel requires more memory than 0.99 but nobody could possibly claim this is a regression. If anything, it's a trade-off.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 1:42 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

the latest version of the kernel requires more memory than 0.99 but nobody could possibly claim this is a regression

I claim that's a regression. Another area where kernel releases have steadily regressed: they run more slowly. And there are machines current kernels won't run on at all that previous ones could. Another regression.

I'm just going by plain meaning of the word (informed somewhat by it's etymology, the Latin for "step backward."). And the fact that it's really useful to be able to talk about the steps backward without regard to whether they're worth it.

Everyone recognizes that sometimes you have to regress in some areas in order to progress in others. And sometimes it's a matter of opinion whether the tradeoff is right. For example, regression testing often uncovers the fact that the new release runs so much slower than the previous one that some people consider it a mistake and it gets "fixed."

I like to use Opera, but almost every upgrade I've ever done has contained functional regressions, usually intentional. As they are often regressions that matter to me, I tend not to upgrade Opera (and it makes no difference to me whether it's a bug or not).

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 8:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

Whatever, keep using 0.99 then, or better go back to first version that just printed AAAABBBB on the screen. Everything from there is a regression.

ext3 regression: unnecessarily syncs temporary files

Posted Jun 5, 2012 14:25 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

Whatever, keep using 0.99 then, or better go back to first version that just printed AAAABBBB on the screen. Everything from there is a regression.

Everything since this is a regression in certain areas, but you seem to be missing the essential point that I stated several ways: These regressions come along with progressions. The value of the progressions outweigh the cost of the regressions. I hate in some way every "upgrade" I make, but I make them anyway.

Everyone has to balance the regressions and the progressions in deciding whether to upgrade, and distributors tend to make sure the balance is almost always in favor of the progressions. We can speak of a "net regression," which most people would not find current Linux to be with respect to 0.99.

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:51 UTC (Mon) by bronson (subscriber, #4806) [Link]

No. There are so many buggy, non-fsyncing programs out there that, if a filesystem has 1G of writes outstanding, it's almost certainly going to lose many hours of work. (Unless it's manually flushing every 20 seconds or so, in which case that's fine but also slower than tmpfs).

In an ideal world, you're exactly right. In today's world, that would be fairly dangerous.

> I've seen filesystems that have mount options and file attributes that specifically indicate that files are temporary

Agreed, but if you're remounting part of your hierarchy with crazy mount options, why not just use tmpfs?

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:08 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

because tempfs just uses ram? and while you can add swap to give you more space, the use of the swap will not be targeted. This means that you may end up with things swapped out that you really would rather have remained active, even if the result was that it took a little more time to retrieve a temporary file.

Temporary files: RAM or disk?

Posted Jun 5, 2012 7:05 UTC (Tue) by bronson (subscriber, #4806) [Link]

That's true, that's an important difference. But you could have a smilar situation with the filesystem-with-options, right? If the filesystem uses a lot of memory, the important things could get swapped out as well.

Temporary files: RAM or disk?

Posted Jun 5, 2012 7:19 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

True, but the difference is that it would need to be a very poorly written filesystem to eat up more memory than the contents that it's holding. And it's much easier to tell where the memory is being used, and therefor make an intelligent decision about what to write to disk (and what to throw away), than when it all has to be stored in memory and your only disk backing you have is swap.

Also, reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.

Temporary files: RAM or disk?

Posted Jun 5, 2012 15:33 UTC (Tue) by giraffedata (subscriber, #1954) [Link]

reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.

I believe the tendency is the other way around. One of the selling points for tmpfs for me is that reading and writing swap is more efficient than reading and writing a general purpose filesystem. First, there aren't inodes and directories to pull the head around. Second, writes stream out sequentially on disk, eliminating more seeking.

Finally, I believe it's usually the case that, for large chunks of data, the data is referenced in the same groups in which it becomes least recently used. A process loses its timeslice and its entire working set ages out at about the same time and ends up in the same place on disk. When it gets the CPU again, it faults in its entire working set at once. For a large temporary file, I believe it is even more pronounced - unlike many files, a temporary file is likely to be accessed in passes from beginning to end. I believe general purpose filesystems are only now gaining the ability to do the same placement as swapping in this case; to the extent that they succeed, though, they can at best reach parity.

In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.

Temporary files: RAM or disk?

Posted Jun 6, 2012 6:53 UTC (Wed) by Serge (guest, #84957) [Link]

> I believe the tendency is the other way around. One of the selling points for tmpfs for me is that reading and writing swap is more efficient than reading and writing a general purpose filesystem. First, there aren't inodes and directories to pull the head around.

It's not that simple. Tmpfs is not "plain data" filesystem, you can create directories there, so it has to store all the metadata as well. It also has inodes internally.

> Second, writes stream out sequentially on disk, eliminating more seeking.

This could be true if swap was empty. Same when you write to the empty filesystem. But what if it was not empty? You get the same swap fragmentation and seeking as you would get in any regular filesystem.

> In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.

And filesystem is intentionally optimized for storing files. Swap is not a plain data storage, otherwise "suspend to disk" could not work. Swap has its internal format, there're even different versions of its format (`man mkswap` reveals v0 and v1). I.e. instead of writing through one ext3fs level you write through two fs levels tmpfs+swap.

Things get worse when you start reading. When you read something from ext3, the oldest part of the filecache is dropped and data is placed to RAM. But reading from swap means that your RAM is full, and in order to read a page from swap you must first write another page there. I.e. sequential read from ext3 turns into random write+read from swap.

Temporary files: RAM or disk?

Posted Jun 6, 2012 15:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> But reading from swap means that your RAM is full, and in order to read a page from swap you must first write another page there. I.e. sequential read from ext3 turns into random write+read from swap.

_Writing_ to swap means that your RAM is full (possibly including things like clean cache which are currently higher priority, but could be dropped at need). _Reading_ from swap implies only that something previously written to swap is needed in RAM again. There could be any amount of free space at that point. Even if RAM does happen to be full, the kernel can still drop clean data from the cache to make room, just as with reading from ext3.

Temporary files: RAM or disk?

Posted Jun 6, 2012 17:43 UTC (Wed) by dgm (subscriber, #49227) [Link]

Yes, merely reading from swap doesn't imply that your RAM is full. What is true is that _when_ your RAM is full (notice that I don't say "if") it _may_ imply a write to swap, depending in how dirty the page cache is. The problem is, tmpfs is a factor that contributes a lot to pollute the page cache. Temporary files are created to be written and then re-read in short, so all pages used by tmpfs are expected to be dirty.

All of this is of no consequence on system startup, when the page cache is mostly clean. Once the system has been up for a while, though... I think a few tests have to be done.

Temporary files: RAM or disk?

Posted Jun 7, 2012 2:28 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

... First, there aren't inodes and directories to pull the head around.
It's not that simple. Tmpfs is not "plain data" filesystem, you can create directories there, so it has to store all the metadata as well. It also has inodes internally.

I was talking about disk structures. Inodes and directory information don't go into the swap space, so they don't pull the head around.

(But there's an argument in favor of regular filesystem /tmp: if you have lots of infrequently accessed small files, tmpfs will waste memory).

Second, writes stream out sequentially on disk, eliminating more seeking.
This could be true if swap was empty. Same when you write to the empty filesystem. But what if it was not empty? You get the same swap fragmentation and seeking as you would get in any regular filesystem.

It's the temporary nature of the data being swapped (and the strategies the kernel implements based on that expectation) that makes the data you want at any particular time less scattered in swap space than in a typical filesystem that has to keep copious eternally growing files forever. I don't know exactly what policies the swapper follows (though I have a pretty good idea), but if it were no better at storing anonymous process data than ext3 is at storing file data, we would really have to wonder at the competence of the people who designed it. And my claim is that since it's so good with process anonymous data, it should also be good with temporary files, since they're used almost the same way.

in order to read a page from swap you must first write another page there.

Actually, the system does the same thing for anonymous pages as it does for file cache pages: it tries to clean the pages before they're needed so that when a process needs to steal a page frame it usually doesn't have to wait for a page write. Also like file cache, when the system swaps a page in, it tends to leave the copy on disk too, so if it doesn't get dirty again, you can steal its page frame without having to do a page out.

Temporary files: RAM or disk?

Posted Jun 7, 2012 13:15 UTC (Thu) by njs (guest, #40338) [Link]

I don't know about tmpfs, but my experience is: if I have a process with a large (multi-gigabyte) working set, and it goes to sleep and gets swapped out, then there's no point in waking it back up again; I might as well kill it and start over. At least on our compute servers (running some vaguely recent Ubuntu, IIRC), swap-in is definitely not doing successful readahead. I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory; it would be hundreds of times faster.

Temporary files: RAM or disk?

Posted Jun 7, 2012 13:28 UTC (Thu) by Jonno (subscriber, #49613) [Link]

If you have enough free memory at the time you want to swap in that process, try to run "sudo swapoff -a ; sudo swapon -a", it will sequentially read in all swap to memory, no random access.

I find that if I have two processes with large working sets causing swaping, and kill one them, doing a swapoff will get the other one performant again much faster than letting it swap in only the stuff it needs as it needs it.

Temporary files: RAM or disk?

Posted Jun 7, 2012 15:44 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

At least on our compute servers (running some vaguely recent Ubuntu, IIRC), swap-in is definitely not doing successful readahead

Good information.

That's probably a good reason to use a regular filesystem instead of tmpfs for large temporary files.

I just checked, and the only readahead tmpfs does is the normal swap readahead, which consists of reading an entire cluster of pages when one of the pages is demanded. A cluster of pages is pages that were swapped out at the same time, so they are likely to be re-referenced at the same time and are written at the same spot on the disk. But this strategy won't effect streaming, like typical filesystem readahead.

And the kernel default size of the cluster is 8 pages. You can control it with /proc/sys/vm/page-cluster, though. I would think on a system with multi-gigabyte processes, a much larger value would be optimal.

Temporary files: RAM or disk?

Posted Jun 11, 2012 14:51 UTC (Mon) by kleptog (subscriber, #1183) [Link]

This is actually related to another problem I ran into recently: is there some way see what is actually in swap? I know /proc/<pid>/smaps gives you information about which blocks are in swap. But I can't see a way to get information about the order. That is, is my swap fragmented?

Temporary files: RAM or disk?

Posted Jun 7, 2012 21:36 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory

Windows 8 will do that for modern applications. http://blogs.msdn.com/b/b8/archive/2012/04/17/reclaiming-...

Temporary files: RAM or disk?

Posted Jun 8, 2012 0:15 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory
Windows 8 will do that for modern applications.

When njh says "hack" I think it means something an intelligent user can invoke to override the normal system paging strategy because he knows a process is going to be faulting back much of its memory anyway.

The Windows 8 thing is automatic, based on an apparently pre-existing long-term scheduling facility. Some applications get long-term scheduled out, aka "put in the background," aka "suspended," mainly so devices they are using can be powered down and save battery energy. But there is a new feature that also swaps all the process' memory out when it gets put in the background, and the OS takes care to put all the pages in one place. Then, when the process gets brought back to the foreground, the OS brings all those pages back at once, so the process is quickly running again.

This of course requires applications that explicitly go to sleep, as opposed to just quietly not touching most of their memory for a while, and then suddenly touching it all again.

Temporary files: RAM or disk?

Posted Jun 8, 2012 0:59 UTC (Fri) by CycoJ (guest, #70454) [Link]

I encourage anyone who wants to see the benefit of having a tmpfs in RAM to try relocating the firefox profile to a tmpfs (see https://wiki.archlinux.org/index.php/Firefox_Ramdisk). I've recently done this on my new system which normally has plenty of RAM to spare. The difference is quite impressive, even though I have a latest generation SSD. Mind you I've been bitten by this once. I kept too many tabs open while doing some simulation work on the side. When I tried to open one more tab, the whole system went into a complete freeze because it ran out of RAM (and I don't have a swap partition), obviously this happened when I was just booking a flight online, with only one last ticket available at this price.

Temporary files: RAM or disk?

Posted Jun 8, 2012 17:14 UTC (Fri) by apoelstra (subscriber, #75205) [Link]

For about a year now, I have had my "/.mozilla mounted as a tmpfs. I don't have a SSD, but I have 2Gb of RAM, and Firefox has never run out of memory for me.

It's screaming fast. I originally started doing this when I had my $HOME mounted over SSHFS, and Firefox would single-handedly saturate my pipe, and took forever to do anything. Its disk IO is (was) obscene.

This also has the benefit (if you want to see it that way) that my history does not get so filled with garbage, since every reboot the profile is reset. I have a line in my .Xclients which copies a template .mozilla into place, so that I start off with Noscript, Adblock, Tor, etc, all enabled, and my history is seeded with LWN and other sites I frequent.

Temporary files: RAM or disk?

Posted Jun 9, 2012 15:51 UTC (Sat) by Serge (guest, #84957) [Link]

> I encourage anyone who wants to see the benefit of having a tmpfs in RAM to try relocating the firefox profile to a tmpfs (see https://wiki.archlinux.org/index.php/Firefox_Ramdisk). The difference is quite impressive, even though I have a latest generation SSD.

It might be a good idea to save some SSD writes, but does it really increases performance? My ~/.mozilla profile is about 2GB, so it was not a good idea to put it in RAM, but I tried that with a new empty profile and noticed no difference. What should I look at?

PS: it's not related to the /tmp dir, I assume, but it's still interesting to see some tmpfs benefits for a popular application.

Temporary files: RAM or disk?

Posted Jun 2, 2012 23:05 UTC (Sat) by mirabilos (subscriber, #84359) [Link]

This is probably the best-written argumentation that can even defeat this "Serge" person’s queries for real-world examples (since of course those would be highly subjective…)

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:10 UTC (Mon) by Serge (guest, #84957) [Link]

> So with ext3 journalling [...] changed will be written to the journal on the next journal commit

Probably. But it won't trigger disk access. You can check that:
for i in `seq 5`; do echo 123 > f; rm -f f; grep sda1 /proc/diskstats; done
(replace "sda1" with the disk you write to)

If file creation/deletion (metadata change) triggers disk access you'll see all the lines different. But if lines are same, then there was no disk access.

Cache still works for journaled filesystems. Linux kernel is written by smart people, yeah.

PS: I've seen reiserfs to trigger "read" in such test. You can see description of diskstats numbers in:
http://www.kernel.org/doc/Documentation/iostats.txt

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:27 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> If file creation/deletion (metadata change) triggers disk access you'll see all the lines different.

This doesn't agree with my understanding of ext3 journalling, so maybe I expressed it poorly.

If you put a 5 second sleep in that loop, I expect you would see changes. I do - once I found a suitably quiet ext3 filesystem to test on.

The metadata blocks do go into the next transaction, but transactions can live in memory for up to 5 seconds before they are flushed.

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:17 UTC (Mon) by Serge (guest, #84957) [Link]

> If you put a 5 second sleep in that loop, I expect you would see changes.

The exact number of seconds depends on /proc/sys/vm/dirty_*_centisecs value and /proc/sys/vm/laptop_mode...

Anyway, are you talking about file content or file name being written to disk in 5 seconds? Or both?

We can check whether content of deleted file is written to disk, run:
for i in `seq 100`; do dd if=/dev/zero of=f bs=1M count=10; rm -f f; done
then check /proc/diskstats or `iostat -k`. If you see writes increased in 1GB, your filesystem writes data even for deleted files. My ext3 does not.

> I do - once I found a suitably quiet ext3 filesystem to test on.

Try /boot. :) Or just insert some USB flash stick and create ext3 there.

Temporary files: RAM or disk?

Posted Jun 4, 2012 11:28 UTC (Mon) by neilbrown (subscriber, #359) [Link]

No. The "5 seconds" that I was taking about is not a /proc/sys/vm/dirty* number. It is ext3 (and presumably ext4) specific.
It defaults to 5 seconds (JBD_DEFAULT_MAX_COMMIT_AGE) and can be changed by the "commit=nn" mount option.

That many seconds after a journal transaction has been opened, it is closed and flushed - if it hadn't been closed already.

It is the metadata that is written to the journal - inodes, free-block bitmaps, directory names etc.
The file contents are handled differently for different settings of "data=".
ordered: data that relates to the metadata in flushed before the metadata is written to the journal
writeback: data is written according to /proc/sys/vm/dirty* rules
journal: data is written to the journal with the metadata.

I'm not sure what the default is today. If you create then delete a file, the data will not go to disk, except possibly for "data=journal". But the metadata will.

Temporary files: RAM or disk?

Posted Jun 4, 2012 15:17 UTC (Mon) by Serge (guest, #84957) [Link]

> If you create then delete a file, the data will not go to disk, except possibly for "data=journal". But the metadata will.

That's harder to test. Maybe compare amount of writes generated by something like:
for i in `seq 10`; do touch $i; rm -f $i; done
with amount of writes generated by:
for i in `seq 1000`; do touch $i; rm -f $i; done
Every creation/deletion is written to disk if the latter line generates about 100 times more writes. On my ext3 I see sub-equal number of writes...

But, anyway, looks like it's not a problem for /tmp then, meaning that ext2 would not be (noticeably) better than ext3 in /tmp use cases.

Temporary files: RAM or disk?

Posted Jun 4, 2012 14:13 UTC (Mon) by hummassa (subscriber, #307) [Link]

In my work machine, on ext4, all lines are different.

Temporary files: RAM or disk?

Posted Jun 1, 2012 13:21 UTC (Fri) by Richard_J_Neill (subscriber, #23093) [Link]

It seems to me that the solution would be an extra flag for a mountpoint that says "files put in this directory should only be flushed to disk with low priority". i.e. have /tmp really existing on disk, but ionice the process for writing the in-memory pages to disk.

BTW, Mandriva/Mageia has done /tmp on tmpfs for ages (I think ~ 5 years), and it does work fine.

Temporary files: RAM or disk?

Posted Jun 5, 2012 12:23 UTC (Tue) by roblucid (subscriber, #48964) [Link]

Except application writers can open temporary files read/write and unlink the newly created file so only the file descriptor provides access to it.

That prevents files getting left around, so rather than a new flag, filesystems could stop sync-ing the disk copy, in this situation, reasoning the file is ephemeral.

On TMPFS based /tmp systems like Solaris (I used it with SunOS 4) then humongeous temporary files would need special arrangements and placing, disks just tended not to have much free space. Disks were not even 1GB and overloading memory + swap space with temp files, tended to be more reliable in operation in practice, because processes could still process even when some luser had filled the disk.

Temporary files: RAM or disk?

Posted Jun 8, 2012 11:46 UTC (Fri) by Wol (guest, #4433) [Link]

Coming at it from a gentoo / SuSE user's viewpoint ...

Gentoo shoves all its compiles into /tmp. And when compiling LO, you need a lot of temp space. So rather than having space dedicated to tmp for compiling, I have something like 10 or 20Gb of swap (plus 8Gb RAM), and simply have a huge tmpfs /tmp.

SuSE on the other hand ... Why oh WHY can't they give you sane defaults! Swap space defaults to twice ram (good) but without doing a "wipe and redo manually", you can't *increase* swap space! I always set swap space to at least twice the mobo's max ram.

The other thing I didn't realise, is that tmpfs defaults to half available ram. So with 8Gb, the first few times I tried to compile OOo, I couldn't work out why it kept crashing !-)

So yeah, I'm all in favour of /tmp in tmpfs. But make sure you have *sane* defaults, and those defaults are *easy* to over-ride. SuSE, I'm glaring at you !!!

Cheers,
Wol

Temporary files: RAM or disk?

Posted Jun 8, 2012 15:20 UTC (Fri) by anselm (subscriber, #2796) [Link]

Swap space defaults to twice ram (good) but without doing a "wipe and redo manually", you can't *increase* swap space!

You can always increase swap space after the fact by means of swap files (rather than swap partitions).

Temporary files: RAM or disk?

Posted Jun 8, 2012 19:35 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

or by creating additional swap partitions and adding them

Temporary files: RAM or disk?

Posted Jun 8, 2012 20:23 UTC (Fri) by jackb (subscriber, #41909) [Link]

Gentoo shoves all its compiles into /tmp.
As long as I've been using it compiling has always been done in /var/tmp, not /tmp.

Mounting /var/tmp/portage on tmpfs is not the default behavior but has become extremely common. For large packages like Chromium or LibreOffice there are ways to override the default PORTAGE_TMPDIR to point to a non-tmpfs directory.

Temporary files: RAM or disk?

Posted Jun 9, 2012 18:06 UTC (Sat) by Serge (guest, #84957) [Link]

> And when compiling LO, you need a lot of temp space. So rather than having space dedicated to tmp for compiling, I have something like 10 or 20Gb of swap (plus 8Gb RAM), and simply have a huge tmpfs /tmp.

Why? Does it makes things faster for you? It would be interesting to see some benchmarks. I've seen tests showing there's no difference, and seen one with extfs being faster than tmpfs+swap for compiling.

> and simply have a huge tmpfs /tmp.

Imho, it's much simpler to have it on disk. :)

> So yeah, I'm all in favour of /tmp in tmpfs.

/tmp is not the only place where you can mount tmpfs. If you want your /var/tmp/portage in tmpfs, you don't have to break other apps and put /tmp there.

Temporary files: RAM or disk?

Posted Jun 12, 2012 14:03 UTC (Tue) by TRauMa (guest, #16483) [Link]

Compiles on tmpfs are faster, factor is 1.8 to 2 in my tests, provided the working set nearly fits into RAM. With lots of swapping going on, you may end up taking longer to compile. Contrary to what is stated above, tmpfs is not smart about swapping, the data in swap is accessed very randomly and I'd be very surprised if inode data wouldn't also end up in swap on high memory pressure. I found all of this out a long time ago on gentoo trying to compile openoffice with 1G of RAM and a dynamic swapfile manager. Now, with 16G, it is actually feasible.

Another thing: I thought the plan was to migrate to per-user-tmp anyway, somewhere in $HOME, for apps that use a lot of tmp like DVD rippers this would be a good idea anyway.

Temporary files: RAM or disk?

Posted Jun 16, 2012 4:30 UTC (Sat) by Serge (guest, #84957) [Link]

> I thought the plan was to migrate to per-user-tmp anyway, somewhere in $HOME, for apps that use a lot of tmp like DVD rippers this would be a good idea anyway.

Per-user directory would not get cleaned on reboot. Using per-user temporary directory may be a bad thing for users with NFS /home, they would prefer using local tmp if it is. Also a common /tmp for all users still needed for file exchange on a multiuser servers. And finally, why would DVD soft used something-in-HOME, if they can use /tmp which is there exactly for those things. ;)

Why put /tmp on tmpfs? Having /var/tmp/portage on tmpfs does not force you to put /tmp there. And it's really hard to find an application that becomes faster just because of /tmp on tmpfs. Even for portage it's not that obvious.

> Compiles on tmpfs are faster, factor is 1.8 to 2 in my tests

Hm... My simple test shows that tmpfs is just about 1-2% faster.
Here's the script to resemble a basic package build:
mount tmpfs or ext3 to /mnt/test, then
$ cd /mnt/test
$ wget http://curl.haxx.se/download/curl-7.26.0.tar.bz2
$ export CFLAGS='-O2 -g -pipe' CXXFLAGS='-O2 -g -pipe'
$ time sh -c 'tar xf curl-7.26.0.tar.bz2 && cd curl-7.26.0 && ./configure && make install DESTDIR=/mnt/test/root && cd ../root && tar czf ../curl-package.tar.gz * && cd .. && rm -rf curl-7.26.0 root'

tmpfs results:
real 70.983s user 48.685s sys 26.527s
real 70.635s user 48.390s sys 26.694s
real 70.701s user 48.203s sys 26.929s
real 70.867s user 48.636s sys 27.090s
real 70.744s user 48.297s sys 27.082s

ext3 results:
real 71.690s user 48.401s sys 27.498s
real 71.614s user 48.340s sys 27.869s
real 71.531s user 48.836s sys 27.520s
real 71.479s user 48.306s sys 27.469s
real 71.635s user 48.540s sys 27.496s

What have I missed?

Temporary files: RAM or disk?

Posted Jun 16, 2012 13:44 UTC (Sat) by nix (subscriber, #2304) [Link]

I thought the idea of per-user /tmp was that every user got his own /tmp, sure, but this was implemented via subdirectories of the *real*, tmpfs, cleared-on-boot /tmp, e.g. /tmp/user-$name/... This can all be done fairly easily with pam_namespace: there's even an example in the default /etc/security/namespace.conf.

(One application that becomes a lot faster with /tmp on tmpfs is GCC without -pipe, or, even with -pipe, at the LTO link step. It writes really quite a lot of large extremely temporary intermediate output to files in /tmp in each stage of the processing pipeline, then reads it back again in the next stage.)

Temporary files: RAM or disk?

Posted Jun 25, 2012 9:40 UTC (Mon) by Serge (guest, #84957) [Link]

> I thought the idea of per-user /tmp was that every user got his own /tmp, sure, but this was implemented via subdirectories of the *real*, tmpfs, cleared-on-boot /tmp.

You don't need tmpfs then. This will work with /tmp anywhere (disk, ram, separate partition, nfs, etc). I mean this is neither a reason to use tmpfs nor it's a reason to avoid it.

> One application that becomes a lot faster with /tmp on tmpfs is GCC without -pipe, or, even with -pipe, at the LTO link step.

Faster linking? Let's check that with something having a lot of binaries:
mount tmpfs or ext3 to /mnt/test, then
$ cd /mnt/test
$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.17.tar.xz
$ export CFLAGS='-O2 -g -flto' TMPDIR=/mnt/test
$ time sh -c "tar xf coreutils-8.17.tar.xz; cd coreutils-8.17; ./configure; make install DESTDIR=/mnt/test/root; cd ../root; tar czf ../coreutils-package.tar.gz *; cd ..; rm -rf coreutils-8.17 root"

tmpfs results:
real 882.876s user 760.111s sys 110.353s
real 884.456s user 761.408s sys 110.603s
real 885.245s user 762.770s sys 110.525s
real 884.914s user 762.417s sys 110.395s
real 885.352s user 762.865s sys 110.360s

ext3 results:
real 895.244s user 762.620s sys 115.027s
real 893.134s user 762.447s sys 114.841s
real 898.353s user 763.645s sys 116.369s
real 898.010s user 763.472s sys 116.074s
real 897.525s user 763.671s sys 116.219s

If my test is correct, it's still same 1-2%. It is faster, but not a lot.

Temporary files: RAM or disk?

Posted Jun 26, 2012 15:49 UTC (Tue) by nix (subscriber, #2304) [Link]

[lots of crude benchmarking ahead.]

It's not just linking that a tmpfs /tmp speeds up a bit, in theory: it's compilation, because without -pipe GCC writes its intermediate .S file to TMPDIR (and -pipe is not the default: obviously it speeds up compilation by allowing extra parallelism as well as reducing potential disk I/O, so I don't quite understand *why* it's still not the default, but there you are.)

btw, coreutils is by nobody's standards 'something having a lot of binaries'. It has relatively few very small binaries, few object files, and an enormous configure script that takes about 95% of the configure/make time (some of which, it is true, runs the compiler and writes to TMPDIR, but most of which is more shell-dependent than anything). LTO time will also have minimal impact in this build.

But, you're right, I'm pontificating in the absence of data -- or data less than eight years old, anyway, as the last time I measured this was in 2004. That's so out of date as to be useless. Time to measure again. But let's use some more hefty test cases than coreutils, less dominated by weird marginal workloads like configure runs.

Let's try a full build of something with more object files, and investigate elapsed time, cpu+sys time, and (for non-tmpfs) disk I/O time as measured from /proc/diskstats (thus, possibly thrown off by cross-fs merging: this is unavoidable, alas). A famous old test, the kernel (hacked to not use -pipe, with hot cache), shows minimal speedup, since the kernel does a multipass link process and writes the intermediates to non-$TMPDIR anyway:

tmpfs TMPDIR, with -pipe (baseline): 813.75user 51.28system 2:13.32elapsed
tmpfs TMPDIR: 812.23user 50.62system 2:12.96elapsed
ext4 TMPDIR: 809.74user 51.90system 2:29.15elapsed 577%CPU; TMPDIR reads: 11, 88 sectors; writes: 6394, 1616928 sectors; 19840ms doing TMPDIR I/O.

So, a definite effect, but not a huge one. I note that the effect of -pipe is near-nil these days, likely because the extra parallelism you get from combining the compiler and assembler is just supplanting the extra parallelism you would otherwise get by running multiple copies of the compiler in parallel via make -j. (On a memory-constrained or disk-constrained system, where the useless /tmp writes may contend with useful disk reads, and where reads may be required as well, we would probably see a larger effect, but this system has 24Gb RAM and a caching RAID controller atop disks capable of 250Mb/s in streaming write, so it is effectively unconstrained, being quite capable of holding the whole source tree and all build products in RAM simultaneously. So this is intentionally a worst case for my thesis. Smaller systems will see a larger effect. Most systems these days are not I/O- or RAM-constrained when building a kernel, anyway.)

How about a real 900kg monster of a test, GCC? This one has everything, massive binaries, massive numbers of object files, big configure scripts writing to TMPDIR run in parallel with ongoing builds, immense link steps, you name it: if there is an effect this will show it. (4.6.x since that's what I have here right now: full x86_64/x86 multilibbed biarch nonprofiled -flto=jobserver -j 9 bootstrap including non-multilib libjava, minus testsuite run: hot cache forced by cp -a'ing the source tree before building; LTO is done in stage3 but in no prior stages so as to make the comparison with the next test a tiny bit more meaningful: stage2/3 comparison is suppressed for the same reason):

tmpfs TMPDIR: 13443.91user 455.17system 36:02.86elapsed 642%CPU
ext4 TMPDIR: 13322.24user 514.38system 36:01.62elapsed 640%CPU; TMPDIR reads: 59, 472 sectors; writes: 98661, 20058344 sectors; 83690ms doing TMPDIR I/O

So, no significant effect elapsed-time-wise, well into the random noise: though the system time is noticeably higher for the non-tmpfs case, it is hugely dominated by the actual compilation. However, if you were doing anything else with the system you would have noticed: paging was intense, as you'd expect with around 10Gb of useless writes being flushed to disk. Any single physical disk would have been saturated, and a machine with much less memory would have been waiting on it.

That's probably the most meaningful pair of results here, a practical worst case for the CPU overhead of non-tmpfs use. Note that the LTO link stage alone writes around six gigabytes to TMPDIR, with peak usage at any one time around 4Gb, and most of this cannot be -pipe'd (thus this is actually an example of something that on many machines cannot be tmpfsed effectively).

Temporary files: RAM or disk?

Posted Jun 1, 2012 4:42 UTC (Fri) by thedevil (subscriber, #32913) [Link]

"You could probably create a user-space solution that monitored swap usage and created new swap files on demand."

It exists, or existed; search for "swapd". I remember it because it was in that context I submitted my one and only kernel patch (which was rightfully ignored).

swapd was essentially useless, because there was just no way for userspace to notice out of swap condition soon enough with polling. Maybe with a bit of kernel help like a netlink socket it would have been possible.

Temporary files: RAM or disk?

Posted Jun 2, 2012 20:37 UTC (Sat) by branden (subscriber, #7029) [Link]

Obviously your one and only kernel patch was ignored was not due to its technical merits (or lack thereof), but for the much simpler reason that it came from thedevil.

Temporary files: RAM or disk?

Posted Jun 14, 2012 10:11 UTC (Thu) by daenzer (✭ supporter ✭, #7050) [Link]

Temporary files: RAM or disk?

Posted Jun 1, 2012 3:58 UTC (Fri) by hpa (subscriber, #48575) [Link]

Calling tmpfs RAM is very misleading. tmpfs is swappable, and in conjunction with a larger swap partition can handle large files just fine. It still performs quite a bit better for many workloads, simply because it has no integrity constraints: if the system crashes, it doesn't have to be able to present a consistent state at all.

On some of the (pre-disaster) kernel.org servers we switched to /tmp on tmpfs when we found that it made some of the data generation scripts that were run on a regular basis run as much as 20 times faster.

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:21 UTC (Fri) by tomba (subscriber, #51091) [Link]

I wonder if a tmpfs-like filesystem which stores small files into ram, like it does now, and large files to a normal on-disk fs, would work. In a sense tmpfs does that already by using swap, but I guess having a normal on-disk fs would be faster. The on-disk fs could be a simple one, as it doesn't need to be retained over boot.

Temporary files: RAM or disk?

Posted Jun 1, 2012 12:13 UTC (Fri) by sorpigal (subscriber, #36106) [Link]

As I read the article I began thinking the same thing. A new tmpfs-like FS that stores all files in RAM at first, stores them on disk if they grow large and also on disk if they remain for a long time. The big question is how you define the on disk location to be used for backing this new tmp; swap seems like a crude solution as (1) I don't use it now and (2) I would prefer to be able have /tmp grow as big as needed without causing the e.g. firefox to get OOMK'd.

Temporary files: RAM or disk?

Posted Jun 3, 2012 6:36 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

> A new tmpfs-like FS that stores all files in RAM at first, stores them on disk if they grow large and also on disk if they remain for a long time.

You are aware that Linux, like all modern operating system kernels, has a unified caching subsystem, right? The VM subsystem _already does_ exactly what you want: it will page out tmpfs pages to backing storage (in this case, swap) just as it would page out file-backed pages to their backing stores; the exact identity of the backing store doesn't make a difference. The distinction between bytes in memory and bytes on disk isn't nearly as clear-cut as you think.

Temporary files: RAM or disk?

Posted Jun 4, 2012 5:08 UTC (Mon) by raven667 (subscriber, #5198) [Link]

One of the few reasonable statements in this discussion... Using the robust features of the kernel to implement the most sensible policy, who would have thought.

Temporary files: RAM or disk?

Posted Jun 1, 2012 5:34 UTC (Fri) by wahern (subscriber, #37304) [Link]

Some of us don't have swap. I've never needed it on any of my servers. If your resident set is too large for RAM, you'll have problems at the worst opportune time--under heavy load. If your virtual set is significantly larger than your resident set, then you have broken programs (probably descended from the mythical daemon which preallocates gigabytes of memory from malloc like candy, and necessitating the OOM killer). If a program wants to allocate a bunch of address space for object caching, or wants to slurp in a large data set into memory, there's mmap. Typical servers don't need swap; it's mostly brain dead desktop applications, and batch processing analytics software which copies huge datasets into malloc'd memory, and neither are heavy users of /tmp.

And I fail to see why tmpfs should be necessarily any better than a vanilla /tmp directory. Both primarily operate in RAM (tmpfs explicitly, /tmp through the buffer cache). Both spill to disk on memory pressure; probably the same disk, in fact. Instead of tmpfs, why not tweak the buffer cache? Are there any numbers comparing tmpfs with a /tmp on its own partition?

Temporary files: RAM or disk?

Posted Jun 1, 2012 23:46 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

it depends on if you allow overcommit.

If you disable overcommit and swap you run into fun where a large program that wants to execute a small one temporarily takes up twice it's large footprint as it forks to execute the small program.

Temporary files: RAM or disk?

Posted Jun 3, 2012 0:19 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

The fact that some people don't have swap isn't a reason. They don't have swap because they don't need it. If tmpfs were otherwise the right choice, they could easily have swap space.

I would add swap space to a system even if it couldn't benefit from it for swapping, just to back tmpfs. It's a more efficient way of storing temporary, expendable files than any disk-based filesystem I know.

Temporary files: RAM or disk?

Posted Jun 3, 2012 0:29 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

I'm struggling to understand your comments that typical servers do not need swap space and (I think) that systems which swap are broken.

The fundamental point of swap space, going back to its invention, is temporal locality of reference - the idea that in a given interval of time, certain memory is accessed far more frequently than other memory. Are you saying that isn't the case in typical servers? Or that it shouldn't be the case for typical servers? That typical servers should reference all memory uniformly over time and explicitly keep the less-accessed data in filesystems?

(The latter, BTW, was the technology that swapping replaced 40 years ago).

Temporary files: RAM or disk?

Posted Jun 3, 2012 1:55 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

The problem is, swap allows to allocate more RAM than present, using disk as a backing storage. Usually it works just fine because you don't need to touch all of your RAM at the same time.

However, there are several pathological cases that can arise. One depressingly common case: a fairly inactive application (say, a Java webapp) with a large working set is slowly pushed into swap by other apps. Since application is inactive it lives just fine until something triggers a garbage collection. And then JVM has to walk through all the pages, tracing object references and that causes all of the working set to be brought into RAM.

And while app is swapping in, the system might appear to be locked - I have no idea why, in theory other apps should remain responsive.

As a bonus, in this scenario the swapin of a Java application might cause swap out of another application which might be active at that time, causing problems with response time.

And as additional bonus, there's an even simpler scenario - an application which constantly allocates RAM (perhaps, in an infinite loop of malloc). It reliably kills my machine for several minutes if I use swap.

Temporary files: RAM or disk?

Posted Jun 4, 2012 13:09 UTC (Mon) by nix (subscriber, #2304) [Link]

And while app is swapping in, the system might appear to be locked - I have no idea why, in theory other apps should remain responsive.
Possibly part of the X server has been pushed out to swap, leaving it unable to dispatch events without swapping itself back in off the already-highly-contended disk.

Temporary files: RAM or disk?

Posted Jun 5, 2012 14:29 UTC (Tue) by pboddie (subscriber, #50784) [Link]

And let us not forget that valuable property of the OOM killer who usually makes an entrance at this point: to kill all the desired applications, leaving stupidapp to finally emerge triumphant, only to exclaim, "Where did everybody go?! System is going down for reboot? What does that mean?!"

Temporary files: RAM or disk?

Posted Jun 3, 2012 14:53 UTC (Sun) by bronson (subscriber, #4806) [Link]

Temporal locality of reference is the case for basically all programs (really oddball scientific ones being the only exception I can think of).

With most servers, though, it doesn't matter. The important bits (the server software) trivially fits into RAM, and the unimportant bits (the served content) comes off disk anyway. In this type of workload, swap is more of a liability than a benefit.

Another way of looking at it: best case it will swap a few unused nginx pages out to disk and bring 0.01% speed improvement, worst case it can fight the buffercache or even bring your server to its knees and light your pager up at 3am.

Temporary files: RAM or disk?

Posted Jun 3, 2012 17:28 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

With most servers, though, it doesn't matter. The important bits (the server software) trivially fits into RAM, and the unimportant bits (the served content) comes off disk anyway.

That's way too general a statement to make. There is a great diversity of kinds of servers - different ages, scales, applications, etc., and few of us have a broad enough view of them to say anything is true about the majority of them.

Whether the important bits fit into RAM is the independent variable, not the dependent. The system designer chooses whether the important bits fit into RAM. So the only way the above makes sense is if you accept the oft-stated axiom that RAM is essentially free in 2012. I know there are servers where that is true, but there are plenty of servers where it's not. In one of those servers, if all the important bits fit into RAM, even though they're rarely accessed, that means someone screwed up and bought too much RAM.

The fact that you say everything but the server software is "served content," already tells me you're limiting your view to web servers and things like them. Other servers have very important data that is neither the server software nor originally from disk. If you don't buy either RAM or swap space for it, you don't serve.

Temporary files: RAM or disk?

Posted Jun 4, 2012 16:03 UTC (Mon) by bronson (subscriber, #4806) [Link]

Web servers, mail servers, file servers, directory servers, dns servers, Jabber servers, bittorrent servers, memcached, database servers, reverse proxy servers, load balancers, etc. etc. Almost all share a work profile that doesn't really benefit from swap.

You'll note that I did say *most*. Of course there exist servers that fall outside this but in my experience they're fairly rare.

So, I'm very curious, what is this somewhat common, swap-friendly type of server that I'm missing?

Temporary files: RAM or disk?

Posted Jun 4, 2012 21:17 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Web servers, mail servers, file servers, directory servers, dns servers, Jabber servers, bittorrent servers, memcached, database servers, reverse proxy servers, load balancers, etc. etc. Almost all share a work profile that doesn't really benefit from swap.

...

So, I'm very curious, what is this somewhat common, swap-friendly type of server that I'm missing?

You don't seem to be following the conversation. I said many servers have important data that is neither server software nor backed by a filesystem. I know I don't have to give you examples of those; many of your examples above use plenty of malloc memory. That was to cast doubt on the claim, which I said is way too general to make, that for most servers the important data is server software and other files.

It still might be true, but I'm a long way from being convinced any of us has a wide enough purview of the computer industry to know that the servers with working data are in the minority.

Temporary files: RAM or disk?

Posted Jun 5, 2012 5:59 UTC (Tue) by bronson (subscriber, #4806) [Link]

Obviously what I said was an oversimplification -- it's only 2 sentences. It still provides a decent mental model to answer the OP's question. If you'd like to improve on it, please do. There's plenty of room.

And, I have a reasonable view of the data centers that I've worked in... Most servers I've seen have avoided swapping. Some merely ignore it because it's redundant (Apache, nginx), and some go to unbelievable lengths to avoid it (Oracle). Very few actually embrace it (Varnish). That's just my experience. Again, if you've seen otherwise, please do share.

Temporary files: RAM or disk?

Posted Jun 5, 2012 1:31 UTC (Tue) by vonbrand (subscriber, #4458) [Link]

I believe you mean they require swap? At least a DNS resolver rapidly accumulates a huge database to keep in RAM, that is relatively rarely used.

Temporary files: RAM or disk?

Posted Jun 5, 2012 6:25 UTC (Tue) by bronson (subscriber, #4806) [Link]

Very true, there have been caching DNS servers that malloc everything and let the VM handle the disk. BIND gained a reputation for absolutely shredding swap space, especially if you were running more than one instance on a box. Now that BIND has its sharable databse plus hooks to use mysql/postgres/ldap/etc, I don't think it works like that anymore...? (I haven't used BIND in quite a while, hallelujah).

Lots of other DNS servers use their own databases and handle caching themselves (tinydns, powerdns, djb). maradns is the one exception I know of, but I don't think it has seen much adoption...?

You make an excellent point, this is a great example of swap usage ont he server. Nevertheless, I'm still under the impression that my "most servers don't want swap" statement still holds.

Temporary files: RAM or disk?

Posted Jun 1, 2012 8:42 UTC (Fri) by iq-0 (subscriber, #36655) [Link]

The only thing missing is an option that you could claim a swap space as designated for only allowing tmpfs spillover.

Most of our servers have small amounts of swap and aggressive overcommit_memory settings. The small amount of swap makes is enough for the incidental extra memory needs and can be helpfull on loaded servers for accumulating effectively wasted memory.
But we do often generate lots of temporary files (for sorting or graphing) and some might actually be rather large. But if it means that I have to increase the global swap space than that's really no option.
I know adding swapspace and tweaking the overcommit_memory_ratio would work, but I don't want my shared memory in /dev/shm any more swappable than I want the normal anonymous pages (only /tmp tmpfs should be allowed to use the additional swap space)

Tmpfs really is the most sane option for /tmp. Why bother doing cleanups at boot? Why bother guaranteeing on disk-consistency or doing disk flushes on sync/fsync/fdatasync. The only sane alternative would be a ext2 filesystem with all data guarantees turned off and recreating it every time you boot.

Temporary files: RAM or disk?

Posted Jun 9, 2012 16:36 UTC (Sat) by Serge (guest, #84957) [Link]

> The only thing missing is an option that you could claim a swap space as designated for only allowing tmpfs spillover.

You have this option — use a separate ext3 partition. :)

> we do often generate lots of temporary files (for sorting or graphing) and some might actually be rather large.

Why do you use tmpfs then? You can just use regular disk. Have you noticed some speedup because of using tmpfs?

> Tmpfs really is the most sane option for /tmp. Why bother doing cleanups at boot?

Why bother about tmpfs size? Why bother about adding more swap. Why bother about system slowed down because of heavy swap usage? On-disk /tmp don't have these problems. And it's cleaned on boot automatically anyway, no need to bother.

> Why bother guaranteeing on disk-consistency or doing disk flushes on sync/fsync/fdatasync.

Nobody does fsync in /tmp, so nobody bothers. :)

> The only sane alternative would be a ext2 filesystem with all data guarantees turned off and recreating it every time you boot.

That's why ext3 is better. Replaying ext3 journal is faster than creating a new filesystem.

Temporary files: RAM or disk?

Posted Jun 14, 2012 9:05 UTC (Thu) by iq-0 (subscriber, #36655) [Link]

> You have this option — use a separate ext3 partition. :)

The problem with disk based /tmp is not that tmpfs or ext3 is faster perse. It's about when they want to do I/O:
- Ext3 sooner or later wants to write all data to disk, this might be because of dirty memory limits or because some application does a sync like call or sync on rename. But it *wants* to be on a disk.
- Tmpfs has no desire to be on disk. Sure, if the system is under memory pressure it is a candidate to being written to disk, but that's about the only case that it ever happens. And any data that is written to disk doesn't need the indexing on disk to retrieve it, just the in-memory indexing which is volatile, just as you expect from a temporary filesystem.

And you can tune a lot about ext3, but at it's core it wants to make sure an always valid structure on disk exists. You can't make it ignore any sync requests, you can't tell it to not bother updating the free-space (you want it to be empty each time it's mounted, so why bother keeping track of which blocks are free?).

> Why do you use tmpfs then? You can just use regular disk. Have you noticed some speedup because of using tmpfs?

I don't know about you, but disk I/O is one of the biggest bottlenecks on our systems. And trying to prevent any for of I/O unless really necessary really helps overall system performance. So yes, we do.

> Nobody does fsync in /tmp, so nobody bothers. :)

Oh? Most software is oblivious to where they write or how their writes might affect performance. And often software is rightfully written to be "correct" (power fail safe) but are in some cases used in a different capacity than was originally imagined. And some tools call 'sync' (like "dpkg" and probably "rpm" too), which sync *all* filesystems mounted.

> That's why ext3 is better. Replaying ext3 journal is faster than creating a new filesystem.

You know what is faster than replaying a journal and than performing (effectively) 'rm -rf' on it? Never creating a filesystem in the first place. Swap space never has to be recreated it's assumed to be unused on fresh boot.

Temporary files: RAM or disk?

Posted Jun 1, 2012 11:35 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I don't have swap. I don't WANT swap because its implementation is a pile of excrement.

In case of problems it reliably slows my computer to a crawl so it's often easier just to reboot the computer than to wait several minutes to type htop and kill the offending process. I very much prefer OOM killer to come out quickly and kill something - that usually leaves me with a working system, at least.

Again, using swap to extend the size of a _filesystem_ seems a bit convoluted.

Temporary files: RAM or disk?

Posted Jun 1, 2012 17:35 UTC (Fri) by gmaxwell (subscriber, #30048) [Link]

I haven't shared your negative expediences with swap— at least not at any time during the past 5 years or so. I run all my systems— servers, desktops, and laptops alike with /tmp on tmpfs, with large swaps. I frequently have tens of gigs of data in tmpfs on workhorse machines.

On my laptop, at least before I used a SSD, tmpfs made a visible increase in battery life because the drive was no longer being woken up by tmp activity.

Temporary files: RAM or disk?

Posted Jun 1, 2012 18:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I've used laptop-mode on my laptop for ages. So I wasn't bother by disk IO from /tmp either.

tmpfs on /tmp default in Debian

Posted Jun 1, 2012 12:11 UTC (Fri) by rleigh (subscriber, #14622) [Link]

Hi,

Just a few comments regarding the current Debian defaults, since I'm mostly responsible for the current use of tmpfs on /tmp.

As mentioned in
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=674517
it was originally intended that this feature only be enabled for new installs, not on upgrade. As mentioned in this bug, some of that has been rectified already in git. We will also correct the configuration for users who got tmpfs enabled during upgrade, but this needs careful testing. The main issue for it being enabled on upgrades is that insufficient swap may be available to have a reasonable amount of space. This is also an issue for new installs--we do need some support in the installer for this as well.

I'm certainly not averse to switching the default back, if this is the best solution at the present time for the majority of our users. As was seen in both this an earlier discussions, there is not a clear-cut consensus here--there are from what I can tell approximately equal numbers in the "for" and "against" camps. It's clear we can't satisfy everyone no matter which is picked as the default.

As mentioned, the use of tmpfs really boils down to peoples expectations of what /tmp is /for/. The size of what may be stored isn't specified in any standard, so it's not fair to say that it's only usable for "small" files. But using a tmpfs does place a restriction of the size of the files which may be used. That said, allowing certain applications to store multi-GiB files on /tmp causes its own indirect problems in addition to the immediate: these programs are often broken on smaller systems due to not being able to cope with running out of space, and impose a requirement for vast amounts of temporary space. These programs are broken on systems with a smaller rootfs, irrespective of the tmpfs issue.

Maybe we need to distinguish not on size, but on speed of access, and have /tmp for "fast" access and a separate location for slower disc-backed storage, which would be more suited to the storage of streaming media, ISO images etc. which are going to have a longer lifetime, and also tend to be larger. None of these uses benefit from being on a tmpfs in the general case. (I've used /scratch for this in the past, though currently it's a 1 TiB btrfs RAID1 under /srv/scratch.)

The important part of what we've achieved for wheezy is having tmpfs filesystems mounted on /run (and optionally /run/lock and /run/shm). The tmpfs on /tmp uses the same infrastructure in our init scripts, and we mount tmpfs on /tmp in two special cases: if the rootfs is read-only and no /tmp mount exists in fstab, and if /tmp contains less than a certain amount of free space. This is one part of making it possible to run with a read-only rootfs out of the box, and also to aid recovery if booting when the rootfs is full, respectively.

The default of whether we mount tmpfs on /tmp by default or not is really only a minor part of the other improvements we have in wheezy--it really doesn't matter which is the default, so long as it works. While I'm in favour of this being tmpfs, if there are too many programs which break which can't be fixed, then we'll have to switch back to using a regular filesystem. Maybe we'll then be able to reconsider it for wheezy+1.

Regards,
Roger

tmpfs on /tmp default in Debian

Posted Jun 1, 2012 12:54 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Hey!

I have an idea! Why not use something like unionfs to automatically push large files to disk-based /tmp?

tmpfs on /tmp default in Debian

Posted Jun 1, 2012 15:11 UTC (Fri) by ballombe (subscriber, #9523) [Link]

The problem is that there is a lot of applications use /tmp to store temporary data they expect will not fit in memory and in a lot of case this is more efficient than swapping (e.g. because they do a big contiguous write, then maybe one hour latter they do a big contiguous read).

tmpfs on /tmp default in Debian

Posted Jun 1, 2012 20:24 UTC (Fri) by wahern (subscriber, #37304) [Link]

Indeed. tmpfs provides a crutch for stupid applications, but makes it difficult (if not impossible, there being no standard alternative to /tmp semantics) for smart applications to manage data intelligently.

It's one thing to cater to stupid applications, its another to erect barriers to good design.

Temporary files: RAM or disk?

Posted Jun 1, 2012 12:37 UTC (Fri) by drag (subscriber, #31333) [Link]

Functionally, I see little difference between using a separate ext4 partition for /tmp versus running tmpfs /tmp with large swap partition or file.

Temporary files: RAM or disk?

Posted Jun 1, 2012 12:43 UTC (Fri) by ottuzzi (subscriber, #74496) [Link]

I was working as sysadmin when our solaris cluster hosting 18 Oracle instances went crazy when someone filled the /tmp mounted as tmpfs (Solaris standard) with stupid log files turned to DEBUG level.
Oracle instances simply stopped to work (messages like not enough memory to fork) and Sun Cluster did not understand the situation and all resource groups went in STOP_FAILED (aka "I don't know how to handle... good luck to you!").
I really think that the default should be the safest choice available and, when needed or desiderable, is sys admin responsibility to look for performance optimization as tmpfs clearly is.
Maybe a simple package to install? yum install tmp_on_tmpfs

Just my 2cents
By
Piero

Temporary files: RAM or disk?

Posted Jun 1, 2012 14:00 UTC (Fri) by Yorick (subscriber, #19241) [Link]

Since most other distributions will have changed to tmpfs for /tmp in a year or so, their maintainers and users will shoulder the cost and burden of adapting ill-behaved applications (presumably those that assume that /tmp is limitless), so the Debian people could just prudently stay behind and switch over after all the work has been done for them. Or, they could give them a hand.

I'm personally an eager user of tmpfs for many things - the performance gains even compared against ext2 can be considerable. If anything, it could be useful to have a dynamic swap extension/shrinking mechanism, so that this would not have to be done manually. (I would not be surprised to learn that such a system already exists and is in daily use by just about everyone else except myself.)

Temporary files: RAM or disk?

Posted Jun 2, 2012 11:40 UTC (Sat) by th0ma7 (guest, #24698) [Link]

A sort of "unionfs" over a regular /tmp (or /tmp over the / partition for that matter) in conjunction of a tmpfs with priority to use RAM space 1st would solve most of the problems... no?

Temporary files: RAM or disk?

Posted Jun 2, 2012 18:35 UTC (Sat) by ncm (subscriber, #165) [Link]

A common cautionary tale template has an engineer confronting a problem and choosing a shortcut: "... and then the engineer had two problems". Unionfs has fundamental shortcomings that make it a problem where its usage pattern is not severely constrained.

How would this work, exactly? Would a daemon move big files out of tmpfs /tmp to a shadowed on-disk /tmp? This might steer clear of unionfs's shortcomings.

Temporary files: RAM or disk?

Posted Jun 2, 2012 21:38 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

I've used Solaris quite a bit and the default /tmp in ram behaviour ends up meaning that a lot of things that should be put in /tmp end up elsewhere because it's just too easy to run out of ram.

as others have noted, /tmp shouldn't create that much I/O as most of the files don't actually need to hit disk. However, since ext3 is so badly behaved in the face of fsyncs, this is not the case on ext3 based systems.

but instead of changing how the system works, they should switch to a different filesystem that doesn't have the bad behaviour of ext3. Use either ext4 or XFS

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:19 UTC (Mon) by Serge (guest, #84957) [Link]

> However, since ext3 is so badly behaved in the face of fsyncs, this is not the case on ext3 based systems.

As long as no real-world applications fsync files in /tmp... who cares?

Temporary files: RAM or disk?

Posted Jun 4, 2012 7:22 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

the way fsync is broken on ext3, that would only matter if /tmp was a separate filesystem. a fsync anywhere on the filesystem triggers this bug

Temporary files: RAM or disk?

Posted Jun 4, 2012 10:44 UTC (Mon) by Serge (guest, #84957) [Link]

Not sure how this can affect /tmp in real-world then. Could you write some detailed description for this use case so that it could be tested?

Temporary files: RAM or disk?

Posted Jun 4, 2012 23:05 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

create one program that does massive sequential writes to a file in one directory.

then do a fsync on some file in a different directory (with a small change to the file)

watch the fsync take a long time to complete on ext3, and almost no time on any other filesystem.

because of this behavior, users and sloppy programmers have been conditioned that fsync calls make their program pause unexpectedly for a potentially long time period (I think I've seen Ted Tso report that he's seen delays longer than 30 seconds)

If you go back the the blog messages about fsync and data reliability when people were claiming that ext4 was eating their KDE configuration data, you will see detailed discussions about this.

Temporary files: RAM or disk?

Posted Jun 5, 2012 5:24 UTC (Tue) by Serge (guest, #84957) [Link]

> watch the fsync take a long time to complete on ext3, and almost no time on any other filesystem.

Thanks for a detailed description.

I could not notice a major difference on my system, it could be my HDD is too fast for that (100MB/s, usual rotating hdd, no raids, no LVMs). I thought this "bug" was fixed a few years ago with Linus's blessing, btw.

But the question is: how is this related to /tmp? Nobody fsync()s file in /tmp. This won't work for small short-lived files as well, since there's no chance they're created at the moment of fsync (and even if they are, you won't notice the difference, because they're small). So there must be program writing large file in /tmp. It must be large enough so that fsync on another partition was noticeably delayed. But large file will trigger dirty*ratio and start writing to disk, thus not delaying fsync() much anyway.

BTW, even if fsync was delayed, what is the application where you could notice this delay? I'm trying to say that I can't think of any real-world use cases, that /tmp on ext3 is not good for.

> I think I've seen Ted Tso report that he's seen delays longer than 30 seconds

Technically it may be possible (if it still was not fixed 3 years ago). You need a machine with a lot of RAM, very slow HDD, increase dirty*ratio to 90%, write a few GBs and then call fsync(). But that would be useless, because it's not related to any real-world use cases.

> If you go back the the blog messages about fsync and data reliability when people were claiming that ext4 was eating their KDE configuration data, you will see detailed discussions about this.

Those were different ext4-specific problems of recently modified files lost on crash (usual thing, actually, official xfs "feature"), not related to fsync(), and definitely not related to /tmp, as far as I remember. And those were fixed a few years ago anyway.

Temporary files: RAM or disk?

Posted Jun 5, 2012 7:12 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

as I understand the problem, it's a design flaw in ext3, not something that can be fixed without a major re-write.

something along the lines that the ext3 journal doesn't know what blocks are related to the metadata, so to avoid revealing old data that may be on disk, the filesystem is required to flush all pending writes.

the XFS/ext4 'problem' you refer to is the way every filesystem other than ext3 works. If you don't do a fsync, the data isn't safe and a crash at the wrong time can give you grief.

the functionality that makes this less of a problem on ext3 is the same functionality that makes it behave so horribly when you do a fsync

this is good for crash-prone desktop systems running software that wasn't written to be crash safe (note that it doesn't make the systems safe, it just reduces the probability of data loss)

but if you are running any software that is written to be crash safe, ext3 is about the worst filesystem you could use (in some cases worse than ext2 or other non-journaled filesystem).

Temporary files: RAM or disk?

Posted Jun 4, 2012 18:34 UTC (Mon) by raven667 (subscriber, #5198) [Link]

fsync on tmp is uniquely pointless since the contents are not intended to survive a normal restart let alone a system fault. Data loss in /tmp is OK and expected during a restart, for persistant data try /var/tmp

Temporary files: RAM or disk?

Posted Jun 5, 2012 12:34 UTC (Tue) by roblucid (subscriber, #48964) [Link]

Unfortunately, one of the practical issues with /tmp being TMPFS based is the lack of consensus on the reboot semantics (UNIX & Linux). It was quite common in past for /tmp files only to be cleaned when older than 24hrs.

There was even debate in openSUSE's FATE, when it was proposed to default to the FHS behaviour and delete /tmp files by default (due to misfeature of old SuSE Linux automatic deletion needed to be explicitly turned on, which caused maintenance issues as developers who "knew" /tmp is automatically cleaned out, wouldn't ensure cleanup on process crashes and so on).

Temporary files: RAM or disk?

Posted Jun 5, 2012 15:40 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> It was quite common in past for /tmp files only to be cleaned when older than 24hrs.

The distinction which I learned, which may very well be non-universal, was that /tmp is for temporary files which are tied to a particular process. These files can be removed as soon as the process exits, including on reboot. The /var/tmp directory is for temporary files which may need to outlast a given process, possibly across reboots, and should thus be cleared out much more sparingly based on the ages of the files.

Of course, not all processes follow this distinction.

Temporary files: RAM or disk?

Posted Jun 9, 2012 21:49 UTC (Sat) by Serge (guest, #84957) [Link]

> It comes down to a question of functionality vs. speed.

The entire thread have arisen because /tmp on tmpfs brings no speed. It brings problems, but no speedup to default real-world use cases. "What's so good in /tmp on tmpfs?" was the main question of the thread. Not just "why tmpfs is good", but "why /tmp on tmpfs is good".

You can put this questions as "Why /tmp on tmpfs is better than /var/ram on tmpfs?"

The only point that nobody objected to is that it's better to put /tmp on tmpfs than on a very small or read-only root partition. But that's not a problem for debian, where someone smart implemented a TMP_OVERFLOW_LIMIT feature, that automatically mounts tmpfs over /tmp if there's not enough free space there or root fs is read-only.

Temporary files: RAM or disk?

Posted Jun 9, 2012 23:00 UTC (Sat) by slashdot (guest, #22014) [Link]

If you put /tmp on tmpfs, then you need to have unlimited swap by swapping to a dynamically resized swapfile as well (which might need kernel changes, I think?).

Otherwise you can't create huge files in /tmp, which means that the system is simply broken.

Temporary files: RAM or disk?

Posted Jun 11, 2012 22:12 UTC (Mon) by nix (subscriber, #2304) [Link]

By extension, any filesystem whose size is not infinite is simply broken. Right.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds