Temporary files: RAM or disk?
Temporary files: RAM or disk?
Posted Jun 5, 2012 7:05 UTC (Tue) by bronson (subscriber, #4806)In reply to: Temporary files: RAM or disk? by dlang
Parent article: Temporary files: RAM or disk?
Posted Jun 5, 2012 7:19 UTC (Tue)
by dlang (guest, #313)
[Link] (11 responses)
Also, reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.
Posted Jun 5, 2012 15:33 UTC (Tue)
by giraffedata (guest, #1954)
[Link] (10 responses)
I believe the tendency is the other way around. One of the selling points for tmpfs for me is that reading and writing swap is more efficient than reading and writing a general purpose filesystem. First, there aren't inodes and directories to pull the head around. Second, writes stream out sequentially on disk, eliminating more seeking.
Finally, I believe it's usually the case that, for large chunks of data, the data is referenced in the same groups in which it becomes least recently used. A process loses its timeslice and its entire working set ages out at about the same time and ends up in the same place on disk. When it gets the CPU again, it faults in its entire working set at once. For a large temporary file, I believe it is even more pronounced - unlike many files, a temporary file is likely to be accessed in passes from beginning to end. I believe general purpose filesystems are only now gaining the ability to do the same placement as swapping in this case; to the extent that they succeed, though, they can at best reach parity.
In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.
Posted Jun 6, 2012 6:53 UTC (Wed)
by Serge (guest, #84957)
[Link] (3 responses)
It's not that simple. Tmpfs is not "plain data" filesystem, you can create directories there, so it has to store all the metadata as well. It also has inodes internally.
> Second, writes stream out sequentially on disk, eliminating more seeking.
This could be true if swap was empty. Same when you write to the empty filesystem. But what if it was not empty? You get the same swap fragmentation and seeking as you would get in any regular filesystem.
> In short, reading and writing swap has been (unintentionally) optimized for the access patterns of temporary files, where general purpose filesystems are not.
And filesystem is intentionally optimized for storing files. Swap is not a plain data storage, otherwise "suspend to disk" could not work. Swap has its internal format, there're even different versions of its format (`man mkswap` reveals v0 and v1). I.e. instead of writing through one ext3fs level you write through two fs levels tmpfs+swap.
Things get worse when you start reading. When you read something from ext3, the oldest part of the filecache is dropped and data is placed to RAM. But reading from swap means that your RAM is full, and in order to read a page from swap you must first write another page there. I.e. sequential read from ext3 turns into random write+read from swap.
Posted Jun 6, 2012 15:24 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link] (1 responses)
_Writing_ to swap means that your RAM is full (possibly including things like clean cache which are currently higher priority, but could be dropped at need). _Reading_ from swap implies only that something previously written to swap is needed in RAM again. There could be any amount of free space at that point. Even if RAM does happen to be full, the kernel can still drop clean data from the cache to make room, just as with reading from ext3.
Posted Jun 6, 2012 17:43 UTC (Wed)
by dgm (subscriber, #49227)
[Link]
All of this is of no consequence on system startup, when the page cache is mostly clean. Once the system has been up for a while, though... I think a few tests have to be done.
Posted Jun 7, 2012 2:28 UTC (Thu)
by giraffedata (guest, #1954)
[Link]
I was talking about disk structures. Inodes and directory information don't go into the swap space, so they don't pull the head around.
(But there's an argument in favor of regular filesystem /tmp: if you have lots of infrequently accessed small files, tmpfs will waste memory).
It's the temporary nature of the data being swapped (and the strategies the
kernel implements based on that expectation) that makes the data you want at any particular time less scattered in swap space than in a typical filesystem that has to keep copious eternally growing files forever. I don't know exactly what policies the swapper follows (though I have a pretty good idea), but if it were no better at storing anonymous process data than ext3 is at storing file data, we would really have to wonder at the competence of the people who designed it. And my claim is that since it's so good with process anonymous data, it should also be good with temporary files, since they're used almost the same way.
Actually, the system does the same thing for anonymous pages as it does for
file cache pages: it tries to clean the pages before they're needed so that
when a process needs to steal a page frame it usually doesn't have to wait for a page write. Also like file cache, when the system swaps a page in, it tends to leave the copy on disk too, so if it doesn't get dirty again, you can steal its page frame without having to do a page out.
Posted Jun 7, 2012 13:15 UTC (Thu)
by njs (subscriber, #40338)
[Link] (5 responses)
Posted Jun 7, 2012 13:28 UTC (Thu)
by Jonno (subscriber, #49613)
[Link]
I find that if I have two processes with large working sets causing swaping, and kill one them, doing a swapoff will get the other one performant again much faster than letting it swap in only the stuff it needs as it needs it.
Posted Jun 7, 2012 15:44 UTC (Thu)
by giraffedata (guest, #1954)
[Link] (1 responses)
Good information.
That's probably a good reason to use a regular filesystem instead of tmpfs for large temporary files.
I just checked, and the only readahead tmpfs does is the normal swap readahead, which consists of reading an entire cluster of pages when one of the pages is demanded. A cluster of pages is pages that were swapped out at the same time, so they are likely to be re-referenced at the same time and are written at the same spot on the disk. But this strategy won't effect streaming, like typical filesystem readahead.
And the kernel default size of the cluster is 8 pages. You can control it with /proc/sys/vm/page-cluster, though. I would think on a system with multi-gigabyte processes, a much larger value would be optimal.
Posted Jun 11, 2012 14:51 UTC (Mon)
by kleptog (subscriber, #1183)
[Link]
Posted Jun 7, 2012 21:36 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Windows 8 will do that for modern applications. http://blogs.msdn.com/b/b8/archive/2012/04/17/reclaiming-...
Posted Jun 8, 2012 0:15 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
When njh says "hack" I think it means something an intelligent user can invoke to override the normal system paging strategy because he knows a process is going to be faulting back much of its memory anyway.
The Windows 8 thing is automatic, based on an apparently pre-existing long-term scheduling facility. Some applications get long-term scheduled out, aka "put in the background," aka "suspended," mainly so devices they are using can be powered down and save battery energy. But there is a new feature that also swaps all the process' memory out when it gets put in the background, and the OS takes care to put all the pages in one place. Then, when the process gets brought back to the foreground, the OS brings all those pages back at once, so the process is quickly running again.
This of course requires applications that explicitly go to sleep, as opposed to just quietly not touching most of their memory for a while, and then suddenly touching it all again.
Temporary files: RAM or disk?
Temporary files: RAM or disk?
reading and writing swap tends to be rather inefficient compared to normal I/O (data ends up very fragmented on disk, bearing no resemblance to any organization that it had in ram, let alone the files being stored in tempfs.
Temporary files: RAM or disk?
Temporary files: RAM or disk?
Temporary files: RAM or disk?
Temporary files: RAM or disk?
... First,
there aren't inodes and directories to pull the head around.
It's not that simple. Tmpfs is not "plain data" filesystem, you can
create directories there, so it has to store all the metadata as well.
It also has inodes internally.
Second, writes stream out sequentially on disk, eliminating more
seeking.
This could be true if swap was empty. Same when you write to the empty
filesystem. But what if it was not empty? You get the same swap
fragmentation and seeking as you would get in any regular filesystem.
in order to read a page from swap you must first write another page there.
Temporary files: RAM or disk?
Temporary files: RAM or disk?
Temporary files: RAM or disk?
At least on our compute servers (running some vaguely recent Ubuntu, IIRC), swap-in is definitely not doing successful readahead
Temporary files: RAM or disk?
Temporary files: RAM or disk?
Temporary files: RAM or disk?
I've often wished for some hack that would just do a sequential read through the swap file to load one process back into memory
Windows 8 will do that for modern applications.