|
|
Log in / Subscribe / Register

A discussion between database and kernel developers

A discussion between database and kernel developers

Posted Mar 11, 2014 3:01 UTC (Tue) by fest3er (guest, #60379)
Parent article: A discussion between database and kernel developers

"Temp files for purposes such as sorting should have writeback deferred as long as possible."

Would it work to write such temp files to a tmpfs that has no backing store? Or are such files really pseudo-temporary?

Can a balance be found between RAM size and DB size (and activity) that would minimize memory pressure while allowing Linux to manage dirty page handling efficiently in the background? Knowing 'normative' relationships between DB size and RAM size, and where/when performance begins to degrade would help to define the real problem and would help to focus attention on problem areas of the kernel and the RDBMS.

I'll part with a truism. You can only squeeze so much performance out of a Yugo; at some point, you have to upgrade to a Neon or, heaven forfend, a nitrous-boosted TDI.


to post comments

A discussion between database and kernel developers

Posted Mar 11, 2014 9:02 UTC (Tue) by mel (guest, #5484) [Link] (10 responses)

> Would it work to write such temp files to a tmpfs that has no backing
> store? Or are such files really pseudo-temporary?

If the size of a temporary file is a large percentage of physical RAM or exceeds physical RAM then it gets pushed to swap so no, it doesn't really work.

> Can a balance be found between RAM size and DB size (and activity) that
> would minimize memory pressure while allowing Linux to manage dirty page
> handling efficiently in the background?

This balance is already maintained but it's not the full story. If data writes to disk have to be strictly ordered for data consistency reasons then it may still be necessary to complete a large amount of writeback before the system can make forward progress. The kernel currently tracks the amount of dirty data that is in the system but not how long it takes to write it.

Would it work to write such temp files to a tmpfs

Posted Mar 11, 2014 15:01 UTC (Tue) by Wol (subscriber, #4433) [Link] (9 responses)

But this is a fairly easy thing to solve today, if you can direct your temporary files.

For example, I run gentoo. With 16Gb ram, I have (iirc) two 32Gb swap partitions (overkill, I know :-). I also have /var/tmp/portage mounted as a 20Gb tmpfs. I doubt I ever spill into swap while "emerge"ing an update, but I can leave it all to the OS to handle without worrying about it.

With PostgreSQL, all a user would need to do would be to add more ram, and the same thing would apply - if it fits in ram it stays in ram, if it doesn't then swap gets involved - but it would have to get involved anyway under any relevant scenario.

Cheers,
Wol

Would it work to write such temp files to a tmpfs

Posted Mar 11, 2014 15:11 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

It's not an unrealistic thing to have several hundred gigabytes of temporary files existing in some workloads.

Would it work to write such temp files to a tmpfs

Posted Mar 11, 2014 17:32 UTC (Tue) by dlang (guest, #313) [Link] (7 responses)

don't forget that paging in and out of swap tends to be significantly slower than to a simple disk file because the swap tends to be badly fragmented.

Would it work to write such temp files to a tmpfs

Posted Mar 11, 2014 17:46 UTC (Tue) by Wol (subscriber, #4433) [Link] (6 responses)

If you're going to use a lot of ram like this, just have several (hopefully very fast) disks and spread swap across them at equal priority.

Or if necessary (expensive but maybe worth it) just have swap on an ssd.

And tell the kernel developers you don't want the swap partition to fragment! :-)

Cheers,
Wol

Would it work to write such temp files to a tmpfs

Posted Mar 11, 2014 18:23 UTC (Tue) by dlang (guest, #313) [Link]

avoiding swap file fragementation requires the kernel either know ahead of time what's going to be swapped out in the future, or requiring that swap space be larger than virtual memory size so that there is a reserved spot for any page and it can be read in efficiently.

requiring a bunch of really fast disks to use swap when a medium speed disk doing a sequential write/read of the data will be just as fast is not a smart way to spend your money.

Would it work to write such temp files to a tmpfs

Posted Mar 12, 2014 16:00 UTC (Wed) by jeremiah (subscriber, #1221) [Link] (4 responses)

FWIW, I've found using an SSD for swap a great way to massively speed things up until the wear leveling really has to kick in. At which point in time everything goes downhill real fast. They seem to be okay when you have a little memory pressure, but when you hit a hard swap storm over a long period of time the drives have a hard time dealing with all of the overwriting of the same data repeatedly. This of course is all anecdotal and may have been caused by a particular version of Crucial/Micron's firmware in their m4 series, but I had ditch my swaps on those drives for fear of losing the whole drive.

Would it work to write such temp files to a tmpfs

Posted Mar 12, 2014 16:10 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Just do some TRIM-ming of empty space on these drives from time to time. Works wonders.

Would it work to write such temp files to a tmpfs

Posted Mar 12, 2014 20:44 UTC (Wed) by jeremiah (subscriber, #1221) [Link]

I'll give it a shot, thanks

Would it work to write such temp files to a tmpfs

Posted Mar 12, 2014 19:05 UTC (Wed) by parcs (guest, #71985) [Link] (1 responses)

Have you tried enabling the "discard" mount option on the swap partition (or using swapon -d)?

Would it work to write such temp files to a tmpfs

Posted Mar 12, 2014 20:44 UTC (Wed) by jeremiah (subscriber, #1221) [Link]

I have not, thanks.

A discussion between database and kernel developers

Posted Mar 11, 2014 9:21 UTC (Tue) by iq-0 (subscriber, #36655) [Link]

I still think that files that are not longer linked to the filesystem, but that have dirty-pages, should be put on a "we don't want any writeback for this unless there is memory pressure". This should probably be the default behaviour.

This makes unlinked temporary files very efficient (no unnecessary writes for stuff that we don't want to make persistent, like tmpfs) and explicitly backed by possible disk space (reserved from the filesystem, unlike tmpfs) and makes the memory easily reclaimable under memory pressure (sort of like tmpfs with dedicated swap space).
Effectively making all temporary file uses on all filesystems with optimized support effectively perform like tmpfs backed when there is no memory pressure (not a single byte should ever have to be written in that case, the inode could be fully in-memory and the space reservation would also be fully in-memory reservation of the blocks from the free space, on crash nothing would need to be recovered, because on disk it would already be in the free space list).

Should the new "link filedescriptor syscall" be used than that should probably trigger fdatasync on the filedescriptor (similar to sync on rename).

And an option to open normally linked file as such and having that file appear truncated in case of crash (because any blocks allocated would only be allocated in memory and on disk would always remain on the free space list).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds