common loads; RAM cache and I/O `nice'; freeblock scheduling
Posted Jul 13, 2006 18:49 UTC (Thu) by ringerc
Parent article: The 2006 Linux Filesystems Workshop (Part III)
Thinking about the loads I see on the servers at work, I have several quite distinct patterns:
- Small random I/O done on small files, mildly biased toward reads. Continuous. (eg Cyrus mail spools - maildir like one-file-per-message storage, plus indexing and header caches).
- Archival storage of large and medium files (images, user documents, etc). Strong read bias, low load. Data is _generally_ added and not further modified.
- Working storage for users - the generic "file server" case of small & medium files, some of which sit around forever while others are quite hot. This includes user home directories.
- System storage (libraries, excectuables, etc). Insanely strong bias toward reads, needs more priority for RAM cache than it gets. It's INCREDIBLY annoying having libraries and binaries shoved out of cache because `tar' is reading some file for a backup that'll never be looked at again.
Additionally, all this gets backed up.
One thing I'd love to see - I can not possibly emphasise this enough - is a sort of `nice' facility for disk I/O. Backups are the most obvious use case - backing up a live server is miserable bastard of a job, as the whole server pigs while the backup runs. Not all servers have load patterns with dead times where one can afford this, and with disk snapshots (like LVM provides) there shouldn't be any need to stop services to back up the system. The ability for tools like `tar' and `cp' to inform the OS and FS that:
- The data they're working with is no more likely to be read/written again in the near future than any other data on the disk, so it should not be cached in RAM at the expense of anything else; and
- Other requests should have priority over requests by this program
would be INSANELY valuable.
[WARNING: The ideas of someone totally uninformed about a subject but too dumb to keep his mouth shut follow. I'm mentioning them just on the off chance one is useful _because_ I'm not used to the thought patterns of the field.]
The freeblock scheduling tricks mentioned sound like something that might benefit from extensions to the disk interface. Consider NCQ disks - the disk can be given a queue of requests to service in the optimal order. Wouldn't it be interesting if these could be prioritised (even just to the extent of 'normal' and 'only service if it won't impact normal reads') so that the disk could opportunistically service the low priority requests if it was working in the right area anyway.
The downside is that I suspect the disk would need a very large queue for low priority requests given the chances of it actually passing over the right block in any sane period of time. I'm not sure though ... track caching could be very handy for low priority reads, for example. It'd be harder for writes, since the disk would need to actually have the data to write when it got the chance. You'd probably need better knowledge of the disk's layout - again, perhaps the disk protocol could be extended to give some more information about layout and about optimal read/write patterns.
Another thing that might be nice would be to be able to ask the disk what was in its cache, and do reads that succeed only if data can be read from cache. After all, bus bandwidth is cheap - essentially free compared to seeks, and still very cheap compared to actual disk reads.
to post comments)