This would be INCREDIBLE

Posted Jul 29, 2004 11:14 UTC (Thu) by ringerc (subscriber, #3071)
Parent article: Kernel Summit: Class-based Kernel Resource Management

I'm the sysadmin of a small to medium business network. We run a dual Xeon
server to host file services (NetATalk, Samba, and NFS), intranet web
services, LTSP thin clients, and mail services. As you can imagine, this
does not always go smoothly ... but it works OK overall, and it could go a
lot more smoothly than it does now.

I'm going to present a wishlist that attempts to briefly explain what I'd
find useful and why I think it'd be good. I'm not claiming that any of it
is easy, or even that it's a good idea in the interests of the system as a
whole - I lack the knowledge to evaluate that. All I can say is "I think
this would be very useful..."

It would be very helpful to be able to control resource allocation to
processes in a more flexible and CONSISTENT way than provided by `nice`
and `ulimit`. In particular:

- Disk I/O QoS, so we could (say) configure "user" applications to
get priority for quick, brief disk access while limiting the IO ops/second
and throughput of file services to just below the disk's ability. Another
example might be limiting the disk throughput and IO/sec usage of a large,
low priority copy operation like archiving an LVM snapshot to removable
storage or cloning an experimental version of a database.

- Per-process disk usage monitoring. "Dammit, why is the /home
array thrashing..."

- Memory limits that start paging processes out instead of killing
them if they exceed the limit, so it's possible to say (for example) that
the group 'users' may collectively consume no more than 50% of system
memory.

- Making CPU and memory limitatation consistent. I'm not convinced
the CPU time ulimit makes any sense in a modern computing environment, and
while I find the 'kill when exceeded' memory ulimit great for limiting the
damage done by crashing processes it'd be nice to be able to have less
drastic control over the system resources they use as well. See above
point.

- Memory priorities for processes. "If you have to free up some
space, get rid of the database cache first, please DO NOT page out the
binaries of my interactive applications, my thin client users happen to be
using those..."

- The ability to do large, one-off copies without driving
everything remotely useful out of the in-memory disk cache. This is a
MAJOR problem in my experience. It is nigh impossible to do any seriously
large copies on an active Linux server (in my experience) if there are any
reasonably interactive tasks. Even if the interactive tasks don't normally
even touch the disk(s) you're using for the big copy, they'll quickly get
sluggish and start swapping or having to repeatedly load parts of
libraries and files from disk. This appears to be because all the
previously cached data - silly things like glibc and the program binaries
- are being pushed out of the disk cache in favour of data from the copy
operation that will never be re-used. This issue makes things like backups
of live servers a much higer impact affair than they could be, especially
combined with the apparent lack of any way of rate-limiting copy
operations.

- The ability to configure and control disk, memory, CPU access,
and other forms of resource limitation and QoS from a single consistent
interface (say, sysfs). Ideally policies might be applied to a group of
related proceses (as `ulimit` and friends do currently) or to all
processes owned by a particular user or group. Imagine "No single user may
use more than 80% of the network bandwidth" or "This group of virtual
machines is limited to 50% of physical system memory (and will begin
paging out instead of crashing if it exceeds it)

Currently, it seems very hard to get different sorts of services to play
well together on the same server. For good utilisation and to limit the
number of servers that need to be managed, it would be nice to change
this. Many of the things needed to make (say) a thin client server andm
mail server live acceptably well on the same box will no doubt also
benefit virtualisation schemes like LVS and UML. After all, they, too,
want ways to prevent different VMs that might be doing different things
from treading on each other too badly.

In fact, I'd eventually love to be able to move my thin client services
into one virtual space (think BSD Jail, LVS, UML, etc), mail into another,
etc with minimal resource overheads. The management benefits would be
pleasant - upgrade your terminal server environment reguarly to get shiny
new GUI improvements, while keeping your mail server environment unchanged
for as long as humanly possible.

I think Linux is already ahead of some contenders when it comes to many
things efficiently sharing one system, but in my opinion there's still a
lot of room for improvement.

Of course, I'm just a lowly sysadmin and probably don't understand the
complexity of what I'm talking about. What the heck - these are wants and
needs, and I'm interested in how they translate into anything that could
make its way into reality outside an IBM mainframe.

Comments appreciated.

--
Craig Ringer
craig <at> postnewspapers [dot] com >d0t< au