User-space software suspend
Posted Sep 30, 2005 3:03 UTC (Fri) by zblaxell
In reply to: User-space software suspend
Parent article: User-space software suspend
I do like the fact that swsusp2 resumes with caches and buffers intact. If I wanted to wait while the system painfully restored this data one 4K page fault from swap at a time, I might as well reboot--it could actually be faster.
On the other hand, I generally like to run a small application before suspending, which allocates memory until a few hundred pages are swapped (it is a loop of malloc() and reading paging statistics out of /proc), then exits. This dumps out some of the more useless 400MB or so of caches on my system, and cuts resume time in half (it does add a second or two to suspend), without the extreme pain of having to swap _everything_ back in on resume.
I'm not sure what benefit there is in pushing too much of the suspend and resume functions into user space. After a while we start to need a whole lot of system calls to tell the kernel which of its "user space" processes are in fact absolutely critical to the continued functioning of the kernel, at which point IMHO it would be much simpler, safer, smaller, and swifter to just push the whole thing back into kernel-space. If you combine user-space suspend and resume with user-space block devices, user-space network devices, user-space encryption (on either), user-space device configuration, network storage devices, and device drivers that live partly or entirely in user-space, there's a whole lot of stuff that is just bouncing back and forth between user-space and kernel-space with no really sane reason to do so other than "we don't have to do all of it in the kernel."
In one special case of user space--monolithic user-space applications--there is a similar question of what to include in the main application's space and what to farm out to other processes. Sometimes the monolithic application is even called a "kernel." One solution in common between the Linux kernel and other large applications is to dynamically load code into the application's address space (.ko's or .so's). Another solution is to initiate another process with a separate address space, then communicate with the main application over some kind of IPC (netlink, /proc, /sys, dbus, hotplug, mmap...or sockets, pipes, shared memory, mmap).
There is a third option which is used by big applications but not the Linux kernel: embedded interpreted languages. Modern applications, once they cross a certain size threshold, tend to suddenly sprout a language interpreter to cope with their more advanced configuration options (where "configuration" sometimes amounts to "when I press this button, execute 1500 lines of custom workflow code"). Things like netfilter get close to this--iptables is almost Turing-complete, the chains are analogous to functions, some of the experimental netfilter modules implement dictionary lookups analogous to variables, and the non-experimental modules can do basic boolean logic on packets combining the results from multiple rules, as long as you don't need more than 8 levels of nesting or 32 bits of storage per packet. Netfilter in particular could benefit a lot from having a compiler in user-space generate an optimized (not every netfilter chain entry *needs* to look at the source and destination network/netmask, but they do nonetheless) bytecode (or even machine code) filter configuration, then pushing that code into a much simpler kernel-space implementation. I'm surprised the Linux kernel doesn't have at least one interpreted configuration language, not even as a module--other Unixish kernels and their bootloaders do.
Most of the time, the only advantage I ever see from having things like root filesystem configuration, device mapping, encryption, firmware loading, etc. configured from or provided by user-space is that it is then possible to do non-trivial configurations or experimental implementations. For example, the md-RAID setup allows a number of straightforward RAID configurations to be set up automatically by the kernel, while the device-mapper and other LVM flavors are configured from user-space and can (in theory) be a lot more flexible. Another example is encrypted filesystem setup, where you almost certainly want to have a custom user-space script to retrieve the decryption keys from whatever they're stored on, match them up with the right partitions, and of course collect the passphrase from the console. All this stuff can easily be handled by even a minimal scripting language with the right set of primitives--most of which would just be wrappers around existing kernel code, e.g. open() or sha1().
I currently do this kind of userspace configuration on an initrd with busybox (almost but not quite as painful as custom C code), custom binaries (which are comparatively hard to fix when they break, unless you have the presence of mind to keep a working development environment on a bootable CD with you at all time), or even shell scripts (which work, but take up megabytes of space for the 99% of the code you're not using). IMHO they all suck. The amount of stuff that I have to put into the initrd keeps getting bigger while the amount of stuff in the kernel keeps getting...well, bigger, and yet the amount of stuff that the kernel can do without help from user-space seems to be decreasing with each new major kernel subsystem. Also, I have to go through some weird flaky gymnastics to reconfigure user space (pivot_root and real-root-dev come to mind here) without leaving dangling references to multi-megabytes of initrd crap taking up RAM and swap. I'd rather just put 20K of some simple script language runtime into the kernel, have the kernel read and execute a 4K boot script, and be done with it. It can't take more than that much code to prompt for a password, run it through the appropriate salt and hash functions, set up two loop device AES keys, then exec "/sbin/init".
to post comments)