LWN.net Logo

posix_spawn is stupid as a system call

posix_spawn is stupid as a system call

Posted Nov 6, 2009 9:07 UTC (Fri) by epa (subscriber, #39769)
In reply to: posix_spawn is stupid as a system call by helge.bahmann
Parent article: Toward a smarter OOM killer

You're right, others pointed out the same thing; no single system call can handle all the things you might want to set up in the child process before exec()ing.

But that said, why does the whole child process (including, potentially, a complete copy of its parent's core pages, all ready to be written to) need to be created just to set a few uids or open some files? Perhaps it would work better to first prepare a new process structure, then set uids and open files for it, and as the last stage breathe life into it by giving a file to exec(). For example

    pid_t child = new_waiting_process();
    // Now child is an entry in the process table, but it is not running.
    // Use the p_ variants of some system calls to set things up for
    // this child process.
    p_setuid(child, uid);
    p_close(child, 0);
    p_open(child, "infile");
    // Finished setup, start it running.
    p_exec_and_start(child, "/bin/cat");
    wait(child);
This would give almost the same flexibility, but without the need to overcommit memory. The kernel would just need to create a new process in a not-runnable state, and the p_whatever system calls allow performing operations on another process rather than yourself. (Of course they would only allow manipulating your own not-yet-started child process, except perhaps for root.)

A process created with new_waiting_process() would inherit its parent's file descriptors, current directory, environment and so on as for fork(), but it would not inherit the parent's core.


(Log in to post comments)

posix_spawn is stupid as a system call

Posted Nov 6, 2009 10:07 UTC (Fri) by helge.bahmann (subscriber, #56804) [Link]

The idea in itself is workable, but the number of system calls you have to
duplicate is _huge_. It would perhaps be easier to create an "almost
empty" process image (with at least one stack and executable code page set
up) in suspended state, and then use ptrace or something similar to inject
system calls into the new process image -- this is tricky, but at least
the kernel is not burdened with an exploding number of system calls.

Alternatively, you could also provide a "fork" variant that explicitly
declares which pages of the address space are to be COWed into the new
process (if you are extra-smart, all you ever need to COW are the stack
pages, but calling library functions before execve is probably going to
spoil that -- but then, finding out which pages a library requires is by
no means easier, so you have to exercise a lot of discipline).

Might be an interesting research project to attempt any of the above in
Linux :)

posix_spawn is stupid as a system call

Posted Nov 6, 2009 13:51 UTC (Fri) by nix (subscriber, #2304) [Link]

You could reduce the set of necessary syscalls to one:

int masquerade_as (pid_t pid)

which issues syscalls in 'pid' instead of the current process. ('pid' is a
process you'd be allowed to ptrace, so immediate children are permitted).
This is a per-thread attribute, and passing a pid of 0 flips back to the
parent again.

Then all you need is this (ignoring error checking just as the OP did,
what a horrible name that new_waiting_process() has got, vvfork() would
surely be better):

pid_t child = new_waiting_process();
masquerade_as (child);
setuid(uid);
close(0);
open("infile");
// Finished setup, start it running.
execve ("/bin/cat", "/bin/cat", environ);
masquerade_as (0);
wait(child);

Note the subtleties here: execution always continues after execve()
because the execve() was done to another process image. Non-syscalls are
very dangerous to run because they might update userspace storage in the
wrong process: we'd really need support for this in libc for it to be
usable.

(In practice this latter constraint destroys the whole idea no matter how
good it might be: Ulrich would say no, as he does to every idea anyone
else originates. Personally I suspect this idea sucks in any case :) )

posix_spawn is stupid as a system call

Posted Nov 8, 2009 21:26 UTC (Sun) by epa (subscriber, #39769) [Link]

From a purist point of view, all these 'new' calls are generalizations of the existing ones taking an extra pid argument, so they can just replace them, with the old ones provided by the C library; of course in the real world there is such a thing as backward compatibility :-p.

posix_spawn is stupid as a system call

Posted Nov 8, 2009 23:34 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, breaking the entire installed base of Linux apps would probably be a
*bad* move :) I think, if you wanted to do this, you'd have to introduce a
huge pile of new syscalls and reimplement the old ones as thin wrappers
(inside the kernel so as not to force everyone to upgrade glibc) calling
the new ones.

posix_spawn is stupid as a system call

Posted Nov 23, 2009 15:08 UTC (Mon) by jch (guest, #51929) [Link]

This is analogous to the *at system calls (openat, fstatat, ...) that have been introduced in Linux and included in the latest revision of POSIX.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds