I have to disagree about O_CLOEXEC

Posted Nov 26, 2010 12:47 UTC (Fri) by Ross (guest, #4065)
Parent article: Ghosts of Unix past, part 2: Conflated designs

Well, maybe I don't have to... I just want to. :)

I don't think creating a file descriptor and allowing it to be duplicated on fork() or preserved through exec should be separate operations.

I think changing fork() to close file descriptors by default would abuse it's currently clean design. It's so beautiful, like a cell dividing. The only special cases which have to exist are things like PID and PPID. Of course people added locks and asynchronous IO. I don't think complicating it more would be nice.

Similarly exec() takes an existing process and environment and just changes the running program. I don't see file descriptors as part which program is running but as part of the environment that it runs in. The simplest thing is that the exec'd program sees exactly the same set of file descriptors that there were right before the system call.

So I'd argue that the defaults make sense as they are. File descriptors are real by default. I'd say the problem is O_CLOEXEC. It seems really useful, but it doesn't fit into this model.

As someone else pointed out, a system call to close all file descriptors (or maybe ones from a given range?) would be a more orthogonal way to handle it, and would probably be useful elsewhere. I've seen lots of code that loops from 0 to 255 closing everything just to be sure it wasn't leaking something.

Now if the automatic-trigger part of O_CLOEXEC is an important feature (maybe you don't trust your code to close things before calling exec), there are some solutions entirely in userspace. First, the C library could make special versions of the exec functions available which closed everything first. You could presumably use grep, macros, or other tricks to make sure you only used those versions. Second, the C library could even make provisions to track file descriptors that should be closed on exec without any help from the kernel.

In summary fork() and exec() are two well-designed parts of Unix. Making them uglier to get rid of this flag to open() would not have been an improvement.

I have to disagree about O_CLOEXEC

Posted Nov 26, 2010 22:51 UTC (Fri) by neilbrown (subscriber, #359) [Link]

I don't think anyone is suggesting changes to fork, though of course it has already been noted that fork shows signs of conflation which 'clone' and 'unshare' help to remove.

However 'exec' is very special. Unlike fork and everything else, the calling process has no control over what happens after the exec call succeeds, so it needs to do everything before.

It could close some file descriptors before without racing with other threads by using 'unshare' to have a private file-table, then closing whatever has been marked in libc as 'close on exec'.

But there are (or at least 'could be') times when you want some file descriptor to still be open if 'exec' fails, but you don't want it to be open after the exec succeeds. For that you really need close-on-exec.

And if it is necessary to have clone-on-exec, then it makes most sense for it to default to 'set' as that is commonly what is wanted, and that is easiest to manage in a race-free way.

The main point that I got from your comment is that while is might be clear that something isn't right with this whole design area, it is open for debate which bits are 'right' and which bits are 'wrong'. I would certainly agree with that.