Ghosts of Unix Past: a historic search for design patterns
Posted Oct 27, 2010 17:15 UTC (Wed) by nix (subscriber, #2304)
I don't know who designed sysvipc, but if I ever meet them I shall shake them warmly by the throat.
Posted Oct 27, 2010 18:51 UTC (Wed) by HelloWorld (guest, #56129)
What's the problem?
Posted Oct 27, 2010 20:57 UTC (Wed) by khim (subscriber, #9252)
Posted Oct 27, 2010 21:09 UTC (Wed) by michaeljt (subscriber, #39183)
I think I personally prefer shm_open to passing fds over sockets.
Posted Oct 27, 2010 22:23 UTC (Wed) by foom (subscriber, #14868)
Posted Oct 28, 2010 7:54 UTC (Thu) by michaeljt (subscriber, #39183)
I think I have problems with the concept of passing a file descriptor through a socket regardless of the API. It just doesn't seem to fit "into the metaphor".
Posted Oct 28, 2010 11:20 UTC (Thu) by neilbrown (subscriber, #359)
I imagine that if you already had a pipe between two processes (possible using a named pipe in the filesystem) then one process could:
openat(pipefd, NULL, flags);
If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.
Posted Oct 28, 2010 11:49 UTC (Thu) by michaeljt (subscriber, #39183)
Pardon me if I am being dense here, but isn't that roughly what Unix domain sockets do?
> If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.
If we are talking about something accessible through the filesystem then surely either a process is allowed to open it (in which case they can be given permissions to do so) or they are not (in which case, well, they shouldn't be). I know there are edge cases like processes which grab a resource and drop privileges, but in that case permission to access the resource is tied to the fact that only a given process binary will manipulate it, and I don't know if you really gain much through passing it through a pipe or a socket instead, as you would need to add lots of extra security checks anyway to be sure you were really talking to that binary (so to speak).
Posted Oct 28, 2010 14:35 UTC (Thu) by nix (subscriber, #2304)
Posted Nov 7, 2010 0:17 UTC (Sun) by kevinm (guest, #69913)
The most valuable part of the file descriptor interface is the sane, well-defined object lifetimes.
Posted Nov 15, 2010 11:48 UTC (Mon) by rlhamil (guest, #6472)
On some other systems, a pipe is STREAMS based, and STREAMS has its own mechanism
for passing fds over STREAMS pipes. Moreover, an anonymous STREAMS pipe can be given
a name in the filesystem namespace (something distinct from a regular named pipe), and
can have the connld module pushed onto it by the "server" end, in which case each client
opening the named object gets a private pipe to the server, and the server is notified
that it can receive a file descriptor for that. In turn, client and server could then pass other
file descriptors over the resulting private pipe.
(On Solaris, pipe() is STREAMS based; but one can write an LD_PRELOADable object
that redefines pipe() in terms of socketpair(), and most programs that don't specifically
depend on STREAMS pipe semantics won't know the difference.)
Unfortunately, STREAMS is far from universal. As a networking API, it's less popular than
sockets, and as a method of implementing a protocol stack, unless there are shortcuts between
for example IP and TCP, it's not efficient enough for fast (say 1Gb and faster) connections.
But for local use, it's still pretty flexible where available.
For performance, some systems do not implement pipes as either socketpair() or STREAMS.
(I just looked at Darwin 10.6.4; the pipe() implementation was changed away from
socketpair() allegedly for performance, and may not even be bidirectional anymore,
although a minimal few ioctls are still supported, but not fd passing.)
As for other abstractions not often thought of with a file descriptor, let me recall
Apollo Domain OS. Its display manager "transcript pads" IIRC had a presence in the
filesystem namespace. And although on one level they were like a terminal, on another,
although they were append-only, one could for all practical purposes seek backward into
them, equivalent to scrolling back. Moreover, certain graphics modes were permitted
within such a pad, and would actually be replayed when scrolled back to! In addition to that,
files in Domain OS were "typed": they had type handlers that could impose particular record
semantics, or even encapsulate version history functions (their optional DSEE did that,
and was the direct ancestor of ClearCase). More conventional interpretations were possible;
they'd always had type "uasc" (unstructured byte stream), although it had a hidden header
which threw off some block counts; a later "unstruct" type gave more accurate sematics of
a regular Unix file. They could also do some neat namespace tricks: some objects that
weren't strictly directories could nevertheless choose to deal with whatever came after them
in a pathname. So if one opened /path/to/magic/thingie/of/mine, it's possible that
/path/to/magic was in some sense a typed file rather than a system-supported directory,
but could choose to allow as valid that a residual path was passed to it, in which case
it would be implicitly handed thingie/of/mine as something it could use to determine the
initial state it was to provide to whatever opened it. _Very_ flexible! Only some of the
abstractions that Plan 9 (or the seldom-used HURD) promise came close to what
Domain OS could do. If I felt like adding something to my collection, a
Posted Oct 30, 2010 1:03 UTC (Sat) by nevyn (guest, #33129)
Posted Nov 2, 2010 10:12 UTC (Tue) by michaeljt (subscriber, #39183)
That sounds to me like the method where you create a file, open it in all processes, unlink it then make it sparse of the size you need, and hope that the kernel heuristics do the right thing...
Posted Oct 28, 2010 14:33 UTC (Thu) by nix (subscriber, #2304)
A more portable approach with essentially no downsides is to pass the fd of a pipe to your recipient process, and use its blocking behaviour when empty to implement your semaphore.
Posted Oct 27, 2010 21:05 UTC (Wed) by jengelh (subscriber, #33263)
Posted Oct 27, 2010 21:22 UTC (Wed) by HelloWorld (guest, #56129)
Posted Oct 28, 2010 10:02 UTC (Thu) by Yorick (subscriber, #19241)
Again and again, the same design mistakes, probably with excellent excuses every time.
Posted Nov 1, 2010 8:21 UTC (Mon) by kleptog (subscriber, #1183)
This comes up every now and then when people want PostgreSQL to use POSIX shared memory or mmap(). Turns out that there is no portable replacement for all the features of SysV shared memory. Which means you could do it, but you lose a number of safety-interlocks you have now. And safety of data is critical to databases.
Posted Nov 7, 2010 0:25 UTC (Sun) by kevinm (guest, #69913)
(If you don't care about that, you can just walk /proc/*/fd/* to count the number of opens, with either POSIX shm or mmap).
Posted Nov 7, 2010 16:09 UTC (Sun) by kleptog (subscriber, #1183)
Given that is this situation attachments are created by fork() only (other than the initial one) if you have nattach == 1, you know there won't be another attachment other than by starting a complete new process. (The 1 is ofcourse yourself).
As for /proc/*/fd/*, that's hardly portable and more importantly, you're not required to have a file descriptor for a shared memory segment which means you need /proc/*/maps which is even less portable. Besides the fact that processes owned by other users are not examinable.
Posted Oct 27, 2010 17:28 UTC (Wed) by wahern (subscriber, #37304)
Plan9 solved this my allowing users to freely mount sub-hierarchies wherever they wished, so that even if you couldn't create a null device file (e.g. `foo'), you could at least mount a device tree (e.g. `foo/null'). In Unix allowing users to freely alter the hierarchy isn't possible because of other built-in assumptions in the system which, if broken, would have undesirable security implications. This is why chroot and mount require root permissions, whereas in Plan9 AFAIK you don't need permissions to change your file tree--even the root--but only permissions to get a reference to a particular sub-tree (i.e. permission to get a descriptor to the server providing the tree).
Hacks like FUSE, while cool, are severely limited by various constraints in Unix.
Posted Oct 27, 2010 19:10 UTC (Wed) by bfields (subscriber, #19510)
Posted Oct 28, 2010 18:03 UTC (Thu) by mszeredi (subscriber, #19041)
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds