User: Password:
|
|
Subscribe / Log in / New account

Ghosts of Unix Past: a historic search for design patterns

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 16:11 UTC (Wed) by fuhchee (guest, #40059)
Parent article: Ghosts of Unix Past: a historical search for design patterns

I'm surprised not to have seen Plan 9 mentioned.


(Log in to post comments)

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 17:15 UTC (Wed) by nix (subscriber, #2304) [Link]

I'm quite glad that sysvipc is seen as so dead that it doesn't need to be mentioned anymore. That was a *really* extreme example of gratuitous separate namespaces with pointless additional restrictions as a direct result (what do you mean I can't wait on a pipe and a semaphore at the same time, because I can't select() on semaphores?)

I don't know who designed sysvipc, but if I ever meet them I shall shake them warmly by the throat.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 18:51 UTC (Wed) by HelloWorld (guest, #56129) [Link]

Huh? If sysvipc is dead, how do you use semaphores and/or shared memory on Unix? I agree that it's a problem that you can't wait on a semaphore and a pipe at the same time.

What's the problem?

Posted Oct 27, 2010 20:57 UTC (Wed) by khim (subscriber, #9252) [Link]

For shared memory there are mmap (don't forget that you can easily pass file descriptor via unix socket) and futexes work great across the shared memory so there are really no need to ever use sysvipc. Except for legacy reason, I suppose.

What's the problem?

Posted Oct 27, 2010 21:09 UTC (Wed) by michaeljt (subscriber, #39183) [Link]

> For shared memory there are mmap (don't forget that you can easily pass file descriptor via unix socket)

I think I personally prefer shm_open to passing fds over sockets.

What's the problem?

Posted Oct 27, 2010 22:23 UTC (Wed) by foom (subscriber, #14868) [Link]

Yeah, if you want to talk about crazy APIs, sendmsg/recvmsg pretty are pretty far out there.

What's the problem?

Posted Oct 28, 2010 7:54 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> Yeah, if you want to talk about crazy APIs, sendmsg/recvmsg pretty are pretty far out there.

I think I have problems with the concept of passing a file descriptor through a socket regardless of the API. It just doesn't seem to fit "into the metaphor".

What's the problem?

Posted Oct 28, 2010 11:20 UTC (Thu) by neilbrown (subscriber, #359) [Link]

I think passing file descriptors around is potentially very sensible. However I don't think much of the API that buries it deep inside sendmsg for Unix domain sockets.

I imagine that if you already had a pipe between two processes (possible using a named pipe in the filesystem) then one process could:

openat(pipefd, NULL, flags);
and the other process could notice (via a poll() message) and
accept(pipefd, ....)
and they would each get an end of a (optionally bi-directional) pipe. This pipe would be private to the two, in contrast to the named pipe which is shared by every process that opens it.

If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.

What's the problem?

Posted Oct 28, 2010 11:49 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> I imagine that if you already had a pipe between two processes (possible using a named pipe in the filesystem) then one process could:
>
> openat(pipefd, NULL, flags);
>
>and the other process could notice (via a poll() message) and
>
> accept(pipefd, ....)
>
>and they would each get an end of a (optionally bi-directional) pipe. This pipe would be private to the two, in contrast to the named pipe which is shared by every process that opens it.

Pardon me if I am being dense here, but isn't that roughly what Unix domain sockets do?

> If you really wanted to pass a file descriptor, you then 'splice' the file descriptor that you to pass onto the pipe. That gives the other end direct access to your file descriptor.

If we are talking about something accessible through the filesystem then surely either a process is allowed to open it (in which case they can be given permissions to do so) or they are not (in which case, well, they shouldn't be). I know there are edge cases like processes which grab a resource and drop privileges, but in that case permission to access the resource is tied to the fact that only a given process binary will manipulate it, and I don't know if you really gain much through passing it through a pipe or a socket instead, as you would need to add lots of extra security checks anyway to be sure you were really talking to that binary (so to speak).

What's the problem?

Posted Oct 28, 2010 14:35 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes, this is just like Unix-domain sockets. Neil's point is that the Unix-domain socket API is needlessly clumsy for this application: his proposed API would be much better (only nobody would use it for years if implemented, because it's Linux-specific).

What's the problem?

Posted Nov 7, 2010 0:17 UTC (Sun) by kevinm (guest, #69913) [Link]

Splicing the file descriptor doesn't have the same semantics. With a passed file descriptor, the original process can exit, and the recipient process will still have the file open.

The most valuable part of the file descriptor interface is the sane, well-defined object lifetimes.

What's the problem?

Posted Nov 15, 2010 11:48 UTC (Mon) by rlhamil (guest, #6472) [Link]

On some systems, pipe(fds) is equivalent to socketpair(AF_UNIX,SOCK_STREAM,0,fds),
in which case one could pass an fd (using the ugly and obscure semantics for doing so over
an AF_UNIX socket) over a pipe.

On some other systems, a pipe is STREAMS based, and STREAMS has its own mechanism
for passing fds over STREAMS pipes. Moreover, an anonymous STREAMS pipe can be given
a name in the filesystem namespace (something distinct from a regular named pipe), and
can have the connld module pushed onto it by the "server" end, in which case each client
opening the named object gets a private pipe to the server, and the server is notified
that it can receive a file descriptor for that. In turn, client and server could then pass other
file descriptors over the resulting private pipe.

(On Solaris, pipe() is STREAMS based; but one can write an LD_PRELOADable object
that redefines pipe() in terms of socketpair(), and most programs that don't specifically
depend on STREAMS pipe semantics won't know the difference.)

Unfortunately, STREAMS is far from universal. As a networking API, it's less popular than
sockets, and as a method of implementing a protocol stack, unless there are shortcuts between
for example IP and TCP, it's not efficient enough for fast (say 1Gb and faster) connections.
But for local use, it's still pretty flexible where available.

For performance, some systems do not implement pipes as either socketpair() or STREAMS.
(I just looked at Darwin 10.6.4; the pipe() implementation was changed away from
socketpair() allegedly for performance, and may not even be bidirectional anymore,
although a minimal few ioctls are still supported, but not fd passing.)

As for other abstractions not often thought of with a file descriptor, let me recall
Apollo Domain OS. Its display manager "transcript pads" IIRC had a presence in the
filesystem namespace. And although on one level they were like a terminal, on another,
although they were append-only, one could for all practical purposes seek backward into
them, equivalent to scrolling back. Moreover, certain graphics modes were permitted
within such a pad, and would actually be replayed when scrolled back to! In addition to that,
files in Domain OS were "typed": they had type handlers that could impose particular record
semantics, or even encapsulate version history functions (their optional DSEE did that,
and was the direct ancestor of ClearCase). More conventional interpretations were possible;
they'd always had type "uasc" (unstructured byte stream), although it had a hidden header
which threw off some block counts; a later "unstruct" type gave more accurate sematics of
a regular Unix file. They could also do some neat namespace tricks: some objects that
weren't strictly directories could nevertheless choose to deal with whatever came after them
in a pathname. So if one opened /path/to/magic/thingie/of/mine, it's possible that
/path/to/magic was in some sense a typed file rather than a system-supported directory,
but could choose to allow as valid that a residual path was passed to it, in which case
it would be implicitly handed thingie/of/mine as something it could use to determine the
initial state it was to provide to whatever opened it. _Very_ flexible! Only some of the
abstractions that Plan 9 (or the seldom-used HURD) promise came close to what
Domain OS could do. If I felt like adding something to my collection, a

What's the problem?

Posted Oct 30, 2010 1:03 UTC (Sat) by nevyn (guest, #33129) [Link]

You don't need to pass the fd, you can also use a real file and open+MMAP_SHARED in each process.

What's the problem?

Posted Nov 2, 2010 10:12 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

> You don't need to pass the fd, you can also use a real file and open+MMAP_SHARED in each process.

That sounds to me like the method where you create a file, open it in all processes, unlink it then make it sparse of the size you need, and hope that the kernel heuristics do the right thing...

What's the problem?

Posted Oct 28, 2010 14:33 UTC (Thu) by nix (subscriber, #2304) [Link]

Yeah, you could use futexes, if you didn't mind being totally nonportable off Linux.

A more portable approach with essentially no downsides is to pass the fd of a pipe to your recipient process, and use its blocking behaviour when empty to implement your semaphore.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 21:05 UTC (Wed) by jengelh (subscriber, #33263) [Link]

POSIX IPC. Such as shm_open. Gives you a file descriptor! *tada* Full exploitation.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 21:22 UTC (Wed) by HelloWorld (guest, #56129) [Link]

OK, I forgot about POSIX IPC. Thanks for the reminder.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 28, 2010 10:02 UTC (Thu) by Yorick (subscriber, #19241) [Link]

Unfortunately the different forms of POSIX IPC (shm, mq) still use their own special-purpose namespaces. And POSIX message queues don't even use file descriptors so they cannot be polled by ordinary means - they do in Linux, but that's not portable.

Again and again, the same design mistakes, probably with excellent excuses every time.

Ghosts of Unix Past: a historic search for design patterns

Posted Nov 1, 2010 8:21 UTC (Mon) by kleptog (subscriber, #1183) [Link]

Yeah, POSIX IPC is close, but not quite. While SysV shared memory is old, it still provides features that are not available anywhere else. The one I can think of off the top of my head is nattach, being able to examine a segment and see if it is still in use and by who.

This comes up every now and then when people want PostgreSQL to use POSIX shared memory or mmap(). Turns out that there is no portable replacement for all the features of SysV shared memory. Which means you could do it, but you lose a number of safety-interlocks you have now. And safety of data is critical to databases.

Ghosts of Unix Past: a historic search for design patterns

Posted Nov 7, 2010 0:25 UTC (Sun) by kevinm (guest, #69913) [Link]

Isn't the entire idea of nattach inherently racy?

(If you don't care about that, you can just walk /proc/*/fd/* to count the number of opens, with either POSIX shm or mmap).

Ghosts of Unix Past: a historic search for design patterns

Posted Nov 7, 2010 16:09 UTC (Sun) by kleptog (subscriber, #1183) [Link]

Not sure what you mean by racey. If you mean the value could change after you read it, sure, but that's not important. It could be that someone other than yourself tries to access the segment after you've checked you're the only one, but there are ways to ensure it's at least not another copy of yourself.

Given that is this situation attachments are created by fork() only (other than the initial one) if you have nattach == 1, you know there won't be another attachment other than by starting a complete new process. (The 1 is ofcourse yourself).

As for /proc/*/fd/*, that's hardly portable and more importantly, you're not required to have a file descriptor for a shared memory segment which means you need /proc/*/maps which is even less portable. Besides the fact that processes owned by other users are not examinable.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 17:28 UTC (Wed) by wahern (subscriber, #37304) [Link]

It definitely would have been useful to contrast with Plan9. For example, one of the problems of devfs, etc, is that the semantics of the file are defined by the external hierarchy. But with major-minor special files, the semantics are defined as attributes of the file's internal meta data. The point being, if something like devfs replaced device special files, you wouldn't be able to create a null device in a chroot jail or anywhere else (at least, not in an easy and portable manner, and not without special permissions, ignoring for the moment that chroot itself requires privilege).

Plan9 solved this my allowing users to freely mount sub-hierarchies wherever they wished, so that even if you couldn't create a null device file (e.g. `foo'), you could at least mount a device tree (e.g. `foo/null'). In Unix allowing users to freely alter the hierarchy isn't possible because of other built-in assumptions in the system which, if broken, would have undesirable security implications. This is why chroot and mount require root permissions, whereas in Plan9 AFAIK you don't need permissions to change your file tree--even the root--but only permissions to get a reference to a particular sub-tree (i.e. permission to get a descriptor to the server providing the tree).

Hacks like FUSE, while cool, are severely limited by various constraints in Unix.

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 27, 2010 19:10 UTC (Wed) by bfields (subscriber, #19510) [Link]

What happened to the effort to allow ordinary users to do bind mounts?

Ghosts of Unix Past: a historic search for design patterns

Posted Oct 28, 2010 18:03 UTC (Thu) by mszeredi (subscriber, #19041) [Link]

There's at least one problem remaining: rename/unlink returns EBUSY for mountpoint regardless of which mnt_namespace the mount is in. It's solvable, e.g. by allowing unlink to detach these kinds of mounts, but I haven't gotten around to doing it yet.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds