|
|
Log in / Subscribe / Register

Microsoft research: A fork() in the road

Microsoft research: A fork() in the road

Posted Apr 11, 2019 20:02 UTC (Thu) by ecree (guest, #95790)
In reply to: Microsoft research: A fork() in the road by simcop2387
Parent article: Microsoft Research: A fork() in the road

Arguably, performant fork() doesn't need overcommit either. If you have enough RAM, you can reserve pages at fork() and release them at exec(), without having to actually populate those pages except as-needed for COW. You could even stall fork() calls elsewhere in the system, rather than immediately returning -ENOMEM, if the system thinks its memory pressure is due only to such short-term reservations.

This only leads to problems in the case where you have a single-process behemoth with huge amounts of writable anonymous pages; also known as a badly-designed program. As long as userland developers are following proper Unix philosophy (in this case, multiprogramming), fork() can remain performant even without overcommitting memory. (And if you're _not_ doing multiprogramming, and are happy to have a single fat process, then you won't want to run subprocesses anyway, so you won't be calling fork(). It's only the ugly half-way compromises that have a problem.)


to post comments

Microsoft research: A fork() in the road

Posted Apr 11, 2019 20:05 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (34 responses)

> This only leads to problems in the case where you have a single-process behemoth with huge amounts of writable anonymous pages; also known as a badly-designed program.
Like, say, an application server with a large amount of cached data?

Microsoft research: A fork() in the road

Posted Apr 11, 2019 20:39 UTC (Thu) by ecree (guest, #95790) [Link] (33 responses)

Separate your driver program (which handles forking of new processes) from your data-crunching (which has the large anonymous shared mappings), and all is well.

And do note that it's only the _anonymous shared_ mappings that are a problem; file-backed mappings don't require COW, and nor do private anonymous mappings. Your "large amount of cached data" could have been stored in memory allocated with mmap(MAP_PRIVATE | MAP_ANON), instead of regular malloc(), and then it wouldn't show up in the child after fork().

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:25 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (22 responses)

> Separate your driver program (which handles forking of new processes) from your data-crunching (which has the large anonymous shared mappings), and all is well.
Oh wow. So we need a persistent daemon that does RPC to simply launch processes efficiently?

And you're arguing that Unix is well designed?

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:40 UTC (Thu) by ecree (guest, #95790) [Link] (21 responses)

> Oh wow. So we need a persistent daemon that does RPC to simply launch processes efficiently?
There doesn't need to be any RPC involved, the data-cruncher can & should be a child process forked from the driver. The communication between the two might be sockets, or shm, but it could be as simple as the cruncher receiving jobs on stdin and shipping notifications on stdout.

Modularity is a virtue.

Besides, I'm not arguing that fork() has to be the _only_ way to launch processes; it's entirely OK to _also_ have a spawn()-like interface for the 'simple case' where you don't want to juggle fds, ulimits, creds, etc., as long as fork() is still supported for the hard cases. And there's always vfork()...

> And you're arguing that Unix is well designed?
Yes. It is.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:47 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (20 responses)

> There doesn't need to be any RPC involved, the data-cruncher can & should be a child process forked from the driver.
I'm sorry. Have you ever worked with Java? Typically you have a server that runs some kind of service. It's a single process - it makes sharing data between requests very easy.

This single process can be very large, tens of gigabytes in size. Modern JVMs are quite efficient at managing large heaps, so this is desirable.

Now you need to launch a helper process. If you use fork()+exec then you're looking at duplicating the entire working set of the application server.

> Yes. It is.
Nope. We have examples of better-designed APIs now.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:04 UTC (Thu) by ecree (guest, #95790) [Link] (8 responses)

> Have you ever worked with Java?

Not when there was any alternative.

> It's a single process - it makes sharing data between requests very easy.

Fun fact: you can share memory between distinct processes, by any of several means.

Also, I'm not suggesting spinning off a separate process to handle each request (the xinetd model); just splitting up the workload into separate processes doing different aspects of the job. Do one thing well.

> This single process can be very large, tens of gigabytes in size. Modern JVMs are quite efficient at managing large heaps, so this is desirable.

Your definition of "desirable" clearly differs from mine.

> Now you need to launch a helper process. If you use fork()+exec then you're looking at duplicating the entire working set of the application server.

I know that. Which is but one of the many reasons you shouldn't build a gigantic monolithic application server in the first place.

The Unix system philosophy is like the Westminster system of government. Take any one part of it in isolation, and it looks obviously silly; incautiously import ideas from another system and everything falls apart. But the whole thing, when put together and kept intact, thrums along beautifully and achieves world domination.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:10 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Fun fact: you can share memory between distinct processes, by any of several means.
Fun fact: none of them allow you to transparently share complex data, with automatic garbage collection.

> Also, I'm not suggesting spinning off a separate process to handle each request (the xinetd model); just splitting up the workload into separate processes doing different aspects of the job. Do one thing well.
So how about being able to run processes without requiring ugly workarounds? Or is this a part that doesn't need to be done well?

> I know that. Which is but one of the many reasons you shouldn't build a gigantic monolithic application server in the first place.
So you're confirming the authors' statement - you have to build your whole system around deficiencies of fork().

> The Unix system philosophy is like the Westminster system of government. Take any one part of it in isolation, and it looks obviously silly; incautiously import ideas from another system and everything falls apart. But the whole thing, when put together and kept intact, thrums along beautifully and achieves world domination.
No. The Unix philosophy is to get something working ASAP and then just objectify it as the epitome of creation, whether it's bad or not.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:42 UTC (Thu) by ecree (guest, #95790) [Link] (2 responses)

> So you're confirming the authors' statement - you have to build your whole system around deficiencies of fork().

No; you have to build your system in ways that are already the Right Thing _for other reasons_.

fork()'s "deficiencies" are only deficient for software that is _already badly designed_ before fork() enters the picture.

> The Unix philosophy is to get something working ASAP and then just objectify it as the epitome of creation

If that were true, Unix systems would still be written in B.

The developers of Research Unix at Bell Labs weren't averse to experimenting with changes to the system. They merely avoided changes which, while superficially attractive, did more harm than good. They had 'engineering taste' — which is really the ability to intuit the deeper consequences and ramifications of a design decision.

And the Unix design, as continued by Plan 9 and Linux, continues to evolve (/proc, /sys, entirely new kinds of fds), but always guided by the Unix philosophy.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:47 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Unless of course you'd rather have a spawn() function that takes as an argument a BPF program that sets up the child environment before the new process image is executed ;)
Why is a server that allows to seamlessly share complex graphs of objects is badly designed? Designing something as multiple processes is not at all better in itself.

> If that were true, Unix systems would still be written in B.
The first Unix versions were written in assembly. Unfortunately, PDP-s became unavailable otherwise Unix fans would have still be extolling the virtues of it.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 18:41 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link]

The first Unix versions were written in assembly. Unfortunately, PDP-s became unavailable otherwise Unix fans would have still be extolling the virtues of it.
The original PDP-7 implementation was written in machine language for want of any other choice. Dito for parts of the original PDP-11 implementation. Nevertheless,
We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs.
[D. Ritchie, The Development of the C Language]

and

By early 1973, the essentials of modern C were complete. The language and compiler were strong enough to permit us to rewrite the Unix kernel for the PDP-11 in C during the summer of that year. (Thompson had made a brief attempt to produce a system coded in an early version of C--before structures--in 1972, but gave up the effort.)
[p. 16]

There was indeed an OS written in PDP-10 machine language whose fans keep extolling its virtues until today: The MIT AI lab Incompatible Timesharing System (with PCLSRIng being 'the virtue') but that's something different.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:59 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (3 responses)

*looks at British politics*

You know, your analogy says some pretty unflattering things about Unix.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 10:27 UTC (Fri) by ecree (guest, #95790) [Link] (2 responses)

Note where I said "incautiously import ideas from another system and everything falls apart".

If I wanted to be maximally inflammatory, I would say that in the analogy, the EU represents systemd. But let's not go down that rabbithole.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 11:24 UTC (Fri) by tao (subscriber, #17563) [Link] (1 responses)

Ah, you mean works much better than the alternative, but there's a rabid small group that seems convinced otherwise that screams very loudly, but cannot really agree with each other on what the alternative "better" solution would be, except that everyone seems convinced that things were better in the mythical "before".

Yes, your simile is rather apt.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 19:21 UTC (Fri) by MatejLach (guest, #84942) [Link]

You articulated my feelings about the systemd hate more acurately than I could. It seems that as time goes on, everything seems to be remembered more fondly, (not just true for sysvinit, it happens with movies, president approval ratings etc.).

One thing that many people also miss, is that systemd's a 'service manager', therefore its work doesn't stop once your services are up and running. Now I know many would argue that's a downside, but the reality is, the alternative is to get the same set of functionality via a patchwork of variable-quality scripts on top of a 'simpler' init system.

Also, complaints about logind are funny, because nobody was apparently willing to do equivalent maintenance work, (consolekit etc.), so yeah.

Anyway, it's getting a bit ranty, but the point still stands.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 14:12 UTC (Fri) by joncb (guest, #128491) [Link] (9 responses)

I feel like if you worrying about fork() while working in Java then something has gone horribly wrong.
I could be wrong, i don't know your workload, but I feel like Java and fork are not meant to be friends.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 18:18 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Why is that? Java can use helper utilities just like everything else. There's also Golang that suffers from the same issues.

Microsoft research: A fork() in the road

Posted Apr 13, 2019 7:26 UTC (Sat) by joncb (guest, #128491) [Link] (7 responses)

The whole point of Java is to detach yourself from these low level concerns.

Indeed, a very quick search suggests that to create a helper process you should either use Runtime.Exec or ProcessBuilder (haven't really touched Java in a good decade so that is probably misleading in the nuances). While i wouldn't be surprised if one of the implementations involves a fork under the covers there's no reason it couldn't be anything else that guarantees the expected semantics.

The difference, of course, between C/C++ and Java/C# is that the former are languages that are expected to execute (more or less) directly on top of the current system whereas the latter are expected to present a virtual facade across such. Therefore i would expect C to have access to fork() where it is available whereas i would not expect Java or C# to do so. Golang is a weird blending of the two where some things are more C like and somethings are not, low level fork access apparently being one of the nots. Rust appears to have fork but has some hefty safety warnings on it.

Microsoft research: A fork() in the road

Posted Apr 13, 2019 9:23 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> The whole point of Java is to detach yourself from these low level concerns.
In this particular case I was running a GPU-based optimizer in a separate process. It was kinda crashy (drivers...), so isolating it was a good idea. Heck, it even used pipe-based interaction. How much more Unixy can you get?

> Indeed, a very quick search suggests that to create a helper process you should either use Runtime.Exec or ProcessBuilder (haven't really touched Java in a good decade so that is probably misleading in the nuances). While i wouldn't be surprised if one of the implementations involves a fork under the covers there's no reason it couldn't be anything else that guarantees the expected semantics.
They both use fork (more precisely, clone) on Linux. There's no way to avoid it, and this is one of the problems.

Microsoft research: A fork() in the road

Posted Apr 13, 2019 23:52 UTC (Sat) by joncb (guest, #128491) [Link] (5 responses)

> They both use fork (more precisely, clone) on Linux. There's no way to avoid it, and this is one of the problems.

I assume you really don't mean "No way to avoid it" here because if there's literally "no way" then this whole exercise is just shouting into the void.

In particular, i'm thinking you (and i specify you because yours is the use case here) write a patch for openJDK that re-implements ProcessBuilder to use something other than fork when calling start(). From your comments on this story that should be very doable. You submit that patch to openJDK and make your case. Regardless of whether it is accepted or not, you can now run openJDK secure in the knowledge that your application is using this faster/safer/cleaner/whatever alternative.

In my travails doing an informal survey of how languages fork i came across an interesting python issue about moving to posix_spawn. It looks like it's stalled for technical compatibility reasons ( https://bugs.python.org/issue35823 ). The part stating that libc "may be more than a decade behind in enterprise Linux distros" shows where bigger problems lie.

Microsoft research: A fork() in the road

Posted Apr 13, 2019 23:54 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

There's no "something other" on Linux.

Microsoft research: A fork() in the road

Posted Apr 14, 2019 0:45 UTC (Sun) by zlynx (guest, #2285) [Link]

Large memory processes like Java should use "vfork()" or "clone()" instead of fork.

Even with overcommit turned on, trying to fork a 10 GB Java process can fail because it exceeds the heuristic.

With overcommit disabled, which is how I run my Linux servers, it will definitely fail.

Luckily we have vfork which was designed for exactly this problem. It doesn't duplicate the process memory, not even CoW. With a bit of care to not overwrite important memory in the parent process, it works very well to launch new child processes.

So "vfork()" is "something other" because it is like fork, but isn't actually fork.

Microsoft research: A fork() in the road

Posted Apr 14, 2019 23:49 UTC (Sun) by neilbrown (subscriber, #359) [Link]

> There's no "something other" on Linux.

Couldn't you open a socket and send a dbus message to systemd to ask it to run some service for you ??
Of course, if you don't like systemd, just write a dedicated server which does whatever you want done.

Microsoft research: A fork() in the road

Posted Apr 15, 2019 6:02 UTC (Mon) by joncb (guest, #128491) [Link] (1 responses)

> There's no "something other" on Linux.

Don't you think this is putting the cart before the horse just a little bit then? Surely creating a "something other" should take precedence to advocating for developers to stop using the one tool they have for this basic task?

Microsoft research: A fork() in the road

Posted Apr 15, 2019 8:41 UTC (Mon) by farnz (subscriber, #17727) [Link]

Not really - the paper says that in practical terms, fork isn't a good API, and while posix_spawn looks better in theory, it practically becomes a mess to use.

The paper is more of an academic opinion piece; it sets out why fork causes issues, why posix_spawn and friends aren't enough better to be worth the effort of a wholesale rewrite of software, and asserts that it should be possible to produce a better API given that, in theory, spawn-type APIs are easier for OS developers to implement.

Within the bounds of academia, this sort of paper serves to legitimise research into better APIs; someone has asserted with examples that existing APIs are imperfect, and now future researchers interested in process creation APIs have something they can use as a reference when they justify spending time on the "solved" problem of spawn versus fork APIs. Maybe the answer will turn out to be that posix_spawn and fork are both local maximums, and the only way to do better is a radical rethink of process design; maybe some bright spark will demonstrate that there is a better API we can use if we step aware from the existing ones.

Key is that we don't have good data on better alternatives to the current "spawn with 101 flags to inherit the right bits of the world" and "fork then clean up" APIs; the paper says we need to work out what the "something other" should look like, because "fork and clean up" is easy for the user, but sets various design choices for the kernel (and requires certain hardware support to be performant - we get CoW very cheaply with modern MMUs, but at the expense of requiring MMUs for an OS kernel, not just MPUs), while "spawn" is easy for the kernel, but leads to huge complexity for the user as they have to handle 101 flags to get the "right" environment in the spawned process.

Microsoft research: A fork() in the road

Posted Apr 16, 2019 7:08 UTC (Tue) by gfernandes (subscriber, #119910) [Link]

I do actually, work on very large, in memory cache, Java applications. And guess what?

We're now _breaking it ALL up_ into microservices, throwing out all the large in memory caches, even moving databases to Mongo or PGSQL.

*ecree* is right.

Gigantic monoliths are no excuse for poor software design.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:28 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> And do note that it's only the _anonymous shared_ mappings that are a problem; file-backed mappings don't require COW, and nor do private anonymous mappings. Your "large amount of cached data" could have been stored in memory allocated with mmap(MAP_PRIVATE | MAP_ANON), instead of regular malloc(), and then it wouldn't show up in the child after fork().
Typically this kind of stuff is stored in native data structures and so it doesn't have to do anything with files. You also can't typically control the allocations made by the JVM or your language runtime.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:52 UTC (Thu) by ecree (guest, #95790) [Link] (3 responses)

> Typically this kind of stuff is stored in native data structures and so it doesn't have to do anything with files.

I know, that's why you use MAP_ANON. Do pay attention ;)

> You also can't typically control the allocations made by the JVM or your language runtime.

I very nearly said something about "the problem with most application servers is they're written in Java", but I held back. Maybe I shouldn't've.

Language runtimes ought to provide mechanisms for allocating objects in private memory, if they're intended to be used for big programs that want child processes. Indeed, if they're going to be written around a spawn()ish view of the world, then objects allocated from user code won't need to be visible post-fork(), so such objects could just be allocated private by default.

C gives you that control, through the aforementioned mmap(), and it's probably even possible (I haven't tried it) to patch your libc to make malloc default-private.

An even more fine-grained system might be tagged allocations, where the fork()-analogue (probably clone()) could specify which tags it wanted to copy into the child. But probably no-one's ever needed that, else there would have been a serious attempt to implement it.

Microsoft research: A fork() in the road

Posted Apr 11, 2019 21:54 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Language runtimes ought to provide mechanisms for allocating objects in private memory, if they're intended to be used for big programs that want child processes. Indeed, if they're going to be written around a spawn()ish view of the world, then objects allocated from user code won't need to be visible post-fork(), so such objects could just be allocated private by default.
Well, they don't. Go, Python, Java, C# all use simple private mappings.

Why _should_ they be designed around fork()?

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:29 UTC (Thu) by ecree (guest, #95790) [Link] (1 responses)

> Why _should_ they be designed around fork()?

Because fork() is necessary to allow complex control of child environment without excessive API surface (spawn() functions with 42 arguments, etc.). So it needs to be supported.

Unless of course you'd rather have a spawn() function that takes as an argument a BPF program that sets up the child environment before the new process image is executed ;)

Microsoft research: A fork() in the road

Posted Apr 11, 2019 22:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Because fork() is necessary to allow complex control of child environment without excessive API surface (spawn() functions with 42 arguments, etc.). So it needs to be supported.
As demonstrated by Zircon in Fuchsia, it's not necessary. You can download the Fuchsia SDK yourself and check it, it's available right now as a counter-example to your point.

> Unless of course you'd rather have a spawn() function that takes as an argument a BPF program that sets up the child environment before the new process image is executed ;)
No, I would have a family of process-management functions that accept the target process handle as a parameter and ability to create suspended processes.

Microsoft research: A fork() in the road

Posted May 31, 2021 17:35 UTC (Mon) by immibis (subscriber, #105511) [Link] (1 responses)

Java has perfectly functional language-level isolation primitives, and although not everything in the standard library is well-behaved, most things are - no different from the C library, really.

There is generally no good reason you should split your Java app into multiple processes just because the OS demands it. Half the point of Java is to shield you from such things, is it not? If you want to split up your app into multiple cooperating modules - as you should - you can do that within the one process.

Microsoft research: A fork() in the road

Posted Jun 1, 2021 1:42 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Java doesn't really handle isolation well. Threads can leak, the heap is shared, etc.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 3:46 UTC (Fri) by roc (subscriber, #30627) [Link] (2 responses)

Private file mappings need COW.

Shared anonymous mappings sometimes need COW too, but you just can't have that in Linux/POSIX.

Microsoft research: A fork() in the road

Posted Apr 12, 2019 10:35 UTC (Fri) by ecree (guest, #95790) [Link] (1 responses)

> Private file mappings need COW.

Yeah I was getting my terminology a bit confused last night.

What I was trying to say was that malloc() memory 'normally' needs COW and file mappings 'normally' don't.

The problem mappings are those which have a _separate_ mapping in the child, which is actually the private ones; shared mappings remain mapped in the child but without COW (I think?), and there's no kind of M_CLOFORK mapping that just isn't mapped in the child at all (which is what my brain late last night said private meant).

> Shared anonymous mappings sometimes need COW too

Why? If it's a shared mapping, then writes by the child should be visible in the parent and vice-versa, so both processes can map the same page and no need to COW. What am I missing?

Microsoft research: A fork() in the road

Posted Apr 12, 2019 22:12 UTC (Fri) by roc (subscriber, #30627) [Link]

> The problem mappings are those which have a _separate_ mapping in the child, which is actually the private ones; shared mappings remain mapped in the child but without COW (I think?)

That's correct.

> and there's no kind of M_CLOFORK mapping that just isn't mapped in the child at all (which is what my brain late last night said private meant).

That's true. Though there is madvise(MADV_DONTFORK) which gives you similar functionality.

> > Shared anonymous mappings sometimes need COW too
> Why? If it's a shared mapping, then writes by the child should be visible in the parent and vice-versa, so both processes can map the same page and no need to COW. What am I missing?

As discussed in the paper that spawned this thread, sometimes fork() is used to create checkpoints of process state (e.g. rr and Redis do this). COW makes this extremely efficient for MAP_PRIVATE pages, which is great, but it doesn't work with MAP_SHARED pages, so rr (not sure about Redis) has to eagerly copy them into the checkpoint. This is bad.

The MAP_PRIVATE/MAP_SHARED model is too inflexible. It would be better to have a model where you can create memory objects backed by files or anonymous memory, and then explicitly COW-clone them (and of course map those objects into your address space, pass them to other processes, etc). The Fuschia documentation isn't great but it seems to have this kind of API. This would require the kernel to manage a tree of COW-clones for each memory object, but that isn't very different to today where Unix kernels have to manage a tree of COW-clones of process address spaces.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds