The fork design pattern is terribly inefficient on systems without virtual memory hardware. Even on those with a VMM unit, copying all the page tables just to perform a simple task is often needlessly expensive. This encourages applications to manage process pools which defeats the simplicity of using fork for simple tasks.
As mentioned by another poster, unfortunately unix file descriptors default to inheritable, which is the opposite of what is desired. In just about 100% of cases, the code doing the fork knows exactly which file handles it wants to pass into a child, yet this code knows nothing about the file descriptors opened in 3rd party libraries. In fact, even if the third party code sets CLOEXEC correctly for itself, a process wishing to spawn multiple children has no way to set the flags correctly for all children. This problem is amplified for multithreaded programs, which can be cloned with file handles and mutexes in invalid states, necessitating the kludge which is pthread_atfork.
This is exactly the reason it's common for security minded linux apps to cycle through closing the first 1024 file descripters immediately before calling exec. This is the only way to be reasonably confident (but not 100%) that handles are not inadvertently leaked to children.
In order to be efficient, the operating system must over commit resources to accommodate all processes using fork. Consider a web browser session occupying some 100MB of ram. Suppose it forks children to do parallel processing, such as downloading files. Now, the main browser continues to fetch new pages and media, which fits into the same 100MB of ram, however the existence of forked children means the kernel cannot free the old unused 100MB of ram since it belongs to a child.
Fork just gets more problematic as the parent processes get larger.
In principal it's not unreasonable for a 1.5GB database process to spawn a 5MB job, yet the fork implies over-committing 1.5GB of ram to this single child at least temporarily. In practice, over-committing can lead to insufficient memory conditions, which is why kernel developers invented the dreaded "Out of memory process killer" to kill otherwise well behaved processes under linux.
Consider that without fork, the fundamental need to overcommit disappears.
Combine all this with the fact that fork isn't very portable, one must come to the conclusion that fork should generally be avoided in large scale projects. Or, if it is used, the parent's role should be limited to forking and monitoring children. This largely precludes the benefits of the fork programming pattern in the first place.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds