if the child really only needs the 5MB, it can free the rest of the allocations and you are back to the 105MB total.
if the programmer isn't sure if the child needs 5MB of data of the entire 100MB of data then they would need to keep everything around in any case.
the worst-case of COW is that you use as much memory as you would without it. In practice this has been shown empirically to be a very large savings. some people are paranoid about this and turn off overcommit so that even in this worst case they would have the memory, but even they benefit from the increased speed, and from the fact that almost all the time the memory isn't needed.
so I disagree with your conclusion that there is so much memory wasted.
Posted Jan 8, 2011 9:12 UTC (Sat) by lwn555 (guest, #72175)
[Link]
"if the child really only needs the 5MB, it can free the rest of the allocations and you are back to the 105MB total."
Easily said. While it's technically possible to free all unused memory pages after a fork, it's unusual to actually do this. The piece of code calling fork() may not really be aware or related to the memory allocated by the rest of the process.
Consider how difficult it would be for one library to deallocate the structures of other libraries after performing a fork.
Even if we did track all objects to free after forking, malloc may or may not actually be able to free the pages back to the system, particularly with pages allocated linearly via sbrk() since objects needed by the child are likely to be near the end.
"the worst-case of COW is that you use as much memory as you would without it."
We can agree there are no reasons not to use copy on write to implement fork.
"so I disagree with your conclusion that there is so much memory wasted."
Then I think you misunderstood the example. No matter which way you cut it, so long as the child doesn't do anything to explicitly free unused pages, it is stuck with 95MB of unusable ram. If the parent updates it's entire working set, then the child will be the sole owner of the data. If the parent quits and the child is allowed to continue, then the useless 95MB is still there. And this is only for one child.
You may feel this is a contrived example, but I can think of many instances where it would be desirable for a large parent to branch work into child processes such that this is a problem.
Fork works great in academic examples and programs where the parent is small, doesn't touch it's data, or the children are short lived. But there are applications where the fork paradigm in and of itself leads to excessive memory consumption.