Some ado about zero
In the good old days, the kernel would use the zero page in situations where it knew it needed a page full of zeroes. So, for example, if a process incurred a read fault on a page it had never used, the kernel would simply map the zero page into that address. A copy-on-write mapping would be used, of course; if the process subsequently modified the page, it would end up with its own copy. But deferring the creation of a new, zero-filled page helped to conserve zeroes, keeping the kernel from running out. Incidentally, it also saved memory, reduced cache pressure, and eliminated the need to clear the new page.
Memory management changes made back in 2007 had the effect of adding reference counting to the zero page. And that turned out to be a problem on multiprocessor machines. Since all processors shared the same zero page (per-CPU differences being unlikely), they also all manipulated the same reference count. That led to serious problems with cache line bouncing, with a measurable performance impact. In response, Nick Piggin evaluated a number of possible fixes, including special hacks to avoid reference-counting the zero page or adding per-CPU zero pages. The patch that got merged, though, simply eliminated most use of the zero page altogether. The change was justified this way:
There was some nervousness about the patch at the time; Linus grumbled about the changes which created the problem in the first place, and worried:
Despite his misgivings, Linus merged the patch for 2.6.24 to see what sort of problems might come to the surface. For the next 18 months, it appeared that such problems were scarce indeed; most people forgot about the zero page altogether. In early June, though, Julian Phillips reported a problem he had observed:
When I run this program on a system running 2.6.20.7 the process only ever seems to use enough memory to hold the data that has actually been written (well - in units of PAGE_SIZE). When I run the program on a system running 2.6.24.5 then as it reads the map the amount of memory used continues to increase until the complete map has actually been allocated (and since the total size is greater than the physically available RAM causes swapping). Basically I seem to be seeing copy-on-read instead of copy-on-write type behaviour.
What Julian was seeing, of course, was the effects from the removal of the zero page. On older kernels, all of the unwritten pages in the data structure would be mapped to the zero page, using no additional physical memory at all. As of 2.6.24, each of those pages gets an actual physical page - containing nothing but zeroes - assigned to it, increasing memory use significantly.
Hiroyuki Kamezawa reports that he has seen zero-page-dependent workloads at other sites. Many of those sites, he says, are running enterprise Linux distributions which have not, yet, shipped kernels new enough to lack zero page support. He worries that these users will encounter the same sort of unpleasant surprise Julian found when they upgrade to newer kernels. In response, he has posted a patch which restores zero page support to the kernel.
Hiroyuki's zero page support isn't quite the same as what came before, though. It avoids reference counting for the zero page, a change which should eliminate the worst of the performance problems. It does add some interesting special cases, though, where virtual memory code has to be careful to test for zero pages; the bulk of those cases are handled with the addition of a get_user_pages_nonzero() function which removes any zero pages from the indicated range. Linus dislikes the special cases, thinking that they are unnecessary. Instead, he has proposed an alternative implementation using the relatively new PTE_SPECIAL flag to mark zero pages. As of this writing, a updated version of the patch using this approach has not yet been posted.
Nick Piggin, who wrote the patch removing zero page support in the first place, would rather not see it return. With regard to the affected users, he asks:
Linus, however, would like to see this feature
restored if it can be done in a clean way. So the return of zero page
support seems fairly likely, assuming the patch can be worked into
sufficiently good shape. Whether that will bring comfort to
enterprise kernel users remains to be seen, though; the next generation of
enterprise Linux releases look set to use kernels around 2.6.27. Unless
distributors backport the zero page patch, enterprise Linux users will
still be stuck with the current, zero-wasting behavior.
| Index entries for this article | |
|---|---|
| Kernel | Memory management |
| Kernel | Zero page |
