LWN: Comments on "ZONE_DEVICE and the future of struct page" https://lwn.net/Articles/717555/ This is a special feed containing comments posted to the individual LWN article titled "ZONE_DEVICE and the future of struct page". en-us Thu, 28 Aug 2025 08:48:14 +0000 Thu, 28 Aug 2025 08:48:14 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net ZONE_DEVICE struct page != ZONE_NORMAL struct page in terms of write rate https://lwn.net/Articles/717973/ https://lwn.net/Articles/717973/ djbw <div class="FormattedComment"> I'm not convinced that's going to be a problem in practice. Consider that the bulk of what makes struct page a frequently accessed data structure is when it is used by the core mm for general purpose page allocations. The ZONE_DEVICE mechanism never releases these pages for that high frequency usage. Another mitigation is that struct page writes are buffered by the cpu cache, which further reduces the write rate to media.<br> </div> Fri, 24 Mar 2017 00:15:50 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717972/ https://lwn.net/Articles/717972/ Cyberax <div class="FormattedComment"> There are problems with that. Persistent memory is fast and durable, but not as fast and durable as the regular volatile RAM. So you have to put your page structures in the main RAM and this adds up quickly - a 2Tb persistent array will require around 32Gb or page structures.<br> </div> Thu, 23 Mar 2017 23:39:47 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717963/ https://lwn.net/Articles/717963/ djbw <div class="FormattedComment"> In fact, it's not a waste. It's fundamental to many kernel paths. The DAX enabling without pages loses get_user_pages() support which disables not only DMA / direct-I/O, but also fundamental operations like fork and ptrace. We're already paying this 1.5% overhead for main memory, and my argument is that we should simply pay that overhead for persistent memory as well. It's not enough to convert some paths to use pfn_t and with new kmap() primitives, because that leaves us an ongoing maintenance burden of dual code paths as developers add new struct page usages. Unless we create a plan to get rid of struct page everywhere we should not special case persistent memory... especially when we have a mechanism to pay the overhead cost from pmem itself.<br> <p> Once we mandate struct page for DAX this appears to open up several clean up opportunities like re-using more of the the core page cache implementation and unifying device-DAX / filesystem-DAX.<br> </div> Thu, 23 Mar 2017 22:25:59 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717825/ https://lwn.net/Articles/717825/ roc <div class="FormattedComment"> It's not about being lazy, it's about tradeoffs.<br> <p> Suppose for the sake of argument that the only two options are #1 drop SPARC64 support or #2 all architectures must waste 64 bytes per 4K page. If the "don't break things" rule forces choice #1 then that means Linux performance on much more common systems is being dragged down by legacy baggage (especially after more similar decisions accumulate).<br> <p> Now suppose there's another option, #3 waste no memory and do a bunch of rearchitecting of the SPARC64 port to handle it. That sounds good, but what if the SPARC64 maintainer doesn't want to do the work? If you're 100% committed to "don't break things" then you can't motivate them by threatening to drop SPARC64 support. Instead the burden of reworking SPARC64 falls on whoever's implementing the core feature. That really sucks for various reasons, but in particular you're making core development that benefits many users much more difficult for the sake of some relatively very small number of users.<br> <p> "Lazy" is a pejorative term implying moral deficiency. There's nothing morally deficient about being honest about taking these tradeoffs seriously.<br> </div> Wed, 22 Mar 2017 22:04:42 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717745/ https://lwn.net/Articles/717745/ flussence <div class="FormattedComment"> The kernel's golden rule is “don't break userspace”. Intentionally breaking an entire class of currently-working systems for the sake of being lazy is a pretty awful thing to suggest.<br> </div> Wed, 22 Mar 2017 06:33:12 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717720/ https://lwn.net/Articles/717720/ willy <div class="FormattedComment"> I don't think it's accurate to say "marginal and dying". For one thing, it is my strong suspicion that we will see persistent memory on SPARC64 CPUs given Oracle's focus. It has a public roadmap going out to 2021.<br> <p> We wouldn't let, say, FR-V or Alpha disrupt persistent memory features, but I think SPARC64 is still relevant.<br> </div> Tue, 21 Mar 2017 21:49:28 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717715/ https://lwn.net/Articles/717715/ roc <div class="FormattedComment"> Why would anyone let SPARC64 requirements drive a design decision like this? That architecture is marginal and dying. And in this case it sounds like it could be worked around with some effort by the SPARC64 maintainer(s).<br> </div> Tue, 21 Mar 2017 19:28:14 +0000 ZONE_DEVICE and the future of struct page https://lwn.net/Articles/717659/ https://lwn.net/Articles/717659/ willy <div class="FormattedComment"> Hi Jon; thanks for the write-up as always!<br> <p> There was a certain amount of cross-talk and mis-speaking; drivers that need to reach into the scatterlist to manipulate the data need a kernel address. What we said yesterday was "They should be using kmap_pfn()", which is actually a gross oversimplification. For the benefit of our audience, on a 32-bit machine, the physical address may not be in lowmem, so you can't just do pfn_to_virt() or page_to_virt().<br> <p> What I now believe is that we need a kmap_sg() and then drivers don't need to care whether there's a PFN or a struct page in the scatterlist; they're getting the virtual address that they need. I'm not sure whether we want a kmap_sg_atomic(). A quick grep tells me we already have scsi_kmap_atomic_sg() which looks ideal other than the "scsi_" prefix. We also have bvec_kmap_irq(), bio_kmap_irqj() and __bio_kmap_atomic(). <br> <p> </div> Tue, 21 Mar 2017 15:34:33 +0000