LWN: Comments on "The case of the overly anonymous anon_vma" https://lwn.net/Articles/383162/ This is a special feed containing comments posted to the individual LWN article titled "The case of the overly anonymous anon_vma". en-us Wed, 17 Sep 2025 04:50:41 +0000 Wed, 17 Sep 2025 04:50:41 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net The case of the overly anonymous anon_vma https://lwn.net/Articles/385910/ https://lwn.net/Articles/385910/ efexis <div class="FormattedComment"> You're right about the page fault bit, my mistake, a reference count would be sufficient there, the reverse map would be for changing the backing for purposes of swap or migration in a NUMA or HIGHMEM system where quick knowledge page&lt;--&gt;active sets is required.<br> Reverse mapping for file backed pages don't need anon_vma stuff obviously, because they're not anon. What they use isn't simpler though, as, for example, a library will often be shared by more address spaces than an anon page, it makes more sense to use a search tree (at least this used to be the case, according to a slightly dated early 2.6.x book detailing it. If it's now simpler than this, that would show kernel devs in fact simplifying things, not adding complexity).<br> If you use none of these, go into your kernel conf and disable CONFIG_SWAP, CONFIG_NUMA and any CONFIG_HIGHMEM entries. If nothing else uses it, it'll either be removed, or worst case, a few greps will show you where the functions are being used elsewhere to throw a few #ifdef's around (I imagine embedded comunity would have interest in already doing this). <br> <p> </div> Mon, 03 May 2010 16:51:07 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/385722/ https://lwn.net/Articles/385722/ i3839 <div class="FormattedComment"> It is. Reverse mapping for file backed pages doesn't need anon_vma stuff, it's a lot simpler.<br> <p> You don't need a reverse mapping to do COW, when you get a pagefault you know the virtual address which caused it.<br> <p> Reverse mapping is needed to find all virtual pages belonging to a certain physical one. That isn't needed often for anonymous memory.<br> <p> </div> Sat, 01 May 2010 12:17:27 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/385621/ https://lwn.net/Articles/385621/ efexis <div class="FormattedComment"> Hmm... I dunno, I would have thought that this isn't so much only needed for anonymous memory, but this may be the only reason why it's needed for anonymous memory, iyswim... that in other uses there are other reasons for it too, and so it can actually be simpler to just have it, than have it in some places and not in others. Anywhere you have copy-on-write pages it's going to be important, as the way you achieve that is to mark the pages read only. Someone tried to write to it, it causes a memory protection fault, which jumps into the VM code giving it the address of memory that the fault occured while trying to write. So where do you start if you don't have a back link to get from that address to the structures belonging to those using the page? You'd end up just doing loads of searching instead. The same goes for when it's something like memory mapped files, you need to know when the pages have been modified so you know it has to at some point be written back to disk. If you can't look up what's going on from the address it's going on at, things get way more complicated. Seems this is just basic accounting :-/<br> <p> </div> Fri, 30 Apr 2010 20:02:52 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384719/ https://lwn.net/Articles/384719/ i3839 <div class="FormattedComment"> I hope you're wrong. :-)<br> <p> This seems more a gradual development, with every thing on its own making sense at the time, but together still going in the wrong direction. It for sure isn't a big enough problem yet.<br> <p> It would be nice to know what the reverse mapping is used for besides swapping. It seems that the reverse mapping of files is a solved problem, but that all this complexity is only for anonymous memory. Which can only go away is you have swap enabled, or some obscure option like memory hotplug. (In the case of hibernation you need to scan all pages anyway, so no need for this either.) So I guess that my main complaint is that this is done even when not needed.<br> <p> For swap you only need a reverse mapping for pages that might get swapped out soon. I wonder if it's possible to create that mapping only for inactive pages instead of all of them all the time. Alternatively, swapping out can be done per-process, then there's no need for a reverse map. Shared pages are swapped out less quickly, but that's usually better anyway.<br> <p> The swapping system needs an overhaul anyway, it's very slow currently. This way both the VM and the swap systems can be improved.<br> <p> Improving the current code may not be easy, but it for sure is possible.<br> <p> </div> Mon, 26 Apr 2010 19:09:39 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384678/ https://lwn.net/Articles/384678/ efexis <i>"I don't have time to look further into the details"</i><br> <br> Well my guess is that they did, and that code that mostly slowed the memory manager down for savings only in a corner case would have been thrown out by Linus, as is often the way. Mon, 26 Apr 2010 03:03:44 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384650/ https://lwn.net/Articles/384650/ i3839 <div class="FormattedComment"> You're forgetting the memory overhead (including cache misses), not everything is pure CPU cycles. Besides that, this adds a cost to most memory operations, even if it turns out it was never necessary. Why slow down the common case to speed up special cases in rare situations?<br> <p> I don't have time to look further into the details, at least not this month. Hopefully next month.<br> <p> </div> Sat, 24 Apr 2010 22:48:50 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384649/ https://lwn.net/Articles/384649/ ewen <div class="FormattedComment"> If you always insert at the start of the list, or always insert at the end of the list, then the order that you are maintaining is "order inserted" (either newest at start or newest at end). This isn't necessarily what one means by an ordered list (except if you actually want a stack or a queue, both of which are "ordered by time"). Inserting anywhere else in the list, to order by a data value, requires finding that location, eg, page address, which requires either knowing the pointer to the immediately prior page address in the list, or going and finding same (O(n)). (As you suggest deletes could be quick if you make the process track a pointer to the relevant entry in the link list, and use a double linked list, something I'd overlooked.)<br> <p> From what I can see "find by page address" is the only primitive that actually matters here (so you can find the references to that page); "find by process id" seems easiest done starting with the processes structures. And the problem that we started with was a 1,000,000-long linked list which referenced multiple pages held by multiple processes. So any structure which made it faster to find the entries for the appropriate page would help, providing its other overhead wasn't too high.<br> <p> Ewen<br> </div> Sat, 24 Apr 2010 22:21:05 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384644/ https://lwn.net/Articles/384644/ efexis <div class="FormattedComment"> Inserts are also quick if you do maintain order, simply by maintaining a pointer to the end of the list. You've only two pointers to update for the list (end of list and pointer to end of list) or am I forgetting something? Deletes could be sped up by making it double linked, and I can't think of why you'd ever need to perform a search... even if you needed to find a particular processes own structure for a page, seems like you'd start the search from that processes page table which can be done in a determinate amount of time, rather than from your own and then through the linked list.<br> <p> I don't know what other property you could be looking for, other than memory address or process id, that you could use as the value for an index or hash. If you were collecting a stat (such as the process that's using the page that runs the most, which could be a useful thing to know) you'd have to iterate through the lot. Maintaining an index of that value is loads more work that I can't see paying off. What other value, other than page address and process id, would you want to search on?<br> <p> <p> </div> Sat, 24 Apr 2010 21:46:37 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384640/ https://lwn.net/Articles/384640/ efexis <div class="FormattedComment"> Fantastic. That last diagram is like a punchline, I wasn't completely with it until that last one which triggered what I refer to as a "cascade of realisation", which is like a nice brain massage :-) much appreciated.<br> <p> Not so appreciated though was the pointing to a 2004 article that I remember reading like it was just erm... 200err... 2007? Way to make me feel old for the first time ever ;-p<br> <p> <p> </div> Sat, 24 Apr 2010 21:15:17 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384641/ https://lwn.net/Articles/384641/ ewen <div class="FormattedComment"> I had in mind replacing the linked list with, eg, a hash (with buckets, probably), which is O(n) average case, or some sort of tree structure, which is O(log n) average case. Inserts are quick on a linked list if you don't bother to maintain any order, but searching and deletes are not; for most other structures inserts are a bit slower, but you get faster searching and deletes.<br> <p> Still if the tricky details of the new solution have been sorted out then there's no point in changing back now. I guess time will tell.<br> <p> Ewen<br> </div> Sat, 24 Apr 2010 21:08:36 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384637/ https://lwn.net/Articles/384637/ efexis <div class="FormattedComment"> More complex doesn't necessarily mean less efficient. Having to search through all other pages belonging to all other processes, just to see who, if anyone, is sharing it with you, incures considerably more overhead. There're probably more reasons you'd want to do this than just at page swapout time too, maybe dealing with process memory limits - knowing who to charge the memory usage to, or with NUMA systems becoming more common, knowing whether it's worth moving a page to memory closer to the core running the majority of processes accessing it. Sure it's more complex to us, processing this extra information in our heads, but that little extra time can save a lot of time in the long run.<br> <p> <p> </div> Sat, 24 Apr 2010 20:59:48 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384633/ https://lwn.net/Articles/384633/ efexis <i>"That fix makes me wonder about the "daemon startup" situation"</i><br> <br> I don't think that matters (if I've interpreted what's said above correctly), as the page is unlikely to be unlinked from all the processes between the daemon fork()s taking place, as those initial processes are going to be pretty short lived. When the top process terminates, the next process down the list would have to take on the links, which will require some work, but no more than what would have to be done if this was done earlier in the game if you knew that the child was going to live longer. The only way it would save time would be if the memory was unlinked (eg, paged out), then paged back in and linked to the shorter running process - picking the longest running process would be more ideal. But yeah, for a process to be running long enough for this to happen probably rules out the ability to predict which is going to end sooner. Sat, 24 Apr 2010 20:45:29 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384631/ https://lwn.net/Articles/384631/ efexis <i>"eg, something indexed by page address"</i><br> <br> That's what the page table is. But you can only index a fixed number of things from that, once you have a variable number of things (as there being any number of processes sharing that page), that's then another dimension, and can thus not be represented using a single dimension of the page address alone. A linked list might be slow for searching in, but in this case it looks like most operations are going to be either inserts (eg, at fork() time), deletes (at CoW or process termination time), and I guess possibly an action required on the whole group, although I couldn't suggest what. These seem like they'd all about as cheap on a linked list as you can get. Sat, 24 Apr 2010 20:33:43 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384535/ https://lwn.net/Articles/384535/ i3839 <div class="FormattedComment"> Not only that, if the anon_vma isn't used for file caches, but only for anonymous pages then the whole thing seems dubious when swap is disabled.<br> <p> In addition to that, anon_vma apparently didn't solve the problem well, but instead of replacing it with something that does, more kludges are added on top of it. I got the feeling that there's some much more elegant solution waiting for someone to find it.<br> <p> </div> Fri, 23 Apr 2010 16:17:41 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/384515/ https://lwn.net/Articles/384515/ rilder <div class="FormattedComment"> Even I feel the same. The scenario "In a workload with 1000 child processes and a VMA with 1000 anonymous pages per process that get COWed," -- may not arise in a normal desktop workload. In such as case won't this introduce additional overhead, unless this feature is introduced as a CONFIG variable which I don't think it is.<br> </div> Fri, 23 Apr 2010 13:41:45 +0000 Anyone completed the exercise? https://lwn.net/Articles/383605/ https://lwn.net/Articles/383605/ pjm <div class="FormattedComment"> I think the point is just that standard tools like to avoid nodes overlapping each other, so if you have a million nodes on screen then each node must occupy only about 1 pixel, and you wouldn't be able to get any information about the problem other than “a million sure is a big number, isn't it?”. But it wasn't really a serious suggestion for understanding the bug, which could be illustrated with just one or two child processes.<br> <p> (I was hoping to be able to point people to a certain impressive tool I've seen for interactively exploring large graphs, but I see that the code isn't yet publicly available. However, it's expected to be redeveloped and GPL'd in a year or two, under the name Dunnart.)<br> <p> </div> Fri, 16 Apr 2010 04:27:29 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383568/ https://lwn.net/Articles/383568/ i3839 <div class="FormattedComment"> Is it just me who's wondering whether all this complexity is worth it?<br> <p> What was it needed for again? To have a reverse mapping? Isn't that mostly useful for swapping and not much else? It seems silly to always add the overhead when it's almost never needed. Why not only have the overhead when actually swapping or whenever it's needed? Then this whole mess disappears from the common case and the complex stuff can be separated and isolated.<br> <p> I have to read more code and think more about it, preferably with a clear mind.<br> <p> </div> Thu, 15 Apr 2010 22:02:45 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383553/ https://lwn.net/Articles/383553/ iabervon <div class="FormattedComment"> Actually, that's a similar message, but not the one I was thinking of. In that one, he says "So again, I can show that..." I was looking at the first time, and this is the second time. The one I'm thinking of is <a href="http://groups.google.com/group/linux.kernel/msg/f9c7ca848976b5cd">http://groups.google.com/group/linux.kernel/msg/f9c7ca848...</a> and has a more complete explanation of the middle steps.<br> </div> Thu, 15 Apr 2010 20:17:47 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383552/ https://lwn.net/Articles/383552/ Felix <div class="FormattedComment"> I think you mean this mail from Linus<br> <a href="http://groups.google.co.id/group/linux.kernel/msg/130d3ea11ed7c8ff">http://groups.google.co.id/group/linux.kernel/msg/130d3ea...</a><br> <p> (it has a slightly different message id+time but seems to fit)<br> </div> Thu, 15 Apr 2010 20:09:08 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383472/ https://lwn.net/Articles/383472/ Darkmere <div class="FormattedComment"> I just resubscribed. Still somewhat starving hacker, but it's better than nothing. Now to register a company account once I snipe a creditcard from someone ;)<br> </div> Thu, 15 Apr 2010 12:07:15 +0000 Anyone completed the exercise? https://lwn.net/Articles/383442/ https://lwn.net/Articles/383442/ nikanth <code>the diagram for the 1000-child scenario which motivated this patch will be left as an exercise for the reader. </code><br/> Has anyone completed the exercise. ;-)<br/> Probably it shouldn't be daunting using the right tool + script. Even a video may be possible. :) Thu, 15 Apr 2010 06:08:12 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383404/ https://lwn.net/Articles/383404/ mrfredsmoothie <div class="FormattedComment"> That's precisely the point of the Linus "don't rely on a tool" philosophy of bug-fixing.<br> <p> If your code cannot be reasoned about by a human, then it cannot be maintained by a human, either.<br> </div> Wed, 14 Apr 2010 19:00:00 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383351/ https://lwn.net/Articles/383351/ gwittenburg <div class="FormattedComment"> +1, same here. Thanks, Jonathan!<br> </div> Wed, 14 Apr 2010 15:04:07 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383328/ https://lwn.net/Articles/383328/ sorpigal <div class="FormattedComment"> I hereby register one (1) classic "Me too," reply.<br> </div> Wed, 14 Apr 2010 11:37:25 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383303/ https://lwn.net/Articles/383303/ jzbiciak <div class="FormattedComment"> Ah, ok. I hadn't seen that email. Makes sense.<br> </div> Wed, 14 Apr 2010 04:57:11 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383298/ https://lwn.net/Articles/383298/ iabervon <div class="FormattedComment"> The message I'm thinking of is &lt;alpine.LFD.2.00.1004061220270.3487@i5.linux-foundation.org&gt; from Apr 6 at 15:35; Linus looks at Steinar Gunderson's disassembly, where %rax is a kernel pointer and %rbx is not %rax+20 like it would be after running the loop, implying that "anon_vma-&gt;head.next" is NULL, not some other anon_vma_chain entry. Boris had worked out that there was some problem in the list previously, but Linus identified that the pointer from the anon_vma was bad, rather than the list further down being corrupt. This turned out to be important to the actual bug, which had to do with the anon_vma associated with the page being gone (and its memory reused) rather than some other anon_vma in the vma's chain being gone or something messing up the list. Boris picked up the stuff you'd get from a debugger that had registers but not core; Linus picked up an important detail that one would only normally get from a debugger by inspecting memory.<br> <p> </div> Wed, 14 Apr 2010 04:44:51 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383297/ https://lwn.net/Articles/383297/ jzbiciak <div class="FormattedComment"> Actually, wasn't that Boris that did the initial assembly dump interpretation? <br> <p> It's actually not so hard once you've done it a couple times. I've had to chase down bugs in our C compiler at work (ah, the joys of running internal alpha builds!). <br> <p> </div> Wed, 14 Apr 2010 04:11:33 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383277/ https://lwn.net/Articles/383277/ ewen <blockquote><em>The fix is straightforward; when linking an existing page to an anon_vma structure, the kernel needs to pick the one which is highest in the process hierarchy; that guarantees that the anon_vma will not go away prematurely.</em></blockquote> <p>That fix makes me wonder about the "daemon startup" situation, where the initial process fork()s at least once (sometimes twice), and then the parent process exits. I assume that the structure in the parent of the hierachy is being chosen with the expectation that it'd be longest lived. But in the "daemon startup" case the child ends up being longest lived. Hopefully the other actions to becoming a daemon, such as disassociating itself with parent process (reparenting to process 1) and becoming a new process group mean that the relevant structures get migrated appropriately down into the child.</p> <p>I'm also left wondering whether a simpler change from "linked list" to, eg, something indexed by page address, would have improved the situation dramatically without the same degree of code complexity, and hence bugs. (Eg, O(n) over 1M pages is non-trivial, O(log n) over 1M pages is almost trivial.)</p> <p>Ewen</p> Tue, 13 Apr 2010 20:22:33 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383264/ https://lwn.net/Articles/383264/ iabervon <div class="FormattedComment"> The part I found most impressive was when he identified that something was crashing in the first iteration of a loop by looking at the register contents and assembly decode from the oops report. Where other people would have needed to look at memory in a debugger, he could just look at registers and infer where the bad value was coming from.<br> <p> </div> Tue, 13 Apr 2010 18:59:08 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383262/ https://lwn.net/Articles/383262/ brouhaha "He fixes radios by <i>thinking</i>!" <p> -- someone whose radio was repaired by Richard Feynman Tue, 13 Apr 2010 18:36:49 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383258/ https://lwn.net/Articles/383258/ dlang <div class="FormattedComment"> it was primarily looking at the source and thinking about what was happening.<br> <p> In the process three other bugs were found and fixed, and the section of code got significant documentation and cleanup improvements.<br> <p> Linus was unable to reliably replicate the bug, so printf, kgdb, etc could not be used.<br> </div> Tue, 13 Apr 2010 17:49:49 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383256/ https://lwn.net/Articles/383256/ Trelane <div class="FormattedComment"> Awesome. *This* is why I subscribe to lwn!<br> </div> Tue, 13 Apr 2010 17:42:39 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383254/ https://lwn.net/Articles/383254/ Wummel It is even more awesome as he seems just to be looking at the source and guesses what could be wrong with it (or does Linus use printf() or even kgdb nowadays for debugging?). Tue, 13 Apr 2010 17:31:32 +0000 The case of the overly anonymous anon_vma https://lwn.net/Articles/383251/ https://lwn.net/Articles/383251/ clugstj <div class="FormattedComment"> I am in awe of Linus' debugging abilities!<br> </div> Tue, 13 Apr 2010 16:26:18 +0000