LWN.net Logo

The case of the overly anonymous anon_vma

The case of the overly anonymous anon_vma

Posted Apr 13, 2010 20:22 UTC (Tue) by ewen (subscriber, #4772)
Parent article: The case of the overly anonymous anon_vma

The fix is straightforward; when linking an existing page to an anon_vma structure, the kernel needs to pick the one which is highest in the process hierarchy; that guarantees that the anon_vma will not go away prematurely.

That fix makes me wonder about the "daemon startup" situation, where the initial process fork()s at least once (sometimes twice), and then the parent process exits. I assume that the structure in the parent of the hierachy is being chosen with the expectation that it'd be longest lived. But in the "daemon startup" case the child ends up being longest lived. Hopefully the other actions to becoming a daemon, such as disassociating itself with parent process (reparenting to process 1) and becoming a new process group mean that the relevant structures get migrated appropriately down into the child.

I'm also left wondering whether a simpler change from "linked list" to, eg, something indexed by page address, would have improved the situation dramatically without the same degree of code complexity, and hence bugs. (Eg, O(n) over 1M pages is non-trivial, O(log n) over 1M pages is almost trivial.)

Ewen


(Log in to post comments)

The case of the overly anonymous anon_vma

Posted Apr 24, 2010 20:33 UTC (Sat) by efexis (guest, #26355) [Link]

"eg, something indexed by page address"

That's what the page table is. But you can only index a fixed number of things from that, once you have a variable number of things (as there being any number of processes sharing that page), that's then another dimension, and can thus not be represented using a single dimension of the page address alone. A linked list might be slow for searching in, but in this case it looks like most operations are going to be either inserts (eg, at fork() time), deletes (at CoW or process termination time), and I guess possibly an action required on the whole group, although I couldn't suggest what. These seem like they'd all about as cheap on a linked list as you can get.

The case of the overly anonymous anon_vma

Posted Apr 24, 2010 21:08 UTC (Sat) by ewen (subscriber, #4772) [Link]

I had in mind replacing the linked list with, eg, a hash (with buckets, probably), which is O(n) average case, or some sort of tree structure, which is O(log n) average case. Inserts are quick on a linked list if you don't bother to maintain any order, but searching and deletes are not; for most other structures inserts are a bit slower, but you get faster searching and deletes.

Still if the tricky details of the new solution have been sorted out then there's no point in changing back now. I guess time will tell.

Ewen

The case of the overly anonymous anon_vma

Posted Apr 24, 2010 21:46 UTC (Sat) by efexis (guest, #26355) [Link]

Inserts are also quick if you do maintain order, simply by maintaining a pointer to the end of the list. You've only two pointers to update for the list (end of list and pointer to end of list) or am I forgetting something? Deletes could be sped up by making it double linked, and I can't think of why you'd ever need to perform a search... even if you needed to find a particular processes own structure for a page, seems like you'd start the search from that processes page table which can be done in a determinate amount of time, rather than from your own and then through the linked list.

I don't know what other property you could be looking for, other than memory address or process id, that you could use as the value for an index or hash. If you were collecting a stat (such as the process that's using the page that runs the most, which could be a useful thing to know) you'd have to iterate through the lot. Maintaining an index of that value is loads more work that I can't see paying off. What other value, other than page address and process id, would you want to search on?

The case of the overly anonymous anon_vma

Posted Apr 24, 2010 22:21 UTC (Sat) by ewen (subscriber, #4772) [Link]

If you always insert at the start of the list, or always insert at the end of the list, then the order that you are maintaining is "order inserted" (either newest at start or newest at end). This isn't necessarily what one means by an ordered list (except if you actually want a stack or a queue, both of which are "ordered by time"). Inserting anywhere else in the list, to order by a data value, requires finding that location, eg, page address, which requires either knowing the pointer to the immediately prior page address in the list, or going and finding same (O(n)). (As you suggest deletes could be quick if you make the process track a pointer to the relevant entry in the link list, and use a double linked list, something I'd overlooked.)

From what I can see "find by page address" is the only primitive that actually matters here (so you can find the references to that page); "find by process id" seems easiest done starting with the processes structures. And the problem that we started with was a 1,000,000-long linked list which referenced multiple pages held by multiple processes. So any structure which made it faster to find the entries for the appropriate page would help, providing its other overhead wasn't too high.

Ewen

The case of the overly anonymous anon_vma

Posted Apr 24, 2010 20:45 UTC (Sat) by efexis (guest, #26355) [Link]

"That fix makes me wonder about the "daemon startup" situation"

I don't think that matters (if I've interpreted what's said above correctly), as the page is unlikely to be unlinked from all the processes between the daemon fork()s taking place, as those initial processes are going to be pretty short lived. When the top process terminates, the next process down the list would have to take on the links, which will require some work, but no more than what would have to be done if this was done earlier in the game if you knew that the child was going to live longer. The only way it would save time would be if the memory was unlinked (eg, paged out), then paged back in and linked to the shorter running process - picking the longest running process would be more ideal. But yeah, for a process to be running long enough for this to happen probably rules out the ability to predict which is going to end sooner.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds