Memory management notifiers
One problem with this technique, as implemented in Linux currently, is that there is no easy way for the host to feed page table changes back to the guest. In particular, if the host system decides that it wants to push a given page out to swap, it can't tell the guest that the page is no longer resident. So virtualization mechanisms like KVM avoid the problem altogether by pinning pages in memory when they are mapped in shadow page tables. That solves the problem, but it makes it impossible to swap processes running KVM-based virtual machines out of main memory.
This seems like a good thing to fix. And a fix exists, in the form of the MMU notifiers patch posted by Andrea Arcangeli (from his shiny new Qumranet address). This patch allows an interested subsystem to be notified whenever specific memory management events take place. The process starts by setting up a set of callbacks:
struct mmu_notifier_ops { void (*release)(struct mmu_notifier *mn, struct mm_struct *mm); int (*age_page)(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address); void (*invalidate_page)(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address); void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end); };
These callbacks are bundled into an mmu_notifier structure:
struct mmu_notifier { struct hlist_node hlist; const struct mmu_notifier_ops *ops; };
The interested code then registers its notifier with:
void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm);
Here, mm is the mm_struct structure associated with a given address space. It is not expected that anybody will be interested in all memory management events, so notifiers are associated with specific address spaces. Once the notifier is in place, the callbacks will be invoked when interesting things happen:
- release() is called when the relevant mm_struct
is about to go away. So it will be the last callback made to that
notifier.
- age_page() indicates that the memory management subsystem
wants to clear the "referenced" flag on the page associated with the
given address. This callback should return the previous
value of the referenced bit, or the closest approximation available on
the host architecture.
- invalidate_page() and invalidate_range() are both ways of telling the guest that the given address(es) are no longer valid - the page has been reclaimed. Upon return from this callback, the affected address range should not be referenced by the guest.
For the curious, the KVM patches (showing how these notifiers are used there) have also been posted.
While this patch set is aimed at KVM, there has been some interest from
other directions as well - virtual machines are not the only places where
separate (but related) page tables are maintained. Graphical processing
units on contemporary video cards are an example - they have their own
memory management units and have some interesting management issues of their own.
Remote DMA (RDMA) engines are another possible user. So these patches have
attracted comments from a few potential users, and have changed
significantly since their first posting. The discussion is still ongoing,
so further changes may come about before the notifiers find their way into
the mainline.
Index entries for this article | |
---|---|
Kernel | KVM |
Kernel | Memory management/Virtualization |
Posted Jan 24, 2008 18:39 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (2 responses)
Posted Jan 25, 2008 20:11 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
I think that's obvious in, "The interested code then registers its notifier with:"
But what the article doesn't say is why a guest kernel would be interested. It says that because the guest kernel can't know when the host has invalidated a page, the host must never invalidate a page (i.e. keep the memory pinned). I guess I don't know how KVM works, but I've worked with virtual machines that don't have this issue.
That swapped out page should still be virtually resident. The guest's page table says so, and, consistent with that, when the guest does a load from its virtual address, the instruction completes without the guest seeing any page fault (because the host takes a page fault, reads the data in, and updates the real page table).
Posted Jan 30, 2008 0:49 UTC (Wed)
by roelofs (guest, #2599)
[Link]
Seems like primarily a performance issue to me. If the guest kernel doesn't know when its "RAM" is really swap, it's not going to be able to manage its memory as effectively as it might like. For example, it might be able to predict memory-usage patterns where the host kernel can't. Wasn't there a recent article(s) about a patch to do speculative read-in of swapped-out memory, specifically for the use-case where some automated overnight process pushes out OpenOffice/Firefox/etc., causing the user significant delays upon his/her return in the morning? (Perhaps even one of Con Kolivas' patches?)
Greg
Memory management notifiers
This would only benefit guest kernels that have been modified to take avantage of it, right?
Memory management notifiers
This would only benefit guest kernels that have been modified to take avantage of it, right?
But what the article doesn't say is why a guest kernel would be interested.
Memory management notifiers