LWN.net Logo

Crowding out OpenBSD

Crowding out OpenBSD

Posted Nov 15, 2012 1:18 UTC (Thu) by stressinduktion (subscriber, #46452)
In reply to: Crowding out OpenBSD by cyrus
Parent article: Crowding out OpenBSD

I think, they haven't adopted the vm subsystem to multicore, yet. Last time, I looked they still used splay trees in FreeBSD for mmap management, which waste a lot of memory bandwidth on todays multicore machines. Projects were underway to change this but never got finished.


(Log in to post comments)

Crowding out OpenBSD

Posted Nov 25, 2012 23:49 UTC (Sun) by vsrinivas (subscriber, #56913) [Link]

In DragonFly, we did a fair bit of a work improving the VM on SMP systems recently (in the last year or so); we still use RB-trees w/ sx (rw) locking for vm_maps (mapping from (VA, aspace) -> vm_pages) and there is still only one pagedaemon (kthread handling page queue scans), but the front-end of the VM and page fault handling are SMP-friendly. The page queues are also locked in fairly fine-grained ways.

DFly's recent 3.2 release is pretty competitive with Linux on Postgres/Pgbench workloads; some of the lessons learned while improving Postgres/Pgbench performance would be interesting to the Linux VM team -- for instance, DragonFly moved to implement a limited form of page table sharing; this reduced the overhead of managing pv_entries ((D)FBSD's version of rmap entries; see the pre-objrmap Linux VM). Page-table sharing in Linux would not bring the same sort of wins there ('cause of objrmap), but it is worth exploring, I suspect fork() performance would benefit.

Another neat bit of DFly VM machinery that might be interesting -- 'swapcache' -- basically allows for clean file data/metadata to be written to a hopefully-solid-state swap device. It would have a similar effect to flashcache, except it happens at the file vnode layer first, rather than at the block device layer.

Some other interesting things that we learned while scaling out DFly -- we spent a bunch of time optimizing our spinlocks on a wide AMD (4x-socket 48 core) K10 system; along the way we reinvented Linux's ticket spinlock, except with the two counters on separate cachelines rather than parts of the same word. We found them to be pretty terrible under heavy contention, (see a comment on LWN's discussion "Ticket spinlocks perform terribly for uncontested and highly contested cases.") and had a pretty incredible 175sec -> 90something sec reduction in concurrent buildworld time by just moving to something like while-cmpxchg.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds