I think you discount LRU action. "Working set" in my mind implies read-write, and either private to a process, or at least private to a tree of closely related processes. (I know that it also should include all the code pages involved, but typically those wouldn't be the thrashy bits.) That large working set is most likely private and not one of these shared, read-only things. A large working set will definitely thrash the cache, but will it really thrash all the cache equally?
That said, library / shared pages still will get referenced at least somewhat regularly by all of the processors on the NUMA node, and so the LRU will prevent the hottest lines from getting evicted. If you assume non-random replacement (which, unfortunately, you can't with certain recent processors), the hot library pages will remain near the front of the LRU, so only the back of the LRU gets cycled.
(The "unfortunately you can't" comment applies to recent ARM Cortex-A series processors, which have a highly associative shared L2 ("That's good!") with random replacement in lieu of an LRU ("That's bad!").)