LWN.net Logo

Performance impact?

Performance impact?

Posted Aug 9, 2006 2:07 UTC (Wed) by bluefoxicy (guest, #25366)
In reply to: Performance impact? by mingo
Parent article: Kernel Summit 2005: The ExecShield patches

"you are right about exec-shield being cheap on newest hardware - but it's more than reasonable even on 'legacy' hardware where we apply the 'dynamic segment limit' trick to approximate NX protection. Maintaining the segment limit has some cost but it's not measurable."

[Stack +RW]
----------- (-- Segment Limit
[Libraries +RX]
[AnonMaps +RWX]
[Heap +RWX]
[Program +RX]

Do you guys still try to map libraries in below the 16MB limit to try to create NULL bytes in the addresses? That was a valiant effort but remember we still consider stuff like buffer overflows exploitable on x86-64 (where we have 48 bits VMA and 64 bit pointers, so they all contain 2 NULL bytes).

The one thing it does buy you is that you can try to map libraries low and move the segment limit down to the heap; keep NX anonymous mappings high so you don't get +X anonmaps as well. Then your image looks as follows:

[Stack +RW]
[AnonMaps +RW]
[Heap +RW]
----------- (-- Segment Limit
[Program +RX]
[Libraries +RX]

Unfortunately libraries are pretty set with the following (because statically linked data is accessed relative to the code segment and can't be separately reloaded, or something like that; I do not fully understand it yet):

[Library +RX]
[LibData +RWX]

So you can't truly make non-executable library data. PaX overloads the supervisor bit for this; the code segment limit works above the highest executable address, and below that the supervisor bit overloading takes effect.

PaX supplies 16 bits of library randomization, whereas ExecShield uses only 8. ES moves libs around in 1MB of VMA so things fit under the first 16MB; you could very well manage to use supervisor-bit overloading only on library data, CSLT working for the heap and anonmaps. PaX moves them in the first 256M so the original VMA layout is kept; everything below the stack gets the supervisor bit overloading.

I actually measured the overhead of PaX's segment limit/SBO hybrid technique at just a bit lower than the SEGMEXEC method of splitting the address space in half; except the Pentium 4 has a flaw that makes the supervisor bit overloading part EXTREMELY slow. Pentium 3 or lower or any AMD chip works just fine.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds