|
|
Log in / Subscribe / Register

Fun with NULL pointers, part 2

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:11 UTC (Wed) by bartoldeman (guest, #4205)
Parent article: Fun with NULL pointers, part 2

The null page is only really *needed* for vm86() -- for 16-bit protected mode code an LDT with a constant non-zero base offset for all segments can be used and a pure 16-bit protected mode program (I think many use vm86() though!) that runs in Wine does not even know it.

But for vm86() there is no easy and fast solution: every program that uses it (Wine, DOSEMU, X when it's been changed to run without root privs, a few others) will have to use CPU emulation. Now mapping the zero page needs CAP_SYS_RAWIO but that by itself is a much higher capability. That is, CAP_SYS_RAWIO allows direct I/O to hardware, surely giving root-like capabilities, even if the kernel is watertight. That way, one could argue for CAP_MAY_MAP_ZERO or something like that.

A technical solution would be Brad's UDEREF, or the Linux 2.0 kernel method of using segment limits, where user space needs to be accessed via FS and there is no direct kernel address space via CS/DS/ES below 3GB (keep in mind, vm86() is i386 only) but those segment limits aren't so popular anymore and I'm not sure how they interact with modern virtualization environments.

Another way would be to sandbox vm86() completely so it has the zero page,
which can be alias-mapped elsewhere, accessible -- after all inside vm86() 16-bit code you can't do system calls. But that would mean to change the page protections on every vm86() call, exit, and also on every interrupt (unmap 0 as soon as possible as an IRQ happens to be effective). Too complex, I think.


to post comments

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:26 UTC (Wed) by bartoldeman (guest, #4205) [Link]

Correction about Wine:
http://www.nabble.com/Re%3A-vm86-mode-is-not-supported-p2...

Alexandre Julliard:
Win16 applications don't need vm86, and work just fine on 64-bit.

OK, that means that with some LDT tricks Win16 apps may work in Wine. I'm not 100% sure but I know the LDT offset trick works for DPMI applications in DOSEMU which I tested a bit last year. Wine can also execute some DOS apps, and those won't be possible in i386 without zero mapping unless Wine acquires a CPU emulator or someone can think of another solution.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:53 UTC (Wed) by jreiser (subscriber, #11027) [Link]

The null page is only really *needed* for vm86()...That is just not true [except of course for arguments regarding universality, or emulating every access to page 0 via SIGSEGV, etc.]  I have written programs where having page 0 is essential (for elegance, or ease of maintenance, or time performance, or space performance, etc.), and I object to those programs becoming non-functional.

Fun with NULL pointers, part 2

Posted Jul 22, 2009 3:54 UTC (Wed) by spender (guest, #23067) [Link] (8 responses)

UDEREF is a PaX feature, developed entirely by the PaX Team -- not by me.

You are correct though that the real problem is any invalid userland access in general. There are many bugs that have existed and will exist that involve (or can involve) userland in ways that simply preventing mappings within the first 64kb won't stop. For instance, in the exploit I developed today that I linked to in this thread, I don't have to worry about how I'll inject arbitrary code into the kernel -- all I need is to get it to execute my already-existing code in userland (allowing for all the auditing/SELinux/AppArmor/LSM disabling code to be reused easily).

With a one-byte write of of 0 (a value I don't control) to an address my technique allows me to 100% reliably control, on x86 I set the highest byte in a function pointer belonging to a module to 0, turning it into a userland address -- game over.

-Brad

Fun with NULL pointers, part 2

Posted Jul 22, 2009 4:54 UTC (Wed) by bojan (subscriber, #14302) [Link] (7 responses)

> You are correct though that the real problem is any invalid userland access in general.

Given the above and the fact that solutions to this type of bug exist (if I understand UDEREF correctly), why is the kernel not being patched so we don't see this ever again? Or am I being overly naive?

Yes, you are overly naive...

Posted Jul 22, 2009 5:42 UTC (Wed) by khim (subscriber, #9252) [Link] (6 responses)

Given the above and the fact that solutions to this type of bug exist (if I understand UDEREF correctly), why is the kernel not being patched so we don't see this ever again? Or am I being overly naive?

The sad truth is that UDEREF is history now. The only architecture where it was useful slowly goes away: most architectures never had segments and Intel/Amd "lost" them recently: x86-64 does not have full segments in 64-bit mode... Segments only retained one attribute: base address, nothing else - good for TLS, not enough for UDEREF...

Yes, you are overly naive...

Posted Jul 22, 2009 6:56 UTC (Wed) by PaXTeam (guest, #24616) [Link] (5 responses)

Note that the fundamental problem in all this NULL deref misery is the lack of userland/kernel virtual address space separation. UDEREF/i386 simulates it by using the IA-32 segmentation logic, but there're certainly other ways to do the same, say address space ID tags in the TLB. Unfortunately AMD had butchered the segmentation logic without providing an alternative (it's not only about security, virtualization vendors weren't that happy either).

Yes, you are overly naive...

Posted Jul 22, 2009 8:02 UTC (Wed) by gmaxwell (guest, #30048) [Link] (2 responses)

How terrible for performance would it be to make all userspace accessible memory no-execute on kernel entrance? (or otherwise achieve the same result, like making it unreachable and then faulting it back in with NX-set).

Yes, you are overly naive...

Posted Jul 22, 2009 17:25 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

this would be _very_ expensive. changing the page tables is relativly expensive (especially if you have to do a lot of them)

Yes, you are overly naive...

Posted Jul 22, 2009 18:53 UTC (Wed) by PaXTeam (guest, #24616) [Link]

well, the actual page table manipulation would not be that expensive, with some tradeoffs you can reduce it to changing a few top-level page table entries and a single TLB flush, which would be a few hundred cycles or so.

however there's more cost to this: TLB repopulation which would inevitably occur after returning to userland. that is the real expense as we're talking about up to hundreds of TLB entries on modern CPU cores, each potentially missing in the data cache and incurring hundreds of cycles.

Yes, you are overly naive...

Posted Jul 22, 2009 11:13 UTC (Wed) by ajb (subscriber, #9694) [Link] (1 responses)

According to wikipedia newer cpus have an 'address space' tag which can be used to avoid flushing the TLB on VM switches. Could also be used for kernel vs user mode? It would require cooperating with the VM monitor though. (I wish VMs could be nested - it would be nice to be able to run untrusted code on EC2 without using QEMU. ).

Although that doesn't obviously help for the cases where the kernel needs to access user mode memory; one would still have to change the permissions to access it and then change them back.

NKX-bit

Posted Jul 22, 2009 23:37 UTC (Wed) by i3839 (guest, #31386) [Link]

Seems like the execute (and perhaps some other) memory permissions should be split up into user and kernel versions, that would solve this particular problem at least. That said, if they just got used to a separate no-execute bit, it may take a really long time before they introduce a no-kernel-execute bit (or ring0, whatever).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds