Support for Intel's LASS

Posted Jan 13, 2023 16:35 UTC (Fri) by mb (subscriber, #50428)
Parent article: Support for Intel's LASS

> Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library
> out there that still use them. If LASS support is merged, distributors will have to decide which
> feature to enable by default.

I have a couple of very old proprietary applications, that still work fine.
How can I check if these use vsyscall?

And will it be possible to disable LASS on a per-process basis?

Support for Intel's LASS

Posted Jan 13, 2023 16:52 UTC (Fri) by corbet (editor, #1) [Link] (6 responses)

For testing applications, you could try booting with vsyscall=none and see if they still work. There's probably a better way but I don't know it offhand.

LASS is system-wide, so it can't be controlled on a per-process basis, at least in the posted implementation.

Support for Intel's LASS

Posted Jan 13, 2023 17:07 UTC (Fri) by mb (subscriber, #50428) [Link]

Cool. Thanks, Jonathan.
I'll try that.

Support for Intel's LASS

Posted Jan 13, 2023 17:46 UTC (Fri) by dullfire (guest, #111432) [Link] (4 responses)

I took a very quick look at that bit of kernel code, it looks like the kernel cmdline option is the only way.

However I bet you could trap those specific faults (I would imaging LASS would look like a page fault to the kernel? I haven't read it's docs, but it has to raise some sort of exception), and if they point at the vsyscall address, just jump the corresponding vDSO address.

Of course it would be slow, but old apps would still work.

Alternately you might be able to get userfaultfd to be able to do something about this (though the kernel would have to forward the LASS fault correctly). I haven't had call to look into userfaultfd to know for sure though.

Support for Intel's LASS

Posted Jan 13, 2023 17:55 UTC (Fri) by dezgeg (subscriber, #92243) [Link] (3 responses)

Patch 5 does discuss possibility of emulating vsyscalls: https://lwn.net/ml/linux-kernel/20230110055204.3227669-5-...

Support for Intel's LASS

Posted Jan 13, 2023 18:09 UTC (Fri) by hansendc (subscriber, #7363) [Link] (2 responses)

Yeah, it's theoretically possible to emulate vsyscalls that were thwarted by LASS. But, the current emulation leverages page fault exceptions (#PF). Those are nice because page faults set a control register (CR2) to the address that faulted. That makes it dirt simple to tell if an access to the vsyscall page caused the fault: "if (is_vsyscall_vaddr(address))".

LASS produces general protection faults (#GP). Unfortunately, #GP's don't set CR2 and the CPU doesn't give great information about why the fault occurred. It's quite possible to go fetch the instruction that faulted, decode it, and figure out that it was accessing the vsyscall page. The kernel does exactly that for some #GP's. But, it's kinda icky, and is best avoided.

But, if someone *REALLY* cares deeply, please do speak up.

Support for Intel's LASS

Posted Jan 13, 2023 21:59 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (1 responses)

In the case of a vsyscall, wouldn't the #GP have a saved instruction pointer in the vsyscall page (LASS documentation says "the fault is reported on the branch target, not the branch instruction")?

Such an RIP would only be reachable with a call or jmp instruction, and if it was a call then the return address would already be on the stack. All you'd have to do would be invoke the system call, replace RIP with a word popped off the stack and go back to userspace.

Not that it's a good idea. :)

Support for Intel's LASS

Posted Jan 17, 2023 16:31 UTC (Tue) by luto (guest, #39314) [Link]

I assume what’s going on is that the CPU will fault on any attempt to set RIP to an address in the wrong half of the address space.

Intel has an unfortunate history of designing CPUs that validate RIP when setting RIP instead of when using RIP. This results in rather unfortunate bugs^Woutcomes when doing creative things like putting a SYSCALL instruction at the very top of the lower half of the address space. The SYSCALL works fine and sets RCX (the saved pointer to the subsequent instruction) to RIP+2, which is noncanonical. This is fine (from a very narrowly focused perspective) because RCX isn’t RIP. A subsequent SYSRET will try to set RIP to the saved value and fault. This is fine because it’s how the CPU works (which is an excuse for almost anything), but it’s barely documented. The fault will cause an exception frame to be written to the user RSP, because that’s how SYSRET works (see above about excuses). The result is privilege escalation.

AMD generally seems more sensible in this regard.

Support for Intel's LASS

Posted Jan 13, 2023 18:03 UTC (Fri) by hansendc (subscriber, #7363) [Link] (7 responses)

Even without rebooting, you can also see if vsyscall emulation is being used:

echo 1 > /sys/kernel/debug/tracing/events/vsyscall/emulate_vsyscall/enable
cat /sys/kernel/debug/tracing/trace_pipe

Running tools/testing/selftests/x86/test_vsyscall_64 will let you know whether the tracing is working or not.

BTW, if you run across a real program that cares, please do let us know.

Support for Intel's LASS

Posted Jan 13, 2023 18:42 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (6 responses)

> BTW, if you run across a real program that cares, please do let us know.

Building RHEL6 kernel in a container requires vsyscall=emulate.

Support for Intel's LASS

Posted Jan 13, 2023 20:07 UTC (Fri) by geofft (subscriber, #59789) [Link] (3 responses)

The manylinux project (widely-compatible base ABI for Python builds) ran into this with CentOS 6, too.

The solution we ended up going with was patching glibc to remove vsyscall support. The build scripts for that appear to be here: https://github.com/pypa/manylinux/tree/v2022.07.10-manyli...

You can probably use the pre-built quay.io/pypa/manylinux2010_x86_64_centos6_no_vsyscall:2020-12-19-cd3e980 container, which contains the result of that build. For your use case of compiling RHEL 6 kernels, that should work.

I also wrote a userspace vsyscall emulator using ptrace as an alternative: https://github.com/pypa/manylinux/pull/158/files It definitely will cause a performance hit because every syscall will trap into the ptracer, but for the commenter above who has a proprietary program, this might be what you need. (Though, really, this should only be a problem for proprietary programs that make syscalls directly, e.g. by being static binaries; if they call into the system libc to make syscalls, then using a newer libc should be enough.)

Support for Intel's LASS

Posted Jan 17, 2023 17:50 UTC (Tue) by luto (guest, #39314) [Link] (2 responses)

That hack seems unlikely to work with if LASS is enabled.

Support for Intel's LASS

Posted Jan 17, 2023 18:54 UTC (Tue) by geofft (subscriber, #59789) [Link] (1 responses)

LASS generates a GPF if you access something with the high bit set, right? Wouldn't that show up to userspace as a SIGBUS or something? You'd probably have to change the == SIGSEGV check in the code, but as long as it sends a catchable signal, there should be a way to make it work.

(Of course if you can use a non-vsyscall libc, that would be better....)

Support for Intel's LASS

Posted Jan 17, 2023 21:40 UTC (Tue) by luto (guest, #39314) [Link]

I’m still hoping for clarification, but I’m suspicious that RIP will point to the CALL into the vsyscall page, not into the vsyscall page.

Support for Intel's LASS

Posted Jan 14, 2023 9:07 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

Isn't RHEL6 ELS ended on Nov 30, 2022?

Support for Intel's LASS

Posted Jan 18, 2023 0:25 UTC (Wed) by judas_iscariote (guest, #47386) [Link]

correct it is an annoying bug on a distribution that is out of support. burn it with fire!