Support for Intel's LASS
Speculative execution happens when the CPU is unable to complete an instruction because it needs data that is not resident in the CPU's caches. Rather than just wait for that data to be fetched from RAM, the CPU will make a guess as to its value and continue running in the speculative mode. If the guess turns out to be correct — which happens surprisingly often — the CPU will have avoided a stall and will be ahead of the game; otherwise, the work that was done speculatively is thrown out and the computation restarts.
This technique is crucial for getting reasonable performance out of current CPUs, but it turns out to have a security cost: speculative execution is allowed to access data that would be denied to code running normally. A CPU will be able to speculatively read data, despite permissions denying that access in the page tables, without generating a fault. That data is never made available to the running process, but accessing it can create state changes (such as loading data into the cache) that can be detected by a hostile program and used to exfiltrate data that should not be readable. In response, kernel developers have adopted a number of techniques, including address-space isolation and preemptive cache clearing, to block these attacks, but those mitigations can have a substantial performance cost.
LASS partially addresses the speculative-execution problem by wiring some address-space-management policy into the hardware. A look at, for example, the Linux x86-64 address-space layout shows that all kernel-space addresses begin with 0xffff. More to the point, they all have the highest-order (sign) bit set, while all user-space addresses have that bit clear. Linux is not the only kernel to partition the 64-bit address space in this way. LASS uses this convention (and, indeed, requires it) to provide some hardware-based address-space isolation.
Specifically, when LASS is enabled, the CPU will intercept any user-mode reference to an address with the sign bit set, or any kernel-mode access with that bit clear. In other words, it prevents either mode from accessing addresses that, according to the sign bit, belong to the other mode. Crucially, this policy is applied early in the execution of an instruction. Normal page protections can only be read (and, thus, enforced) by traversing through the page-table hierarchy, which produces timing and cache artifacts. LASS can trap a forbidden access simply by looking at the address, without any reference to the page tables, yielding constant timing and avoiding any internal state changes. And this test is easily performed during speculative execution as well.
Of course, adding a new protection mechanism like this requires adaptation in the kernel, which must disable LASS when it legitimately needs to access user-space memory. Most of the infrastructure needed to handle this is already in place, since supervisor-mode access prevention must be handled in a similar way. There is a problem, though, with the vsyscall mechanism, which is a virtual system-call implementation. The vsyscall area is hardwired to be placed between the virtual addresses ffffffffff600000 and ffffffffff601000. Since the sign bit is set in those addresses, LASS will block accesses from user mode, preventing vsyscalls from working. LASS is thus mutually exclusive with vsyscalls; if one is enabled, the other must be disabled. Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library out there that still use them. If LASS support is merged, distributors will have to decide which feature to enable by default.
LASS should be able to protect against speculative attacks where user space is attempting to extract information from the kernel — Meltdown-based attacks in particular. It may not directly block most Spectre-based attacks, which generally involve speculative execution entirely in kernel space, but it may still be good enough to block the cache-based covert channels used to get information out of the kernel. The actual degree of protection isn't specified in the patches, though, leading Dave Hansen to ask for more information:
LASS seemed really cool when we were reeling from Meltdown. It would *obviously* have been a godsend five years ago. But, it's less clear what role it plays today and how important it is.
If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.
In any case, it is not possible to buy a CPU that supports LASS now; it
will be necessary to wait until processors from the "Sierra Forest" line
become available. Once those CPUs get out to where they can be tested, the
value of LASS will, hopefully, become more clear. Until then, the
development community will have to do its best to decide whether a partial
fix to speculative-execution problems is better than the current state of
affairs.
Index entries for this article | |
---|---|
Kernel | Architectures/x86 |
Kernel | Security/Meltdown and Spectre |
Posted Jan 13, 2023 16:35 UTC (Fri)
by mb (subscriber, #50428)
[Link] (15 responses)
I have a couple of very old proprietary applications, that still work fine.
And will it be possible to disable LASS on a per-process basis?
Posted Jan 13, 2023 16:52 UTC (Fri)
by corbet (editor, #1)
[Link] (6 responses)
LASS is system-wide, so it can't be controlled on a per-process basis, at least in the posted implementation.
Posted Jan 13, 2023 17:07 UTC (Fri)
by mb (subscriber, #50428)
[Link]
Posted Jan 13, 2023 17:46 UTC (Fri)
by dullfire (guest, #111432)
[Link] (4 responses)
However I bet you could trap those specific faults (I would imaging LASS would look like a page fault to the kernel? I haven't read it's docs, but it has to raise some sort of exception), and if they point at the vsyscall address, just jump the corresponding vDSO address.
Of course it would be slow, but old apps would still work.
Alternately you might be able to get userfaultfd to be able to do something about this (though the kernel would have to forward the LASS fault correctly). I haven't had call to look into userfaultfd to know for sure though.
Posted Jan 13, 2023 17:55 UTC (Fri)
by dezgeg (subscriber, #92243)
[Link] (3 responses)
Posted Jan 13, 2023 18:09 UTC (Fri)
by hansendc (subscriber, #7363)
[Link] (2 responses)
LASS produces general protection faults (#GP). Unfortunately, #GP's don't set CR2 and the CPU doesn't give great information about why the fault occurred. It's quite possible to go fetch the instruction that faulted, decode it, and figure out that it was accessing the vsyscall page. The kernel does exactly that for some #GP's. But, it's kinda icky, and is best avoided.
But, if someone *REALLY* cares deeply, please do speak up.
Posted Jan 13, 2023 21:59 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
Such an RIP would only be reachable with a call or jmp instruction, and if it was a call then the return address would already be on the stack. All you'd have to do would be invoke the system call, replace RIP with a word popped off the stack and go back to userspace.
Not that it's a good idea. :)
Posted Jan 17, 2023 16:31 UTC (Tue)
by luto (guest, #39314)
[Link]
Intel has an unfortunate history of designing CPUs that validate RIP when setting RIP instead of when using RIP. This results in rather unfortunate bugs^Woutcomes when doing creative things like putting a SYSCALL instruction at the very top of the lower half of the address space. The SYSCALL works fine and sets RCX (the saved pointer to the subsequent instruction) to RIP+2, which is noncanonical. This is fine (from a very narrowly focused perspective) because RCX isn’t RIP. A subsequent SYSRET will try to set RIP to the saved value and fault. This is fine because it’s how the CPU works (which is an excuse for almost anything), but it’s barely documented. The fault will cause an exception frame to be written to the user RSP, because that’s how SYSRET works (see above about excuses). The result is privilege escalation.
AMD generally seems more sensible in this regard.
Posted Jan 13, 2023 18:03 UTC (Fri)
by hansendc (subscriber, #7363)
[Link] (7 responses)
echo 1 > /sys/kernel/debug/tracing/events/vsyscall/emulate_vsyscall/enable
Running tools/testing/selftests/x86/test_vsyscall_64 will let you know whether the tracing is working or not.
BTW, if you run across a real program that cares, please do let us know.
Posted Jan 13, 2023 18:42 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (6 responses)
Building RHEL6 kernel in a container requires vsyscall=emulate.
Posted Jan 13, 2023 20:07 UTC (Fri)
by geofft (subscriber, #59789)
[Link] (3 responses)
The solution we ended up going with was patching glibc to remove vsyscall support. The build scripts for that appear to be here: https://github.com/pypa/manylinux/tree/v2022.07.10-manyli...
You can probably use the pre-built quay.io/pypa/manylinux2010_x86_64_centos6_no_vsyscall:2020-12-19-cd3e980 container, which contains the result of that build. For your use case of compiling RHEL 6 kernels, that should work.
I also wrote a userspace vsyscall emulator using ptrace as an alternative: https://github.com/pypa/manylinux/pull/158/files It definitely will cause a performance hit because every syscall will trap into the ptracer, but for the commenter above who has a proprietary program, this might be what you need. (Though, really, this should only be a problem for proprietary programs that make syscalls directly, e.g. by being static binaries; if they call into the system libc to make syscalls, then using a newer libc should be enough.)
Posted Jan 17, 2023 17:50 UTC (Tue)
by luto (guest, #39314)
[Link] (2 responses)
Posted Jan 17, 2023 18:54 UTC (Tue)
by geofft (subscriber, #59789)
[Link] (1 responses)
(Of course if you can use a non-vsyscall libc, that would be better....)
Posted Jan 17, 2023 21:40 UTC (Tue)
by luto (guest, #39314)
[Link]
Posted Jan 14, 2023 9:07 UTC (Sat)
by dottedmag (subscriber, #18590)
[Link]
Posted Jan 18, 2023 0:25 UTC (Wed)
by judas_iscariote (guest, #47386)
[Link]
Posted Jan 14, 2023 6:50 UTC (Sat)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Jan 14, 2023 11:00 UTC (Sat)
by matthias (subscriber, #94967)
[Link] (2 responses)
Posted Jan 15, 2023 12:11 UTC (Sun)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Jan 15, 2023 13:06 UTC (Sun)
by matthias (subscriber, #94967)
[Link]
Posted Jan 14, 2023 9:22 UTC (Sat)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Jan 14, 2023 11:02 UTC (Sat)
by matthias (subscriber, #94967)
[Link]
Posted Jan 17, 2023 17:45 UTC (Tue)
by wtarreau (subscriber, #51152)
[Link]
Posted Jan 19, 2023 18:47 UTC (Thu)
by anton (subscriber, #25547)
[Link]
According to Intel, they put in Meltdown fixes in Coffee Lake Refresh in 2018 (AMD never was affected) and of course in later CPUs. Given that all CPUs that get LASS already have the Meltdown fixes, what's the point of LASS?
LASS would also prevent the kernel from accessing the user memory, which AFAIK would be a problem for the kernel; e.g., how does write(2) access the user-mode buffer that the kernel has to read in order to write it to a file.
Support for Intel's LASS
> out there that still use them. If LASS support is merged, distributors will have to decide which
> feature to enable by default.
How can I check if these use vsyscall?
For testing applications, you could try booting with vsyscall=none and see if they still work. There's probably a better way but I don't know it offhand.
Support for Intel's LASS
Support for Intel's LASS
I'll try that.
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
cat /sys/kernel/debug/tracing/trace_pipe
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
Yes, this is not very clear. The data in paragraph 1 is different from the data in paragraph 3. I will sketch how meltdown works as an example.
Support for Intel's LASS
if (a != 0) {
if ((*b & 0x1) == 0) {
load c
} else {
load d
}
}
a is 0, but it is not in the cache and the CPU speculates that a is not 0. As the speculation is mostly statistics, one can enforce this speculation.
The pointer b points to non-accessible memory (e.g. kernel memory). Based on the value of *b, either c or d is loaded into the cache. Normally the access to *b would trigger a SEGFAULT, but as a is 0, the CPU detects at some point that this was all just speculation, it ignores the fault and continues as if nothing did happen.
Now one can access c and d, measure the time this takes and conclude which of the two has been loaded into the cache. This gives away one bit from *b.
So speculative execution does affect the cache. After all you can gain the most advantage if the value (c or d) is already on its way to the cache at the point of time when the value of a finally arrives at the CPU.
The data that is not in the cache and whose value is point of speculation is a. The data that discovered is (one bit of) *b, and the data that is loaded into the cache is either c or d. The presence of c or d is used to discover the value of *b.
Support for Intel's LASS
Then it computes *b, detect the fault, but still continue with load ?
Support for Intel's LASS
Support for Intel's LASS
Support for Intel's LASS
> If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.
Support for Intel's LASS
As described, LASS makes no sense that I can see. It only fixes a part of Meltdown (the vulnerability where programs could extract data from mapped, but PROT_NONEd pages (typically user code reading kernel pages)).
Support for Intel's LASS