Support for Intel's LASS

By Jonathan Corbet
January 13, 2023

Speculative-execution vulnerabilities come about when the CPU, while executing speculatively, is able to access memory that would otherwise be denied to it. Most of these vulnerabilities would go away if the CPU were always constrained by the established memory protections. An obvious way to fix these problems would be to make CPUs behave that way, but doing that without destroying performance is not an easy task. So, instead, Intel has developed a feature called "linear address-space separation" (LASS) to paper over parts of the problem; Yian Chen has posted a patch set adding support for this feature.

Speculative execution happens when the CPU is unable to complete an instruction because it needs data that is not resident in the CPU's caches. Rather than just wait for that data to be fetched from RAM, the CPU will make a guess as to its value and continue running in the speculative mode. If the guess turns out to be correct — which happens surprisingly often — the CPU will have avoided a stall and will be ahead of the game; otherwise, the work that was done speculatively is thrown out and the computation restarts.

This technique is crucial for getting reasonable performance out of current CPUs, but it turns out to have a security cost: speculative execution is allowed to access data that would be denied to code running normally. A CPU will be able to speculatively read data, despite permissions denying that access in the page tables, without generating a fault. That data is never made available to the running process, but accessing it can create state changes (such as loading data into the cache) that can be detected by a hostile program and used to exfiltrate data that should not be readable. In response, kernel developers have adopted a number of techniques, including address-space isolation and preemptive cache clearing, to block these attacks, but those mitigations can have a substantial performance cost.

LASS partially addresses the speculative-execution problem by wiring some address-space-management policy into the hardware. A look at, for example, the Linux x86-64 address-space layout shows that all kernel-space addresses begin with 0xffff. More to the point, they all have the highest-order (sign) bit set, while all user-space addresses have that bit clear. Linux is not the only kernel to partition the 64-bit address space in this way. LASS uses this convention (and, indeed, requires it) to provide some hardware-based address-space isolation.

Specifically, when LASS is enabled, the CPU will intercept any user-mode reference to an address with the sign bit set, or any kernel-mode access with that bit clear. In other words, it prevents either mode from accessing addresses that, according to the sign bit, belong to the other mode. Crucially, this policy is applied early in the execution of an instruction. Normal page protections can only be read (and, thus, enforced) by traversing through the page-table hierarchy, which produces timing and cache artifacts. LASS can trap a forbidden access simply by looking at the address, without any reference to the page tables, yielding constant timing and avoiding any internal state changes. And this test is easily performed during speculative execution as well.

Of course, adding a new protection mechanism like this requires adaptation in the kernel, which must disable LASS when it legitimately needs to access user-space memory. Most of the infrastructure needed to handle this is already in place, since supervisor-mode access prevention must be handled in a similar way. There is a problem, though, with the vsyscall mechanism, which is a virtual system-call implementation. The vsyscall area is hardwired to be placed between the virtual addresses ffffffffff600000 and ffffffffff601000. Since the sign bit is set in those addresses, LASS will block accesses from user mode, preventing vsyscalls from working. LASS is thus mutually exclusive with vsyscalls; if one is enabled, the other must be disabled. Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library out there that still use them. If LASS support is merged, distributors will have to decide which feature to enable by default.

LASS should be able to protect against speculative attacks where user space is attempting to extract information from the kernel — Meltdown-based attacks in particular. It may not directly block most Spectre-based attacks, which generally involve speculative execution entirely in kernel space, but it may still be good enough to block the cache-based covert channels used to get information out of the kernel. The actual degree of protection isn't specified in the patches, though, leading Dave Hansen to ask for more information:

LASS seemed really cool when we were reeling from Meltdown. It would *obviously* have been a godsend five years ago. But, it's less clear what role it plays today and how important it is.

If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.

In any case, it is not possible to buy a CPU that supports LASS now; it will be necessary to wait until processors from the "Sierra Forest" line become available. Once those CPUs get out to where they can be tested, the value of LASS will, hopefully, become more clear. Until then, the development community will have to do its best to decide whether a partial fix to speculative-execution problems is better than the current state of affairs.

Index entries for this article
Kernel	Architectures/x86
Kernel	Security/Meltdown and Spectre

Support for Intel's LASS

Posted Jan 13, 2023 16:35 UTC (Fri) by mb (subscriber, #50428) [Link] (15 responses)

> Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library
> out there that still use them. If LASS support is merged, distributors will have to decide which
> feature to enable by default.

I have a couple of very old proprietary applications, that still work fine.
How can I check if these use vsyscall?

And will it be possible to disable LASS on a per-process basis?

Support for Intel's LASS

Posted Jan 13, 2023 16:52 UTC (Fri) by corbet (editor, #1) [Link] (6 responses)

For testing applications, you could try booting with vsyscall=none and see if they still work. There's probably a better way but I don't know it offhand.

LASS is system-wide, so it can't be controlled on a per-process basis, at least in the posted implementation.

Support for Intel's LASS

Posted Jan 13, 2023 17:07 UTC (Fri) by mb (subscriber, #50428) [Link]

Cool. Thanks, Jonathan.
I'll try that.

Support for Intel's LASS

Posted Jan 13, 2023 17:46 UTC (Fri) by dullfire (guest, #111432) [Link] (4 responses)

I took a very quick look at that bit of kernel code, it looks like the kernel cmdline option is the only way.

However I bet you could trap those specific faults (I would imaging LASS would look like a page fault to the kernel? I haven't read it's docs, but it has to raise some sort of exception), and if they point at the vsyscall address, just jump the corresponding vDSO address.

Of course it would be slow, but old apps would still work.

Alternately you might be able to get userfaultfd to be able to do something about this (though the kernel would have to forward the LASS fault correctly). I haven't had call to look into userfaultfd to know for sure though.

Support for Intel's LASS

Posted Jan 13, 2023 17:55 UTC (Fri) by dezgeg (subscriber, #92243) [Link] (3 responses)

Patch 5 does discuss possibility of emulating vsyscalls: https://lwn.net/ml/linux-kernel/20230110055204.3227669-5-...

Support for Intel's LASS

Posted Jan 13, 2023 18:09 UTC (Fri) by hansendc (subscriber, #7363) [Link] (2 responses)

Yeah, it's theoretically possible to emulate vsyscalls that were thwarted by LASS. But, the current emulation leverages page fault exceptions (#PF). Those are nice because page faults set a control register (CR2) to the address that faulted. That makes it dirt simple to tell if an access to the vsyscall page caused the fault: "if (is_vsyscall_vaddr(address))".

LASS produces general protection faults (#GP). Unfortunately, #GP's don't set CR2 and the CPU doesn't give great information about why the fault occurred. It's quite possible to go fetch the instruction that faulted, decode it, and figure out that it was accessing the vsyscall page. The kernel does exactly that for some #GP's. But, it's kinda icky, and is best avoided.

But, if someone *REALLY* cares deeply, please do speak up.

Support for Intel's LASS

Posted Jan 13, 2023 21:59 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (1 responses)

In the case of a vsyscall, wouldn't the #GP have a saved instruction pointer in the vsyscall page (LASS documentation says "the fault is reported on the branch target, not the branch instruction")?

Such an RIP would only be reachable with a call or jmp instruction, and if it was a call then the return address would already be on the stack. All you'd have to do would be invoke the system call, replace RIP with a word popped off the stack and go back to userspace.

Not that it's a good idea. :)

Support for Intel's LASS

Posted Jan 17, 2023 16:31 UTC (Tue) by luto (guest, #39314) [Link]

I assume what’s going on is that the CPU will fault on any attempt to set RIP to an address in the wrong half of the address space.

Intel has an unfortunate history of designing CPUs that validate RIP when setting RIP instead of when using RIP. This results in rather unfortunate bugs^Woutcomes when doing creative things like putting a SYSCALL instruction at the very top of the lower half of the address space. The SYSCALL works fine and sets RCX (the saved pointer to the subsequent instruction) to RIP+2, which is noncanonical. This is fine (from a very narrowly focused perspective) because RCX isn’t RIP. A subsequent SYSRET will try to set RIP to the saved value and fault. This is fine because it’s how the CPU works (which is an excuse for almost anything), but it’s barely documented. The fault will cause an exception frame to be written to the user RSP, because that’s how SYSRET works (see above about excuses). The result is privilege escalation.

AMD generally seems more sensible in this regard.

Support for Intel's LASS

Posted Jan 13, 2023 18:03 UTC (Fri) by hansendc (subscriber, #7363) [Link] (7 responses)

Even without rebooting, you can also see if vsyscall emulation is being used:

echo 1 > /sys/kernel/debug/tracing/events/vsyscall/emulate_vsyscall/enable
cat /sys/kernel/debug/tracing/trace_pipe

Running tools/testing/selftests/x86/test_vsyscall_64 will let you know whether the tracing is working or not.

BTW, if you run across a real program that cares, please do let us know.

Support for Intel's LASS

Posted Jan 13, 2023 18:42 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (6 responses)

> BTW, if you run across a real program that cares, please do let us know.

Building RHEL6 kernel in a container requires vsyscall=emulate.

Support for Intel's LASS

Posted Jan 13, 2023 20:07 UTC (Fri) by geofft (subscriber, #59789) [Link] (3 responses)

The manylinux project (widely-compatible base ABI for Python builds) ran into this with CentOS 6, too.

The solution we ended up going with was patching glibc to remove vsyscall support. The build scripts for that appear to be here: https://github.com/pypa/manylinux/tree/v2022.07.10-manyli...

You can probably use the pre-built quay.io/pypa/manylinux2010_x86_64_centos6_no_vsyscall:2020-12-19-cd3e980 container, which contains the result of that build. For your use case of compiling RHEL 6 kernels, that should work.

I also wrote a userspace vsyscall emulator using ptrace as an alternative: https://github.com/pypa/manylinux/pull/158/files It definitely will cause a performance hit because every syscall will trap into the ptracer, but for the commenter above who has a proprietary program, this might be what you need. (Though, really, this should only be a problem for proprietary programs that make syscalls directly, e.g. by being static binaries; if they call into the system libc to make syscalls, then using a newer libc should be enough.)

Support for Intel's LASS

Posted Jan 17, 2023 17:50 UTC (Tue) by luto (guest, #39314) [Link] (2 responses)

That hack seems unlikely to work with if LASS is enabled.

Support for Intel's LASS

Posted Jan 17, 2023 18:54 UTC (Tue) by geofft (subscriber, #59789) [Link] (1 responses)

LASS generates a GPF if you access something with the high bit set, right? Wouldn't that show up to userspace as a SIGBUS or something? You'd probably have to change the == SIGSEGV check in the code, but as long as it sends a catchable signal, there should be a way to make it work.

(Of course if you can use a non-vsyscall libc, that would be better....)

Support for Intel's LASS

Posted Jan 17, 2023 21:40 UTC (Tue) by luto (guest, #39314) [Link]

I’m still hoping for clarification, but I’m suspicious that RIP will point to the CALL into the vsyscall page, not into the vsyscall page.

Support for Intel's LASS

Posted Jan 14, 2023 9:07 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

Isn't RHEL6 ELS ended on Nov 30, 2022?

Support for Intel's LASS

Posted Jan 18, 2023 0:25 UTC (Wed) by judas_iscariote (guest, #47386) [Link]

correct it is an annoying bug on a distribution that is out of support. burn it with fire!

Support for Intel's LASS

Posted Jan 14, 2023 6:50 UTC (Sat) by epa (subscriber, #39769) [Link] (3 responses)

I think the description of speculative execution is a bit muddled. The second paragraph says that it runs with data not in the cache, guessing the value and continuing. But the third paragraph talks about actually accessing the data, and putting it into the cache. If speculative execution really were as described in the second paragraph, and never loaded anything from memory into the cache, then most cache-based side channel attacks would not be possible.

Support for Intel's LASS

Posted Jan 14, 2023 11:00 UTC (Sat) by matthias (subscriber, #94967) [Link] (2 responses)

Yes, this is not very clear. The data in paragraph 1 is different from the data in paragraph 3. I will sketch how meltdown works as an example.

if (a != 0) {
  if ((*b & 0x1) == 0) {
    load c
  } else {
    load d
  }
}

a is 0, but it is not in the cache and the CPU speculates that a is not 0. As the speculation is mostly statistics, one can enforce this speculation. The pointer b points to non-accessible memory (e.g. kernel memory). Based on the value of *b, either c or d is loaded into the cache. Normally the access to *b would trigger a SEGFAULT, but as a is 0, the CPU detects at some point that this was all just speculation, it ignores the fault and continues as if nothing did happen.
Now one can access c and d, measure the time this takes and conclude which of the two has been loaded into the cache. This gives away one bit from *b.
So speculative execution does affect the cache. After all you can gain the most advantage if the value (c or d) is already on its way to the cache at the point of time when the value of a finally arrives at the CPU.
The data that is not in the cache and whose value is point of speculation is a. The data that discovered is (one bit of) *b, and the data that is loaded into the cache is either c or d. The presence of c or d is used to discover the value of *b.

Support for Intel's LASS

Posted Jan 15, 2023 12:11 UTC (Sun) by ballombe (subscriber, #9523) [Link] (1 responses)

Your example require b to be in the cache, yes ?
Then it computes *b, detect the fault, but still continue with load ?

Support for Intel's LASS

Posted Jan 15, 2023 13:06 UTC (Sun) by matthias (subscriber, #94967) [Link]

Yes, b and *b should both be in the cache (the pointer and the data pointed to). Then misprediction is only detected after *b is evaluated and the load of c or d is already on its way. If *b would not be in the cache, probably a would arrive before *b and the inner if would never be evaluated. But *b would be loaded and one can just repeat the whole thing with some different a that is not yet in the cache.

Support for Intel's LASS

Posted Jan 14, 2023 9:22 UTC (Sat) by josh (subscriber, #17465) [Link] (1 responses)

One huge benefit of LASS would be if it permitted turning *off* the unmapping of kernel space when in userspace, to improve syscall performance, without losing *as* much in the way of safety.

Support for Intel's LASS

Posted Jan 14, 2023 11:02 UTC (Sat) by matthias (subscriber, #94967) [Link]

I guess that this is meant by the sentence:
> If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.

Support for Intel's LASS

Posted Jan 17, 2023 17:45 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

I'm wondering why instead of trapping they just don't disable speculation between cross-modes. It would just slightly slow down accesses to the other mode without requiring software support. The vsyscall area would still be accessible without any effort. I suspect they've already thought about it and ditched the idea, but we don't know why.

Support for Intel's LASS

Posted Jan 19, 2023 18:47 UTC (Thu) by anton (subscriber, #25547) [Link]

As described, LASS makes no sense that I can see. It only fixes a part of Meltdown (the vulnerability where programs could extract data from mapped, but PROT_NONEd pages (typically user code reading kernel pages)).

According to Intel, they put in Meltdown fixes in Coffee Lake Refresh in 2018 (AMD never was affected) and of course in later CPUs. Given that all CPUs that get LASS already have the Meltdown fixes, what's the point of LASS?

LASS would also prevent the kernel from accessing the user memory, which AFAIK would be a problem for the kernel; e.g., how does write(2) access the user-mode buffer that the kernel has to read in order to write it to a file.