LWN.net Weekly Edition for October 18, 2018
Welcome to the LWN.net Weekly Edition for October 18, 2018
This edition contains the following feature content:
- A farewell to email: email has many failings; projects are considering moving to other communications mechanisms in response.
- A new direction for i965: from XDC: a new and faster i965 graphics driver.
- Fighting Spectre with cache flushes: a novel technique for blocking potential Spectre exploits.
- OpenPGP signature spoofing using HTML: some email clients make it easy to spoof PGP signatures on incoming mail.
- I/O scheduling for single-queue devices: which is the right I/O scheduler to use by default on slower devices?
- Secure key handling using the TPM: a way to manage a large number of keys using the trusted platform module.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
A farewell to email
The free-software community was built on email, a distributed technology that allows people worldwide to communicate regardless of their particular software environment. While email remains at the core of many projects' workflow, others are increasingly trying to move away from it. A couple of recent examples show what is driving this move and where it may be headed.Email is certainly not without its problems. For many of us, plowing through the daily email stream is an ongoing chore. Development lists like linux-kernel can easily exceed 1,000 messages per day; it is thus unsurprising that the number of kernel developers who actually follow such lists has been dropping over time. Email is based on a trust model from a simpler time; now it is overwhelmed by spam, trolls, sock puppets, and more. Dealing with the spam problem alone is a constant headache for mailing-list administrators. Interacting productively via email requires acquiring a set of habits and disciplines that many find counterintuitive and tiresome. Your editor's offspring see email as something to use to communicate with their grandparents, and not much more.
It is thus not surprising that some projects are thinking about alternative ways of communicating. Even projects like the kernel, which remains resolutely tied to email, are seeing some experimentation around the edges. Some, though, are diving in more seriously, with a couple of recent experiments being found in the Fedora and Python projects.
On the Fedora side, project leader Matthew Miller recently proposed
moving the council-discuss list to Fedora's Discourse
instance. The idea was proposed as a sort of trial, with the hope that
it can "
She also added that mailing lists provide no control over where messages
are kept and are impossible to delete material from — attributes that
others will, instead, see as being an advantage of email.
Meanwhile, a significant part of the Python community was surprised at the
end of September when Łukasz Langa posted a missive titled python-committers is dead, long live
discuss.python.org. It stated that "
This move was not a welcome surprise to everybody involved; some thought
that the timing — while the project is trying to conclude a fundamental discussion on its leadership
model — was not ideal, and that the new system was being imposed on the
community without discussion. The result is that the conversation split, with
some posters moving over to Discourse and others remaining on the list.
Here, too, there has been discussion of the advantages and disadvantages of
each mode of communication.
Proponents of email value the ability to choose their own tools and
workflows; many of us who deal with large amounts of email have come up
with highly optimized setups that allow this work to be done efficiently.
Email can often be processed with scripts, and the ability to maintain
local archives can be highly useful. Mailing lists sometimes offer other
mechanisms, such as NNTP access, that facilitate quick reading. Many
people also appreciate the simple fact that email comes to the reader
rather than having to be explicitly looked for on a discussion site; as
Máirín Duffy commented
in the Fedora discussion:
For that and other reasons, she worried that a switch to a system like
Discourse could make bringing in new contributors harder rather than
easier.
Proponents of a switch note (though not in so many words) that the
indicated dopamine
hits have been thoughtfully provided: there are various
email notification mechanisms for new topics and such, and users by default
get handy notifications every time somebody "likes" one of their posts.
The mobile app makes engagement from handsets easier. There is a graduated
trust model that allows proven community members to take on some of the
moderation task, taking the load off of list administrators and community
managers; Python developer Brett Cannon looks
forward to having such features available:
He concluded by suggesting that the project has "
In both the Fedora and Python cases, the move to Discourse has been put
forward as an experiment that may or may not be rolled back, but going back
can be hard. Neal Gompa suggested
that OpenMandriva's shift to Discourse was "
Regardless of how these specific experiments work out, one conclusion is
clear. Even the people who are most tied to email are finding it
increasingly unworkable in the world we have grown into. Administering
an email installation while blocking spam and ensuring that one's emails
actually get delivered is getting harder; fewer people and organizations
want to take it on. As a result, our formerly decentralized email system
is becoming more centralized at a tiny number of email providers. If
"email" means "Gmail", it starts to lose its advantage over other
centralized sites.
As others have often said, what we need is a modern replacement for email —
some sort of decentralized solution that preserves the advantages of email
while being suitable to the 2018 Internet and attractive to our children.
Various projects have been working in this area for years, and some,
like Mastodon, appear to be
gaining some traction. But none have made any real progress toward
replacing email as a large-scale collaboration mechanism.
Your editor's free-software experience began by installing software and
applying patches received via email over Usenet. In the far-too-many
intervening years, some things have changed significantly, but the primacy
of email for development discussions remains unchallenged for many
projects. But the writing is on the wall; many mailing lists have already
gone dark as patch discussions have moved to centralized hosting systems,
and other types of conversation are starting to shift to systems like
Discourse. These systems may not be everything that we could hope for, and
they are likely to significantly slow down work for many of us. But they
also represent important experiments that will, with luck, lead to better
communications in the future. Email is not dead, but neither is FTP; soon
we may look at them both in about the same way.
Graphical applications are always pushing the limits of what the hardware
can do and
recent developments in the graphics world have caused Intel to rethink its
3D graphics driver. In particular, the lower CPU overhead that the Vulkan
driver on Intel hardware can
provide is becoming more attractive for OpenGL as well. At the 2018 X.Org Developers Conference Kenneth
Graunke
talked about an experimental re-architecting of the i965 driver using Gallium3D—a
development that came as something of a surprise to many, including him.
Graunke has been working on the Mesa
project for eight years or so; most of
that time, he has focused on the Intel 3D drivers. There are some
"exciting changes" in the Intel world that he wanted to present to the
attendees, he said.
CPU overhead has become more of a problem over the last few years. Any
time that the
driver spends doing its work is time that is taken away from the
application. There has been a lot of Vulkan adoption, with its lower CPU
overhead, but there are still lots of OpenGL applications out there. So he
wondered if the CPU overhead for OpenGL could be reduced.
Another motivation is virtual reality (VR). Presenting VR content is a
race against time, so there is no time to waste on driver overhead. In
addition, Intel has integrated graphics, where the CPU and GPU
share the same power envelope; if the CPU needs more power, the GPU
cannot be clocked as high as it could be.
Using less CPU leads to more watts available
for GPU processing.
For the Intel drivers, profilers show that "draw-time
has always been [...] the volcanically hot
path" and, in particular, state upload (sending the state of the OpenGL
context to the GPU) is the major component of that.
There are three different approaches to handling state upload in an OpenGL
driver that he wanted to compare, he said. OpenGL is often seen as a
"mutable state machine"; it has a context that has a "million different
settings that you can tweak". He likens it to an audio mixing board, which
has lots of different knobs that each do something different. At its
heart, OpenGL programs are setting these knobs, drawing, then setting them
and drawing again—over and over.
The first way to handle state tracking is with "state streaming":
translate the knobs that have been changed and send those down to the GPU.
This assumes that it is not worth reusing any state from previous draws; since
state is changing all the time and every draw call could be completely new,
these drivers just translate as little as possible of the state changes, as
efficiently as possible, before sending them to the GPU.
Early on, he asked his mentor about why state was not being reused and was
told that cache lookups are too expensive. Essentially the context and
state have to be turned into some kind of hash key that gets looked up in
the cache. If there is a miss, that state needs to be recalculated and
sent to the GPU, so you may as well just have done the translation. This
is what i965 does "and it works OK", but it does make draw-time dominate,
which leads to efforts to shave microseconds off the draw-time.
But he started thinking about this idea of "not reusing state" some. He put
up an image
from a game he
has been playing lately and noted that it is drawn 60 times per second; the
scene is fairly static. So
the same objects get drawn many, many times. It is not all static, of
course, so maybe a character walks into the scene in shiny armor and you
need to figure out how to display that in the first frame, but the next 59
frames can reuse that. "This 'I've never seen your state before' idea is
kinda bunk", he said.
The second mechanism is the one that Vulkan uses, which is to have
"pre-baked pipelines". The idea is to create pipeline objects for each
kind of object displayed in the scene. That makes draw-time "dirt cheap"
because you just bind a pipeline and draw, over and over again. If the
applications are set up to do this, "it is wonderful", but OpenGL applications
are not, so it is not really an option.
The third option, surprisingly, is Gallium3D (or simply "Gallium"), he
said. He has been
learning that it is basically a hybrid approach between state streaming and
pre-baked pipelines. It uses "constant state objects" (CSOs), which are
immutable objects that capture a part of the GPU state and can be cached
across multiple draw operations. CSOs are essentially a Vulkan pipeline
that has been chopped up into pieces that can be mixed and matched as
needed. Things like the blending state, rasterization mode, viewport, and
shader would each have their own CSO. The driver can associate the actual
GPU commands needed to achieve that state with the CSO.
The Gallium state tracker essentially converts the OpenGL mutable API into
the immutable CSOs that make up the Gallium world. That means the driver
really only has to deal with CSOs and Gallium handles the
messy OpenGL context..
The state tracker
looks at the OpenGL context, tracks what's dirty, and ideally rediscovers
cached CSOs for the new state. The state tracker helps "get rid of a bunch
of nonsense from the API", Graunke said. For example, handling the
different coordinate systems between the driver, GPU, and the window system
is much
simplified using Gallium. That simplifying can be done before the cache
lookup occurs, which may mean more cache hits for CSOs.
A consistent critique of Gallium is that it adds an extra layer into the
driver path. While that's true, there is a lot of work done in a Classic
(or state-streaming) driver to convert the context into GPU commands. There is
far less work needed to set things up to be looked up in the cache for a
Gallium driver, but if there is a cache miss, there will be additional work
needed to create (and cache) the new CSO. But even if the sum of those two
steps is larger for Gallium, and it generally is, the second step is
skipped much of the time,
which means that a Gallium-based driver may well be more efficient.
The i965 driver is one of the only non-Gallium Mesa drivers. Graunke said
that the developers knew it could be better than it is in terms of its CPU
usage. The code itself is
pretty efficient, but the state tracking is too coarse-grained, which means
that this efficient code executes too often. Most of the workloads they see are
GPU bound, so they spent a lot of time improving the compiler, adding color
compression, improving cache utilization, and
the like to make the GPU processing more efficient.
But, CPU usage could be improved and "people loved to point that out", he
said. It was a source of criticism from various Intel teams internally,
but there was also Twitter shaming from Vulkan fans. The last straw for him was
data showing that the i965 driver was "obliterated" by the performance of
the AMD RadeonSI
driver on a microbenchmark. That started him on the path toward seeing what
could be done to fix the CPU side of the equation.
A worst case for i965 is when an application binds a new texture or
modifies one. The i965 driver does not have good tracking for the texture
state, so it has to do a bunch of retranslations for
each other texture and image that are bound to that texture in any shader stage.
It is a lot of
stuff to do for a relatively simple operation. Reusing some state would
help a lot, but it is hard to do for surprising reasons.
Back in the "bad old days of Intel hardware", there was one virtual GPU
address space for all
processes. The driver told the kernel about its buffers and the kernel
allocated addresses for them. But the commands needed to refer to those
buffers using pointers when the addresses had not yet been assigned, so
the driver gave
the kernel a list
of pointers in those buffers that needed to be patched up when the buffer
was assigned to an address. Intel GPUs save the last known GPU state in a
hardware
context that could potentially be reused, but it includes pointers to unpatched
addresses, so no state that
involves pointers can be saved and reused. The texture
state has pointers, which leads to the worst case he described.
Modern hardware does not suffer from these constraints. More recent Intel GPUs
have 256TB of virtual GPU address space per process. The "softpin" kernel
feature available since Linux 4.5 allows user space to assign the
virtual addresses and never have to worry about them being
changed. That allows the state to be pre-baked or inherited even if it
has pointers.
Other changes also make state reuse much easier.
So that led to the obvious conclusion that state reuse needed to be added
to the i965 driver. But in order to do that, an architectural overhaul was
needed. It required a fundamental reworking of the state upload code.
Trying to do that in the production driver was "kinda miserable".
Adding these kinds of changes incrementally is difficult. Enterprise
kernels also complicate things, since there are customers using new
hardware on way-old kernels—features like softpin are not available on those.
Beyond that, the driver needs to support old hardware that still has all of
the memory constraints, so he would need to support both old and new memory
management and state management in the same driver.
He finally realized that Gallium gave him "a way out of all of these
problems". In retrospect that all may seem kind of obvious, so why didn't
Intel do this switch earlier? In looking back, though, Gallium never
seemed to be the answer to the problems the developers were facing along
the way. It didn't magically get them from OpenGL 2.1 to 4.5;
that required lots of feature work. The shader compiler story was lacking
or not really viable (because it was based on TGSI). It
also didn't solve their driver performance
problems or enable new hardware.
Switching to Gallium would have allowed the driver to support multiple
APIs, but that was not really of interest at the time. All in all, it
looked like a
huge pile of work for no real gain. But things have changed, Graunke said.
Gallium has gotten a lot better due to lots of work by the Mesa community.
Threading support has been added. NIR
is a viable option instead of TGSI. In addition, the i965 driver has
gotten more modular due to Vulkan. It seemed possible to switch to
Gallium, but it was still a huge effort, and he wanted to be sure that it
would actually pay off to do so.
That led to his big experiment. In November 2017, he started over from
scratch using the noop driver as a template. He borrowed ideas from the
Vulkan driver and focused on just the latest hardware and kernel versions.
He wanted to be free to experiment so he did not publicize his work, though
he did put it in his Git repository in January. If it did not work out, he
wanted to be able to scrap it without any fanfare.
After ten months, he is ready to declare the experiment a success. The
Iris driver is available in his repository (this list of
commits is what he pointed to). It is primarily of interest to driver
developers at this point and is not ready for regular users.
One interesting note is that Iris uses no TGSI; most drivers use some TGSI,
rather than translating everything to NIR, but Iris did not take that
approach.
It only
supports Skylake (Gen9+) hardware and kernel 4.16+, though he could
make it work for any 4.5+ kernel if needed.
The driver passes 87% of the piglit OpenGL tests. It can run some
applications, but others run into bugs. There are several missing features
that are being worked on or at least thought about at this point. But
enough is there that he can finally get an answer to whether switching to
Gallium makes sense; the performance can be measured and the numbers will
be in the right ballpark to allow conclusions to be drawn.
He put up performance numbers on the draw overhead using piglit for five
different scenarios (which can be seen in his slides
[PDF] and there is more description of the benchmark and results in the
YouTube video of the
talk).
It went from roughly two million draw calls per second to over nine million
in the
simplest case; in the worst case (from the perspective of the i965 driver)
it went from half a million to almost nine million.
On average, Iris can do 5.45x more draw calls per second than
i965. Those are good numbers, but using threaded contexts results in even
better numbers (6.48x for the simple case and 20.8x for the worst), though
he cautioned that support for threaded contexts in Iris is still
not stable, so those numbers should be taken with a grain of salt.
The question "is Iris worthwhile?" can be answered with a "yes". But
reducing the CPU overhead when most workloads are GPU bound may not truly
reflect reality. The microbenchmark he used is "kind of the ideal
case for a Gallium driver" since it uses back-to-back draws that are
hitting the CSO cache. That said, going back to his observations about
displaying a game, it may well be representative for the 59 of 60 frames per
second where little changes. There is a need to measure real programs; one demo
he ran on a low-power Atom-based system was 19% faster, but a bunch of
others showed no difference with i965 at all. "So, your mileage may vary,
but it's still pretty exciting", he said.
Graunke believes that this work has settled the Classic
versus Gallium driver debate in favor of the latter. Also, the Gallium
interface is
much nicer to work with than the Classic interface. He does not regret
the path Intel took, but he is excited about the future; Iris is a much better
architecture for that future. In addition, he believes that
RadeonSI, and now Iris, have basically debunked the myth that Mesa itself is
slow. i965 may be slow, but that is not really an indictment of Mesa.
There is a lot of work left to do and lots of bugs to fix. He needs to
finish getting the piglit running as well as doing so for the conformance test
suites (CTS) for OpenGL. He has started running the CTS, which is looking
good so far. He still needs to test lots of applications and there is work
to be done cleaning up some of his Gallium hacks before the driver can go
upstream.
Beyond that, he wants to look at Iris performance on real applications and
compare it to i965 to see if there are places where Iris can be made even
faster. He would like to use FrameRetrace
on application data
as part of that process.
Now that Intel has joined the rest of the community in using Gallium, that
is probably a good
opportunity for the whole community to think about
where Mesa should go from here. All of the drivers will be Gallium-based
moving forward, so the community can collaborate with a focus on further
Gallium enhancements. Gallium is not the ideal
infrastructure (nor is Classic), but by dreaming about what could come in
the future, the Mesa community can evolve it into something awesome, he said.
In the Q&A, he was asked about moving more toward a Vulkan-style
driver. Graunke noted that there are several efforts to implement OpenGL
on Vulkan and that he is interested to see where they go. There is
something of an impedance mismatch between the baked-in pipelines of Vulkan
and the wildly mutable OpenGL world and it is not clear to him whether that
can be resolved reasonably. For Iris, he chose the route that RadeonSI had
taken and proven, but if the Vulkan efforts pan out, that could be
something to look at down the road.
[I would like to thank the X.Org Foundation and LWN's travel sponsor, the
Linux Foundation, for travel assistance to A Coruña for XDC.]
Speculative-execution vulnerabilities are only exploitable if they leave a
sign somewhere else in the system. As a general rule, that "somewhere
else" is the CPU's memory cache. Speculative execution can be used to load
data into the cache (or not) depending on the value of the data the
attacker is trying to exfiltrate; timing attacks can then be employed to
query the state of the cache and complete the attack. This side channel is a
necessary part of any speculative-execution exploit.
It has thus been clear from the beginning that one way of blocking these
attacks is to flush the memory caches at well-chosen times, clearing out
the exfiltrated information before the attacker can get to it. That is,
unfortunately, an
expensive thing to do. Flushing the cache after every system call would
likely block a wide range of speculative attacks, but it would also slow
the system to the point that users would be looking for ways to turn the
mechanism off. Security is all-important — except when you have to get
some work done.
Kristen Carlson Accardi recently posted a
patch that is based on an interesting observation. Attacks using
speculative execution involve convincing the processor
to speculate down a path that non-speculative execution will not follow.
For example, a kernel function may contain a bounds check that will prevent the
code from accessing beyond the end of an array, causing an error to be
returned instead. An attack using the Spectre vulnerability will bypass
that check speculatively, accessing data that the code was specifically
(and correctly) written not to access.
In other words, the attack is doing something speculatively that, when the
speculation is unwound, results in an error return to the calling program —
but, by then, the damage is done. The error return is a clue that
there maybe something inappropriate going on. So Accardi's patch will, in
the case of certain error returns from system calls, flush the L1 processor
cache before returning to user space. In particular, the core of the
change looks like this:
The code exempts some of the most common errors from the cache-flush
policy, which makes sense. Errors like EAGAIN and ENOENT
are common in normal program execution but are not the sort of errors that
are likely to be generated by speculative attacks; one would expect
an error like EINVAL in such cases. So exempting those errors
should significantly reduce the cost of this mitigation without
significantly reducing the protection that it provides.
(Of course, the code as written above doesn't quite work right, as was pointed
out by Thomas Gleixner, but the fix is easy and the posted patch shows
the desired result.)
Alan Cox argued
for this patch, saying:
Andy Lutomirski is
not convinced, though. He argued that there are a number of possible
ways around this protection. An attacker running on a hyperthreaded
sibling could attempt to get the data out of the L1 cache between the
speculative exploit and the cache flush, though Cox said that the time
window available would be difficult to hit. Fancier techniques, such as
loading the cache lines of interest onto a different CPU and watching to
see when they are "stolen" by the CPU running the attack could be
attempted. Or perhaps the data of interest is still in the L2 cache and
could be probed for there. In the end, he
said:
Answering Lutomirski's criticisms is probably necessary to get this patch
set merged. Doing so would require providing some numbers for what the
overhead of this change really is; Cox claimed
that it is " Beyond just encrypting messages, and thus providing secrecy, the OpenPGP
standard also enables digitally signing messages to authenticate
the sender. Email applications and plugins usually verify these
signatures automatically and will show whether an email contains a valid
signature. However, with a surprisingly simple attack, it's often possible
to fool
users by faking — or spoofing — the indication of a valid signature using
HTML email. For example, until version 2.0.7, the Enigmail plugin for Mozilla
Thunderbird
displayed a correct and fully trusted signature as a green bar above the
actual mail content. The problem: when HTML mails are enabled this part of
the user interface can be fully controlled by the mail sender. Below is an
image of real signed message in Enigmail:
The signature below has been faked:
Thus an attacker can simply fake a signature by crafting an HTML mail
that will display the green bar. This can, for example, be done by using a
screen shot of a valid signature and embedding it as an image.
The attack isn't perfect: in addition to the green bar, Enigmail shows
an icon of a sealed letter in the header part of the mail. That part can't
be faked. However the green bar is the much more visible indicator. It's
plausible to assume that many users would fall for such a fake.
After learning
about this attack, the Enigmail developers changed the behavior in version
2.0.8, which was released in August. Enigmail now displays the security
indicator above the mail headers and thus outside of an attacker's
control. An even better fake can be achieved in the KDE
application KMail. Signed mail in KMail is framed with a green bar
above and below the mail. This can easily be recreated with an HTML
table. If the message is viewed in a separate window, the fake is
indistinguishable from a proper signature. Only a look into the message
overview can uncover the fake: an icon indicates a signed mail. Similar attacks are possible in other mail clients, too. In GNOME's
Evolution, a green bar is displayed below the mail, but a faked signature has some minor
display issues due to a border drawn around the mail
content. A plugin for the web-mail client, Roundcube, is similarly
vulnerable. In many cases, signatures using S/MIME are also vulnerable. The examples mentioned all rely on HTML mail to fake signature
indicators. HTML mail is often seen as a security risk, which was
particularly highlighted by the EFAIL attack that was published
by a research team earlier this year. EFAIL relied largely on HTML mail as
a way to exfiltrate decrypted message content. Security-conscious users
often disable the display of HTML mail, but all of the mail clients
mentioned have the automatic display of HTML messages enabled by
default. In the GPGTools plugin for Apple
Mail, this kind of attack was not
possible using the
same technique. The indication of a valid signature is displayed as a
pseudo-header. A correctly signed mail contains a header indicating
"Security: Signed" below the "To:" field. Since the mail headers are
clearly separate from the mail content it's not
possible to use HTML mail to achieve anything here.
However, it turns out
that it's possible to inject additional lines into the "Subject:" header
that look like an additional mail header. This can be achieved in two ways:
either by encoding a newline character in the subject line or by sending
multiple subject headers within one mail. Despite being able to inject additional fake mail headers, the
attacker can't inject headers below the "To:" line. Thus, the order of the
headers is not correct in a fake created with this trick. The pseudo-header
also contains an icon: a check mark in a circle. While some similar
characters exist in the Unicode standard, this exact icon cannot be
replicated by an attacker. Despite the shortcomings of this attack, the developers of GPGTools
released an update that avoids this attack vector. Apple has not
yet commented on the issue that, at its core, is a bug in Apple Mail, which
should not allow multi-line subjects. The text-mode client Mutt naturally isn't vulnerable to HTML-based
attacks. Yet the most prominent indication of a signed message in Mutt is simply
the output of the GPG verification command within the mail. One can easily
send this output as part of the mail body. However, at the bottom of the
screen, there is a message confirming the signature that cannot be faked.
Here we see a real signature verification in Mutt:
And here is an example of a faked signature verification:
When asked about this in email, a Mutt developer confirmed that the
developers
are aware of
the issue and that several mitigations exist. Notably, enabling colors for
signed messages makes this attack almost useless, as an attacker can't
change colors of a mail in Mutt's standard settings. It's worth mentioning that all these attacks require the attacker to
know details about the victim's system, in particular, the mail
client they use. However this information often isn't secret. Many mail clients
routinely include the "X-Mailer:" header that specifies which mail client and
version was used. The attacks on signatures are a case of user-interface (UI) attacks. These
don't just
affect mail clients, they can happen anywhere that an attacker can
potentially influence the look of some UI element. In the past it was possible to fake a browser's URL-entry bar in a
popup. This
is prevented in modern browsers; all browser windows —
including popups — always contain a real URL bar. Yet similar attacks
remain possible on
mobile browsers [PDF]. Other attacks of that kind are still
possible. One is the picture-in-picture
attack, where a web page may simply draw a new window that looks just
like a whole browser
within the page. Such a trick was recently used in an attempt
to phish Steam user accounts. UI attacks are nothing new, but they haven't been properly investigated
in the email space until recently. The first defense is obvious: the
security indicators need to be moved out of the attacker-controlled
space. For users, disabling HTML mail display and promptly applying
security updates to mail-handling programs — good security practices in any
case — are the best ways to avoid these fakes. [I presented this attack in a lightning talk at the
SEC-T conference [YouTube] and was interviewed
afterward [YouTube] for a podcast.]
A bit of review for those who haven't been following the block layer
closely may be in order. There are two generations of the internal API
used between the block layer and the underlying device drivers, which we
can call "legacy" and "multiqueue". Unsurprisingly, the legacy API is
older, while the multiqueue API was first
merged in 3.13. The conversion
of block drivers to the multiqueue API has been ongoing since then, with
the SCSI subsystem only switching
over, after a false start, in the upcoming 4.19 release. Most of the
remaining holdout legacy drivers will be
converted to multiqueue in the near future, at which point the legacy
API can be expected to go away.
Several I/O schedulers exist for the legacy interface but, in practice,
only two are in common use: cfq for slower drives and
none for fast, solid-state devices. The multiqueue interface was
aimed at fast devices from the outset; it was not able to support an I/O
scheduler at all initially. That capability was added later, along with
the mq-deadline scheduler, which was essentially a forward port of
one of
the simpler legacy schedulers (deadline). BFQ, which came later,
is also a multiqueue-API scheduler.
In early October Linus Walleij posted
a patch making BFQ the default I/O scheduler for single-queue devices
driven by way of the multiqueue API.
The idea of a single-queue multiqueue device may seem a bit contradictory
at a first encounter, but one should remember that "multiqueue" refers
to the API which, unlike the legacy API, is capable of handling block
devices that implement more
than one queue in hardware (but does not require multiple queues). As more
drivers move to this API, more
single-queue devices will be driven using it. In this particular case,
Walleij is concerned with SD cards and similar devices, the kind often
found in mobile systems. The expectation is that devices with a single
hardware queue can be expected to be relatively slow, and that BFQ will
extract better performance from those devices.
The initial response
from block subsystem maintainer Jens Axboe was not entirely positive:
"
There were a few objections made to Axboe's position. Paolo Valente, the
creator of BFQ, asserted
that almost nobody understands I/O schedulers or how to choose one, so
almost everybody will stick with whatever default the system gives them.
And mq-deadline, as a default, is far worse than BFQ for such
devices, he said. Walleij added
that there are quite a few systems out there that do not use udev
at all, so a rule-based approach will not work for them. On embedded
systems where initramfs is not in use, it's currently not possible
to mount the root filesystem using BFQ at all. As an additional
practical difficulty, the number of hardware queues provided by a device is
currently not made available to udev, so it could not effect this
particular policy choice in any case (though that would be relatively
straightforward to fix).
Oleksandr Natalenko was
not impressed by the embedded-systems argument; he said that the people
building such systems know which I/O scheduler they should use and can build
their systems accordingly. Mark Brown took issue
with that view of things, though:
Walleij echoed
that view, and added that there have been many times in kernel history
where the decision was made to try to do the right thing automatically
when possible, without requiring intervention from user space.
Bart Van Assche, instead, questioned the superiority of the BFQ scheduler.
He initially claimed that
it would slow down kernel builds (a sure way to prevent your code from
being merged), but Valente challenged
that assessment. Van Assche's other concern,
though, had to do with fast solid-state SATA drives. Once SCSI switches
over to the multiqueue API, those drives will show up with a single queue,
and will thus be affected by this change. He questioned whether BFQ could
be as fast as mq-deadline in that situation, but did not present
any test results.
One other potential problem, as pointed
out by Damien Le Moal, is shingled magnetic recording (SMR) disks,
which often require that write operations arrive in a specific order. BFQ
does not provide the same ordering guarantees that mq-deadline
does, so attempts to use it with SMR drives are unlikely to lead to a high
level of user satisfaction. Valente has
a plan for how to support those drives in BFQ, but he acknowledged that
they will not work correctly now.
The discussion wound down without reaching any sort of clear conclusion.
It would appear that, before being merged, a patch of this nature would
need to gain some additional checks to ensure, at a minimum, that BFQ is
not selected for hardware that it cannot schedule properly. No such
revision has been posted as of this writing. The proponents of BFQ seem
unlikely to give up in the near future, though, so this topic seems like
one that can be expected to arise again.
Trusted Computing has not had the best
reputation over the years — Richard Stallman dubbing it "Treacherous
Computing" probably hasn't helped — though those fears of taking away
users' control of their computers have not proven to be founded, at least yet.
But the Trusted
Platform Module, or TPM, inside your computer can do more than just
potentially enable lockdown. In our second report from
Kernel Recipes 2018,
we look at a talk from James Bottomley about how the TPM works,
how to talk to it, and how he's using it to improve his key handling.
Everyone wants to protect their secrets and, in
a modern cryptographic context, this means protecting private keys. In the most
common use of asymmetric cryptography, private keys are used to prove
identity online, so control of a private key means control of that online
identity. How damaging this can be depends on how much trust is placed in
a particular key: in some cases those keys are used to sign contracts, in
which case someone who absconds with a private key can impersonate someone on
legal documents — this is bad.
The usual solution to this is hardware security modules, nearly all of which
are USB dongles or smart cards accessed via USB. Bottomley sees
the problem with these as capacity: most USB devices can only cope with one or
two key pairs, and smart cards tend to only hold three. His poster child
in this
regard is Ted Ts'o, whose physical keyring apparently has about eleven YubiKeys on it.
Bottomley's laptop has two VPN keys, four SSH keys, three GPG keys (because of
the way he uses subkeys) and about three other keys. Twelve keys is beyond the
capacity of any USB device that he knows of.
But nearly all modern general-purpose computing hardware has a TPM in it;
laptops have had them for over ten years. The only modern computers
that don't have one, said Bottomley, are Apple devices "and nobody
here would use one of those". The TPM is a small integrated circuit
on the motherboard that the CPU can talk to, which can perform
cryptographic operations. The advantage of
using the TPM for key management is that the TPM can scale
to thousands of keys, though the mechanism by which it does this is,
as we shall see, interesting. Most of the TPMs currently in the field
are TPM 1.2 compliant. Thanks largely to Microsoft, he added, TPM 2.0 is
the up-and-coming standard; some modern Dell laptops are already shipping
with TPM 2.0, even though the standard isn't actually finalized yet.
The main problem with TPMs under Linux is that accessing them has
been a "horrifically bad" experience; anyone who tried to use them has
ended up never wanting to deal with them again. The mandated model for
dealing with TPMs is the Trusted Computing Group (TCG)
Software
Stack (TSS),
which is "pretty complicated". The Linux implementation of TSS 1.2 is TrouSerS; it was completed
in 2012, and has "bit-rotted naturally since then", but remains
the only way to use TPM 1.2 under Linux. The architecture involves a
TrouSerS library that is linked to any relevant application, which talks
to a user-space daemon (tcsd), which in turn talks to the TPM through a
kernel driver. One of the many issues Bottomley has with this is that the
daemon is
a natural point of attack for anyone looking to run off with TPM secrets. For
his day job, which is running containers in the cloud, the design is completely
unacceptable: cloud tenants will not tolerate "anything daft in security terms",
and a single daemon through which the secrets of all the system's tenants pass
is certainly
that.
The added functionality in TPM 2.0 means we can do much better
than the TCG model. For many people in the industry, departure from the
TCG model is heresy, but fortunately, said Bottomley, there are large
free-wheeling sections of the industry (i.e. Linux people) who are happy to
use any programming model that actually works. As with 1.2, the TCG
is writing
a software stack, but it's naturally even more complex than 1.2's TSS.
IBM has
had a fully-functional
TPM 2.0 stack since May 2015 and Intel is writing its own. As
of 4.12, the Linux kernel has a TPM 2.0 resource manager.
Under TPM 1.2, the asymmetric encryption algorithm was 2048-bit RSA
(RSA2048),
which is still good enough today. The hashing function was SHA-1,
which isn't good enough any more. To avoid falling into the trap of
mandating an algorithm that ages badly, TPM 2.0 features "algorithm
agility", whereby it can support multiple hashing functions
and encryption algorithms, and will use whichever the user requests.
In practice, this usually means that in addition to SHA-1 and RSA2048,
SHA-256 and a few elliptic curves are supported, and this is good enough
for another few years. When SHA-256 starts to show its age, TPM 2 can
support a move to, for example, the SHA-3 family of algorithms.
There are problems
with elliptic curves, however; most TPMs support only BN-256 and NIST P-256,
but nobody uses the former, and the NSA had a hand in the creation of
the latter, so nobody trusts it. Attempts to get more curves added tend
to run into borders; some nations don't like crypto being exported,
and others will only allow it to be imported if it's crypto they like.
Curve25519 has not
apparently even been on the TCG's radar, although Bottomley said there's
now a request into the TCG to approve it, and since it's already been
published worldwide, there is some chance that it may be generally
approved as a standard addition to the TPM.
The TPM's functions include shielded key handling, things related to
trusted operating systems
including measurement and attestation, and data sealing; Bottomley focused on
the key-handling capabilities. The TPM organizes keys in hierarchies with a
well-known key at the top; this is a key, generated by the TPM, the private half
of which is known only to the TPM itself, and the public half of which it
will give to
anyone who asks. This enables the secure exchange of secrets with the TPM. If
the TPM is reinitialized, the random internal seed from which the TPM derives
its
well-known key is regenerated, and anything the TPM has ever encrypted
becomes unusable. This is important in terms of decommissioning devices;
reinitializing the TPM is enough to render it, and anything it has ever
encrypted, harmless.
Bottomley said that the TPM is capable of scaling to thousands
of keys. While that's not exactly false, the TPM accomplishes this by not
storing any
of your keys at all. The TPM has very little memory
and can only hold around three transient keys at any one time. Key handling
is initially a matter of feeding a secret key to the TPM, which encrypts it
into a
"key blob" that only the TPM can read, which the TPM then gives back to you.
Key storage is your responsibility, but with the safety net that
these key blobs are meaningless to any TPM save yours, so they are not the
difficult-to-store dynamite that an unprotected private key is. You can
safely pass the blob in plaintext to the TPM any time you want the TPM
to perform a key operation. The TPM will not decrypt a blob and give
you the private key inside it; it will only perform operations with such
a key. This also means that if you want to make backups of a private
key, it is quite important to do it before you feed it to the TPM.
If you ever get a new system, migrating your identity to it will not
be possible unless you have these backups stored safely.
Prior to about March 2018, it was generally thought the the physical
closeness of the CPU and the TPM made the channel between them
effectively secure, at least without physical access to the hardware.
Unfortunately, this has turned out not to be true, and a software tool
called TPM Genie,
which can attack this channel, was released. TCG's response was to
develop the Enhanced
System API (ESAPI), which allows for encrypted
parameters in communication with the TPM (and for those of a suspicious
mindset, it also allows the TPM to prove it's really a TPM).
ESAPI has
its wrinkles: encrypted commands support only one encrypted argument,
which must be the first, but some secret-handling commands, such as
the one to import a private key in the first place, pass that key as
the second argument, which was apparently overlooked by the TCG.
But these problems were solved (in that
specific case, by modifying the import command so that
a randomly generated single-use symmetric key was used to encrypt the
private key, and passed in the
secure first field so that the TPM can decrypt the private key).
Having hardware support is all very well, but if user space can't easily
come to grips with it, it's useless. Support in user space has come by
recognizing that existing cryptosystems tend to use passphrase-protected
(i.e. passphrase-encrypted) key files. With fairly small modifications,
these can be
passphrase-protected key blobs; possession of the passphrase allowing
you to enable the OS to feed the blob to the TPM. This turns
key-file-based systems into full two-factor authentication: if your
laptop is stolen, each blob is passphrase-encrypted and cannot be fed to
the TPM without that passphrase, and if your passphrase is compromised
it is of no use to an attacker without your physical laptop.
OpenSSL now has a TPM 2.0 engine, though there is a problem in that many
projects that use OpenSSL don't load the engine part of the API, thus
cannot use engine-based keys. However, the fix is usually just
a couple of lines of extra code in any program that uses OpenSSL, to ensure
that the engine is loaded when OpenSSL is initialized. OpenVPN, for
example, has been
converted this way and can use the TPM. OpenSSH has been converted, though the
agent requires additional patching. GnuPG uses libgcrypt, not OpenSSL,
but after discussion at last year's Kernel Recipes, Bottomley has written
a TPM module for GnuPG, which he demonstrated as part of the talk. Sbsigntools
and gnome-keyring also now have TPM support.
Some use cases remain unsupported: the TPM is a slow, low-power device
that cannot handle hundreds of key operations per second, so it will
likely remain unusable for things like full-disk encryption.
In response to an audience question about audit, Bottomley
accepted that TPM 1.2 was poor in this regard. This allowed
a number of issues to slip through the net, such as
weak prime generation for Infineon TPMs. For TPM 2.0, the TCG
has upped its game; there is a
certification program involving a standardized test suite to check correct
operation. There is also a reference implementation, written by Microsoft,
but available under a BSD license, though apparently it is not widely
used by
TPM vendors.
The days of practical username/password authentication have already gone. The
more we can support two-factor authentication, the more secure a future we can
reasonably expect. The TPM is by no means the only way of doing this — I
shall continue to use my YubiKey over NFC, and my GPG smartcard — but as
TPM 2.0 hardware becomes more widespread it's an increasingly practical way of
doing it, and it gets utility from a part of your computer that until now
may have largely been looked on with disdain.
[We would like to thank LWN's travel sponsor, The Linux Foundation, for assistance with travel funding for Kernel Recipes.]
help increase engagement' in council discussions. Sanja
Bonic added
some reasons for why one might expect that to happen:
we have reached the limits
of what is possible with mailing lists
" and fingered email as
contributing to some of the project's community-management problems. He
pitched the advantages of Python's Discourse instance,
including the mobile application, community moderation, the ability to move
discussions between categories, rich media support, code syntax
highlighting, dynamic notifications, social login support, and
"emojis
". The email asked all members of the
python-committers list to cease using it immediately and start talking on
Discourse instead.
simply gotten too big
for email
".
largely a failure
"
when it comes to developer discussions (it worked better for user support),
but the project still has not gone back to using a mailing list for those
discussions. Moving a project to a new communication medium is hard;
declaring defeat and moving back can be even harder, especially since
people will have become attached to the numerous aspects of the new system
that do actually work better.
A new direction for i965
Handling state
i965
Enter Gallium
Results
Fighting Spectre with cache flushes
One of the more difficult aspects of the Spectre hardware vulnerability is
finding all of the locations in the code that might be exploitable. There
are many locations that look vulnerable that aren't, and others that are
exploitable without being obvious. It has long been clear that finding all
of the exploitable spots is a long-term task, and keeping new ones from
being introduced will not be easy. But there may be a simple technique that
can block a large subset of the possible exploits with a minimal cost.
__visible inline void l1_cache_flush(struct pt_regs *regs)
{
if (IS_ENABLED(CONFIG_SYSCALL_FLUSH) &&
static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
if (regs->ax == 0 || regs->ax == -EAGAIN ||
regs->ax == -EEXIST || regs->ax == -ENOENT ||
regs->ax == -EXDEV || regs->ax == -ETIMEDOUT ||
regs->ax == -ENOTCONN || regs->ax == -EINPROGRESS)
return;
wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
}
}
pretty much zero
" but no hard numbers have been
posted. The other useful piece would be to show some current exploits that
would be blocked by this change.
If that information can be provided, though (and the bug in the patch
fixed), flushing the L1 cache could yet prove to be a relatively cheap and
effective way to block Spectre exploits that have not yet been fixed by
more direct means. As a way of hardening the system overall, it seems
worthy of consideration.
OpenPGP signature spoofing using HTML
Perfect fakes
Text fakes
User-interface attacks
I/O scheduling for single-queue devices
Block I/O performance can be one of the determining factors for the
performance of a system as a whole, especially on systems with slower
drives. The need to optimize I/O patterns has led to the development of a
long series of I/O schedulers over the years; one of the most recent of
those is BFQ, which was merged during the
4.12 development cycle. BFQ incorporates an impressive set of heuristics
designed to improve interactive performance, but it has, thus far, seen
relatively little uptake in deployed systems. An attempt to make BFQ the
default I/O scheduler for some types of storage devices has raised some
interesting questions, though, on how such decisions should be made.
I think this should just be done with udev rules, and I'd prefer if
the distros would lead the way on this
". This approach is not
inconsistent with how the kernel tries to do things in general, leaving
policy decisions to user space. But, of course, current kernels, by
selecting mq-deadline for such devices, are already implementing a
specific policy.
Secure key handling using the TPM
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Tutanota; Bro becomes Zeek; SFLC on automotive software and copyleft; Quotes; ...
- Announcements: Newsletters; events; security updates; kernel patches; ...