The morning of day two of this year's Linux
Security Summit was filled with reports from various kernel
security subsystem maintainers. Each spoke for 20 minutes or so, generally
progress in the last year, as well as plans for the future.
Herbert Xu reviewed some of the changes that have come for the kernel
crypto subsystem, starting with the new user-space API. Since cryptography
can be done in user space, providing an API to do it in the kernel may seem
a bit roundabout, but it is important so that user space can access
hardware crypto accelerators. The API is targeted at crypto offload devices that were
not accessible to user space before.
The interface is socket-based, so data can be sent to devices using
write() or send(). For large amounts of data,
splice() can be used for zero copy I/O. The API is "completely
extensible". It doesn't currently handle asymmetric key cryptography, for
example, but that could be easily added.
There is also a new user-space control interface for configuring the kernel
algorithms. For example, there are multiple AES algorithms available that
are optimized for different processors. The performance of the optimized
versions may be 20-30 times better than the generic C implementation. The
system can often figure out the right one to use, Xu said, but some
variants are not easily chosen automatically, so there is a need for this
Parallelizing the crypto algorithms using pcrypt is a case in point. In
it may make sense to spread the crypto work around on different processors,
but it can sometimes degrade performance. It was designed for the IPSec
use case, but there needs to be an administrative interface to choose. That
interface is netlink-based and allows users to select the priority of the
algorithms that are used by the kernel.
Optimizations of crypto algorithms for various CPUs have also been added.
The SHA-1 algorithm has been enhanced to use the SSE3 instructions for x86
processors, and more AES-NI modes for x86 have been added. There is now SHA
support on the VIA Nano processor as well. The arc4 cipher has added
"block" cipher support, which means that it can be handed more than a
single byte at a time (as was required before).
Support for new hardware has also been added, including picoXcell, CAAM,
s5p-sss, and ux500. Those are all non-x86 crypto offload devices.
Finally, Xu noted that asymmetric key ciphers have finally been added to
the kernel. He had wanted them for some time, but there were no in-kernel
users. Now, "thanks to IMA and module signing", there are such users, so
that code, along with hardware acceleration and a user-space interface, has
The AppArmor access control mechanism has seen some incremental
improvements over the last year, John Johansen reported. One
focus has been on eliminating the out-of-tree patches to complete the
AppArmor system. There are some "critical pieces" missing, particularly in
the upstream version of AppArmor, he said.
Several things have landed in AppArmor, including some bug fixes and the
aafs (AppArmor filesystem) introspection interface. The latter allows
programs to examine the rules and policies that have been established in
A larger set of changes have been made on the user-space side. The project
has standardized on Python, so some tools got rewritten in that language,
while others were ported to support Python 3. In addition, the policy
language has been made more consistent, and some simple shortcuts have been
added to make it easier to use.
The policy compiler has been improved as well, both in terms of memory
usage and performance. There were some test policies that could not be
compiled even on 256GB systems, but they can now be compiled on 16GB
systems. The compiler runs two to four times faster and produces policies
that are 30-50% smaller. Lastly, some basic LXC containers integration has
been added to AppArmor.
There are a number of things that are "close to landing", he said. The
AppArmor mount rules, which govern the allowable devices, filesystem types, mount
points, and so on, for mounting are being tested in Ubuntu right now. The
implementation seems solid, but it would be nice to have a Linux Security
Module (LSM) hook for pivot_root(). There are some "nasty things"
that pivot_root() does with namespaces, and the LSM hook could help
The reader-writer locks used by AppArmor have been "finally" converted to
use read-copy-update (RCU), and that will be pushed upstream.
There are also some improvements to policy introspection, including adding
a directory for each profile in a given namespace. The original
introspection interface was procfs-style, but AppArmor has moved to a
sysfs-style interface, which should be more acceptable.
The policy matching engine has been cleaned up and the performance has been
improved. Some of that work has been in minimizing the size of the
policies. A new policy templating tool has been created that will build a
base policy as a starting point for administrators. There has also been
work on a sandbox, similar to the SELinux sandbox, that can dynamically
generate policies to create a chroot() or container-based sandbox
with a nested X server to isolate processes. The last of the near-term
changes is a way to mediate D-Bus access with AppArmor rules, which has been
The final category of features that Johansen presented were those that are
being worked on, but won't be merged soon. Converting the deterministic
finite automata (DFA) used in the matching engine to an extended hybrid
finite automata (eHFA) headed that list. An eHFA provides capabilities
that DFAs don't have including variable matching and back references. The
latter is not something AppArmor is likely to use, but eHFAs do provide
better compression and performance. Another matching engine enhancement is
sharing state machines between profiles and domains, which will improve
memory usage and performance.
Beyond that, there are plans to add a "learning mode", similar to SELinux's
audit2allow, so that policies can be created from the actions of running
programs. Adding more mediation is also being worked on, including
handling environment variable filtering, inter-process communication (IPC),
and networking. Internally labeling files and other objects, so that the
matching engine does not need to run again for objects that have
been recently accessed is also on the horizon.
In a short presentation, David Howells gave an update on the key management
subsystem in the kernel. Over the last year, the subsystem has made better
use of RCU, which will improve the scalability when using keys. In
addition, the kernel keyrings have been "made more useful" by adding
additional keyring operations such as invalidating keys and clearing
keyrings. The latter is useful for clearing the kernel DNS resolver cache,
A logon key type has been added to support CIFS multi-user mounts. That
cannot be read from user space, so that the keys cannot be divulged to
attackers (e.g. when the user is away from the system). The lockdep
(kernel locking validator)
support has been improved, as has the garbage collector. There is now just
one garbage collector, rather than two, and a deadlock in garbage
collection has been
fixed as well.
In the future, a bug where the GNOME display manager (gdm) hangs in
certain configurations will be fixed. The problem stems from a
limitation in the kernel that does not allow session keyring manipulation
from multithreaded programs. Support for a generic "crypto" key type will
also be added to support signed kernel modules.
Eric Paris prefaced his presentation by explaining that he works
on the kernel and user-space pieces of SELinux—he is "not a policy writer"—so he would be focusing on
those parts in his talk.
There have been some interesting developments in the use of SELinux over
the past year, including Red
Hat's OpenShift project
that allows multiple users to develop web applications on a single box.
SELinux is used to isolate those users from each other. In addition, he
noted the SELinux-based secure Linux
containers work that provides a "super lightweight" sandbox using
containers. "Twiddle one bit", he said, and that container-based sandbox can be converted to use KVM
Historically, SELinux has focused on containing system daemons, but that is
changing somewhat. There are a couple of user programs that are being
contained in Fedora, including running the Nautilus thumbnailing program
in a sandbox. In addition, Firefox and its plugins now have SELinux
policies to contain them for desktop users.
RHEL 5 and 6 have also received Common Criteria
certification for the virtualization profile using QEMU/KVM. SELinux
enforcement was an important part of gaining that certification.
Paris said that systemd has become SELinux-aware in a number of ways. He
likes the new init system and would like it to have more SELinux integration
in the future. The socket activation mechanism makes it easy to launch a
container on the first connection to a web port, for example. Systemd
handles launching the service automatically, so that you don't need to run
the init script directly, nor are "run-init games" needed. It is also much
easier to deal with daemons that want to use TTYs, he said. Using SELinux
enforcement in systemd means that an Apache server running as root would
not be able
to start or stop the MySQL server, or that a particular administrator would
only be able to start and stop the web server, but not the database server.
The named file
transitions feature (filename_trans) was "a little bit contentious"
when it got added to SELinux, but it "ended up being brilliant", Paris
said. The feature took ideas from AppArmor and TOMOYO and helps avoid
mislabeling files. In addition to the standard SELinux labels for objects,
policies can now use the file name to make decisions. It is just the name
of the file, not the full path that "Al Viro says doesn't exist", but it
allows proper labeling decisions to be made.
For example, the SSH daemon will create a .ssh directory when a
user sends their keys to the system using something like
ssh-copy-id. But, without filename_trans, SELinux
would have no way to know what label to put on that directory, because it
couldn't tell if it was creating .ssh or some other directory (e.g. a
directory being copied from the remote host). There used to be a daemon
that would fix the label but that was a "hacky" solution. Similarly,
SELinux policies can now distinguish between accesses to
resolv.conf and shadow. 90% of the bugs reported for
SELinux are because the label is wrong, he said, and filename_trans will
help alleviate that.
There has also been a split in the SELinux policy world. The upstream
maintainers of the core SELinux policies have been slower to adopt changes
because they are concerned with "hard security goals". That means that it
can take a lot of time to get changes upstream. So, there is now a
"contrib" set of policies that affect non-core pieces. That reduces the
amount of "messy policy" that Dan Walsh has to fix for Fedora and RHEL.
Shrinking the policies is another area that has been worked on. The RHEL 6
policy is 6.8MB after it is compiled down, but the Fedora 18 policy has
shrunk to 4.8MB. The unconfined user policies were removed, as were some
duplicate policy entries, which resulted in further space savings. There
are "no real drawbacks", he said, as the new policies can do basically the
same things as the old in 65% less space.
But there are also efforts to grow the policies. There are "hundreds of
daemons and programs" that now have a default policy, which have been
incorporated into the Fedora policies. The 65% reduction number includes
"all the new stuff we added", he said.
Paris finished his talk by joking that "by far the most interesting"
the SELinux world recently was the new SELinux stickers that he handed out
to interested attendees.
The work on the integrity subsystem started long ago, but a lot of it has
been merged into the mainline over the years, Mimi Zohar said to begin her
report. The integrity measurement architecture (IMA) has been merged in
several pieces, starting with IMA-measurement in 2.6.30, and there is still
more to come. For example, IMA-appraisal
should be merged soon, and the IMA-directories patches have been posted for
review. In addition, digital signature support has been added for the IMA
file data measurements as well as for the extended verification module (EVM)
file metadata measurements. Beyond that, there is a patch to audit the log
file measurements that is currently in linux-next.
The integrity subsystem is going in two directions at once, Zohar said. It
is extending Trusted Boot by adding remote attestation, while also
extending Secure Boot with local integrity measurement and appraisal.
There is still more work to be done, of course.
Support for signing files (including kernel modules) needs to be added to
distributions, she said. There is also a need to ensure that anything that
gets loaded by the kernel is signed and verified. For example, files that
are loaded via the request_firmware() interface may still need to
The kernel build process also needs some work to handle signing the kernel
image and modules. For users who may not be interested in maintaining a
key pair but still want to sign their kernel, an ephemeral key pair can be
created during the build. The private key can be used to sign the image
and modules, then it can be discarded. The public key needs to be built
into the kernel for module verification. There is also a need for a safe
mechanism to store that public key in the UEFI key database for Secure
Boot, she said.
The TOMOYO LSM was added in the 2.6.30 kernel as an alternative mandatory
access control (MAC) mechanism, maintainer Tetsuo Handa said. That was based on
version 2.2 of TOMOYO, and the 3.2 kernel has been updated to use TOMOYO
3.5. There have been no major changes to TOMOYO since the January release
Handa mostly wanted to discuss adding hooks to the LSM API to protect
attacks. Those hooks would also allow TOMOYO to run in parallel
with other LSMs, he said. By checking the binfmt handler
permissions in those hooks, and possibly sanitizing the arguments to the
handler, one could thwart some kinds of shellcode execution. James Morris
and others seemed somewhat skeptical about that approach, noting that
attackers would just adapt to the restrictions.
Those hooks are also useful for Handa's latest project, the CaitSith [PDF]
LSM. He believes that customers are finding it too difficult to configure
SELinux, so they are mostly disabling it. CaitSith is one of a number of
different approaches he has tried (including TOMOYO) to attack that problem.
In a talk entitled "Smack veers mobile", Casey Schaufler looked at the
improvements to the LSM, while pointing to the mobile device space as
one of its main users. The security models in the computing industry are
changing, he said. Distributions, users, files, and system administrators
are "out", while operating systems, user experience, apps, and resources
are "in". That shift is largely caused by the recent emphasis on mobile
For Smack, there have been "a few new things" over the last year. There is
now an interface for user space to ask Smack to do an access check, rather
than wait for a denial. One can write a query to /smack/access,
then read back the access decision. Support for the SO_PEERCRED
option to getsockopt() for Unix domain sockets has been added.
That allows programs to query the credentials of the remote end of the
socket to determine what kind of privileges to give it.
If a parent and child process are running with two different labels, there
could be situations where the child can't signal its death to the parent.
That can lead to zombie processes.
It's only "humane" to allow the child to notify the parent, so that has
There is also a new mechanism to revoke all of the rules for a given
subject label. Tizen was trying to do this in a library, but it required
reading all of the rules in, then removing each. Now, using
/smack/remove-subject, that can be all be done in one operation.
The length of Smack labels has increased again. It started out with a
seven-character limit, but that was raised earlier to 23 characters in
support of labeled networking. It turns out that humans don't generally
create the labels, he said, so the limit has now been raised to 255
characters to support generated label names. For example, the label might
include information on the version of an app, which app store it came from,
and so on. Care must be taken, as there needs to be an explicit mapping from Smack labels to
network labels (which
are still limited to 23 characters by the CIPSO
There is now a "friendlier" rule setting interface for Smack. The original
/smack/load interface used a fixed-length buffer with an explicit
format, which caused
"complaints from time to time". The new /smack/load2 interface uses
white space as a separator.
"Transmuting" directories is now recursive. Directories can get their label
either from their parent or from the process that creates them, and when
the label changes, those changes now propagate into the children. Schaufler
originally objected to the change, but eventually "figured out that is was
better" that way, he said.
The /smack/onlycap mechanism has been extended to cover
CAP_MAC_ADMIN. That means that privileged daemons can still be
forced to follow the Smack rules even if they have the
CAP_MAC_ADMIN capability. By writing a Smack label to
/smack/onlycap, the system will be configured to only allow
processes with that label to circumvent the Smack rules. Previously, only
CAP_MAC_OVERRIDE was consulted, which would allow processes to get
around this restriction.
The Smack rules have been split into multiple lists based on the subject
label. In the past, the Smack rule list could get rather long, so it took
a long time to determine that there was no rule governing a particular
access. By splitting the list, a 30-95% performance increase was realized
on a 40,000 rule set,
depending on how evenly the rules split.
Some cleanup has been done to remove unnecessary locking and bounds
checks. In addition, Al Viro had "some very interesting things to say"
Smack fcntl() implementation. After three months, he finally
settled down, reread the message, and agreed with Viro's assessment. Those
problems have now been fixed.
Schaufler said that he is excited by the inclusion of Smack as the MAC
solution for the Tizen distribution. He is "very much involved" in the
Tizen project and looks forward to Smack being deployed in real world
There are some other things coming for Smack, including better rule list
searching and true list entry removal. Right now, rules that are removed
are just marked, not taken out of the list, because there is a "small
matter of locking" to be resolved. Beyond that, there is probably a
surprise or two lurking out there for new Smack features. If someone can
make the case for a feature, like the often requested multiple labels
feature, it may just find its way into Smack in the future.
Kees Cook's Yama LSM was named after a
Buddhist god of the underworld who is the "ruler of
the departed". It started as an effort to get some symbolic link restrictions added to the
kernel. Patches to implement those restrictions had been floating around
since at least 1996, but had never been merged.
Those restrictions are now available in the kernel in the form of the Yama
LSM, but the path of getting them into the mainline was rather tortuous.
Cook outlined that history, noting that his original submission was
rejected for not being an LSM in May 2010. In June of that year, he added
some hardlink and ptrace() attach restrictions to the symlink
changes and submitted it as the Yama LSM. In July, a process relationship
API was added to allow the ptrace() restrictions to be relaxed for
things like crash handlers, but Yama was reverted out of the security-next
tree because it was an LSM. Meanwhile, the code was released in
Ubuntu 10.10 in October and then in ChromeOS in December 2011.
Eventually, the LSM was "half merged" for the 3.4 kernel. The link
restrictions were not part of that, but they have subsequently been merged
into the core kernel for 3.6. Those restrictions are at least 16 years old, Cook said, which
means they "can drive in the US". He was able to get the link restrictions
into the core by working with Al Viro, but he has not been able to get the
ptrace() restrictions into the core kernel, which is where he
belong. James Morris noted that none of the core kernel developers "like
"some actively hate it", which makes it hard to get these kinds of changes
into the core—or sometimes upstream at all.
In the future, Cook would like to see some changes in the kernel module loading
path to support ChromeOS. Everyone is talking about signing modules, but
ChromeOS already has a protected root partition, he said. If
(or a new interface) could get information about where in the filesystem
a module comes from, that would solve his problem. He also mentioned the
perennial LSM stacking topic,
noting that Ubuntu and other distributions are hardcoding Yama stacking to
get the ptrace() restrictions, so maybe that will provide impetus
for a more general stacking solution—or to move the ptrace()
restrictions into the core kernel.
[ Slides for many of the subsystem reports, as well as the rest of the
presentations are available on the LSS
schedule page. ]
to post comments)