Brief items
The current development kernel is 3.6-rc6,
released on September 16. "
Fairly
normal statistics: two thirds drivers, with the remaining third being a mix
of architecture updates, filesystems (gfs2 and nfs) and random core stuff
(scheduler, workqueue, stuff like that)." Linus is hoping to pull
the final 3.6 release together before too much longer.
Stable updates: 3.0.43, 3.4.11, and 3.5.4 were released on September 14 with the
usual set of important fixes.
Comments (none posted)
First of all, it's a mess. Shame on me. Shame on you. Shame on
all of us for allowing this mess. Let's all tremble in shame for
solid ten seconds before proceeding.
—
Tejun Heo
Cc: Horses <stable@vger.kernel.org>
—
Alan Cox
People who run git trees prefer fixup patches due to extensive
lameness.
—
Andrew Morton
Comments (none posted)
Kernel development news
Over the last few years, the Linux kernel has added features to measure the
integrity of files on disk to protect against offline attacks. The
integrity measurement architecture (IMA) was added in the 2.6.30 kernel,
and other pieces have followed, but the job is not done. Dmitry Kasatkin
gave a presentation at the 2012
Linux Security Summit (LSS) on an extension to
the integrity subsystem to handle the contents of directories as well as
various
special files.
Integrity protection is needed to prevent attackers from
altering the contents of a filesystem without the kernel's awareness, by
removing the
disk or booting into an alternative operating system. Runtime integrity
is already handled by the existing access control mechanisms, Kasatkin
said. Those include discretionary access control (DAC) mechanisms like the
traditional Unix file permissions or mandatory access control (MAC) schemes
such as those
provided by SELinux or Smack. But those mechanisms rely on trusting the
access control metadata (e.g. permissions bits or security extended
attributes), which can be tampered with in an offline attack.
IMA measures the integrity of files by calculating a cryptographic hash
over the file contents, which is stored in the security.ima
extended attribute (xattr). IMA can also be used in conjunction with a
Trusted Platform Module (TPM) to remotely attest to the integrity of the
running system.
The extended verification module (EVM) was added in 3.2 to protect the
inode metadata of files against offline attacks. That metadata includes the
security xattrs (including
those for SELinux and Smack along with security.ima), mode (permissions),
owner, inode number, etc. Once again, a hash of the values is used, and
EVM stores that as the security.evm xattr on the file.
The digital signature extension was added in the 3.3 kernel to allow the
IMA and EVM xattrs to be signed. In addition to storing a hash value in the
xattrs, a digital signature of the hash value can also be stored and verified.
The IMA-appraisal feature, which Kasatkin said is being targeted for 3.7,
will inhibit access to files whose IMA hash does not match the contents
(i.e. the file has been changed offline). There were some locking problems that prevented IMA-appraisal
from being merged earlier, but those have been resolved.
But, all of those pieces don't add up to everything needed for real
integrity protection, Kasatkin said. While EVM protects the inode metadata
and IMA protects the contents of regular files, there is a missing piece:
file names. In Linux, the inode does not contain the file name, as it
lives in the directory entries, and the association between a file name and
an inode is not protected.
The result is that files can be deleted, renamed, or moved in an offline attack
without being detected by the integrity subsystem. In addition, symbolic links and
device nodes are currently unprotected, which means that those files can be
added, modified, or removed offline without detection. Various attacks are
possible via
changing directory entries, he said. One could delete a file required for
booting, or restore a backup version (and associated security xattrs) of a
program with known vulnerabilities.
Using two virtual machines, Kasatkin simulated an offline attack by
creating files in one VM, then mounting the disk in the other VM and
changing some of the files. With the existing integrity code (including
IMA-appraisal), he was unable to access files with changed contents in the
original VM, but had no problems accessing files that had been renamed or
moved (nor were deleted files detected).
That problem leads to the directory and special file integrity protection
that he
has proposed. For directories, two new
hooks, ima_dir_check() and ima_dir_update(), would be
added. The former would be called during path lookup (from
may_lookup()) and would deny access if any directory entries in
the path had
been unexpectedly altered. When directories are updated in the running system,
ima_dir_update() would be called to
update the integrity
measurement to reflect those changes.
The implementation of the verification starts from the root inode during a
path lookup. Nothing happens when the filesystem is mounted, the
verification is done lazily during file name lookup. Whenever a dentry
(directory cache entry) is allocated for a directory, a call is made to
ima_dir_check() to verify it. This proposed callback
does not
break RCU path walk, so it should not cause
scalability problems on larger machines. The
integrity measurement is calculated with a hash over the list of entries
in the directory, using the inode number, name, type, and offset values for
each, and storing the result in security.ima on the directory
(which is then protected with EVM).
For special files, like symbolic links and device nodes,
there is one new hook that has been added: ima_link_check().
It is called during path lookup (follow_link()) and for the
readlink() system call. The
measurement is a hash of the target path for symbolic links or the major
and minor device numbers for device nodes. Once again, those values are
stored in security.ima and are verified before access.
The user-space tools used to set the integrity measurements for image
creation also need updating to support the new features. The
evmctl command (part of the ima-evm-utils package) has added the ability to set the reference hashes
for directories and special files.
Kasatkin then demonstrated the integrity protections of the new code. If a
file is moved or removed, the directory holding the file can no longer be
accessed, so commands like ls or cd fail with an
EPERM. He also presented performance numbers that showed relatively
modest decreases compared to IMA/EVM without the directory and special file
handling code, but more substantial declines when compared to not having
IMA/EVM enabled at all. Interestingly, though, both flavors of IMA/EVM
performed better on a file copy test than did a disk encrypted using
dm-crypt. Disk encryption is another way to thwart offline attacks, of course.
It would seem that the kernel integrity subsystem is approaching
"completion". The final pieces of the puzzle are now available; Kasatkin
and others are hopeful they will be acceptable upstream soon, though he did
note that the VFS developers had not yet reviewed the most recent patch
set. For those that need this kind of protection for Linux, though, the
wait may nearly be over.
Comments (6 posted)
Day two (28 August) of the 2012 Kernel Summit included a
day-long minisummit entitled "memcg minisummit"
chaired by Ying Han and Johannes Weiner. Ying noted
that the original minisummit title was something of a misnomer, since
it had grown in scope to cover both memory
control groups (memcg) and memory-management (mm) topics generally.
The session began with a statement that it was assumed that everyone in
the room was familiar with previous discussions on the topics to be
discussed. (Some of these previous discussions took place in the April
2012 LSF/MM meeting. Coverage of that event can be found in LWN articles
here and here.) Given the context of the summit, this
assumption was considered reasonable by everyone, though readers without a
memory-management background may find the record of the discussion a little
hard to follow at times.
Except for one very brief topic, coverage of the various sessions is
split out into separate articles. The topics covered were as follows:
- Improving kernel-memory accounting for
memory cgroups; some users need better accounting of kernel-memory
usage inside cgroups (control groups), in order to to prevent poorly
behaved cgroups from exhausting system memory.
- Kernel-memory shrinking; a
discussion stemming from Ying Han's patches to implement a per-cgroup slab
shrinker.
- Improving memory cgroups performance
for non-users; how do we resolve the problem that the
current memcg implementation has a performance impact even when memory
cgroups are not being used?
- Memory-management performance
topics; short discussions of various performance and
scalability topics.
- Hierarchical reclaim for memory
cgroups; what is the best way to reclaim memory from soft-limited
trees of memory cgroups when the system is under memory pressure?
- Reclaiming mapped pages; toward
improving reclaim of mapped pages to handle a wider variety of workloads.
- Volatile ranges; looking at
various ideas on improving the implementation of this proposed kernel
feature.
- Memory-management patches work: Michal Hocko briefly
discussed the origin of the memcg-devel tree. This tree has
evolved into being a general memory-management development tree that is not
rebased like linux-next, but instead takes a mainline release from
Linus Torvald's tree and applies Andrew Morton's patches against them.
This gives memory-management developers a common, relatively stable ground
to implement against. The tree already has a few users and they seem to be
happy so far. (Since the meeting, the
tree has been moved to kernel.org, and renamed from
memcg-devel to mm.)
-
Moving zcache toward the mainline; what
are the barriers to getting the compressed cache feature merged?
- Dirty/writeback LRU; a discussion
of Fengguang Wu's proposal to split the file LRU list into clean
and dirty lists.
- Proportional I/O controller; two
proposed solutions to improve its performance for cgroup workloads.
- Shared-memory accounting in memory
cgroups; dealing with some scenarios where memory cgroups are unfairly
charged for memory usage.
- NUMA scheduling; a discussion of
competing patch sets that implement this feature.
By and large, this was considered a successful meeting by the
memory-management developers in attendance. Ying Han kept everyone on
track and the meeting to schedule, and each of the topics were discussed in
detail; good progress was made on many issues, and the participants gained
insights into several issues that will affect an increasing number of users
in the future. Hopefully, some of the remaining issues will now be more
easily resolved on mailing lists.
[Michael Kerrisk would like to thank Fengguang Wu, Glauber Costa,
Johannes Weiner, Michal Hocko, and especially Mel Gorman for assistance
with the write-up of the minisummit.]
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>