Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.4-rc3, released on November 29. Linus said: "I don't think there's anything particularly exciting, although that obviously depends on whether some particular issue ended up affecting you or not. Most of it is pretty tiny random fixups."

Previously, 4.4-rc2 came out on November 22.

Stable updates: none have been released in the last two weeks.

Comments (none posted)

Quotes of the week

There are two uses for a selfie stick. The second is to allow people over the age of 40 to read the text on their phone without wearing glasses.

— Alan Cox

We don't plan for "future" development other than a single patch series, as we have no idea what that development is, nor if it will really happen.

— Greg Kroah-Hartman

I'm glad I'm not the only one who can screw up a trivial one-line fix to this damned function. ;-)

I'm still leading with three stupid mistakes over your one though.

— A challenge from Arnd Bergmann

Comments (none posted)

Post-init read-only memory

By Jonathan Corbet
December 2, 2015

At the 2015 Kernel Summit, the assembled developers discussed the idea of incorporating more security-hardening patches into the kernel. As part of that effort, it was agreed that taking another look at the out-of-tree grsecurity patches made sense. The first fruit from this work would appear to be the post-init read-only memory patch set from Kees Cook. This work has been received well, but it also highlights some of the difficulties involved with hardening a general-purpose kernel.

The key to a successful exploit is often convincing the kernel to write to an unintended location. See, for example, this recent exploit, which uses a driver bug to overwrite a portion of the vDSO area; that, in turn, enables an attacker to run arbitrary code in kernel mode. One way to defend against such attacks is to minimize, to the greatest extent possible, the memory that the kernel is allowed to write to. A number of techniques, from simply marking data read-only to supervisor-mode access prevention, can be deployed toward that end. There is one class of data, identified by the grsecurity developers, that current techniques overlook, however.

When the kernel boots, it sets up a vast array of data structures describing the hardware it runs on and much more. In many cases, those data structures will never be changed again but, since they are resident in writable memory, they can still be changed by an errant write operation. The post-init read-only memory patch set, as posted by Kees, allows these data structures to be marked with a special __read_only annotation. That will cause them to be placed into a separate ELF section (".data..read_only"). Once the kernel has finished the initialization process, all data found in that section will be marked read-only, never to be changed again. At that point, exploits like the vDSO overwrite linked above will no longer work.

This change seems like an obvious win: unchanging data is marked read-only, blocking known exploits and, perhaps, minimizing the impact of simple bugs as well. As an added bonus, read-only data will be kept together, leading to better cache behavior. It would appear to be an obvious candidate for merging in the near future. That will probably come to pass, but, first, an important question has to be answered: what should happen when the hardware catches an attempt by the kernel to write (post initialization) memory that had been marked __read_only?

When things go wrong

This question matters because there is a potential hazard whenever a data structure is marked __read_only: the developer involved may have overlooked the one case where, after a rare sequence of events on days with a waxing gibbous moon, that data structure must be changed. Or there may be a case where data structures are modified unnecessarily, perhaps storing data that is already there anyway. Such cases work in current kernels, but would break if the data being written were made read-only. Mathias Krause described one such experience, wherein the system would fail during the resume sequence. As he noted: "Debugging that kind of problem is sort of a PITA, you could imagine."

The ideal solution would be to have the compiler catch attempts to modify __read_only data outside of the initialization sequence, but that is not currently possible. Simply marking the relevant data structures const will not work; those data structures are written to during boot and, as PaX Team pointed out, making them const opens the door to all kinds of surprising, optimization-related behavior from the compiler. Where compilers are involved, surprising behavior is rarely a good thing. As an alternative, Mathias suggested the use of a special-purpose GCC module to detect inappropriate writes. There seems to be agreement that this is a good idea, but no such module exists and it will take time to create one. Holding this patch set until a checker module can be created seems undesirable.

But without such a checker, there will almost certainly be situations where the kernel tries to write to something marked __read_only, either because it was so marked in error or as the result of some other bug. There have been a number of ideas put forward on how such problems could be handled.

The most obvious thing to do is to simply oops the kernel, with the usual results for the process that was running and, perhaps, the machine as a whole. Andy Lutomirski supported this approach, saying: "We failed, we might be under attack, let's oops." The problem with this approach, of course, is that it takes the machine out of commission, possibly with an error that is less than fun to try to track down. Ingo Molnar also worried that the oops information would, in most desktop cases, never be seen by the user and, as a result, would never be reported to developers. That highlights an old problem with presenting such information on desktop systems, but that problem is unlikely to be fixed right now.

The alternative to oopsing the system would be to log the error and somehow try to continue. Ingo suggested simply skipping over the offending instruction and trying to continue, but that idea did not go far; as PaX Team pointed out, simply dropping an intended write operation could create no end of strange problems further down the line and may actually help exploit attempts. Linus suggested, instead, that the kernel could mark the relevant page writable and retry the instruction. That would, of course, remove the read-only protection from that page, but it would allow the system to continue to operate while generating diagnostic information for developers. One would probably not want things to work this way on a production system, but it could be an invaluable option for developers.

The final piece of the puzzle might be to have a kernel command-line operation to disable the read-only marking entirely. That would provide an option to users who run into a bug and need to be able to get their work done until a proper fix is available.

Kees has indicated that his current approach is to take the kill-the-machine approach by default. He has already implemented the command-line option, and said that Linus's "mark the page writable" suggestion would not be difficult to add. So the next version of the patch should have addressed most of the concerns expressed so far. Getting it merged may prove to be the easy part, though; the task of identifying and marking truly read-only data could be a long and error-prone affair, even when starting with the work that the grsecurity developers have already done. The good news is that this work should make the kernel more secure, provide a (perhaps imperceptible) performance improvement, and turn up a few bugs along the way.

Comments (13 posted)

TLS in the kernel

By Jake Edge
December 2, 2015

An RFC patch from Dave Watson at Facebook proposes moving the bulk of Transport Layer Security (TLS) processing into the kernel. There are a number of advantages he sees for doing so, but most of the commenters on the patch set seem a bit skeptical about the idea. TLS is, of course, the encryption layer that protects HTTPS and other internet protocols.

The patch set implements RFC 5288 encryption for TLS, which is based on the 128-bit advanced encryption standard (AES) using Galois counter mode (GCM)—also known as "gcm(aes)". That accounts for roughly 80% of the TLS connections that Facebook sees, Watson said. The idea is for the kernel to handle the symmetric encryption and decryption, while leaving the handshake processing to user space. The feature uses the user-space API to the kernel's crypto subsystem, which is accessed via sockets created using the AF_ALG address family.

The basic idea is that an AF_ALG socket and a regular TCP socket are both created. The TCP socket is used to do the handshake with the remote endpoint, which establishes keys and such. The keys (one each for sending and receiving) are passed to the crypto socket using setsockopt(). An operational socket is also created by making an accept() call on the crypto socket. That socket is used in further processing, including setting the initialization vectors (IVs) using sendmsg() and control messages created using CMSG. There are also two IVs, one for each direction. In addition, the file descriptor for the TCP socket is passed to the operational socket in a control message; the application will then read and write data from the operational socket. Watson pointed to an example C program that uses the new facility.

That approach has a number of benefits, according to Watson. Using some additional code that was not part of his submission, he said the in-kernel TLS showed 2-7% better performance than the equivalent done in user space. The idea was inspired by some work [PDF] that Netflix did on FreeBSD to improve the performance of TLS. In addition, two other features could benefit from having TLS in the kernel, he said. The kernel connection multiplexer (KCM) needs access to unencrypted data in the kernel, which this would provide; offloading TLS encryption and decryption to NICs would also require TLS framing support in the kernel.

But Hannes Frederic Sowa questioned two of those advantages. He believes that the existing facilities provided by Linux already do less copying than those that FreeBSD provides, so he suggested comparing the in-kernel approach with a user-space implementation using mmap() and vmsplice() on the TCP socket. Beyond that, he noted that kernel developers have been strong opponents of TCP-offloading efforts. In order to provide TLS offloading, a NIC would also need to handle the TCP layer, so it would effectively be doing TCP offloading as well.

Crypto maintainer Herbert Xu was a bit surprised at the approach. While he can see that using AF_ALG makes sense as a way to export TLS functionality to user space, it's not the way he might have approached it:

However, I must say that it wouldn't have been my first pick. I'd imagine a TLS socket to look more like a TCP socket, or perhaps a KCM socket as proposed by Tom.

But Watson noted that handling out-of-band (OOB) data is one reason to not just layer TLS on top of a TCP socket. TLS transfers data beyond just the data being sent by the application, for things like alerts or to change the cipher being used, but a TCP socket lacks an easy way to signal the reception of that kind of data. In Watson's patches, the crypto socket returns an error in that situation and user space can then read the OOB data from the TCP socket if it wishes.

But others also questioned the value of having TLS in the kernel at all. Modern processors provide user-space programs with access to accelerated crypto instructions directly, without a need for kernel intervention. There is some crypto-acceleration hardware out there, where there might be some benefit to having TLS in the kernel, but it has mostly fallen by the wayside because of better processor support for crypto. As Sowa put it:

There are some crypto [accelerators] out there so that putting tls into the kernel would give a net benefit, because otherwise user space has to copy data into the kernel for device access and back to user space until it can finally be send out on the wire.

Since processors provide aesni and other crypto extensions as part of their instruction set architecture, this, of course, does not make sense any more.

Overall, it looks like it will take some more convincing arguments before putting TLS in the kernel will be seriously considered. For some specialized situations, it might make sense to do so, but even the limited version Watson posted adds more than 1200 lines of code to the kernel—for dubious gains. Over time, more and more crypto has been added to the kernel, though, so maybe TLS will eventually find its way in too.

Comments (21 posted)

SOCK_DESTROY: an old Android patch aims upstream

By Jonathan Corbet
December 2, 2015

TCP is a patient protocol; if a remote peer stops responding, it will wait a long time (measured in minutes, by default) in the hope that connectivity will eventually return. Sometimes, however, that wait is undesirable; that is especially true when it is known that the connection will not be coming back, but that the establishment of a replacement connection may succeed. As it happens, mobile networking often presents such situations. The SOCK_DESTROY patch set from Lorenzo Colitti is an attempt to improve the user experience in such situations. It fills a clear need, but has run into some opposition anyway; it also shows that the rift between the Android and kernel projects has not yet been entirely closed.

Imagine, for a moment, a user streaming $SPORTING_EVENT on a phone handset over a WiFi connection. Said user walks out the door, away from the WiFi network's coverage; that will cause the stream to freeze, probably at the beginning of the bit of action that decides the entire game. The WiFi connection is gone and is not coming back, but the streaming application does not know that, so it will wait a long time, in vain, for data to show up on its network socket. After several minutes, the connection will time out. The application will then realize that it has been disconnected and will try to reconnect; that new connection, going over the phone's broadband interface, will succeed. Streaming recommences, and our poor user gets to watch the post-game sportscasters talking about the one-of-a-kind play that happened while the stream was frozen. The resulting handset-destroying rage could have been avoided if the application had not waited for the network timeout to occur.

There are other scenarios that can create similar problems; placing a system onto a virtual private network (VPN) is another example. When this kind of network change occurs, things would work better if applications knew immediately that their open connection was never going to produce another packet. There are a number of ways this information could be conveyed, but one of the more straightforward ways would be to simply close the socket, returning an error to the application. That is what the SOCK_DESTROY patch set makes possible.

In particular, it adds a SOCK_DESTROY to the netlink-based "socket diag" mechanism, first added to the kernel in the 3.10 development cycle. A suitably privileged process (CAP_NET_ADMIN is required) can use this operation to close an arbitrary socket owned by another process; that process will see an ETIMEDOUT error. As it happens, that error is the same that is returned when a socket times out, but the pain of actually waiting for the timeout has been taken away. Any application that is prepared for such errors (and applications running in mobile environments, at least, should be) should recover and reconnect with no changes required.

As it happens, the Android kernel has had this capability since 2008, though in a different form: Android currently supports an ioctl() command called SIOCKILLADDR. This patch set is an attempt to move this capability upstream, cleaning it up a bit along the way. The fact that this feature has been shipped with Android suggests that there is a real need for it, but a number of concerns were raised anyway.

Tom Herbert worried that this facility could be used by an administrator to close sockets for any reason and that the affected application would have no way to know that this had happened. He suggested that the error code returned could be changed to ENETRESET, so that an explicit action to close a socket would not be presented as if it were a passive timeout. A later version of the patch set changes the return code to ECONNABORTED, which was chosen to be compatible with what BSD systems do.

Hannes Frederic Sowa suggested that, in some cases, quickly closing a socket in this manner could cause old data to be delivered to the wrong socket. Networking maintainer David Miller agreed with that concern, and suggested an alternative: the closing of sockets could be handled by the operation that disconnects them from the network in the first place. So, for example, the removal of a route associated with a disappeared network could cause any sockets bound to that network to be closed. David made it clear that he wants to have the kernel, rather than user space, in charge of deciding which sockets should be closed.

The problem with that approach, according to Lorenzo, is that the kernel doesn't always have a way to know which sockets have been affected by a networking change. The VPN case, in particular, can muddy the waters considerably. Beyond that, it was pointed out that user space can also force sockets to be closed by killing applications directly or installing special firewall rules. The new operation just makes this kind of action easier to carry out. Lorenzo did, however, change the patch to send a reset (set the RST bit) to the peer when a socket is closed as a way of reducing the chances of protocol confusion.

Eric Dumazet came in with a request that the change be merged. He noted that: "Every time I make a change in linux TCP stack, this code breaks, and this a real pain because Android changes need to be carried over to vendors." Getting the SOCK_DESTROY patch merged would spare him the phone calls and allow him to get more work done on the rest of the networking code. He also noted that the commonly suggested alternative of having applications do their own keep-alive processing is not really viable in the mobile environment for a couple of reasons.

Finally, Eric pointed out that TCP is competing with the QUIC protocol in the mobile space. QUIC is based on UDP and can react quickly to changes in the networking environment; without a similar ability to react, he said, TCP is not competitive.

David then complained that the Android developers still do not really care about the upstream kernel — a complaint that your editor still occasionally hears over beer at conferences. The fact that Android has been carrying this patch for something like seven years does not, in his mind, constitute a reason to merge it quickly into the mainline. Indeed, he said, Android's developers should be prepared to wait for a while as the patch's merits are considered:

You have been considering this non-stop for whatever time you have been working on this, everyone else is now considering and thinking about this for the first time right now. Therefore you must be understanding and patient. Just like I've been patiently waiting for my Nexus 6 to be updated to something newer than 2+ year old kernel technology.

Lorenzo responded that he would like to see things change in this area, with more Android code going upstream. The posting of the SOCK_DESTROY patch set was a part of the effort to bring that about. Almost everything that the Android networking group has done in the last two years has been sent upstream, he said.

As was recently discussed at the 2015 Kernel Summit, Android-based devices run a lot of out-of-tree code; indeed, they may be running more out-of-tree code than upstream code. The portion of that code contained within the Android project's repositories is relatively low, though, and there does appear to have been an effort to reduce it in recent years. But it's clear that some resentment remains in the kernel development community. In the end, though, that resentment is unlikely to prevent the merging of needed functionality. By the time it gets upstream, this feature may or may not look like SOCK_DESTROY, but it can be expected to do something similar. Mobile devices are not going away and the kernel community, in the end, wants to support them as well as possible.

Comments (11 posted)

A journal for MD/RAID5

November 24, 2015

This article was contributed by Neil Brown

RAID5 support in the MD driver has been part of mainline Linux since 2.4.0 was released in early 2001. During this time it has been used widely by hobbyists and small installations, but there has been little evidence of any impact on the larger or "enterprise" sites. Anecdotal evidence suggests that such sites are usually happier with so-called "hardware RAID" configurations where a purpose-built computer, whether attached by PCI or fibre channel or similar, is dedicated to managing the array. This situation could begin to change with the 4.4 kernel, which brings some enhancements to the MD driver that should make it more competitive with hardware-RAID controllers.

While hardware-RAID solutions suffer from the lack of transparency and flexibility that so often come with closed devices, they have two particular advantages. First, a separate computer brings dedicated processing power and I/O-bus capacity which takes some load off the main system, freeing it for other work. At the very least, the system CPU will never have to perform the XOR calculations required to generate the parity block, and the system I/O bus will never have to carry that block from memory to a storage device. As commodity hardware has increased in capability and speed over the years, though, this advantage has been significantly eroded.

The second advantage is non-volatile memory (NVRAM). While traditional commodity hardware has not offered much NVRAM because it would hardly ever be used, dedicated RAID controllers nearly always have NVRAM as it brings real benefits in both performance and reliability. Utilizing NVRAM provides more than just the incremental benefits brought by extra processing components. It allows changes in data management that can yield better performance from existing devices.

With recent developments, non-volatile memory is becoming a reality on commodity hardware, at least on server-class machines, and it is becoming increasing easy to attach a small solid-state storage device (SSD) to any system that manages a RAID array. So the time is ripe for MD/RAID5 to benefit from the ability to manage data in the ways that NVRAM allows. Some engineers from Facebook, particularly Shaohua Li and Song Liu, have been working toward this end; Linux 4.4 will be the first mainline release to see the fruits of that labor.

Linux 4.4 — closing the RAID5 write hole

RAID5 (and related levels such as RAID4 and RAID6) suffer from a potential problem known as the "write hole". Each "stripe" on such an array — meaning a set of related blocks, one stored on each active device — will contain data blocks and parity blocks; these must always be kept consistent. The parity must always be exactly what would be computed from the data. If this is not the case then reconstructing the data that was on a device that has failed will produce incorrect results.

In reality, stripes are often inconsistent, though only for very short intervals of time. As the drives in an array are independent (that is the "I" of RAID) they cannot all be updated atomically. When any change is made to a stripe, this independence will almost certainly result in a moment when data and parity are inconsistent. Naturally the MD driver understands this and would never try to access data during that moment of inconsistency ... unless....

Problems occur if a machine crash or power failure causes an unplanned shutdown. It is fairly easy to argue that the likelihood that an unclean shutdown would interrupt some writes but not others is extremely small. It's not easy to argue that such a circumstance could never happen, though. So when restarting from an unclean shutdown, the MD driver must assume that the failure may have happened during a moment of inconsistency and, thus, the parity blocks cannot be trusted. If the array is still optimal (no failed devices) it will recalculate the parity on any stripe that could have been in the middle of an update. If, however, the array is degraded, the parity cannot be recalculated. If some blocks in a stripe were updated and others weren't, then the block that was on the failed device will be reconstructed based on inconsistent information, leading to data corruption. To handle this case, MD will refuse to assemble the array without the "--force" flag, which effectively acknowledges that data might be corrupted.

An obvious way to address this issue is to use the same approach that has worked so well with filesystems: write all updates to a journal before writing them to the main array. When the array is restarted, any data and parity blocks still in the journal are simply written to the array again. This ensures the array will be consistent whether it is degraded or not. This could be done with a journal on a rotating-media drive but the performance would be very poor indeed. The advent of large NVRAM and SSDs makes this a much more credible proposition.

The new journal feature

The functionality developed at Facebook does exactly this. It allows a journal device (sometimes referred to as a "cache" or "log" device) to be configured with an MD/RAID5 (or RAID4 or RAID6) array. This can be any block device and could even be a mirrored pair of SSDs (because you wouldn't want the journal device to become a single point of failure).

To try this out you would need Linux 4.4-rc1 or later, and the current mdadm from git://neil.brown.name/mdadm. Then you can create a new array with a journal using a command like

    mdadm --create /dev/md/test --level=5 --raid-disks=4 --write-journal=/dev/loop9 \
          /dev/loop[0-3]

It is not currently possible to add a journal to an existing array, but that functionality is easy enough to add later.

With the journal in place, RAID5 handling will progress much as it normally does, gathering write requests into stripes and calculating the parity blocks. Then, instead of being written to the array, the stripe is intercepted by the journaling subsystem and queued for the journal instead. When write traffic is sufficiently heavy, multiple stripes will be grouped together into a single transaction and written to the journal with a single metadata block listing the addresses of the data and parity. Once this transaction has been written and, if necessary, flushed to stable storage, the core RAID5 engine is told to process the stripe again, and this time the write-out is not intercepted.

When the write to the main array completes, the journaling subsystem will be told; it will occasionally update its record of where the journal starts so that data that is safe on the array effectively disappears from the journal. When the array is shut down cleanly, this start-of-journal pointer is set to an empty transaction with nothing following. When the array is started, the journal is inspected and if any transactions are found (with both data and parity) they are written to the array.

The journal metadata block uses 16 bytes per data block and so can describe well over 200 blocks. Along with each block's location and size (currently always 4KB), the journal metadata records a checksum for each data block. This, together with a checksum on the metadata block itself, allows very reliable determination of which blocks were successfully written to the journal and so should be copied to the array on restart.

In general, the journal consists of an arbitrarily large sequence of metadata blocks and associated data and parity blocks. Each metadata block records how much space in the journal is used by the data and parity and so indicates where the next metadata block will be, if it has been written. The address of the first metadata block to be considered on restart is stored in the standard MD/RAID superblock.

The net result of this is that, while writes to the array might be slightly slower (depending on how fast the journal device is), a system crash never results in a full resync — only a short journal recovery — and there is no chance of data corruption due to the write hole.

Given that the write-intent bitmap already allows resynchronization after crash to be fairly quick, and that write-hole corruption is, in practice, very rare; you may wonder if this is all worth the cost. Undoubtedly different people will assess this tradeoff differently; now at least the option is available once that assessment is made. But this is not the full story. The journal can provide benefits beyond closing the write-hole. That was a natural place to start as it is conceptually relatively simple and provides a context for creating the infrastructure for managing a journal. The more interesting step comes next.

The future: writeback caching and more full-stripe writes

While RAID5 or RAID6 provide a reasonably economical way to combine multiple devices to provide large storage capacity with reduced chance of data loss, they do come at a cost. When the host system writes a full stripe worth of data to the array, the parity can be calculated from that data and all writes can be scheduled almost immediately, leading to very good throughput. When writing to less than a full stripe, though, throughput drops dramatically.

In that case, some data or parity blocks need to be read from the array before the new parity can be calculated. This read-before-write introduces significant latency to each request, so throughput suffers. The MD driver tries to delay partial-stripe writes a little bit in the hope that the rest of the stripe might be written soon. When this works, it helps a lot. When it doesn't, it just increases latency further.

It is possible for a filesystem to help to some extent, and to align data with stripes to increase the chance of a full-stripe write, but that is far from a complete solution. A journal can make a real difference here by being managed as a writeback cache. Data can be written to the journal and the application can be told that the data is safe before the RAID5 engine even starts considering whether some pre-reading might be needed to be able to update parity blocks.

This allows the application to see very short latencies no matter what data-block pattern is being written. It also allows the RAID5 core to delay writes even longer, hoping to gather full stripes, without inconveniencing the application. This is something that dedicated RAID controllers have (presumably) been doing for years, and hopefully something that MD will provide in the not-too-distant future.

There are plenty of interesting questions here, such as whether to keep all in-flight data in main memory, or to discard it after writing to the journal and to read it back when it is time to write to the RAID. There is also the question of when to give up waiting for a full stripe and to perform the necessary pre-reading. Together with all this, a great deal of care will be needed to ensure we actually get the performance improvements that theory suggests are possible.

This is just engineering though. There is interest in this from both potential users of the technology and vendors of the NVRAM and there is little doubt that we will see the journal enhanced to provide very visible performance improvements to complement the nearly invisible reliability improvements already achieved.

Comments (88 posted)

Linus Torvalds Linux 4.4-rc3 ?

Linus Torvalds Linux 4.4-rc2 ?

Kamal Mostafa Linux 3.13.11-ckt30 ?

Jiri Slaby Linux 3.12.51 ?

Ben Hutchings Linux 3.2.74 ?

Eric Auger ARM IRQ forward control based on IRQ bypass manager ?

Matthew McClintock arm: qcom: Add support for IPQ8014 family of SoCs ?

Ard Biesheuvel UEFI boot and runtime services support for 32-bit ARM ?

Jens Wiklander ARM SMC Calling Convention interface ?

Geoff Levand arm64 kexec kernel patches v12 ?

Joshua Henderson Initial Microchip PIC32MZDA Support ?

Paul Burton MIPS Boston board support ?

Matt Fleming EFI page table isolation ?

Juri Lelli CPUs capacity information for heterogeneous systems ?

Tom Zanussi tracing: 'hist' triggers ?

Daniel Wagner Simple wait queue support ?

Qais Yousef Implement generic IPI support mechanism ?

serge.hallyn@ubuntu.com CGroup Namespaces (v5) ?

Jessica Yu (mostly) Arch-independent livepatch ?

Lv Zheng ACPICA / debugger: Add in-kernel AML debugger support ?

Andrey Ryabinin UBSAN: run-time undefined behavior sanity checker ?

Pramod Kumar Generalized broadcom cygnus gpio driver ?

Georgi Djakov Add initial support for RPM clocks ?

Adam Thomson ASoC: Add support for DA7217 and DA7218 audio codecs ?

Chris Zhong Add mipi dsi support for rk3288 ?

Andrew F. Davis Add support for the TI TPS65086 PMIC ?

Simon Wood HID: Support for the Logitech G920 Wheel ?

Andy Gross Add QCOM DWC3 Phy support ?

Haibo Chen Add i.mx7d adc driver support ?

Salil Mehta net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem ?

Jisheng Zhang Add Marvell berlin4ct clk support ?

Simon Arlott MIPS: bmips: Add bcm6345-l1 interrupt controller ?

Sinan Kaya dma: add Qualcomm Technologies HIDMA driver ?

MaJun irqchip:support mbigen interrupt controller ?

Lee Jones remoteproc: Add driver for STMicroelectronics platforms ?

Xinliang Liu Add New DRM Driver for HiSilicon hi6220 SoC ?

Rongrong Zou Hisilicon LPC driver ?

Ingi Kim Add RT5033 Flash LED driver ?

Milo Kim Support TI LMU devices ?

Andrew F. Davis Add support for the TI TPS65086 PMIC ?

Sascha Hauer Add Mediatek thermal support ?

igal.liberman@freescale.com Freescale DPAA FMan ?

Or Gerlitz Introducing ConnectX-4 Ethernet SRIOV ?

Philipp Zabel MT8173 DRM support ?

Vladimir Zapolskiy gpio: lpc32xx: add new LPC32xx GPIO controller driver ?

Stephen Boyd Add support for MSM8996 clock controllers ?

Liviu Dudau drm: Add support for the ARM HDLCD display controller ?

Sudeep Holla ACPI / processor_idle: Add ACPI v6.0 LPI support ?

Guenter Roeck watchdog: Add support for keepalives triggered by infrastructure ?

Javi Merino Hierarchical thermal zones ?

Peter Ujfalusi dmaengine: New 'universal' API for requesting channel ?

Alex Williamson VFIO: capability chains ?

Hans Verkuil Refactoring Videobuf2 for common use ?

Andreas Herrmann blk-mq and I/O scheduling ?

David Howells Enhanced file stat system call ?

Vishal Verma Badblock tracking for gendisks ?

Seth Forshee Support fuse mounts in user namespaces ?

Dan Williams fs, bdev: handle end of life ?

Andreas Gruenbacher Richacls (Core and Ext4) ?

Kirill A. Shutemov RFD: huge tmpfs: compound vs. team pages ?

Minchan Kim MADV_FREE support ?

Daniel Cashman Allow customizable random offset to mmap_base address. ?

Johannes Weiner mm: memcontrol: account socket memory in unified hierarchy v4 ?

Michal Hocko GFP_NOFAIL reserves + warning about reserves depletion ?

Michal Hocko mm, oom: introduce oom reaper ?

Michal Hocko OOM detection rework v3 ?

Shaohua Li userfaultfd: add write protect support ?

Dan Williams get_user_pages() for dax mappings ?

Dave Watson Crypto kernel TLS socket ?

Tom Herbert net: The beginning of the end for NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM ?

Tom Herbert kcm: Kernel Connection Multiplexor (KCM) ?

David Decotigny RFC: new ETHTOOL_GSETTINGS/SSETTINGS API ?

Jiri Pirko bonding/team offload + mlxsw implementation ?

Kees Cook introduce post-init read-only memory ?

Mimi Zohar ima: measuring/appraising files read by the kernel ?

Serge E. Hallyn Introduce new security.nscapability xattr ?

Xiao Guangrong KVM: x86: track guest page access ?

Kernel development

Brief items

Kernel release status

Quotes of the week

Kernel development news

Post-init read-only memory

When things go wrong

TLS in the kernel

SOCK_DESTROY: an old Android patch aims upstream

A journal for MD/RAID5

Linux 4.4 — closing the RAID5 write hole

The new journal feature

The future: writeback caching and more full-stripe writes

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Device driver infrastructure

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers