Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.10-rc6, released on January 29. Linus is worried that the patch activity has increased this time around. "It's still not all that big by historical standards, since 4.10 has generally been pretty calm, but it's a bit distressing. I was hoping to do the usual 'rc7 is the last rc' release schedule for once (with both 4.8 and 4.9 pushing out to rc8), and I really want things to calm down for that to happen." The codename has changed again, now it's "Fearless Coyote".

Stable updates: 4.9.6 and 4.4.45 were released on January 26. The 4.9.7 and 4.4.46 updates are in the review process as of this writing; they can be expected on or after February 2.

Comments (none posted)

Quotes of the week

Seeing as I still get asked "hey, why is finger kernel.org not working any more," I'm sure "hey, what happened to ftp.kernel.org?" is going to be the question that will haunt me well into my retirement.

— Konstantin Ryabitsev

Distro UAPI headers work fine in a world where the kernel is a static entity and does not update its ABIs. I.e. it only works if there's no actual kernel side extensions to the ABI. The whole UAPI distro headers approach is designed for the case where the style of sharing the headers matters the least: for a stagnant kernel or a stagnant tooling project ...

Btw., this kind of rigid, suboptimal, latency laden method of sharing information between the kernel and tooling might be one of the reasons why in general the Linux tooling landscape sucks, compared to other OSs...

— Ingo Molnar

Comments (none posted)

Shutting down FTP services (kernel.org)

Kernel.org has announced that it will be shutting down FTP access to its archives in two stages: March 1 will see the end of ftp.kernel.org, while December 1 is the termination date for mirrors.kernel.org.

Let's face it -- while kinda neat and convenient, offering a public NFS/CIFS server was a Pretty Bad Idea, not only because both these protocols are pretty terrible over high latency connections, but also because of important security implications.

Well, 19 years later we're thinking it's time to terminate another service that has important protocol and security implications -- our FTP servers. Our decision is driven by the following considerations:

The protocol is inefficient and requires adding awkward kludges to firewalls and load-balancing daemons
FTP servers have no support for caching or accelerators, which has significant performance impacts
Most software implementations have stagnated and see infrequent updates

All kernel.org FTP services will be shut down by the end of this year.

Comments (72 posted)

Making sense of GFP_TEMPORARY

By Jonathan Corbet
February 1, 2017

This is the season where potential topics for the upcoming Linux Storage, Filesystem, and Memory Management Summit are discussed; often that discussion resolves the relevant issues before the actual event. That would appear to be the case with the mysterious GFP_TEMPORARY memory-allocation flag. The development community now know whats it does, but now it seems that the flag itself may turn out to be a temporary thing.

Matthew Wilcox started the discussion by listing no less than nine different topics that he would like to see addressed at the summit. One of those (#8) was "nailing down exactly what GFP_TEMPORARY means". This flag was added to the 2.6.24 kernel by Mel Gorman in 2007; since then, it has picked up a few dozen users throughout the kernel. But, it seems, nobody has ever documented what the flag's effects are or when it should be used.

What the flag actually does is relatively straightforward, though it took a while for the discussion to make it clear. Vlastimil Babka described it this way:

GFP_TEMPORARY, compared to GFP_KERNEL, adds __GFP_RECLAIMABLE, which tries to place the allocation within MIGRATE_RECLAIMABLE pageblocks - GFP_KERNEL implies MIGRATE_UNMOVABLE pageblocks, and userspace allocations are typically MIGRATE_MOVABLE.

All of this is driven by the need to defragment memory so that multiple-page allocations can be made when needed. Pages that are allocated for user-space memory are relatively easy to manage in this regard since they can be relocated elsewhere in physical memory; as long as the page-table entries are updated accordingly, the application(s) using those pages won't even be aware that they have moved. The kernel groups such pages together into regions of memory marked MIGRATE_MOVABLE in the hopes of being able to clear large contiguous areas of memory when the need arises. Keeping non-movable pages out of that area minimizes the risk of a single nailed-down page thwarting an effort to clear a range of memory.

Memory allocated for the kernel is not so easy to relocate, though, since the memory-management subsystem has no way to know where the references to any given page of memory might be or even how many of them exist. Thus, as a general rule, kernel-space memory allocations must be assumed to be eternally fixed in place; they cannot be put into the MIGRATE_MOVABLE regions. That said, some kernel-space memory has at least the possibility of being freed when memory gets tight. Memory allocated from a slab allocator with an associated shrinker callback falls into this category, for example. If this "reclaimable" memory is grouped together and kept separate from the completely unmovable memory, then there is at least a chance of freeing some usable blocks of pages when the shrinkers are run. The __GFP_RECLAIMABLE flag indicates memory that can (maybe) be reclaimed by the kernel in this way.

So GFP_TEMPORARY sets the __GFP_RECLAIMABLE flag, causing allocations to be taken from the MIGRATE_RECLAIMABLE block. That describes what the flag does, but not how it is meant to be used. After some discussion, it became evident that, in fact, nobody really seemed to know what the intended use case for GFP_TEMPORARY is.

For example, one might imagine that, from its name, GFP_TEMPORARY is intended for short-lived allocations — those that will be freed in the near future. But, Wilcox asked, what does short-lived mean in this context? Is it permissible for kernel code to block while holding a GFP_TEMPORARY allocation, for example? Or, instead, should preemption be disabled while holding that allocation? Would allocating data structures for I/O operations (which could take 30 seconds to time out) as GFP_TEMPORARY be acceptable? In other words, what are the promises that a kernel developer needs to make to perform a GFP_TEMPORARY allocation, and what benefits come from making those promises?

With regard to the acceptable time period, it turns out there is no clear answer. In the above-linked message, Babka said: "There's no simple connection to time, it depends on the larger picture". This led to complaints from developers like Neil Brown, who, understandably, thought that a name involving "temporary" would be somehow related to time. He also suggested that the whole idea is somewhat shaky, and that, if it works at all to reduce fragmentation, that is more a matter of luck. His suggestion was to look, instead, at mechanisms to render kernel-allocated objects movable so that active defragmentation could be performed. This is an interesting idea, but it is also less than trivial to implement and beyond the scope of the current discussion.

Wilcox, meanwhile, continued trying to determine the situations in which GFP_TEMPORARY should be used. It seems that it should not be used with kmalloc() calls, since the slab allocators ignore it. It is possible to hold these allocations for a considerable period of time. He suggested that there might be two possible benefits from using GFP_TEMPORARY: a higher chance of successfully allocating the memory, and making larger allocations more likely to succeed in general. Babka responded that nothing in the memory-management code makes GFP_TEMPORARY allocations more likely to succeed, but that the general benefit for larger allocations might exist.

In the end, nobody was able to come up with a simple answer to the question of when GFP_TEMPORARY should be used. So Michal Hocko concluded that perhaps it shouldn't exist at all:

From the current discussion so far it really seems that it would be really hard to define sensible semantic for GFP_TEMPORARY with the current implementation so I will send a patch to simply drop this flag. If we want to have such a flag then we should start over with defining the semantic first and think this thing over properly.

Subsystems like memory management are full of heuristics intended to improve the behavior of the system. The nature of heuristics, though, tends to make their use and benefits a bit fuzzy at times, especially in the absence of focused testing (as appears to be the case here). But even ineffective heuristics can end up wired into the system to the point where nobody questions their existence. One of the good things about free-software development is that it makes it easy for fresh eyes to come in and generate awkward questions.

Comments (8 posted)

LZ4: vendoring in the kernel

By Jonathan Corbet
February 1, 2017

In his 2017 linux.conf.au talk, Robert Lefkowitz talked about the process of "vendoring" — the copying of code from other projects into one's own repository rather than accepting a dependency on those projects. Vendoring is common in commercial projects; Android does a lot of it, for example, and Lefkowitz suggested that the practice should become more common elsewhere as well. Vendoring is not unknown elsewhere and is even done in the kernel, as a current patch set shows.

The LZ4 compression algorithm claims to be "extremely fast", especially on the decompression side. The project claims benchmark results showing LZ4 beating LZO decompression by a factor of four and zlib by nearly an order of magnitude. It is a lossless algorithm, so it is suitable for compressing data that must be recoverable in exactly its original form. Recent releases have added a "fast" mode that allows callers to control the trade-off between speed and the amount of compression applied.

One can imagine how this kind of fast compression would be useful to have in the kernel. And indeed, the kernel has had LZ4 capability since the 3.11 release in 2013. It was added by Chanho Min, who grabbed the r90 release from the LZ4 repository and stuffed it into the kernel under lib/lz4. A quick grep shows that it is currently used in the crypto layer, in the pstore subsystem, and in the squashfs filesystem. There are other places in the kernel that use compression, but they are not using LZ4 currently.

One of the advantages of copying the code into your own repository is that you are no longer dependent on an external dependency. Lefkowitz thought that independence was so valuable that he recommended copying for any dependency with less than about 35 million lines. In the kernel's case, there is an especially strong case against external dependencies: the kernel must be built as a standalone program using its complicated set of linker rules. It is probably possible to tweak the kernel's build system to allow it to link against externally supplied libraries, but one can imagine that there would be a fair amount of opposition to any such move. Kernel developers want to know exactly what is going into the end product.

The downside of vendoring, of course, is that you then lose out on all of the enhancements made in the original project. The LZ4 developers have made a number of releases since 2013; these have added numerous features, including the "LZ4 fast" mode. Some of the changes may have fixed bugs that, in the kernel, would constitute security vulnerabilities. None of those changes are in current kernels.

Toward the beginning of the year, Sven Schmidt posted a patch set updating LZ4 to the project's 1.7.2 release. The motivation was a desire to use the LZ4 fast mode in the Lustre filesystem, but he made the reasonable assumption that other parts of the kernel might want to take advantage of the fast mode as well. The patches are a wholesale replacement of the existing LZ4 code; the work initially done by Min to turn the LZ4 library into a kernel module has been replicated.

There do not appear to be any objections to upgrading the kernel's LZ4 implementation, but Greg Kroah-Hartman did note one potential problem and, in the process, highlighted one of the other hazards that go with vendoring. The existing in-kernel LZ4 implementation has not sat still since 2013; it has had a number of patches applied to it. Some of those were security fixes. When Schmidt replaced the LZ4 implementation, he replaced those fixes as well, potentially reintroducing problems that had already been fixed once.

Once his attention was called to the issue, Schmidt agreed to look at the patches and make sure that his replacement does not bring the old bugs back. With luck, he will also get any relevant changes merged back upstream, though Willy Tarreau suggested that some of the fixes, at least, were specific to the kernel. If such changes exist, they are unlikely to make it upstream and will thus be something the kernel has to carry indefinitely.

Making sure that the new LZ4 maintains the fixes applied to the old one is not a huge job; the number of patches is small. Happily, they exist as separate patches, rather than having been quietly folded into the source when LZ4 was initially added to the kernel. But it is a job that has to be remembered every time that somebody decides to update the kernel's LZ4 implementation. In this case, Kroah-Hartman noticed the problem, but the project cannot always count on his attentiveness to avoid regressions with future upgrades.

Such upgrades will almost certainly happen sooner or later. The upstream LZ4 project is already up to 1.7.6 as of this writing; it has added a new high-compression mode and fixed some bugs since 1.7.2 was released. At some point, somebody working in the kernel space will want the enhancements being made upstream.

The kernel has other copied subsystems like LZ4; they are mostly low-level compression and cryptographic code. Each one of these represents a sort of disconnect from the upstream project (in cases where there is still a functioning upstream project, at least). One could regard the highly modified kernels shipped in the mobile and embedded areas as being another example of the same thing; rather than upstream their code, these vendors simply copy it from one kernel to the next.

There are solid reasons for vendoring, but also real costs associated with it. The prevalence of vendoring throughout our community suggests that we are still struggling to find the best ways to integrate software that is created by independent groups of developers, especially as the scale of our projects continues to increase. For now, we will just have to hope that, the next time somebody decides to update a library like LZ4 in the kernel, they will remember what the old fixes are and make sure they carry over to the new version.

Comments (10 posted)

Linus Torvalds Linux 4.10-rc6 Jan 29

Greg KH Linux 4.9.6 Jan 26

Sebastian Andrzej Siewior v4.9.6-rt4 Jan 30

Greg KH Linux 4.4.45 Jan 26

Jiri Slaby Linux 3.12.70 Feb 01

Will Deacon Add support for the ARMv8.2 Statistical Profiling Extension Jan 27

Ingo Molnar x86/fpu: Simplify the FPU state machine Jan 26

Dave Hansen x86, mpx: Support larger address space (MAWA) Jan 26

Kees Cook Introduce the initify gcc plugin Jan 31

Dan Williams introduce a dax_inode for dax_operations Jan 28

Chris Packham Support for Marvell switches with integrated CPUs Jan 27

Quentin Schulz add support for AXP20X and AXP22X power supply drivers Jan 27

Gerd Hoffmann mmc: bcm2835: Add new driver for the internal SD controller. Jan 27

eajames.ibm@gmail.com drivers: hwmon: Add On-Chip Controller drive Jan 26

Eric Anholt staging: BCM2835 MMAL V4L2 camera driver Jan 27

Chen-Yu Tsai clk: sunxi-ng: Add support for A80 CCUs Jan 28

George Cherian Add Support for Cavium Cryptographic Acceleration Unit Jan 30

Raviteja Garimella Support for USB DRD Phy driver for NS2 Jan 30

Alexander Loktionov net: ethernet: aquantia: Add AQtion 2.5/5 GB NIC driver Jan 28

Dupuis, Chad Add QLogic FastLinQ FCoE (qedf) driver Jan 25

Hugues Fruchet Add support for DELTA video decoder of STMicroelectronics STiH4xx SoC series Jan 31

Hugues Fruchet Add support for MPEG-2 in DELTA video decoder Jan 30

Mylène Josserand Add sun8i A33 audio driver Jan 31

Iyappan Subramanian drivers: net: xgene-v2: Add RGMII based 1G driver Jan 31

Christopher Bostic FSI device driver introduction Feb 01

Andrey Smirnov i.MX7 PCI support Feb 01

Christoph Hellwig automatic IRQ affinity for virtio V2 Jan 27

Heikki Krogerus USB Type-C Connector class Jan 30

Noralf Trønnes drm: Add support for tiny LCD displays Jan 31

Dmitry Torokhov Export APIs to copy device properties & more Feb 01

Sakari Ailus Media object lifetime management meeting report from Oslo Jan 27

Kirill A. Shutemov ext4: support of huge pages Jan 26

Ram Pai DM: dm-inplace-compress: inplace compressed DM target Jan 30

Jérôme Glisse HMM (Heterogeneous Memory Management) v17 Jan 27

Anshuman Khandual Define coherent device memory node Jan 30

Shaohua Li mm: add new LRU list for MADV_FREE pages Jan 29

Michal Hocko kvmalloc Jan 30

"Christoph Paasch" (via mptcp-dev Mailing List) <mptcp-dev-1cNGNKGn6cRWdXg3Zgxhqoble9XqW/aP@public.gmane.org> MPTCP v0.91.3 maintenance release Jan 31

Andy Lutomirski setgid hardening Jan 25

James Bottomley Add session handling to tpm spaces Jan 27

Jens Wiklander generic TEE subsystem Jan 28

Casey Schaufler LSM: Stacking for major security modules - resend Jan 25

Jintack Lim Provide the EL1 physical timer to the VM Jan 26

Punit Agrawal Add support for monitoring guest TLB operations Jan 31

Joe Stringer Libbpf object pinning Jan 26

Pablo Neira Ayuso iptables 1.6.1 release Jan 27

Theodore Ts'o Release of e2fsprogs 1.43.4 Jan 31

Kernel development

Brief items

Kernel release status

Quotes of the week

Shutting down FTP services (kernel.org)

Kernel development news

Making sense of GFP_TEMPORARY

LZ4: vendoring in the kernel

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous