LWN.net Weekly Edition for November 3, 2016
Adaptive mutexes in user space
One of the frustrations of computer programming (almost certainly shared with other engineering disciplines) is that, often, a simple, elegant, and general design doesn't work as well as an ugly hack. Such designs still have value as they are more maintainable and more extensible, so it is not uncommon to need to find a balance between simple elegance and practical efficiency. The story of futex support in Linux could be seen as a story of trying to find just this balance. The latest episode adds a new special case, but provides impressive performance improvements.
Some futex history
The original design of futexes — a kernel interface to support Fast User-space muTEXes — introduced a four-byte memory location that would always be updated atomically. A key aspect of the design was that the kernel needed only the simplest understanding of a futex's contents; a comparison with a value provided by user space was all that was ever needed. All updates were handled by user-space code and, if a program ever found that it needed to wait for the value to change, such as to wait for a lock to be released, it would use the futex() system call to ask the kernel to wait for that change. When some other thread changed the value, it would tell the kernel to wake up some number of sleeping processes, and they could examine the new value and act accordingly.
This design is simple and elegant, but imperfect. The kernel has minimal knowledge of what user space is doing, and user space has little access to relevant information that the kernel maintains, so they are limited in the extent to which they can work together. This disconnect has required a number of extensions over the years, three that are in the mainline now, and one that is on the horizon.
The first extension was needed to optimize the implementation of pthread_cond_signal(), which sometimes needs to send a wakeup on one futex (a condition variable), unlock a mutex (represented by another futex), then possibly send a wakeup on that second futex. At the same time, another thread might be waiting on the first futex and will immediately try to lock the mutex, which could require waiting on the second futex. Having this second thread wake up and go straight back to sleep leads to measurably poor performance. Neither the documentation nor the changelogs make it clear why the wakeups cannot both be performed after the mutex is unlocked, but they do assert that this is racy and so a new futex operation was created.
FUTEX_WAKE_OP is given two futexes and instructions on how to unlock the second. It will perform that unlocking, wake up waiters on the first futex, then conditionally wake up waiters on the second as well. It does all this atomically with respect to any other operations on either futex. This was the first time that the kernel needed to modify the value of the futex itself; Jakub Jelinek came up with a fairly generic mechanism to describe the unlock operation. Five different operations are provided to combine an operand with the current value of the futex, and then one of six different comparisons can be performed against a second operand to decide if the second wakeup should happen. This seems fairly powerful and suitably general, but was probably wasted effort. Only a single operation (set to zero) is ever used by glibc, and only a single comparison (greater than one). It is unlikely that the other options will ever be used, in part because of the subsequent extensions that impose structure on the value.
It is possible for a thread to be killed before it makes an expected change to a futex that other threads are waiting for. Only the user-space code knows which thread "owns" the lock (or even what sort of locking is being used) and only the kernel knows when a process dies unexpectedly. To allow those waiting threads to discover that something is wrong, some extra communication was needed. This gave rise to "robust futexes" that allow a thread to register a linked list of futexes whose waiting threads need to be woken up if the thread ever dies.
The use of robust futexes significantly reduced the flexibility of futexes. The four-byte memory location now has a fully defined meaning: 30 bits provide the thread ID of the owner of the futex, one bit records if any other threads are waiting on the futex, and one bit indicates if the previous owner died. This means that robust futexes cannot be used to create counting semaphores, or reader/writer locks. They can only be used for binary mutual-exclusion locks. It also means that most of the operations provided for FUTEX_WAKE_OP are of no value for robust futexes.
The more flexible, non-robust futexes could still be used as private futexes between threads in a single process and never shared between processes. In that context, an unexpected failure will kill the whole process rather than a single thread, so no recovery handling is needed.
The third extension of interest was to support priority inheritance. If it is possible for threads of different priorities to claim a lock then, when a low-priority process holds the lock and a high-priority process is waiting for it, any medium priority process that prevents the low-priority process from running will indirectly interfere with the high-priority process, which is not desirable. This "priority inversion" is usually addressed using priority inheritance, which causes the low-priority process to run with the priority of the highest-priority process that is waiting for it. Linux has priority-inheritance mutex locks internally, so the priority-inheritance (PI) extension to futexes allocates one of those whenever a PI futex is contended, and uses it to manage priority.
This extension was the first to introduce the verbs "lock" and "unlock" (and "trylock") into the futex interface. Previously the interfaces only talked about "waking" and "waiting" with the implication that a variety of different services could be built on that. For priority-inheritance locks, at least, that pretense in now gone. It really is just a lock.
What's next for futexes
In the kernel, one of the improvements that has been made to mutexes in recent years is to add adaptive spinning. The theory is that sometimes a mutex is only held for a short period of time and, in those cases, it is more efficient to busy-wait for the lock to be free than to go to sleep and then be woken up. If the busy-wait doesn't look like it will be successful, only then is the thread put to sleep. Making a choice between spinning and sleeping is the adaption included in the name.
It should be no surprise that this optimization should be useful for user-space locking using futexes. Waiman Long has found that, for a particular micro-benchmark, standard (wait/wake) futexes can achieve a mere 35 million operations in ten seconds, while adaptive spinning can increase that to over 54 million. This technique had been tried before by Darren Hart, though his reported results weren't quite so impressive, probably because modern processors have many more cores and high core counts can tip the balance towards spinning over sleeping. While micro-benchmark results should be treated with caution, a 50% improvement deserves some attention.
The user-space code could, of course, simply spin for, say, 20 microseconds before giving up and asking the kernel to put it to sleep. While simple, this approach is far from ideal. If the process holding the lock is sleeping, busy-waiting for it is a waste of power and could possible increase the total wait time. It only makes sense to busy-wait if the process owning the lock is itself busy.
Here again, the separation between user space and kernel space is a problem. Only the kernel knows which processes are busy or sleeping. Either we need to tell user space when the owner of a futex is sleeping, or tell the kernel that it should spin for a while before taking the lock. Both of these are probably possible, but moving the whole locking operation into the kernel is probably easiest, matches the approach that PI futexes use, and is the approach that Long is exploring.
The patchset
Long's latest patchset adds the FUTEX_LOCK and FUTEX_UNLOCK operations to the futex() system call; they work in a similar fashion to FUTEX_LOCK_PI and FUTEX_UNLOCK_PI, but without the priority inheritance. They use a regular mutex to help implement the locking, but in a slightly different way than the PI extension's use of an rt_mutex.
The first time the futex() system call is made, there are two processes interested in the lock. One already holds the lock, while the other wants to acquire it. The PI extension initializes a new rt_mutex in a locked state and makes it appear that the thread that owns the futex also owns this rt_mutex. There is substantial complexity in making this work in a race-free way, but in essence, that is what happens. The second process then waits for the mutex in a fairly normal way.
The new adaptive spinning extension (called "throughput optimized", which describes the goal rather than the implementation) uses a mutex only to arbitrate between the different threads that might be waiting, not to arbitrate between them and the thread that owns the lock. Whichever thread manages to claim this mutex is the "top waiter" and gets to decide whether to busy-wait for the current owner to release the lock, or to go to sleep to be woken by the usual futex wakeup mechanism.
Long did originally try to follow the same model as PI mutexes but found that the performance wasn't close to what he wanted, as too many unlock requests still went through the kernel. Futexes only need the kernel to be involved when there is contention; the kernel only gets involved when there is one lock owner and at least one lock waiter. Additionally, if there is only one waiter, when the owner releases the lock and that waiter becomes the owner, the kernel no longer needs to be involved. When that new owner drops the lock it should be able to complete without involving the kernel. With PI mutexes, the rt_mutex stays allocated until completely unlocked (i.e. until there are no more owners). This means that last unlock goes through the kernel. Long claims that this extra kernel involvement reduces throughput significantly.
Another benefit from maintaining control of the busy-waiting separately from the kernel mutex is that the benefits of lock stealing can be realized; this sacrifices some fairness for performance. There is a small window between the moment when the owning thread unlocks a futex and when the top waiter locks that futex. If another thread tries to claim the lock during that window it can successfully steal the lock. This is seen as a good thing, presumably because that new thread has its working set of memory in cache and will likely make progress quickly. Long's patches explicitly allow this stealing, but also put a limit on it. If the top waiter is woken up after sleeping and fails to get the lock, it sets a flag asking that the next time some thread unlocks the lock, that they perform a handoff instead and explicitly give the lock to that top waiter, thus avoiding further theft. Long provides numbers that seem to suggest that this improves throughput (the main goal) and also improves fairness.
Responses
The reception to the patch set so far has been cautious. The results appear encouraging, but there are some questions, including whether the code might be driven too much by that one benchmark. Two comments that reveal Thomas Gleixner's concerns are first:
and later:
I really wonder how the average programmer should pick the right flavour, not to talk about any useful decision for something like glibc to pick the proper one.
Given that adaptive spinning has made its way into the in-kernel mutexes, it would be surprising if a way cannot be found to make them work well for user space too. Of course, it would also be surprising if the first attempt at providing such a feature would have found the right balance between the various competing needs. We don't have efficient adaptive spinning for futexes yet, but if it really brings value, it shouldn't be too far away.
The Turris Omnia router: help for the IoT mess?
The Turris Omnia router is not the first FLOSS router out there, but it could well be one of the first open hardware routers to be available. As the crowdfunding campaign is coming to a close, it is worth reflecting on the place of the project in the ecosystem. Beyond that, I got my hardware recently, so I was able to give it a try.
A short introduction to the Omnia project
The Omnia router is a followup project on CZ.NIC's original research project, the Turris. The goal of the project was to identify hostile traffic on end-user networks and develop global responses to those attacks across every monitored device. The Omnia is an extension of the original project: more features were added and data collection is now opt-in. Whereas the original Turris was simply a home router, the new Omnia router includes:
- 1.6GHz ARM CPU
- 1-2GB RAM
- 8GB flash storage
- 6 Gbit Ethernet ports
- SFP fiber port
- 2 Mini-PCI express ports
- mSATA port
- 3 MIMO 802.11ac and 2 MIMO 802.11bgn radios and antennas
- SIM card support for backup connectivity
Some models sold had a larger case to accommodate extra hard drives,
turning the Omnia router into a NAS device that could actually serve as
a multi-purpose home server. Indeed, it is one of the objectives of the
project to make "more than just a router
". The NAS model is
not currently on sale anymore, but there are plans to bring it back
along with LTE modem
options and new accessories "to expand Omnia towards home
automation
".
Omnia runs a fork of the OpenWRT distribution called TurrisOS that has been customized to
support automated live updates, a simpler web interface, and other extra
features. The fork also has patches to the Linux kernel, which is based on Linux
4.4.13 (according to uname -a). It is unclear why those
patches are necessary since the ARMv7 Armada 385 CPU has been supported
in Linux since at least 4.2-rc1, but it is common for OpenWRT ports to
ship patches to the kernel, either to backport missing functionality or
perform some optimization.
There has been some pressure from backers to petition Turris to "speedup the process of upstreaming Omnia support to OpenWrt". It could be that the team is too busy with delivering the devices already ordered to complete that process at this point. The software is available on the CZ-NIC GitHub repository and the actual Linux patches can be found here and here. CZ.NIC also operates a private GitLab instance where more software is available. There is technically no reason why you wouldn't be able to run your own distribution on the Omnia router: OpenWRT development snapshots should be able to run on the Omnia hardware and some people have installed Debian on Omnia. It may require some customization (e.g. the kernel) to make sure the Omnia hardware is correctly supported. Most people seem to prefer to run TurrisOS because of the extra features.
The hardware itself is also free and open for the most part. There is a binary blob needed for the 5GHz wireless card, which seems to be the only proprietary component on the board. The schematics of the device are available through the Omnia wiki, but oddly not in the GitHub repository like the rest of the software.
Hands on
I received my own router last week, which is about six months late from the original April 2016 delivery date; it allowed me to do some hands-on testing of the device. The first thing I noticed was a known problem with the antenna connectors: I had to open up the case to screw the fittings tight, otherwise the antennas wouldn't screw in correctly.
Once that was done, I simply had to go through the usual process of setting up the router, which consisted of connecting the Omnia to my laptop with an Ethernet cable, connecting the Omnia to an uplink (I hooked it into my existing network), and go through a web wizard. I was pleasantly surprised with the interface: it was smooth and easy to use, but at the same time imposed good security practices on the user.
For example, the wizard, once connected to the network, goes through a full system upgrade and will, by default, automatically upgrade itself (including reboots) when new updates become available. Users have to opt-in to the automatic updates, and can chose to automate only the downloading and installation of the updates without having the device reboot on its own. Reboots are also performed during user-specified time frames (by default, Omnia applies kernel updates during the night). I also liked the "skip" button that allowed me to completely bypass the wizard and configure the device myself, through the regular OpenWRT systems (like LuCI or SSH) if I needed to.
Notwithstanding the antenna connectors themselves, the hardware is nice. I ordered the black metal case, and I must admit I love the many LED lights in the front. It is especially useful to have color changes in the reset procedure: no more guessing what state the device is in or if I pressed the reset button long enough. The LEDs can also be dimmed to reduce the glare that our electronic devices produce.
All this comes at a price, however: at $250 USD, it is a much higher price tag than common home routers, which typically go for around $50. Furthermore, it may be difficult to actually get the device, because no orders are being accepted on the Indiegogo site after October 31. The Turris team doesn't actually want to deal with retail sales and has now delegated retail sales to other stores, which are currently limited to European deliveries.
A nice device to help fight off the IoT apocalypse
It seems there isn't a week that goes by these days without a record-breaking distributed denial-of-service (DDoS) attack. Those attacks are more and more caused by home routers, webcams, and "Internet of Things" (IoT) devices. In that context, the Omnia sets a high bar for how devices should be built but also how they should be operated. Omnia routers are automatically upgraded on a nightly basis and, by default, do not provide telnet or SSH ports to run arbitrary code. There is the password-less wizard that starts up on install, but it forces the user to chose a password in order to complete the configuration.
Both the hardware and software of the Omnia are free and open. The
automatic update's EULA
explicitly states that the software provided by CZ.NIC "will be released
under a free software licence
" (and it has been, as mentioned earlier). This
makes the machine much easier to audit by someone looking for possible
flaws, say for example a customs official looking to approve the import in
the eventual case where IoT devices end up being regulated. But it
also makes the device itself more secure. One of the problems with these
kinds of devices
is "bit rot": they have known vulnerabilities that are not fixed in a
timely manner, if at all. While it would be trivial for an attacker to
disable the Omnia's auto-update mechanisms, the point is not to
counterattack, but to prevent attacks on known vulnerabilities.
The CZ.NIC folks take it a step further and encourage users to actively participate in a monitoring effort to document such attacks. For example, the Omnia can run a honeypot to lure attackers into divulging their presence. The Omnia also runs an elaborate data collection program, where routers report malicious activity to a central server that collects information about traffic flows, blocked packets, bandwidth usage, and activity from a predefined list of malicious addresses. The exact data collected is specified in another EULA that is currently only available to users logged in at the Turris web site. That data can then be turned into tweaked firewall rules to protect the overall network, which the Turris project calls a distributed adaptive firewall. Users need to explicitly opt-in to the monitoring system by registering on a portal using their email address.
Turris devices also feature the Majordomo software (not to be confused with the venerable mailing list software) that can also monitor devices in your home and identify hostile traffic, potentially leading users to take responsibility over the actions of their own devices. This, in turn, could lead users to trickle complaints back up to the manufacturers that could change their behavior. It turns out that some companies do care about their reputations and will issue recalls if their devices have significant enough issues.
It remains to be seen how effective the latter approach will be, however. In the meantime, the Omnia seems to be an excellent all-around server and router for even the most demanding home or small-office environments that is a great example for future competitors.
Security
Defending against Rowhammer in the kernel
The Rowhammer vulnerability affects hardware at the deepest levels. It has proved to be surprisingly exploitable on a number of different systems, leaving security-oriented developers at a loss. Since it is a hardware vulnerability, it would appear that solutions, too, must be placed in the hardware. Now, though, an interesting software-based mitigation mechanism is under discussion on the linux-kernel mailing list. The ultimate effectiveness of this defense is unproven, but it does show that there may be hope for a solution that doesn't require buying new computers.Rowhammer works by repeatedly reading the same memory location a large number of times. With contemporary DRAM, reading a location is a destructive act; the memory controller must rewrite the data into that location after each read. Those rewrites can cause neighboring memory cells to discharge slightly; if an attacker causes rewriting to happen too many times before the next regular refresh cycle happens, they can corrupt data in those neighboring cells. The result is seemingly random bit flips in nearby memory.
This would appear to be a difficult vulnerability to exploit. An attacker must find memory that is known to be adjacent to data of interest, then manage to corrupt that data in a useful way. But attackers can do surprising things; a fair number of Rowhammer exploits have now been posted. That includes the "Drammer" exploit that works on many Android devices. Rowhammer is thus a serious problem. Unfortunately, the only proper solution appears to be to increase the memory refresh rate, something that cannot generally be done in deployed hardware.
An intriguing alternative turned up on the linux-kernel list, though its
nature wasn't immediately clear. Pavel Machek asked a question that raised some eyebrows:
"I'd like to get an interrupt every million cache misses... to do a
printk() or something like that.
" Developers naturally wondered
what he was up to. The answer turns out to be an in-kernel Rowhammer
defense.
Contemporary CPUs are generally equipped with performance-monitoring units (PMUs) that can track many aspects of how the system is running. Normally the PMU is used by utilities like perf for system profiling and performance tuning. But one of the events the PMU can track is memory-cache misses. For Rowhammer to work, it must act on main memory; reads from cache will not be effective. That means forcing a cache miss for each of, generally, hundreds of thousands of reads to the same address. If the PMU can be used to detect those cache misses, it might be able to detect — and mitigate — Rowhammer attacks.
The patch is evolving rapidly as this is being written; the current version takes the form of a "nohammer" kernel module. It has a (currently hardwired) parameter called dram_max_utilization_factor, which determines the maximum cache-miss rate allowed in the system. If it is set to 8 (the default), then the nohammer module will trigger if the cache-miss rate exceeds 1/8 of the theoretical maximum. When that happens, the CPU will be forced to delay for a period long enough to allow the next DRAM refresh to run; 64ms by default. In theory, this delay should slow down a Rowhammer attack enough to make it ineffective.
It's a nice theory, but it still suffers from a number of practical problems at this point. To begin with, a 64ms hard delay will add a huge latency to anything the affected CPU is supposed to be doing. If it happens with any frequency at all, it will be noticed, even on systems that are not highly latency-sensitive. Ingo Molnar has suggested making the delay shorter and more frequent; that would reduce the maximum imposed latency, but doesn't change the overall nature of the defense.
The PMU can detect a high rate of cache misses, but it cannot tell the kernel whether all of those misses involved the same address or not. So it could be triggered by an application that is, for example, reading quickly through a large array of data in memory. Thus, it seems entirely plausible that a number of legitimate workloads will generate high rates of cache misses over time that will be mistaken for Rowhammer attacks. Those workloads will be penalized severely by this patch, for no actual gain. That will quickly lead to people turning the Rowhammer defense off.
The PMU is a per-CPU mechanism, but memory is globally accessible in a multiprocessor system. The patch has some tests for an attack that is conducted by two CPUs simultaneously, but does not scale well to systems with more processors than that. It's not entirely clear how it can be made to work in a setting where, say, eight processors are all pounding the same location simultaneously.
Finally, Mark Rutland raised an important point: this mechanism depends entirely on counting cache misses. If the attacker is able to obtain an uncached memory mapping, all operations on that memory will bypass the cache entirely and will not be counted. It would appear that Drammer makes use of just such a mapping, so this module may well not be an effective defense against it. Detecting attacks against uncached memory could prove to be a much harder problem.
So it is far too soon to say that the kernel has a useful defense against Rowhammer attacks. But this work shows that, when one is willing to pay the price, a defense might just be possible, at least for some types of attacks. That is an improvement over a world where the only real defense is to buy new hardware — once the vendors get around to producing Rowhammer-resistant systems. It will be interesting to watch where this work goes and how effective it becomes.
Brief items
Security quotes of the week
New vulnerabilities
bind: denial of service
| Package(s): | bind | CVE #(s): | CVE-2016-8864 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | January 11, 2017 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
A defect in BIND's handling of responses containing a DNAME answer can cause a resolver to exit after encountering an assertion failure in db.c or resolver.c During processing of a recursive response that contains a DNAME record in the answer section, BIND can stop execution after encountering an assertion error in resolver.c (error message: "INSIST((valoptions & 0x0002U) != 0) failed") or db.c (error message: "REQUIRE(targetp != ((void *)0) && *targetp == ((void *)0)) failed"). A server encountering either of these error conditions will stop, resulting in denial of service to clients. The risk to authoritative servers is minimal; recursive servers are chiefly at risk. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cairo: denial of service
| Package(s): | cairo | CVE #(s): | CVE-2016-9082 | ||||
| Created: | October 31, 2016 | Updated: | November 2, 2016 | ||||
| Description: | From the Debian LTS advisory:
It was discovered that there was a possible DoS attack in Cairo, a multi-platform library providing vector-based rendering. An SVG could generate invalid pointers from a _cairo_image_surface in write_png. | ||||||
| Alerts: |
| ||||||
chromium: denial of service
| Package(s): | chromium | CVE #(s): | CVE-2016-5138 | ||||
| Created: | October 31, 2016 | Updated: | November 2, 2016 | ||||
| Description: | From the CVE entry:
Integer overflow in the kbasep_vinstr_attach_client function in midgard/mali_kbase_vinstr.c in Google Chrome before 52.0.2743.85 allows remote attackers to cause a denial of service (heap-based buffer overflow and use-after-free) by leveraging an unrestricted multiplication. | ||||||
| Alerts: |
| ||||||
curl: multiple vulnerabilities
| Package(s): | curl | CVE #(s): | CVE-2016-8615 CVE-2016-8616 CVE-2016-8617 CVE-2016-8618 CVE-2016-8619 CVE-2016-8620 CVE-2016-8621 CVE-2016-8622 CVE-2016-8623 CVE-2016-8624 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | November 18, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the SUSE advisory:
- CVE-2016-8624: invalid URL parsing with '#' (bsc#1005646) - CVE-2016-8623: Use-after-free via shared cookies (bsc#1005645) - CVE-2016-8622: URL unescape heap overflow via integer truncation (bsc#1005643) - CVE-2016-8621: curl_getdate read out of bounds (bsc#1005642) - CVE-2016-8620: glob parser write/read out of bounds (bsc#1005640) - CVE-2016-8619: double-free in krb5 code (bsc#1005638) - CVE-2016-8618: double-free in curl_maprintf (bsc#1005637) - CVE-2016-8617: OOB write via unchecked multiplication (bsc#1005635) - CVE-2016-8616: case insensitive password comparison (bsc#1005634) - CVE-2016-8615: cookie injection for other servers (bsc#1005633) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
imagemagick: multiple vulnerabilities
| Package(s): | ImageMagick | CVE #(s): | CVE-2014-9907 CVE-2015-8959 CVE-2016-7513 CVE-2016-7514 CVE-2016-7518 CVE-2016-7520 CVE-2016-7521 CVE-2016-7523 CVE-2016-7525 CVE-2016-7530 CVE-2016-7532 CVE-2016-7534 CVE-2016-7535 CVE-2016-7536 CVE-2016-7538 CVE-2016-7539 CVE-2016-7540 CVE-2016-8677 | ||||||||||||||||||||||||||||||||
| Created: | October 31, 2016 | Updated: | January 30, 2017 | ||||||||||||||||||||||||||||||||
| Description: | From the openSUSE advisory:
- CVE-2014-9907: DOS due to corrupted DDS files (bsc#1000714) - CVE-2015-8959: DOS due to corrupted DDS files (bsc#1000713) - CVE-2016-7513: Off-by-one error leading to segfault (bsc#1000686) - CVE-2016-7514: Out-of-bounds read in coders/psd.c (bsc#1000688) - CVE-2016-7518: Out-of-bounds read in coders/sun.c (bsc#1000694) - CVE-2016-7520: Heap overflow in hdr file handling (bsc#1000696) - CVE-2016-7521: Heap buffer overflow in psd file handling (bsc#1000697) - CVE-2016-7523: AddressSanitizer:heap-buffer-overflow READ of size 1 meta.c:496 (bsc#1000699) - CVE-2016-7525: Heap buffer overflow in psd file coder (bsc#1000701) - CVE-2016-7530: Out of bound in quantum handling (bsc#1000703) - CVE-2016-7532: Fix handling of corrupted psd file (bsc#1000706) - CVE-2016-7534: Out of bound access in generic decoder (bsc#1000708) - CVE-2016-7535: Out of bound access for corrupted psd file (bsc#1000709) - CVE-2016-7536: SEGV reported in corrupted profile handling (bsc#1000710) - CVE-2016-7538: SIGABRT for corrupted pdb file (bsc#1000712) - CVE-2016-7539: Potential DOS by not releasing memory (bsc#1000715) - CVE-2016-7540: Writing to RGF format aborts (bsc#1000394) - CVE-2016-8677: Memory allocation failure in AcquireQuantumPixels (bsc#1005328) | ||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||
java: unspecified vulnerability
| Package(s): | java | CVE #(s): | CVE-2016-5556 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entry:
Unspecified vulnerability in Oracle Java SE 6u121, 7u111, and 8u102 allows remote attackers to affect confidentiality, integrity, and availability via vectors related to 2D. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
libtiff: denial of service
| Package(s): | libtiff | CVE #(s): | CVE-2016-3658 | ||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||
| Description: | From the CVE entry:
The TIFFWriteDirectoryTagLongLong8Array function in tif_dirwrite.c in the tiffset tool in LibTIFF 4.0.6 and earlier allows remote attackers to cause a denial of service (out-of-bounds read) via vectors involving the ma variable. | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
libwmf: denial of service
| Package(s): | libwmf | CVE #(s): | CVE-2016-9011 | ||||||||||||
| Created: | November 2, 2016 | Updated: | November 14, 2016 | ||||||||||||
| Description: | From the Debian LTS advisory:
Agostino Sarubbo from Gentoo discovered a flaw in libwmf's Windows Metafile Format (WMF) parser which caused allocation of excessive amount of memory potentially leading to a crash. | ||||||||||||||
| Alerts: |
| ||||||||||||||
libxml2: code execution
| Package(s): | libxml2 | CVE #(s): | CVE-2016-4658 | ||||||||||||||||||||||||||||
| Created: | November 1, 2016 | Updated: | November 7, 2016 | ||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
A use-after-free vulnerability via namespace nodes in XPointer ranges was found in libxml2. | ||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||
mailman: cross-site request forgery
| Package(s): | mailman | CVE #(s): | CVE-2016-7123 | ||||
| Created: | November 2, 2016 | Updated: | November 2, 2016 | ||||
| Description: | From the Ubuntu advisory:
It was discovered that the Mailman administrative web interface did not protect against cross-site request forgery (CSRF) attacks. If an authenticated user were tricked into visiting a malicious website while logged into Mailman, a remote attacker could perform administrative actions. This issue only affected Ubuntu 12.04 LTS. | ||||||
| Alerts: |
| ||||||
mariadb: multiple unspecified vulnerabilities
| Package(s): | mariadb mysql | CVE #(s): | CVE-2016-3492 CVE-2016-5612 CVE-2016-5616 CVE-2016-5624 CVE-2016-5626 CVE-2016-5629 CVE-2016-8283 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 1, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entries:
Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows remote authenticated users to affect availability via vectors related to Server: Optimizer. (CVE-2016-3492) Unspecified vulnerability in Oracle MySQL 5.5.50 and earlier, 5.6.31 and earlier, and 5.7.13 and earlier allows remote authenticated users to affect availability via vectors related to DML. (CVE-2016-5612) Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows local users to affect confidentiality, integrity, and availability via vectors related to Server: MyISAM. (CVE-2016-5616) Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier allows remote authenticated users to affect availability via vectors related to DML. (CVE-2016-5624) Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows remote authenticated users to affect availability via vectors related to GIS. (CVE-2016-5626) Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows remote administrators to affect availability via vectors related to Server: Federated. (CVE-2016-5629) Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows remote authenticated users to affect availability via vectors related to Server: Types. (CVE-2016-8283) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
memcached: code execution
| Package(s): | memcached | CVE #(s): | CVE-2016-8704 CVE-2016-8705 CVE-2016-8706 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 1, 2016 | Updated: | January 12, 2017 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
- CVE-2016-8704 (arbitrary code execution): An integer overflow in the process_bin_append_prepend function which is responsible for processing multiple commands of Memcached binary protocol can be abused to cause heap overflow and lead to remote code execution. - CVE-2016-8705 (arbitrary code execution): Multiple integer overflows in process_bin_update function which is responsible for processing multiple commands of Memcached binary protocol can be abused to cause heap overflow and lead to remote code execution. - CVE-2016-8706 (arbitrary code execution): An integer overflow in process_bin_sasl_auth function which is responsible for authentication commands of Memcached binary protocol can be abused to cause heap overflow and lead to remote code execution. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
mysql: unspecified vulnerability
| Package(s): | mysql mariadb | CVE #(s): | CVE-2016-5617 | ||||||||||||||||||||||||
| Created: | November 1, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||
| Description: | From the CVE entry:
Unspecified vulnerability in Oracle MySQL 5.5.51 and earlier, 5.6.32 and earlier, and 5.7.14 and earlier allows local users to affect confidentiality, integrity, and availability via vectors related to Server: Error Handling. | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
nodejs-tough-cookie: denial of service
| Package(s): | nodejs-tough-cookie | CVE #(s): | CVE-2016-1000232 | ||||
| Created: | October 28, 2016 | Updated: | November 2, 2016 | ||||
| Description: | From the Red Hat advisory:
A regular expression denial of service flaw was found in Tough-Cookie. An attacker able to make an application using Touch-Cookie to parse a sufficiently large HTTP request Cookie header could cause the application to consume an excessive amount of CPU. (CVE-2016-1000232) | ||||||
| Alerts: |
| ||||||
openstack-manila-ui: cross-site scripting
| Package(s): | openstack-manila-ui | CVE #(s): | CVE-2016-6519 | ||||||||||||
| Created: | October 27, 2016 | Updated: | November 2, 2016 | ||||||||||||
| Description: | From the Red Hat advisory:
A cross-site scripting flaw was discovered in openstack-manila-ui's Metadata field contained in its "Create Share" form. A user could inject malicious HTML/JavaScript code that would then be reflected in the "Shares" overview. Remote, authenticated, but unprivileged users could exploit this vulnerability to steal session cookies and escalate their privileges. (CVE-2016-6519) | ||||||||||||||
| Alerts: |
| ||||||||||||||
oxide-qt: information disclosure
| Package(s): | oxide-qt | CVE #(s): | CVE-2016-1586 | ||||
| Created: | November 2, 2016 | Updated: | November 2, 2016 | ||||
| Description: | From the Ubuntu advisory:
It was discovered that a long running unload handler could cause an incognito profile to be reused in some circumstances. If a user were tricked in to opening a specially crafted website, an attacker could potentially exploit this to obtain sensitive information. | ||||||
| Alerts: |
| ||||||
python-django: two vulnerabilities
| Package(s): | python-django | CVE #(s): | CVE-2016-9013 CVE-2016-9014 | ||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | November 21, 2016 | ||||||||||||||||||||||||
| Description: | From the Ubuntu advisory:
Marti Raudsepp discovered that Django incorrectly used a hardcoded password when running tests on an Oracle database. A remote attacker could possibly connect to the database while the tests are running and prevent the test user with the hardcoded password from being removed. (CVE-2016-9013) Aymeric Augustin discovered that Django incorrectly validated hosts when being run with the debug setting enabled. A remote attacker could possibly use this issue to perform DNS rebinding attacks. (CVE-2016-9014) | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
qemu-kvm: multiple vulnerabilities
| Package(s): | qemu-kvm | CVE #(s): | CVE-2016-7909 CVE-2016-8909 CVE-2016-8910 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 31, 2016 | Updated: | November 3, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian LTS advisory:
Multiple vulnerabilities have been discovered in qemu-kvm, a full virtualization solution on x86 hardware based on Quick Emulator(Qemu). The Common Vulnerabilities and Exposures project identifies the following problems: CVE-2016-7909: Quick Emulator(Qemu) built with the AMD PC-Net II emulator support is vulnerable to an infinite loop issue. It could occur while receiving packets via pcnet_receive(). A privileged user/process inside guest could use this issue to crash the Qemu process on the host leading to DoS. CVE-2016-8909: Quick Emulator(Qemu) built with the Intel HDA controller emulation support is vulnerable to an infinite loop issue. It could occur while processing the DMA buffer stream while doing data transfer in 'intel_hda_xfer'. A privileged user inside guest could use this flaw to consume excessive CPU cycles on the host, resulting in DoS. CVE-2016-8910: Quick Emulator(Qemu) built with the RTL8139 ethernet controller emulation support is vulnerable to an infinite loop issue. It could occur while transmitting packets in C+ mode of operation. A privileged user inside guest could use this flaw to consume excessive CPU cycles on the host, resulting in DoS situation. Further issues fixed where the CVE requests are pending: * Quick Emulator(Qemu) built with the i8255x (PRO100) NIC emulation support is vulnerable to a memory leakage issue. It could occur while unplugging the device, and doing so repeatedly would result in leaking host memory affecting, other services on the host. A privileged user inside guest could use this flaw to cause a DoS on the host and/or potentially crash the Qemu process on the host. * Quick Emulator(Qemu) built with the VirtFS, host directory sharing via Plan 9 File System(9pfs) support, is vulnerable to a several memory leakage issues. A privileged user inside guest could use this flaw to leak the host memory bytes resulting in DoS for other services. * Quick Emulator(Qemu) built with the VirtFS, host directory sharing via Plan 9 File System(9pfs) support, is vulnerable to an integer overflow issue. It could occur by accessing xattributes values. A privileged user inside guest could use this flaw to crash the Qemu process instance resulting in DoS. * Quick Emulator(Qemu) built with the VirtFS, host directory sharing via Plan 9 File System(9pfs) support, is vulnerable to memory leakage issue. It could occur while creating extended attribute via 'Txattrcreate' message. A privileged user inside guest could use this flaw to leak host memory, thus affecting other services on the host and/or potentially crash the Qemu process on the host. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tar: file overwrite
| Package(s): | tar | CVE #(s): | CVE-2016-6321 | ||||||||||||||||||||||||||||||||
| Created: | November 1, 2016 | Updated: | December 5, 2016 | ||||||||||||||||||||||||||||||||
| Description: | From the Debian LTS advisory:
A vulnerability has been discovered in the tar package that could allow an attacker to overwrite arbitrary files through crafted files. | ||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||
tiff: multiple vulnerabilities
| Package(s): | tiff | CVE #(s): | CVE-2016-3619 CVE-2016-3620 CVE-2016-3621 CVE-2016-3631 CVE-2016-3633 CVE-2016-3634 CVE-2016-5102 CVE-2016-5318 CVE-2016-5319 CVE-2016-5652 CVE-2016-8331 CVE-2016-3624 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | November 2, 2016 | Updated: | February 1, 2017 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entries:
The DumpModeEncode function in tif_dumpmode.c in the bmp2tiff tool in LibTIFF 4.0.6 and earlier, when the "-c none" option is used, allows remote attackers to cause a denial of service (buffer over-read) via a crafted BMP image. (CVE-2016-3619) The ZIPEncode function in tif_zip.c in the bmp2tiff tool in LibTIFF 4.0.6 and earlier, when the "-c zip" option is used, allows remote attackers to cause a denial of service (buffer over-read) via a crafted BMP image. (CVE-2016-3620) The LZWEncode function in tif_lzw.c in the bmp2tiff tool in LibTIFF 4.0.6 and earlier, when the "-c lzw" option is used, allows remote attackers to cause a denial of service (buffer over-read) via a crafted BMP image. (CVE-2016-3621) The (1) cpStrips and (2) cpTiles functions in the thumbnail tool in LibTIFF 4.0.6 and earlier allow remote attackers to cause a denial of service (out-of-bounds read) via vectors related to the bytecounts[] array variable. (CVE-2016-3631) The setrow function in the thumbnail tool in LibTIFF 4.0.6 and earlier allows remote attackers to cause a denial of service (out-of-bounds read) via vectors related to the src variable. (CVE-2016-3633) The tagCompare function in tif_dirinfo.c in the thumbnail tool in LibTIFF 4.0.6 and earlier allows remote attackers to cause a denial of service (out-of-bounds read) via vectors related to field_tag matching. (CVE-2016-3634) An exploitable remote code execution vulnerability exists in the handling of TIFF images in LibTIFF version 4.0.6. A crafted TIFF document can lead to a type confusion vulnerability resulting in remote code execution. This vulnerability can be triggered via a TIFF file delivered to the application using LibTIFF's tag extension functionality. (CVE-2016-8331) The cvtClump function in the rgb2ycbcr tool in LibTIFF 4.0.6 and earlier allows remote attackers to cause a denial of service (out-of-bounds write) by setting the "-v" option to -1. (CVE-2016-3624) CVE-2016-5102, CVE-2016-5318, CVE-2016-5319, and CVE-2016-5652 are unspecified. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
tre: code execution
| Package(s): | tre musl | CVE #(s): | CVE-2016-8859 | ||||||||||||||||||||||||
| Created: | October 28, 2016 | Updated: | January 2, 2017 | ||||||||||||||||||||||||
| Description: | From the Debian-LTS advisory:
A vulnerability has been found in the tre package that could allow an attacker to perform controlled heap corruption. | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 4.9-rc3, released on October 29. "It turns out that the bug that we thought was due to the new virtually mapped stacks during the rc2 release wasn't due to that at all, but a block request queuing race condition. So people who turned off the new feature weren't actually avoiding it at all." The new feature appears to be solid, but more testing is always welcome.
The October 30 known regressions list has 14 entries.
Stable updates have had a busy week. 4.8.5 and 4.4.28 were released on October 28, 4.8.6 and 4.4.29 were released on October 31, and 4.4.30 came out later the same day.
Gregg: DTrace for Linux 2016
Brendan Gregg celebrates the capabilities of Linux kernel tracing with BPF. "With the final major capability for BPF tracing (timed sampling) merging in Linux 4.9-rc1, the Linux kernel now has raw capabilities similar to those provided by DTrace, the advanced tracer from Solaris. As a long time DTrace user and expert, this is an exciting milestone! On Linux, you can now analyze the performance of applications and the kernel using production-safe low-overhead custom tracing, with latency histograms, frequency counts, and more."
What comes after ‘iptables’? Its successor, of course: `nftables` (RH blog)
The Red Hat Developers Blog is running an introduction to the nftables packet filtering system. "nftables implements a set of instructions, called expressions, which can exchange data by storing or loading it in a number of registers. In other words, the nftables core can be seen as a virtual machine. Applications like the nftables front end-tool nft can use the expressions offered by the kernel to mimic the old iptables matches while gaining more flexibility."
Formatted kernel documentation at kernel.org
For the last couple of release cycles, the kernel's ongoing transition to the Sphinx documentation system has left kernel.org behind. Thanks to some work by Konstantin Ryabitsev, that situation has now been remedied, and kernel.org has the formatted documentation generated from the current -rc kernel. The DocBook-generated documents remain available for as long as DocBook stays in use. (For those interested in the linux-next version of the documentation, the version on LWN's server is usually up to date; it currently has the changes that are queued for 4.10.)Preemption latency of real-time Linux systems (OSADL)
The Open Source Automation Development Lab site has an article describing the use of the cyclictest utility to track down latency problems in a realtime kernel. "When cyclictest is invoked it creates a defined number of real-time threads of given priority and affinity. Every thread starts a loop during which a timed alarm is installed and waited for. Whenever the alarm wakes up the thread, the difference between the expected and the effective time is calculated and entered into a histogram with a granularity of one microsecond."
Quote of the week
That's the number of mails related to this project sent to LKML, according to my archive. About 1/3 of those mails are the postings of the patchsets alone.I cannot tell how many offlist mails have been sent around in total on this matter, but at least in my personal mail are close to hundred.
Beers: Uncountable
This applies to both the number of beers consumed and the number of beers owed.
I'm pretty happy with the final outcome of these patches and I want to say thanks to everyone!
Kernel development news
The 2016 Kernel Summit
The 2016 Linux Kernel Summit was held October 31 to November 2 in Santa Fe, New Mexico, USA, alongside the Linux Plumbers Conference. As usual for recent years, the summit was broken into an invitation-only core day and an open "technical topics" day; the latter was planned alongside the Plumbers tracks.LWN was present for the core-day discussions; the topics discussed there were:
- Stable kernel workflow issues. What
are the problems with the community's stable kernel releases, and how
can things be made better?
- Group maintainership models: different
ways to share the work of subsystem maintenance across a group of
people.
- Development process issues: what is
Linus unhappy about? With a significant emphasis on bug tracking.
- The future of the Kernel Summit. The
development community has changed since the first Summit in 2001; now
the event itself will be changing too.
- Kernel hardening: we have actually
made some progress on increasing the kernel's ability to protect
itself in the last year, but there is a lot to be done still.
- The kernel thread freezer is said to
be out of control. What is the problem and how can it be fixed?
- Documentation; there is a big
transition underway with the kernel's documentation, and some
questions in need of answers.
- Tracepoint challenges: the number of tracepoints is growing rapidly; are we painting ourselves into an ABI corner? With a guest appearance by Batman.
Sessions from the technical day
There were relatively few sessions in this track; much of the interesting discussion moved to a wider forum in the Linux Plumbers Conference.
- Virtual-memory topics: a short but
intense discussion on how the virtual-memory subsystem must evolve to
function properly on current and upcoming systems.
- The perils of printk(); the kernel's message-printing function has a surprising number of problems to address.
Notes posted elsewhere
- Complex dependencies (Luis Rodriguez)
- Task isolation (Chris Metcalf)
- Audio (Liam Girdwood)
Group photo
See also: Len Brown's photos from the Kernel Summit.
[Thanks to LWN subscribers for supporting our travel to the event.]
A discussion on stable kernel workflow issues
The opening session at the 2016 Kernel Summit, led by Jiri Kosina, had to do with the process of creating stable kernel updates. There is, he said, a bit of a disconnect between what the various parties involved want, and that has led to trouble for the consumers of the stable kernel releases.Jiri's point of view was centered on his role as a distribution kernel maintainer. Consumers like him want a number of things from the stable kernel releases, including fixes for user-visible functional problems, fixes for bugs that crash the system, and fixes for severe performance regressions. What they do not want are new features or minor performance improvements; the latter have often been shown to regress performance for other workloads.
Perhaps the biggest thing in the "don't want" column, though, is something that has caused quite a bit of trouble in the past: fixes for bugs that are not actually present. There have been a number of cases of bogus "fixes" that have broken things, causing big headaches for distributors, who must spend a lot of time figuring out what has gone wrong. Just because a patch applies cleanly to an older kernel does not mean that it actually belongs there, but that distinction often seems to get lost.
Part of that, perhaps, is a result of what the producers of stable kernels want: a process that scales. Stable releases are done by a small group of developers; they don't have a lot of time to spend on each proposed fix. They want to include all of the fixes that make sense, but depend on others to tell them when fixes actually do make sense.
From his point of view as a distribution maintainer, Jiri said that one of
the biggest issues with the stable process is that it is not clear why
specific patches got into stable. The "CC: stable" tag applied to patches
is an opt-in mechanism, and the decision is often made by people who are
neither the author of the patch nor the maintainer of the relevant
subsystem. The review process is far too lenient; it is a passive approval
approach that lets stuff get in unless somebody goes out of their way to
block it. Maintainers often respond to proposed stable inclusions with the
equivalent of "oh yeah, whatever" without really thinking about the
problem. The review barrier is too low; somebody needs to be thinking
about the semantics of every stable patch.
James Bottomley noted that the typical maintainer response is to think that they already reviewed a specific patch when they first accepted it; they aren't sure what else they should do when a patch comes around in a stable release. Mark Brown added that maintainers often can't remember what a particular ancient kernel looked like, so it is hard for them to say whether a specific patch makes sense there or not.
Christoph Hellwig pointed out that not all stable kernels are equal. The current review process is reasonable for recent kernels, he said, but it does not work as well for the long-term stable releases. Stable kernel maintainer Greg Kroah-Hartman asked what "good" meant in general. There are always going to be bugs, what is the acceptable rate when five or six patches are being applied to a two-year-old kernel every day?
Jiri said that there is always room for improvement. One possible way to make things better might be to require a Fixes: tag for every patch going into a stable kernel release. That tag identifies the bug that a patch is meant to fix; without it, the stable maintainers don't know if a patch fixes a bug that is actually present in an older kernel. An alternative might be a new tag specifically related to inclusion into a stable kernel; it would mean "I have thought about this responsibly." That tag should specify the version(s) of the kernel it should be applied to. There could also be a tag for patches that should not be considered for stable releases.
Ben Herrenschmidt asked about whether the stable kernels should include patches to make new hardware work. Jiri responded that those patches aren't really wanted, but he didn't care too much if they don't break things. Greg added that addition of new device IDs and quirks is pretty common in the stable kernels. Laura Abbot noted that there is often a fine line between a "feature" and a "bug fix." The direct rendering subsystem, for example, often comes up with large changes to make specific hardware features perform well; without them, things run slowly. DRM subsystem bug-fix patches can also be quite large.
Rafael Wysocki said that, often, a maintainer knows that things broke, but is not sure of which change caused the problem. Jiri responded that, in that case, the fix should not go into the stable releases. Ben wondered how a problem could be fixed if the developer didn't know what caused the problem. Linus chimed in to note that requiring a Fixes: tag could lead to the addition of bogus tags when the maintainer doesn't know. Maintainers should not do that, he said, but Mark worried that it could happen as a "hoop-jumping" exercise.
Jiri said that, one way or another, somebody has to go through the exercise of thinking about which stable kernels a patch should be applied to. The result can take the form of either a Fixes: tag or a list of relevant versions, as long as that work has been done.
Dan Williams suggested that developers should add more test cases to demonstrate bugs and prove their fixes. These tests could be run against older kernels to show whether they need the fix — and whether the fix works as expected. Steve Rostedt said he will often add tests in response to bugs in his code; he suggested the addition of a new tag listing a test to run for a patch. But Linus was against the addition of new tags, especially for passing information to other humans. That sort of thing should just be explained in the commit message, he said. If we have too many tags, their usage will be inconsistent at best.
Linus went on to point out a specific case of things going wrong. The stable trees recently shipped a three-line patch that made a trivial root exploit possible on older CPUs. He had suggested removing the patch from the stable series, but the stable maintainers all chose to apply a 100-line fix instead. That was, he said, the wrong decision. If a stable tree takes a patch that breaks things, the response should not be to add more patches. Greg said that the stable maintainers would, as a general rule, rather have the same changes that the mainline has so that the bugs are all the same.
Ted Ts'o said that there are a lot of device vendors out there who are not actually using the stable releases; instead, they are cherry-picking patches that look relevant. If, in this case, they got the problematic three-line patch but missed the subsequent fix, the results would be bad. In general, he said, the group had not yet talked much about how the stable kernels are actually being used. Rafael said that broken patches should simply be reverted from the stable kernels rather than fixed further; that way, it will be clear to others where the broken patch was. But Ben protested that patches often come in a series; reverting just one can break things further.
Andy Lutomirski said that the fixup patch in this case was partly his work, even though it didn't have his name on it when it was applied. That fix should never have been applied to the stable kernels, he said; stable maintainers, in general, should never apply complex changes without asking first.
James said that the failure this time around was in the review process, but Linus replied that the patch was, in fact, "fine and correct." The problem is that it depended on another change that went into the 4.6 kernel. He doesn't blame that specific patch for the trouble; the bug was not obvious, and the backport was clean. It would have taken a "superhuman" to realize that the patch would be problematic in 4.1 without the exception-handling change that was applied in 4.6. The failure was when the patch was revealed to have broken things; at that point, it should have been reverted immediately. Mistakes will happen and the stable kernels will not be perfect; but, he said, those kernels are too eager to accept patches, and to accept more rather than reverting a patch that went wrong.
As the discussion wound down, perhaps the one solid conclusion was that problematic patches should indeed be reverted rather than fixed; that policy was immediately applied in subsequent stable kernel releases (example). There will likely be more pressure for each patch destined for the stable releases to carry a Fixes: tag or some version information making it clear that the relevant bug is present. And, hopefully, stable releases will be a little more stable in the future.
Group maintainership models
Traditionally, kernel subsystem maintainership is a solitary job, but there has been a steady increase in the number of subsystems that are using some sort of group model instead. At the 2016 Kernel Summit, Darren Hart and Daniel Vetter talked about how these models work in practice and what their experiences might have to offer other subsystem maintainers.
Darren started by noting that there are a number of motivations behind
group maintainership, starting with the fact that the work, for a busy
subsystem, can often be more than one person can handle. Some sort of load
balancing can help to keep maintainers from burning out. Group models are
also more robust in the face of vacations, illness, or simply a day job
that gets busy. Dan Williams added that group maintainership can also be a
good way to develop new maintainers for the future.
There are, Darren said, two models of group maintainership seen in the kernel community. One of them is the "hands off" model, as exemplified by the arm-soc tree maintained by Arnd Bergmann and Olof Johansson. They manage a single repository, using an IRC channel to take a "lock" when they are ready to apply some changes. They maintain a log file, Olof said, so that they can always see what the other has done.
The other model is "delegation," usually seen in subsystems that use the patchwork patch-management subsystem. Patchwork can delegate the handling of each patch to a specific maintainer; Darren would like to start making more use of it. Mauro Carvalho Chehab said that this is the approach used in the media subsystem; there are two maintainers, and patches are automatically delegated by patchwork. Rafael Wysocki added that the power-management subsystem also uses it; in this case, the power-management mailing list is shared between multiple subsystems, so the automatic delegation in patchwork helps to sort out changes as they arrive.
Daniel Vetter talked a bit about the multiple-committer model used in the i915 graphics driver subsystem; it was a shortened version of the talk that was covered in this article in October. He had been working in a two-person team (with Jani Nikula) for three years, but wanted more help. He had plenty of reviewers, but couldn't find anybody else willing to be named as a co-maintainer. Patch submitters wanted to deal with the maintainer rather than with other reviewers, so he and Jani were becoming a bottleneck in the process.
In response, they decided to try out a group model where many committers
have the ability
to commit changes to the repository. It is generally working well, though
there has been "some fallout." The way that the tree is managed, with
fixes being cherry-picked into another tree, creates trouble with
linux-next; they have some ideas for how to improve that interaction.
Developers are also occasionally confused when a seemingly
random person accepts their patches.
James Bottomley asked what the essential difference is between a committer and a maintainer in this model; Daniel answered that committers work internally, while the maintainer deals with the rest of the world. Committers in general don't want a lot of external visibility — they don't want to be listed in the MAINTAINERS file — so the solution is to call them something other than "maintainers." Ben Herrenschmidt observed that the maintainer's real job, in this model, is to accept the blame when things go wrong.
Olof asked if Daniel had observed problems with developers shopping patches around trying to find an accommodating committer. Daniel responded that, in general, he trusts his committers to say "no." There had been a couple of cases involving managers who have tried to get patches merged that way; it seems to happen once with every new manager. His response is to set up a meeting with that manager and explain how things need to be done. When asked if arm-soc had that problem, Olof responded that their model, where they deal with submaintainers rather than taking patches from developers directly, tends to keep that from happening.
The final part of the discussion centered on the workflow issues in the i915 subsystem that can cause Git to send patches multiple times — the core of the difficulty with linux-next. Daniel said that the tooling is not up to the job, but Linus responded that the workflow the group was using sounded "really nasty." What i915 is using, he said, is the submaintainer model; he should be taking pull requests from those maintainers rather than sharing a repository with them. Daniel said he is not against the submaintainer model, but it would create some coordination issues in this case; the nature of that driver (and DRM drivers in general) has a lot of developers working on the same files simultaneously.
Linus insisted, though, that, with the right habits, the submaintainer model works. Maintainers should make use of topic branches and avoid back merges with upstream trees. Daniel agreed that the i915 model would not work well for proper subsystems, but for a "leaf" like the i915 driver, it works well. The session wound down at that point.
Development process issues
The Kernel Summit traditionally includes a session where Linus Torvalds and the assembled developers discuss how the development process is working and whether there are any issues in need of resolution; the 2016 event was no exception. The picture that emerged is one of a process that is working reasonably well and developers who are mostly content. There are always things that can be better, though, especially when it comes to bug tracking.Linus started off by saying that there are no serious problems that he can see. There are specific subsystems that are occasionally problematic; he shouts at them, and they generally do better. In recent times, he hasn't even had to shout all that often.
One thing that does bother him is developers who send him fixes in the -rc2
or -rc3 time frame for things that never worked in the first place. If
something never worked, then the fact that it doesn't work now is not a
regression, so the fixes should just wait for the next merge window. Those
fixes are, after all, essentially development work. When they arrive he
usually accepts them, but it's annoying and adds noise to the process.
They add to his "are we getting ready for a release?" stress. In
general, he said, if a fix applies to a feature that is not currently being
used, it should wait for the next development cycle.
Overall, though, he said, things are working smoothly. The largest kernel release ever (measured by the number of commits) is currently in progress. This cycle has been a little painful, but size has nothing to do with it; instead, he ran into a ten-year-old bug that took a lot of work to track down. It is one of those things that happens occasionally, rather than any sort of process issue.
He suggested that one reason for the size of the 4.9 development cycle is the pre-announcement by Greg Kroah-Hartman that it would be a long-term-support release.
When asked if he should be more vocal about the above-mentioned mid-cycle fixes, Linus added that they are not really a huge issue for him. Additionally, some fixes are fine, especially for really new code. For certain areas, such as cellphones that have a short shelf life, it makes sense to push (and fix) drivers aggressively. Laptop support was also mentioned; he would like non-technical users to be able to install Linux on their new laptop that they just bought, so those kind of fixes are welcome almost anytime. But that is not what he's seeing; instead, he sees fixes for enterprise features that, due to the conservative nature of that sector, are not likely to be used for some time yet.
Bugged by Bugzilla
The bulk of the discussion, though, related to the kernel Bugzilla instance hosted on kernel.org. Laura Abbott said that bug reporting is a problem, in that users who report bugs never quite know what kind of response they are going to get. Some subsystem maintainers watch the Bugzilla and respond to issues there; others want nothing to do with it. As a result, users often have a poor experience, and are often subjected to shot-in-the-dark attempts to track down bugs from people who are not closely related to the subsystem in question.
James Bottomley said that the root of the problem is that the Bugzilla has no integration with email, which is the primary means by which kernel developers communicate. Perhaps it's time to look into using a different tool? Al Viro added that he couldn't imagine being paid enough to deal with Bugzilla. Linus said that there are groups that are more accepting and make use of it; they tend to be happy with it. But the rest of the community tends to hate it.
Darren Hart said that the Bugzilla is a good bridge between developers and users. That said, there is not much that he (as the x86 platform maintainer) can do with most of the bugs filed there. He only has five laptops, so he will be unable to reproduce most of the problems that have been reported. Some of the bug reporters can be convinced to move to email, but not all of them will do that. If the Bugzilla goes away, he said, we will lose some useful information.
There was some talk of modifying the Bugzilla to refer users to the appropriate mailing list for their problem. But kernel.org administrator Konstantin Ryabitsev said that he wants to avoid local modifications to Bugzilla if at all possible. Tweaks make the upgrade path much more complicated; he would rather remain "as vanilla as possible." He suggested, instead, that somebody should be hired to do bug triage.
Those upgrades, James said, are contributing to the current problems; the Bugzilla used to have better email integration, but that was lost in an upgrade. Konstantin said there was little alternative; kernel.org lacks the staff to backport security patches into a custom Bugzilla deployment.
Len Brown said that he likes the Bugzilla system well enough. It is not the best tool, but it can be made to work. Staying on top of things helps a lot; the power-management developers have managed to close over 3,000 bugs in recent years. The most important bugs, he said, are the most recent ones. If a developer responds immediately to a bug, there is a good chance of getting useful information back. A month later, it probably isn't going to work. Old bugs should be scrubbed aggressively; if a reporter doesn't reply to requests for information, the bug should just be closed. That keeps the list short and manageable.
The Bugzilla is a good place to collect ancillary information (such as screen shots) from bug reporters. Some developers said that email works well for that, but Linus said he would much rather deal with Bugzilla than try to fish through email archives for information. Len also said that it's generally not a good idea to put the best developers on Bugzilla duty. Instead, put a new developer there; it's a good opportunity for them to learn, and for everybody else to see how they work.
Linus said that he gets emails from the Bugzilla, and that he thinks it works OK much of the time. Many bug reports start in distribution bug trackers, though. So one ends up hopping through various links in different trackers; it is a painful process and he hates it. In the end, he would rather not see the Bugzilla be the primary bug tracking system for the kernel.
James asked if the kernel needed a dedicated person for bug management. Len replied that it would take somebody who is really good; such a person could also be a subsystem maintainer. What's really needed, he said, is a community to take on this task. Ted Ts'o said that a "bug ombudsperson team" would be a good thing to have, but it would need to be prepared to grow; as they get better at dealing with bugs, usage of the bug tracker will go up. He expressed doubts that this kind of work could be funded, it is hard to put together a business case for it. Konstantin suggested that perhaps Core Infrastructure Initiative (CII) funds could be found for kernel-bug management.
Len said that it would be nice to augment the Bugzilla to obtain other relevant information, such as the kernel version the user is running. Ben Hutchings noted that Debian's reportbug tool can run package-specific scripts to obtain the needed information. Laura added that Fedora's automatic bug reporting has useful information, but is very noisy. The best reports, she said, come directly from users who took the time to prepare them.
There was talk of eliminating the kernel's Bugzilla entirely. One approach would be to direct users to their distribution's trackers; that would not be helpful for users running mainline kernels, though. An alternative would be to replace it with a set of subsystem-specific trackers for the subsystems that are interested. The problem there, though, is that the relevant subsystem often changes as a bug is understood; moving an entry between separate trackers would be painful.
Konstantin said that kernel.org will soon be upgrading to Bugzilla 5; it is fairly different, he said, with a nicer user interface. He suggested doing that upgrade first, then perhaps seeking an intern from CII. Then, at least, we would get to a point where users who file bugs will see a response. And that was more or less how the session closed; the next step will be to see how the upgrade of the kernel's Bugzilla goes.
The future of the Kernel Summit
The first Kernel Summit was held in 2001. A lot has changed since then. In particular, said Summit organizer Ted Ts'o, the kernel development community has grown considerably over that time. Those changes are now leading to changes in how the Summit itself is run.The growth in the kernel and its community, Ted said, means that it has become nearly impossible to discuss technical issues at the Summit. There is just no way to be sure that all of the right people are in the room. Meanwhile, Linus has been increasingly interested in the process-oriented discussions that have tended to dominate recent gatherings. But the Summit, as it is currently organized, isn't necessarily the best group for those discussions either.
So the 2017 Kernel Summit will be different. It will be held in Prague, co-located with the Open Source Summit Europe (formerly LinuxCon Europe). It will be a short half-day event, with far fewer people present. In particular, the attendees are likely to be approximately thirty top-level subsystem maintainers, chosen directly by Linus. They will generally be the maintainers he pulls directly from. And, naturally, this event will focus mostly on process-oriented issues.
Ted said that he still thinks it is important to have a broader gathering
of kernel developers, though. Often the hallway discussions that result
from simply having developers in the same place are the most important part
of the event. He also said that, over the years, managers have been
trained to think that it is important to send developers to the Kernel
Summit, and that training should not go to waste. So there will be an open
technical track in Prague as well; it will consist of some presentations,
but also more discussion-oriented topics.
The end of the Kernel Summit as it has been run for so many years did not appear to bother many people, but there was some grumbling about one aspect of this plan: the co-location with the Open Source Summit. For many developers, the technical content of that event has fallen to the point where they are not really interested in attending. They would rather see the new kernel event attached to a more technical gathering, such as the Linux Plumbers Conference (as was done this year).
Ted responded that, for 2017, things are already locked in place. The lead time for event planning has gotten longer, so it's too late to change things for next year. The Linux Plumbers Conference will be in Los Angeles, co-located with the North American Open Source Summit, and that cannot be changed at this time. For the longer term, there has been a fair amount of discussion about joining the Kernel Summit with either Plumbers or the Linux Storage, Filesystem, and Memory-Management Summit. But that is for later; 2017 will be a transitional year. As 2016 was, in the end; the morning of the 2016 core day was organized much like future events might be.
Rik van Riel noted that co-location with other conferences helps to bring people into the kernel community. James Morris said that the Linux Security Summit has grown over the years, to the point that it now has over 120 attendees. This event wants to co-locate with Plumbers next year rather than with the Kernel Summit, in the theory that it will get a mix of attendees better suited to the security problem.
James Bottomley described the basic conflicts that arise when one tries to hold conferences together. If too many events are held in parallel, attendees have to choose between the sessions they are most interested in. If they are held serially, though, the resulting event becomes too long; few people are willing to dedicate more than a week to a set of conferences. Mark Brown, somewhat cynically, noted that there is an advantage to co-location with the Open Source Summit: attendees don't care if they miss it. The Open Source Summit is easy to attend even at the last minute; co-location with events like Plumbers, which routinely sells out quickly, makes it hard for last-minute attendees to come.
The session was more of an information-sharing exercise than one where decisions would be made. Kernel-oriented events are in a period of change; how that will play out will have to be seen over the next year or two.
The status of kernel hardening
At the 2015 Kernel Summit, Kees Cook said, he talked mostly about the things that the community could be doing to improve the security of the kernel. In 2016, instead, he was there to talk about what had actually been done. Kernel hardening, he reminded the group, is not about access control or fixing bugs. Instead, it is about the kernel protecting itself, eliminating classes of exploits, and reducing its attack surface. There is still a lot to be done in this area, but the picture is better than it was one year ago.One area of progress is in the integration of GCC plugins into the build system. The plugins in the kernel now are mostly examples, but there will be more interesting ones coming in the future. Plugins are currently supported for the x86, arm, and arm64 architectures; he would like to see that list grow, but he needs help from the architecture maintainers to validate the changes. Plugins are also not yet used for routine kernel compile testing, since it is hard to get the relevant sites to install the needed dependencies.
Linus asked how much plugins would slow the kernel build process; linux-next maintainer Stephen Rothwell also expressed interest in that question, noting that "some of us do compiles all day." Kees responded that there hadn't been a lot of benchmarking done, but that the cost was "not negligible." It is, though, an important part of protecting the kernel.
Probabilistic protections
The kernel has adopted a number of probabilistic protections over the last year. These protections only work if the attacker doesn't know something about the system. They include kernel address-space layout randomization (KASLR) and stack protection. Probabilistic protections can be defeated if the information leaks out, but they are still effective and worth doing.
One improvement is in the randomization of the kernel text base; it was added to arm64 in the 4.6 release and MIPS in 4.7. But the text base is only the beginning, more memory areas need to be randomized. One possibility is to randomize the kernel's link order at boot time. That would be a lot of work, but it would mean that an attacker would need more than a single information leak to defeat the whole thing.
Linus said that randomization can be a pain for debugging; it is not fun to
track down a problem that only happens in one boot out of every 300 or so. Al
Viro worried that changing the link order would also change the order in
which the kernel's initialization calls are made, with unpredictable
effects. Kees responded that this particular change isn't coming anytime
soon. Andi Kleen suggested just doing the link randomization and dropping
KASLR altogether; the kernel's addresses tend to leak via all kinds of paths
anyway. Linus responded that, while the address leaks are being plugged
over time, KASLR does indeed work poorly against local
attackers, but it is more useful against remote attackers.
Kees went on to say that the kernel got KASLR for its memory areas in 4.8 for the x86_64 architecture.
Work is being done on free-list randomization, which makes the layout of the heap less predictable. Perhaps more controversial is struct layout randomization. That cannot be done in a general way without causing all kinds of problems, but there is one place where it is especially useful: structs consisting of only function pointers. Such structs are one of the most prized targets for attackers, and the kernel has a lot of them. A GCC plugin can be used to detect these structures and randomize their order. In general, the kernel shouldn't care about that ordering, and changing it should not have performance effects.
Linus was not entirely convinced; he said that most people are running distributor kernels, so the specific ordering used will always be available to an attacker. The value, Kees responded, is forcing attackers to identify specific kernel builds; that is "excruciating" for them. It greatly expands the number of settings their exploit has to work in.
Deterministic protections
While probabilistic protections only work if some key data remains secret, Kees said, deterministic protections work all the time. These include things like read-only memory; if memory is read-only, it is always protected from being changed. Bounds checking to head off overflows is another form of deterministic protection.
One useful protection is the CONFIG_DEBUG_RODATA configuration option which, Kees said, is badly named. It ensures that executable memory is not writable anywhere in the kernel; it should be mandatory on all systems that support it. It is turned on by default on the x86 architecture as of 4.6, and will be for arm64 as of 4.9.
Another important protection is protection of user space against access by
the processor when it is running in a privileged mode. By far the most
common way
to exploit the kernel, he said, is to get the kernel to execute code
that has been placed somewhere in user-space memory. If the kernel cannot
access that memory, such exploits will not work. Processor vendors have
worked to provide such protections using technologies like SMAP and SMEP
(on x86) and PAN (on ARM), but there is a problem: such protections are not
widely available yet. There are no Xeon processors with
SMEP SMAP on the
market; PAN was added to the ARMv8.1 specification, but no hardware is
shipping yet.
So, he said, the kernel needs emulation of those features instead; it is, he said, a fundamental need. Linus replied, though, that he hates the emulation patches with a passion. And, he said, it is not necessary, in that the kernel's support for SMEP protects systems that lack SMEP too. That is because it forces all kernel paths that access user-space memory to be verified, preventing accidental accesses. So, he said, the emulation does not buy much. Kees disagreed, saying that the emulation can protect systems that will not have hardware protection for a few years yet.
Work is being done on hardened usercopy, which performs sanity checking on operations that copy data to and from user space. The current patch set contains about 1/3 of the PaX USERCOPY protections, which is a start. Next steps include segregating the slab caches; objects that are exposed to user space should be stored apart from those that are purely internal to the kernel. The problem here is to find a clean way to deal with exceptions. An inode object, for example, should not be copyable to or from user space, but there can be reasons to copy the file name stored within that structure. The PaX code does such copies by way of the stack, which is generally seen as being the wrong approach; Kees said that a more maintainable API for exceptions is needed. Linus added that this kind of problem is exactly why he has never seriously considered merging the grsecurity patch set; it's full of "this kind of craziness."
Memory wiping is useful, in that it can block information leaks and some types of use-after-free exploits. The slab allocator can do poisoning of memory, but not zeroing, which would be nice to add. After Linus asked, Kees said that the advantage of zeroing is that the kernel often needs to allocate zeroed pages; if freed memory has already been zeroed, those allocations can be optimized. A problem with zeroing is that some objects are allocated and freed so often that the performance hit becomes prohibitive, so there needs to be a way to make exceptions. There is a GCC plugin out there to do stack clearing, which is worth looking at.
"Constification" — making unchanging data constant — can protect against some types of exploits. The lowest-hanging fruit here is structs full of function pointers; the "constify" GCC plugin tries to make those const by default. As of 4.6, the kernel can make data read-only after initialization, but that feature is not yet widely used in the kernel. There would be value in identifying "write-rarely" data that would be read-only most of the time, and only made writable during explicit updates.
Kees's final topic was reference-count hardening. If an attacker is able to force a reference count to overflow, a use-after-free exploit is usually not far away. Most of these attacks can be blocked if atomic variables can be kept from overflowing. The hardening patches out there will kill the responsible process when an overflow is detected, and the counter involved is permanently blocked at a high value. In this way, an exploit is downgraded to a denial-of-service situation.
Kees's slides are available for the curious.
The problematic kthread freezer
The kernel thread ("kthread") freezer, as its name would suggest, is charged with freezing kernel threads during a system hibernation cycle. At the 2016 Kernel Summit, Jiri Kosina took the stage (for the second time) to say that the usage of the kthread freezer is "out of control" and "broken everywhere." It is time, he said, to bring things under control, then get rid of the freezer altogether.The first problem, he said, is that the freezer's semantics are not well defined; nobody really knows what it means for a kthread to be frozen. Most of the current uses of the freezer are superfluous. In many cases, the purpose is to have filesystems be in a consistent state during hibernation; that can be better achieved with the filesystem freeze mechanism. It doesn't make sense to freeze I/O operations in general, since they are needed to write out the hibernation image. There is a lot of freezing in drivers too, a situation which, he said, makes no sense. There is a well-defined set of power-management callbacks in place to put drivers into a suspended state during hibernation.
The kernel, he said, is the victim of a massive copy-and-paste cargo cult. Uses of the kthread freezer are spreading like a disease, a situation that has to stop.
There are two especially pathological uses that he called out. One is try_to_freeze() calls for threads that have not been marked freezable in the first place; those calls will never have any effect. The other is try_to_freeze() calls after starting I/O, but without waiting for that I/O to complete.
The solution is to eliminate use of the kthread freezer wherever possible. It is not needed in threads that will not generate disk I/O. It is also not needed — indeed, its use is a bug — in I/O helper threads. The best solution would be to move the entire hibernation subsystem to use filesystem freezing instead, and simply get rid of the kthread freezer. It might be necessary to keep it around for NFS, he said, but there's not much else that should need it. But the first step is to stop its use from spreading.
Ben Herrenschmidt spent a while talking about the history of the freezer, which, he said, was invented as "a big, fat band-aid" without which the system could not suspend properly. Now, instead, we simply need to make our drivers cope properly with I/O during a suspend operation. As the session closed, Linus agreed that the best approach was to get rid of the kthread freezer altogether and to use filesystem freezing where it is really needed. So one should expect development to go in that direction.
Kernel documentation update
The kernel's documentation "subsystem" has undergone some changes over the past few releases as we reported on in late October. The author of that report and the kernel's documentation maintainer, Jonathan Corbet, gave a presentation at this year's Kernel Summit to describe some of those changes. He was joined by Mauro Carvalho Chehab, who has done much of the work (along with Daniel Vetter, Jani Nikula, and Markus Heiser) to make it all happen.
Corbet started by noting the 4.8-rc1 release
announcement, where Linus Torvalds highlighted that "over 20% of
the patch is documentation updates, due to conversion of the drm and
media documentation from docbook to the Sphinx doc format
". Those
changes were unusual in that documentation changes have never been anywhere
near that large in previous merge windows.
Corbet set out on the Sphinx transition with several goals. The first was to eliminate the hand-rolled DocBook-based toolchain that was being used to generate the documentation. At a Kernel Summit a few years ago, he asked kernel developers how many had gotten the toolchain set up and less than half indicated that they had. In the end, it is simply "the wrong way to go"; the kernel project should not be developing its own tools for creating documentation, it should use something that is developed and maintained by others.
Another goal was to have integrated documentation with nice output formatted in multiple ways. But, he wanted to be able to do that without a complicated markup language and a bunch of toolchain dependencies. Beyond that, he wanted to clean up the Documentation directory so that it "doesn't look like my daughter's bedroom", he said, complete with a photo of said messy bedroom. All of that will make it easier for developers to write documentation, which should, in theory, lead to better documentation.
So, starting in the 4.7 cycle, the documentation began being switched to use the Sphinx documentation generator, which uses the reStructuredText markup language. There are, he said, LWN articles about the history of the change and how it all works. In addition, the kernel documentation now has a Linux Kernel Documentation book that describes how to build (and write) documentation for the kernel.
Open questions
There are, of course, some open questions. The organization of the documentation tree leaves a lot to be desired. It used to have around 300 files in the top-level directory, but he has slowly been moving things around. One move that he has been nervous about is the SubmittingPatches file, which is being incorporated into the development-process book. Chehab has submitted a patch to move the file and leave a three-line file pointing to the new location in its place, but Corbet is worried about dangling references to the file. He asked if there were objections to making that move.
At that point, Torvalds said with a grin: "No one in this room has ever read anything in the Documentation directory." He said it was really up to the users of the kernel and its documentation to decide if the move made sense. There were murmurs of disagreement in the room and Darren Hart said: "I do use it, read it, and cite it by section." He said that he liked what had been done so far, especially that he could now cite sections by URL.
That last piece is thanks to kernel.org maintainer Konstantin Ryabitsev, who built the documentation from the tree and put it up at kernel.org, Corbet said. Furthermore, a look at mailing list postings shows that the documentation is cited rather frequently. "So they may not read it, but they tell everyone else to read it", he said. Based on the reaction in the room, it appears that no one is "too upset" with moving SubmittingPatches, so he will leave that patch in for 4.10. The goal is to eventually have a top-level Documentation directory that looks like all of the others in the kernel tree.
Olof Johansson asked about having stub files pointing to the proper place, as was done for SubmittingPatches. Corbet replied that he has done that for some of the more important files, but not for every one. David Howells also cautioned against moving memory-barriers.txt. Corbet said that when he had broached the subject with Paul McKenney, he was told to "keep away", so he plans to work with McKenney on that down the road.
For something perhaps a bit more controversial, Corbet noted that there is only one directory in the top-level kernel directory that is capitalized: Documentation. Since part of the reorganization will be adding more subdirectories, thus lengthening the path names for files of interest—in addition to an already-long top-level name—it has led some to ask that he consider renaming the directory to doc .
That immediately led to discussion of tab-key-based auto-completion, as well as bikeshedding over a new name. But, as was also pointed out, those names are often used in places (e.g. email) where auto-completion does not work. H. Peter Anvin noted that files like README are capitalized in part to help newbies, who will often be attracted to those files because their names stand out.
After some more discussion, it was suggested that Corbet call for a vote, which he did. Roughly half of those assembled voted against, while about the same voted for the change. That made it obvious there was "no clear consensus" on the question, so things would stay the way they are. Shuah Khan was glad to hear that; she voted against changing the name because of the large number of blog posts and other types of information that she and others have written that would suddenly become outdated.
Adding complications
Moving to another topic, Corbet said he had set out to get to something simpler and the community had accomplished that, but now things have started to get more complicated again. A change that was made for 4.9 meant that LaTeX is required to build the HTML version of the documentation. He will be pushing to get that particular problem fixed.
There are number of files that some want to pre-process to get them into the Sphinx format. There was a request to add a Sphinx directive that would run an arbitrary shell command as part of the documentation build process, but he rejected that particular mechanism. It has also been suggested that the MAINTAINERS file be processed into the Sphinx format.
Since the media subsystem is where some of the push for pre-processing is coming from, he asked Chehab to explain what he would like to do there. Chehab has a patch that takes the ABI files from the media subsystem to convert them into the Sphinx format, which allows creating documentation that is sorted and arranged in various ways. That is useful for distributions and others, he said; "it adds value" to the documentation.
For the MAINTAINERS file, it would be nice if interested users could find out where they can get the latest development tree for a subsystem, which could be added into the information already there. It makes it easier for users if it is part of the documentation and it "comes almost for free", he said.
Corbet said that a decision will need to be made about how much more complicated the documentation toolchain should be allowed to get. Nikula has suggested that any changes made for the kernel should be upstreamed into Sphinx. While that is a nice idea, Corbet said, some of the changes are pretty kernel-specific so it may be hard to convince the Sphinx developers to accept them.
Another area of disagreement is about what to do with old and obsolete documentation, much of which has not even seen typo fixes in the Git era. For example, some instructions from Larry McVoy in 1996 on how to manually bisect a problem in the 1.3 kernel seem like they are past their prime. We don't keep old code around, Corbet said, so we should do the same with documentation.
Torvalds wryly noted that pull requests that remove lines from the kernel get high priority. But he had a different complaint as well: "Can we get rid of PDF in the kernel source?" It is, he said, "binary crap" and those files are simply PDF versions of the SVG files sitting right next to them.
Chehab said that the media subsystem needed some PDF files for the DocBook version of its documentation. Those may not be needed for Sphinx, he said. There are roughly ten PDF files that showed up recently, Torvalds said. Those files are not editable and have no reason to be in the kernel source.
Image files are similar, Torvalds said after a question from Corbet. A binary file that no one can edit should not be in the tree, Torvalds said. He suggested putting them on a web site, but that there is a reason the kernel tree is called a source tree. It was agreed that solutions could be found to have images with the documentation without requiring binary images in the kernel tree.
Rafael Wysocki asked that Corbet consult with him before moving any files in the power management part of the documentation tree. That is standard operating procedure, Corbet said. He will let the appropriate maintainer know what he is planning to do and won't do it over the objection of that maintainer.
Another request came from Hannes Reinecke, who would like to see the return values of kernel functions get added into the documentation. Right now, some free-form text could be added to the kerneldoc comments associated with the function, Corbet said, and something more structured could be worked out later. But, in order to get a full list of the return values, the entire set of kernel functions needs to be annotated, David Woodhouse said, so that return values from functions that are called can be incorporated into the list. But it was suggested that even just annotating the leaf functions (those that call no others) would be a good place to start. At that point, things kind of wound down; Corbet and Chehab left the stage in favor of Batman.
Tracepoint challenges
The final core-day session at the 2016 Kernel Summit, run by Steve "Batman" Rostedt and Shuah Khan, concerned the use of tracepoints in the kernel. It started with a discussion of tracepoint performance issues, but quickly came around to the perennial area of concern about tracepoints: whether they form part of the kernel's user-space ABI or not.Steve started by noting that he is seeing an "explosion" in the number of tracepoints being added. The problem is that, while the cost of tracepoints has been made as low as possible, they are still not free. Each tracepoint hurts performance slightly. So it may eventually become necessary to limit the addition of tracepoints into the kernel.
David Howells noted that a number of maintainers have been seen to push
back on the addition of printk() calls to the kernel, saying that
tracepoints should be used instead. Steve responded that they should push
back on tracepoints too. Each tracepoint should have its own rationale
justifying its existence. Chris Mason suggested that the best way to cut
down on tracepoints is to require developers to document them.
Mel Gorman reminded the group that tracepoints can be inserted dynamically into a running kernel. Mark Brown said that dynamic tracepoints require more tooling; that may be fine for a server system, but is harder on a phone. But Steve said that no special tools are required to insert tracepoints; it can all be done with echo commands.
Shuah brought things around to the ABI issue by saying that tracepoints can be highly effective for debugging problems on deployed systems. But, she asked, if we add tracepoints, do we have to maintain them forever? Ted Ts'o noted that the current work with eBPF makes tracepoints far easier to use, a change with both good and bad aspects. On the good side, the kernel now has dynamic tracing capabilities approaching those of DTrace. On the other hand, that means that people are starting to use these capabilities, and system administrators are starting to depend on them. So the ABI issue is no longer theoretical.
Peter Zijlstra said that there are tracepoints in the scheduler now that he would like to remove, but fears he can't without breaking things. Linus, though, said that problematic tracepoints should simply be taken out, especially if they are hindering development. This should happen even if the removal would break the LatencyTOP tool. Greg Kroah-Hartman protested that, in the past, Linus had blocked a tracepoint change that broke the PowerTOP utility. Linus's answer is that the community was still figuring out how to work with tracepoints then, and that there was no actual need to break PowerTOP at that time.
But, he said, tracepoints are still a view into the kernel's internals. They have to be able to change over time. If the removal of a particular tracepoint proves to be painful for user space, that removal will have to be reconsidered, but only then. That, he said, has always been the ABI rule: we can change things, but, if the result is broken user space, we'll change it back. Additionally, he said, LatencyTOP users tend to be people who compile their kernels anyway, while PowerTOP users are not. So LatencyTOP users can better adjust to a tracepoint change.
And, in the end, Linus said, if a tracepoint becomes so useful that it becomes part of the ABI, there is probably a good reason for it and it likely should be kept. But the way to find out is to change things and see who screams.
Ted suggested that now would be a good time to look at Brendan Gregg's perf-tools set to see which tracepoints it depends on. If those tracepoints need adjustment to be supportable in the long run, now is the time to make those changes before the usage of those tools increases further.
Some maintainers may feel better now about allowing tracepoints in the code they are responsible for, but others have not changed their view. Al Viro made it clear that his policy would not be changing, and that he would not be allowing any tracepoints in the virtual filesystem layer. He is worried about how some developers may use those tracepoints, and does not want to see a day in the future where systems are unable to boot with newer kernels as the result of tracepoint changes.
The session concluded with Linus saying that, in the history of kernel development, nobody has ever screamed about a change to a tracepoint. He allowed that this might happen as the use of tracepoints increases. But, he said, there is no point in making a big deal about that possibility before it proves to be a problem.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Device drivers
Device driver infrastructure
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Distributions
Brief items
Distribution quotes of the week
Minoca OS goes open source
Minoca OS has been released under the GNU GPLv3. "Minoca OS is a general purpose operating system written completely from the ground up. It’s intended for devices looking to conserve power, memory, and storage. It aims to be lean, maintainable, modular, and compatible with existing software."
Distribution News
Debian GNU/Linux
dgit 2.9 - for everyone
Ian Jackson has announced the release of dgit 2., with a new set of tutorials. "dgit allows you to treat the Debian archive as if it were a git repository, and get a git view of any package. If you have the appropriate access rights you can do builds and uploads from git, and other dgit users will see your git history."
Release Architectures for Debian 9 'Stretch'
The release architectures for Debian 9 will be amd64, arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, and s390x. "The only change from Jessie is the removal of powerpc as a release architecture. We discussed this at length, and eventually took the view that the least disservice to users of that port is to provide reasonable notice of its discontinuation. We recognise and acknowledge that discontinuing any port is unavoidably disruptive."
Gentoo Linux
Gentoo on Android stage3
Benda Xu has released a "Gentoo on Android" stage3 tarball. "The tarball runs Gentoo natively under /data/gentoo (:=EPREFIX) on a rooted android device newer than 2011 or armv7a, alongside the Android stack."
Red Hat Enterprise Linux
Red Hat end-of-life notices
Red Hat has announced the retirement of Red Hat Enterprise Developer Toolset 3.x and Red Hat Enterprise Linux 6.6 Extended Update Support. Red Hat will no longer provide updated packages, including critical impact security patches or urgent priority bug fixes, for these products.
Newsletters and articles of interest
Distribution newsletters
- DistroWatch Weekly, Issue 685 (October 31)
- Linux Mint Monthly News (October)
- Lunar Linux weekly news (October 28)
- openSUSE Tumbleweed – Review of the Week (October 28)
- Ubuntu Kernel Team weekly newsletter (October 25)
- Ubuntu Weekly Newsletter Issue 485 (October 30)
The (updated) history of Android (Ars Technica)
Ars Technica covers the history of Android from version 0.5 to 7.0 "Nougat". "One of the most interesting additions to Nougat is a revamp of the app framework to allow for resizable apps. This allowed Google to implement split screen on phones and tablets, picture-in-picture on Android TV, and a mysterious floating windowed mode. We've been able to access the floating window mode with some software trickery, but we've yet to see Google use it in an actual product. Is it being aimed at desktop computing?"
Arch Linux: In a world of polish, DIY never felt so good (The Register)
The Register takes a look at Arch Linux. "The lack of a default set of applications and desktop system also does not make for tidy reviews - or reviews at all really, since what I install will no doubt be different to what you choose. I happened to select a very minimal setup of bare Openbox, tint2 and dmenu. You might prefer the latest release of GNOME. We'd both be running Arch, but our experiences of it would be totally different. This is of course true of any distro, but most others have a default desktop at least."
Page editor: Rebecca Sobol
Development
Ten years of KVM
We recently celebrated 25 years of the Linux project. KVM, or Kernel-based Virtual Machine, a part of the Linux kernel, celebrated its 10th anniversary in October. KVM was first announced on 19 October 2006 by its creator, Avi Kivity, in this post to the Linux kernel mailing list.
That first version of the KVM patch set had support for the VMX instructions found in Intel CPUs that were just being introduced around the time of the announcement. Support for AMD's SVM instructions followed soon after. The KVM patch set was merged in the upstream kernel in December 2006, and was released as part of the 2.6.20 kernel in February 2007.
Background
Running multiple guest operating systems on the x86 architecture was quite difficult without the new virtualization extensions: there are instructions that can only be executed from the highest privilege level, ring 0, and such access could not be given to each operating system without it also affecting the operation of the other OSes on the system. Additionally, some instructions do not cause a trap when executed at a lower privilege level — despite them requiring a higher privilege level to function correctly — so running a "hypervisor" that ran in ring 0, while running other OSes in lower-privileged rings was also not a solution.
The VMX and SVM instructions introduced a new ring, ring -1, to the x86 architecture. This is the privilege level where the virtual machine monitor (VMM), or the hypervisor, runs. This VMM arbitrates access to the hardware for the various operating systems so that they can continue running normally in the regular x86 environment.
There are several reasons to run multiple operating systems on one hardware system: deployment and management of OSes becomes easier with tools that can provision virtual machines (VMs). It also leads to lower power and cooling costs by hosting multiple OSes and their corresponding applications and services to run on newer, more capable hardware. Moreover, running legacy operating systems and applications on newer hardware without any changes to adapt to the newer hardware now becomes possible by emulating older hardware via the hypervisor.
The functionality of KVM itself is divided in multiple parts: the generic host kernel KVM module, which exposes the architecture-independent functionality of KVM; the architecture-specific kernel module in the host system; the user-space part that emulates the virtual machine hardware that the guest operating system runs on; and optional guest additions that make the guest perform better on virtualized systems.
At the time KVM was introduced, Xen was the de facto open source hypervisor. Since Xen was introduced before the virtualization extensions were available on x86, it had to use a different design. First, it needed to run a modified guest kernel in order to boot virtual machines. Second, Xen took over the the role of the host kernel, relegating Linux to only manage I/O devices as part of Xen's special "Dom0" virtual machine. This meant that the system couldn't truly be called a Linux system — even the guest operating systems were modified Linux kernels with (at the time) non-upstream code.
Kivity started KVM development while working at Israeli startup Qumranet to fix issues with the Xen-related work the company was doing. The original Qumranet product idea was to replicate machine state across two different VMs to achieve fault tolerance. It was soon apparent to the engineers at Qumranet that Xen was too limiting and a poor model for their needs. The virtualization extensions were about to be introduced in AMD and Intel CPUs, so Kivity started a side-project, KVM, that was based on the new hardware virtualization specifications and would be used as the hypervisor for the fault-tolerance solution.
Development model
Since the beginning, Kivity wrote the code with upstreaming it in mind. One of the goals of the KVM model was as much reuse of existing functionality as possible: using Linux to do most of the work, with KVM just being a driver that handled the new virtualization instructions exposed by hardware. This enabled KVM to gain any new features that Linux developers added to the other parts of the system, such as improvements in the CPU scheduler, memory management, power management, and so on.
This model worked well for the rest of the Linux ecosystem as well. Features that started their life with only virtualization in mind began being useful and widely-adopted in general use cases as well, like transparent huge pages. There weren't two separate communities for the OS and for the VMM; everyone worked as part of one project.
Also, management of the VMs would be easier as each VM could be monitored as a regular process — tools like top and ps worked out of the box. These days, perf can be used to monitor guest activity from the host and identify bottlenecks, if any. Further chipset improvements will also enable guest process perf measurement from the host.
The other side of KVM was in user space, where the machine that is presented to the guest OS is built. kvm-userspace was a fork of the QEMU project. QEMU is a machine emulator — it can run unmodified OS images for a variety of architectures that it supports, and emulate those architecture's instructions for the host architecture it runs on. This is of course very slow, but the advantage of the QEMU project was that it had quite a few devices already emulated for the x86 architecture — such as the chipset, network cards, display adapters, and so on.
What kvm-userspace did was short-circuit the emulation code to only allow x86-on-x86 and use the KVM API for actually running the guest OS on the host CPU. When the guest OS performs a privileged operation, the CPU will exit to the VMM code. KVM takes over; if it can service the request itself, it would do so, and give control back to the guest. This was a "lightweight exit". For requests that the KVM code can't serve, like any device emulation, it would defer to QEMU. This implied exiting to user space from the host Linux kernel, and hence this was called a "heavyweight exit".
One of the drawbacks in this model was the maintenance of the fork of QEMU. The early focus of the developers was on stabilizing the kernel module, and getting more and more guests to work without a hitch. That meant much less developer time was spent on the device emulation code, and hence the work to redo the hacks to make them suitable for upstream remained at a lower priority.
Xen too used a fork of QEMU for its device emulation in its HVM mode (the mode where Xen used the new hardware virtualization instructions). In addition, QEMU had its own non-upstream Linux kernel accelerator module (KQEMU) for x86-on-x86 that eliminated the emulation layer, making x86 guests run faster on x86 hardware. Integrating all of this required a maintainer who would understand the various needs from all the projects. Anthony Liguori stepped up as a maintainer of the QEMU project, and he had the trust of the Xen and KVM communities. Over time, in small bits, the forks were eliminated, and now KVM as well as Xen use upstream QEMU for their device model emulation.
The "do one thing, do it right" mantra, along with "everything is a file", was exploited to the fullest. The KVM API allows one to create VMs — or, alternatively, sandboxes — on a Linux system. These can then run operating systems inside them, or just about any code that will not interfere with the running system. This also means that there are other user-space implementations that are not as heavyweight or as featureful as QEMU. Tools that can quickly boot into small applications or specialized OSes with a KVM VM started showing up — with kvmtool being the most popular one.
Developer Interest
Since the original announcement of the KVM project, many hackers were interested in exploring KVM. It helped that hacking on KVM was very convenient: a system reboot wasn't required to install a new VMM. It was as simple as re-compiling the KVM modules, removing the older modules, and loading the newly-compiled ones. This helped immensely during the early stabilization and improvement phases. Debugging was a much faster process, and developers much preferred this way of working, as contrasted with compiling a new VMM, installing it, updating the boot loader, and rebooting the system. Another advantage, perhaps of lower importance on development systems but nonetheless essential for my work-and-development laptop, was that root permissions were not required to run a virtual machine.
Another handy debugging trick that was made possible by the separation of the KVM module and QEMU was that if something didn't work in KVM mode, but worked in emulated mode, the fault was very likely in the KVM module. If some guest didn't work in either of the modes, the fault was in the device model or QEMU.
The early KVM release model helped with a painless development experience as well: even though the KVM project was part of the upstream Linux kernel, Kivity maintained the KVM code on a separate release train. A new KVM release was made regularly that included the source of the KVM modules, a small compatibility layer to compile the KVM modules on any of the supported Linux kernels, and the kvm-userspace piece. This ensured that a distribution kernel, which had an older version of the KVM modules, could be used unchanged by compiling the modules from the newest KVM release for that kernel.
The compatibility layer required some effort to maintain. It needed to ensure that the new KVM code that used newer kernel APIs that were not present on older kernels continued to work, by emulating the new API. This was a one-time cost to add such API compatibility functions, but the barrier to entry for new contributors was significantly reduced. Hackers could download the latest KVM release, compile the modules against whichever kernel they were running, and see virtual machines boot. If that did not work, developers could post bug-fix patches.
Widespread adoption
Chip vendors started taking interest and porting KVM to their architectures: Intel added support for IA64 along with features and stability fixes to x86; IBM added support for s390 and POWER architectures; ARM and Linaro contributed to the ARM port; and Imagination Technologies added MIPS support. These didn't happen all at once, though. ARM support, for example, came rather late ("it's the reality that's not timely, not the prediction", quipped Kivity during a KVM Forum keynote when he had predicted the previous year that an ARM port would materialize).
Developer interest could also be seen at the KVM Forums, which is an annual gathering of people interested in KVM virtualization. The first KVM Forum in 2007 had a handful of developers in a room where many discussions about the current state of affairs, and where to go in the future, took place. One small group, headed by Rusty Russell, took over the whiteboard and started discussions on what a paravirtualized interface for KVM would look like. This is where VIRTIO started to take shape. These days, the KVM Forum is a whole conference with parallel tracks, tens of speakers, and hundreds of attendees.
As time passed, it was evident the KVM kernel modules were not where most of the action was — the instruction emulation, when required, was more or less complete, and most distributions were shipping recent Linux kernels. The focus had then switched to the user space: adding more device emulation, making existing devices perform better, and so on. The KVM releases then focused more on the user-space part, and the maintenance of the compatibility layer was eased. At this time, even though the kvm-userspace fork existed, effort was made to ensure new features went into the QEMU project rather than the kvm-userspace project. Kivity too started feeding in small changes from the kvm-userspace repository to the QEMU project.
While all this was happening, Qumranet had changed direction, and was now pursuing desktop virtualization with KVM as the hypervisor. In September 2008, Red Hat announced it would acquire Qumranet. Red Hat had supported the Xen hypervisor as its official VMM since the Red Hat Enterprise Linux 5.0 release. With the RHEL 5.4 release, Red Hat started supporting both Xen and KVM as hypervisors. With the release of RHEL 6.0, Red Hat switched to only supporting KVM. KVM continued enjoying out-of-the box support in other distributions as well.
Present and future
Today, there are several projects that use KVM as the default hypervisor: OpenStack and oVirt are the more popular ones. These projects concern themselves with large-scale deployments of KVM hosts and several VMs in one deployment. These come with various use cases, and hence ask of different things from KVM. As guest OSes grow larger (more RAM and virtual CPUs), they become more difficult to live-migrate without incurring too much downtime; Telco deployments need low latency network packet processing, so realtime KVM is an area of interest; and faster disk and network I/O is always an area of research. Keeping everything secure and reducing the hypervisor footprint are also being worked on. The ways in which a malicious guest can break out of its VM sandbox and how to mitigate such attacks is also a prime area of focus.
A lot of advancement happens with new hardware updates and devices. However, a lot of effort is also spent in optimizing the current code base, writing new algorithms, and coming up with new ways to improve performance and scalability with the existing infrastructure.
For the next ten years, the main topics of discussion may well not be about the development of the hypervisor. More interesting will be to see how Linux gets used as a hypervisor, bringing better sandboxing for running untrusted code, especially on mobile phones, and running the cloud infrastructure, by being pervasive as well as invisible at the same time.
Brief items
Development quotes of the week
You might say, “That’s great, but double-buffered rendering is the textbook solution to the problem of displaying incomplete rendering to users and driving them to kill their dogs in maniacal frustration.”. That’s true, but Emacs predates those textbooks. GNU Emacs is an old-school C program emulating a 1980s Symbolics Lisp Machine emulating an old-fashioned Motif-style Xt toolkit emulating a 1970s text terminal emulating a 1960s teletype. Compiling Emacs is a challenge. Adding modern rendering features to the redisplay engine is a miracle.
Personally I think a Linux kernel tarball, without accompanying git history, is a GPL violation. But I don't expect to convince anyone...
Collabora Online Development Edition 2.0 released
Version 2.0 of the Collabora Online Development Edition online office suite has been released. "Collabora Productivity, the developers behind LibreOffice Online, announced the release of CODE 2.0, including the latest and most requested feature from customers: collaborative editing. Developers and home users are encouraged to update, try this out and get involved with the latest developments." See this blog entry for lots of details.
Mesa 13.0.0 released
The Mesa project has announced version 13.0.0 of the 3D graphics library that provides an open-source implementation of OpenGL. "This release has huge amount of features, but without a doubt the biggest ones are: Vulkan driver for hardware supported by the AMDGPU kernel driver [and] OpenGL 4.4/4.5 capability, yet the drivers may expose lower version due to pending Khronos CTS validation."
PostgreSQL 2016-10-27 Cumulative Update Release
The PostgreSQL Global Development Group has released an update to all supported versions of its database system, including 9.6.1, 9.5.5, 9.4.10, 9.3.15, 9.2.19, and 9.1.24. This is the last update for the PostgreSQL 9.1 series which is now end-of-life. "This release fixes two issues that can cause data corruption, which are described in more detail below. It also patches a number of other bugs reported over the last three months. The project urges users to apply this update at the next possible downtime."
Announcing the Tor Browser User Manual
The new Tor Browser User Manual has been released. "During the creation of this manual, community feedback was requested over various mailing lists / IRC channels. We understand that many people who read this blog are not part of these lists / channels, so we would like to request that if you find errors in the manual or have feedback about how it could be improved, please open a ticket on our bug tracker and set the component to "community"."
Twisted 16.5 Release
Twisted Matrix Laboratories has released Twisted 16.5. Highlights of this release include Deferred.addTimeout for timing out your Deferreds, yield from support for Deferreds in functions wrapped with twisted.internet.defer.ensureDeferred, twisted.internet.cfreactor is now supported on Python 2.7 and Python 3.5, and more. See the NEWS file for more information.Waltham: a generic Wayland-style IPC over network
Pekka Paalanen has announced the Waltham project. "Waltham is (will be) the Wayland-style protocol framework built to support network communications. Just like Wayland, the protocol is object-oriented, defined in XML files, and you use a code generator to bind it in server and client programs. Practically all the Wayland protocol design principles apply also to Waltham protocols - if you can write Wayland protocols, you can write Waltham protocols."
Newsletters and articles
Development newsletters
- Emacs News (November 1)
- These Weeks in Firefox (October 31)
- What's cooking in git.git (October 26)
- What's cooking in git.git (October 28)
- What's cooking in git.git (October 31)
- This Week in GTK+ (October 31)
- Koha Community Newsletter (October)
- OCaml Weekly News (November 1)
- Perl Weekly (October 31)
- PostgreSQL Weekly News (October 30)
- Python Weekly (October 27)
- Ruby Weekly (October 27)
- This Week in Rust (November 1)
- Wikimedia Tech News (October 31)
Project for porting C to Rust gains Mozilla's backing (InfoWorld)
InfoWorld takes a look at a C-to-Rust translation project called Corrode. "What Corrode does not do (yet) is take constructs specific to C and rewrite them in memory-safe Rust equivalents. In other words, it performs the initial grunt work involved in porting a project from C to Rust, but it leaves the heavier lifting -- for example, using Rust's idioms and language features -- to the developer."
Page editor: Rebecca Sobol
Announcements
Brief items
A change of lawyers at the FSF
The Free Software Foundation has announced that Eben Moglen has stepped down as the organization's general counsel; there is no word on who his replacement will be. "The FSF looks forward to working together in other capacities with Professor Moglen and SFLC on future projects to advance the free software movement and use of the GNU General Public License (GPL)."
Heiki Lõhmus takes over FSFE vice-presidency from Alessandro Rubini
The Free Software Foundation Europe has announced that Alessandro Rubini has resigned as vice-president of the FSFE and Heiki Lõhmus will be taking over that role.New Directors Join Linux Foundation Board
The Linux Foundation has announced the appointment of Erica Brescia, co-founder and COO of Bitnami; Jeff Garzik, co-founder of Bloq; and Nithya A. Ruff, director of Western Digital’s Open Source Strategy Office, to its Board of Directors. Ms. Ruff and Ms. Brescia join as At-Large Directors, and Mr. Garzik comes on board as the representative of Linux Foundation Silver members.
Articles of interest
Free Software Supporter Issue 103, November 2016
This edition of the Free Software Foundation's newsletter covers nominations for Free Software Awards, LibrePlanet call for papers, a licensing resource series, DMCA anti-circumvention rules, and much more.Eben Moglen on GPL Compliance and Building Communities: What Works (Linux.com)
Linux.com has a transcript of Eben Moglen's talk in New York on October 28. "I have some fine clients and wonderful friends in this movement who have been getting rather angry recently. There is a lot of anger in the world, in fact, in politics. Our political movement is not the only one suffering from anger at the moment. But some of my angry friends, dear friends, friends I really care for, have come to the conclusion that they’re on a jihad for free software. And I will say this after decades of work—whatever else will be the drawbacks in other areas of life—the problem in our neighborhood is that jihad does not scale." There is a video of the talk available as well.
Calls for Presentations
FOSDEM Desktops DevRoom 2017 CfP
The Desktops DevRoom at FOSDEM (in Brussels, Belgium) will take place on February 5. The call for participation closes December 5. "Talks can be very specific, such as the advantages/disadvantages of distributing a desktop application with snap vs flatpak, or as general as using HTML5 technologies to develop native applications. Topics that are of interest to the users and developers of all desktop environments are especially welcome."
FOSDEM 2017 Legal and Policy Issues DevRoom CFP
There will be a Legal and Policy Issues DevRoom at FOSDEM in Brussels, Belgium February 4-5. The call for papers closes November 27. "Hackers, contributors and lawyers alike are encouraged to submit on any project policy or legal topic. Successful proposals will cover topics of interest at a medium to advanced level."
CFP Deadlines: November 3, 2016 to January 2, 2017
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
| Deadline | Event Dates | Event | Location |
|---|---|---|---|
| November 11 | November 11 November 12 |
Linux Piter | St. Petersburg, Russia |
| November 11 | January 27 January 29 |
DevConf.cz 2017 | Brno, Czech Republic |
| November 13 | December 10 | Mini Debian Conference Japan 2016 | Tokyo, Japan |
| November 15 | March 2 March 5 |
Southern California Linux Expo | Pasadena, CA, USA |
| November 15 | March 28 March 31 |
PGConf US 2017 | Jersey City, NJ, USA |
| November 18 | February 18 February 19 |
PyCaribbean | Bayamón, Puerto Rico, USA |
| November 20 | December 10 December 11 |
SciPy India | Bombay, India |
| November 21 | January 16 | Linux.Conf.Au 2017 Sysadmin Miniconf | Hobart, Tas, Australia |
| November 21 | January 16 January 17 |
LCA Kernel Miniconf | Hobart, Australia |
| November 28 | March 25 March 26 |
LibrePlanet 2017 | Cambridge, MA, USA |
| December 1 | April 3 April 6 |
‹Programming› 2017 | Brussels, Belgium |
| December 10 | February 21 February 23 |
Embedded Linux Conference | Portland, OR, USA |
| December 10 | February 21 February 23 |
OpenIoT Summit | Portland, OR, USA |
| December 31 | March 2 March 3 |
PGConf India 2017 | Bengaluru, India |
| December 31 | April 3 April 7 |
DjangoCon Europe | Florence, Italy |
| January 1 | April 17 April 20 |
Dockercon | Austin, TX, USA |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
EuroPython 2017 will be held in Rimini, Italy
The EuroPython Society (EPS) has announced the decision to accept the proposal from the Italian on-site team, backed by the Python Italia APS, to hold EuroPython 2017 in Rimini, Italy. The conference will be in July, although exact dates have not yet been set.Events: November 3, 2016 to January 2, 2017
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| November 1 November 4 |
PostgreSQL Conference Europe 2016 | Tallin, Estonia |
| November 1 November 4 |
Linux Plumbers Conference | Santa Fe, NM, USA |
| November 3 | Bristech Conference 2016 | Bristol, UK |
| November 4 November 6 |
FUDCon Phnom Penh | Phnom Penh, Cambodia |
| November 5 | Barcelona Perl Workshop | Barcelona, Spain |
| November 5 November 6 |
OpenFest 2016 | Sofia, Bulgaria |
| November 7 November 9 |
Velocity Amsterdam | Amsterdam, Netherlands |
| November 9 November 11 |
O’Reilly Security Conference EU | Amsterdam, Netherlands |
| November 11 November 12 |
Seattle GNU/Linux Conference | Seattle, WA, USA |
| November 11 November 12 |
Linux Piter | St. Petersburg, Russia |
| November 12 November 13 |
T-Dose | Eindhoven, Netherlands |
| November 12 November 13 |
Mini-DebConf | Cambridge, UK |
| November 12 November 13 |
PyCon Canada 2016 | Toronto, Canada |
| November 13 November 18 |
The International Conference for High Performance Computing, Networking, Storage and Analysis | Salt Lake City, UT, USA |
| November 14 November 18 |
Tcl/Tk Conference | Houston, TX, USA |
| November 14 | The Third Workshop on the LLVM Compiler Infrastructure in HPC | Salt Lake City, UT, USA |
| November 14 November 16 |
PGConfSV 2016 | San Francisco, CA, USA |
| November 16 November 18 |
ApacheCon Europe | Seville, Spain |
| November 16 November 17 |
Paris Open Source Summit | Paris, France |
| November 17 | NLUUG (Fall conference) | Bunnik, The Netherlands |
| November 18 November 20 |
GNU Health Conference 2016 | Las Palmas, Spain |
| November 18 November 20 |
UbuCon Europe 2016 | Essen, Germany |
| November 19 | eloop 2016 | Stuttgart, Germany |
| November 21 November 22 |
Velocity Beijing | Beijing, China |
| November 24 | OWASP Gothenburg Day | Gothenburg, Sweden |
| November 25 November 27 |
Pycon Argentina 2016 | Bahía Blanca, Argentina |
| November 29 November 30 |
5th RISC-V Workshop | Mountain View, CA, USA |
| November 29 December 2 |
Open Source Monitoring Conference | Nürnberg, Germany |
| December 3 | NoSlidesConf | Bologna, Italy |
| December 3 | London Perl Workshop | London, England |
| December 6 | CHAR(16) | New York, NY, USA |
| December 10 | Mini Debian Conference Japan 2016 | Tokyo, Japan |
| December 10 December 11 |
SciPy India | Bombay, India |
| December 27 December 30 |
Chaos Communication Congress | Hamburg, Germany |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol

![[Group photo]](https://static.lwn.net/images/conf/2016/ks/ksgroup-sm.jpg)