LWN.net Weekly Edition for October 20, 2016
Automatically detecting kernel interface changes
ABI changes can be painful for anybody charged with the development and maintenance of software; that can be doubly so when the changes happen inadvertently and take people by surprise. There is tooling out there that can search for and report ABI changes. At Kernel Recipes 2016, Dodji Seketeli presented some early work he has done on a tool that would find unexpected kernel ABI changes and asked what might seem like an obvious question: would this functionality be useful to the development community?It is worth noting that he was not talking about the sort of ABI change that kernel developers worry about the most: changes to the user-space ABI. Instead, he is focusing on changes to the loadable-module ABI. At first blush, that might seem like it could reduce the level of interest in his work. As was pointed out in the talk, kernel developers are generally unwilling to talk about the module interface as an ABI at all; at best, it's a fluid API with no stability guarantees. This interface is explicitly allowed to change, so the number of developers wanting a tool to flag those changes might be thought to be small.
Interest in this kind of tool comes mostly from distributors. The
enterprise distributors could use it to let binary-driver vendors know when
something has changed in the module interface. But Ben Hutchings, Debian's
kernel maintainer, said that it would be generally useful to avoid making
changes to the module interface when patching a stable kernel.
The abidiff tool exists to provide just this kind of information. It reads the ELF symbol information from an object file, along with any debug information found there. It uses that information to build an internal representation of the ABI, which can be saved in a special XML file. Given ABI representations from two different objects, abidiff can report on the differences between the two.
Seketeli showed some example output from abidiff; it can be seen in his slides. The tool is able to detect changes in the types of a function's parameters or its return value. Anything that changes the size of a structure or the layout of its members will also be reported on. The removal of functions is noted, and so on. There are mechanisms for reducing noise by filtering out changes that might not be of interest; for example, changes to structures that do not appear in a specific set of header files can be suppressed. Other tools built on top of abidiff can look for ABI changes in libraries stored in package files.
But, he pointed out, none of this works with the kernel now. But wouldn't it be nice if we had a tool that could look at a set of kernel modules and exported interfaces, generating a report of what has changed from a previous version?
Getting there requires a bit of work. The tool would need to understand and handle the special ELF sections used in the kernel build; the __export_symbol and __export_symbol_gpl sections are particularly relevant. Kernel modules also need to be parsed properly, and an interface description generated from the result. The sheer size of the kernel presents a problem as well; it will force some memory-usage optimizations that have not been necessary thus far. These are the sort of issues he has been working on.
Thus far, he has added a kernel-specific mode to the abidw tool, which generates an XML representation of an ABI from an ELF file for use with abidiff. Some examples of the output can be found in this page. Anybody wanting to play with this work can grab a copy of the repository by running:
git clone -b dodji/kabidiff git://sourceware.org/git/libabigail.git
The discussion of this work was wide-ranging and energetic; it is hard to report on here. One topic that came up was the possibility of detecting changes in the user-space ABI instead; that is a tool that would be useful for regression testing in general. That, Seketeli allowed, is a rather harder problem. Even just looking at the system-call interface, it can be hard for a tool to understand what a system call's parameters are supposed to represent.
So a user-space ABI checker is probably not on the immediate horizon. We probably will see a tool that can find changes in the module interface, though, and that will have its own uses. Developers might be surprised to learn how often the changes they make affect the interface used by loadable modules.
Graphics world domination may be closer than it appears
The mainline kernel has support for a wide range of hardware. One place where support has traditionally been lacking, though, is graphics adapters. As a result, a great many people are still using proprietary, out-of-tree GPU drivers. Daniel Vetter went before the crowd at Kernel Recipes 2016 to say that the situation is not as bad as some think; indeed, he said, in this area as well as others, world domination is proceeding according to plan.
The current state of affairs
The first stop on Vetter's tour of the direct rendering manager (DRM) subsystem was documentation, and, in particular, the transition to Sphinx that has unfolded over the last couple of release cycles. The new formatted documentation system for the kernel is "pretty and awesome", and makes writing the documentation fun. As a result, there's now a lot more documentation than there used to be; indeed, the DRM documentation is pretty much complete. The biggest gap at this point is a top-level picture that nicely ties all the pieces together.
Moving on to rather older work (he titled this section "dungeons and dragons"), Vetter noted that there are still some DRM1 drivers around; these are at least ten years old at this point. They feature nasty user-space APIs, root holes, and other delightful things. These drivers are built around a midlayer architecture, a design which has gone out of fashion in recent years; the idea was to make it possible to build the drivers on BSD systems. In current kernels, these drivers are hidden behind the CONFIG_DRM_LEGACY option. They cannot be removed outright without breaking things, though, so they will remain for a while.
The IGT tools from Intel have proved to be a useful test suite for the validation of DRM drivers. They are Intel-specific for now, but are being modified to be more generic. At this point, a number of drivers and continuous-integration systems are using these tests to trap regressions. See the DRM documentation for information on how to validate drivers with the IGT suite.
Recently there has been an influx of DRM developers from the ARM community;
that has led to a new set of problems. The DRM subsystem is special,
Vetter said, in that it requires
that the user-space API for any driver be
open source. Much of the code for these drivers runs in user space; the
10% that runs in the kernel is "useless" without the user-space side as
well. A kernel driver without the user-space code cannot be enhanced or
maintained.
The ARM folks were unaware of this restriction and not used to operating in
this mode, so the DRM maintainers have had to start rejecting their
patches. The result was some screaming, but, at this point, the ARM
community understands the requirements and is starting to look at opening
up the user-space code as well.
One of the big changes in the DRM subsystem in recent years has been the switch to the atomic mode-setting API. The original DRM API featured one ioctl() call for each operation to be done; that resulted in a lot of display flickering as applications worked through a long series of changes. The atomic API allows everything to be done with a single call, leading to flicker-free changes. An atomic change is an all-or-nothing affair; if it succeeds at all, it will succeed completely.
This API also provides a separate call to check whether a set of changes would succeed without actually making those changes. It can be hard to know before trying; hardware often has weird restrictions that get in the way. He mentioned adapters with three video outputs but only two clocks as an example. Overlay support (the ability to directly display a video stream from another source, such as a camera, without going through user space) has been added to this API as well. Overlays went out of fashion for a while, but it turns out that a lot of power can be saved by outputting the video directly; it is a crucial feature for mobile systems.
At this point, there are 20 drivers in the mainline with atomic mode-setting implementations; another two or three are added with each release. The adoption of this API far exceeds the rate of adoption of the original kernel mode-setting API. It helps that a lot of functionality is in common code now, so the drivers themselves have gotten smaller. The support library has been made more modular; using it is not an all-or-nothing affair like it used to be.
Use of the atomic API is growing; one example is the drm_hwcomposer library, written by Google for use with Android systems. The ChromeOS Ozone interface running on Wayland uses it, as do all the other Wayland implementations. We have, he said, "a driver API to rule them all" for the first time.
Looking forward
Turning to future work, Vetter mentioned that there is interest in an interface that can allocate buffers for use with multiple devices. The ION memory allocator offers this functionality, but it remains Android-specific for now.
The old framebuffer device (fbdev) interface has been deprecated for some time, but it still turns out to be useful in some settings. In particular, it can save memory bandwidth and power on some low-end displays — those that require manual uploading of display data. The generic fbdev "defio" interface can now be remapped onto kernel mode-setting operations, making it possible to write a full fbdev driver on top of the DRM subsystem.
The simple display pipeline helper also makes writing simple drivers easy. For settings where there is a simple processing pipeline and a single connector, it can provide access to the atomic API without most of the complexity. With this helper, the DRM API is "now strictly better" than fbdev.
Fences are currently an area of active development. A fence is like the kernel's completion structure, in that it can be used to wait for (and signal) the completion of an operation; it is intended to be used with DMA operations in particular. There are two models for fence usage. In the "implicit" model, the kernel attaches fences to I/O buffers and takes care of everything; user space never sees it. The "explicit" model, instead, has the kernel providing fences to user space, which must then manage them itself.
The implicit model has been implemented for some time, in the form of reservation_object structures attached to DMA buffers. The TTM memory manager (used with the AMD and Nouveau drivers) has always supported it; other drivers are picking up support over time. This is the model preferred by the Linux desktop; both X and Wayland expect implicit fencing.
On the other hand, the Android system wants to use explicit fencing. It provides more control to user space and reduces the need for complexity in (vendor-supplied) graphics drivers. That was the driving factor in Android's decision, Vetter said; no vendor proved able to implement implicit fences correctly. The DRM subsystem implements an explicit fence as a sync_file structure, which is returned to user space as a file descriptor. User-space fences will be supported in the 4.9 kernel; the MSM/freedreno driver has added support so far.
As one might imagine, there is some tricky interaction between implicit and explicit fences. The solution that has been chosen is to use implicit fences by default, but to switch to the explicit model as soon as an application calls one of the explicit-fencing extensions.
Google has created the "HWC2" composer that can make use of DRM's explicit-fencing support; it is not yet publicly released, Vetter said, but will hopefully show up in 4.10. More information will be available at the Linux Plumbers Conference. Sometime soon it will be possible to run Android on a mainline kernel with an open-source graphics stack, he said.
Along those lines, what is the status of low-level GPU drivers? At this point, there are three vendor-supported open drivers in the mainline, and three more reverse-engineered ones. Of those, the Nouveau driver runs fairly well on Tegra systems. The freedreno driver is "pretty feature-complete" and is now competitive with proprietary drivers. The etnaviv driver is coming along, but still needs work on the user-space side. But, he said, there are still no vendor-supported system-on-chip drivers; that situation is "pretty dire."
He finished up by noting that the atomic API now "rules them all." There has been a lot of progress in documentation and general cleanup; all of the major gaps for authors of display drivers have been closed. Cross-driver fencing is reaching a point of being ready for everyone, and even rendering is showing some (albeit slow) progress. Upstream graphics, he said, is finally winning.
Security
Sandboxing with the Landlock security module
Anybody working to harden a computing system is likely to look at sandboxing fairly early in the process. The prospect of vulnerabilities in running software is a bit less worrisome if the scope for exploitation of those vulnerabilities is limited, and a sandbox can limit an attacker's freedom nicely. The kernel has a number of mechanisms that can support sandboxing now, and others are under development. One of those, the Landlock security module, was the topic of Mickaël Salaün's talk at Kernel Recipes 2016.The goal for Landlock, Salaün said, is to allow unprivileged users to restrict processes that they run. He is trying to create something that is similar to the OpenBSD pledge() (formerly tame()) system call. By restricting what a running process can do, a Landlock-based sandbox can reduce the attack surface of the kernel and, with luck, make the exploitation of vulnerabilities harder in general.
Why not use the mechanisms that the kernel already provides? The
Linux security module (LSM) subsystem offers mechanisms like SELinux or
Smack, but those are meant for administrators, not users, Salaün said.
Their configuration is complex, and setting policies is a privileged
operation, which runs counter to the goal of working for unprivileged
users. The seccomp() mechanism can be used to create sandboxes,
but it is limited; only 64 bits of information can be passed to a
seccomp() hook, and it is not possible to filter system calls
based on the paths of files they try to access. The system-call level is
also the wrong place for this kind of filtering; the security hooks used by
the LSM subsystem are better placed for making proper access-control
decisions.
Thus, a new LSM. It can be thought of as being similar to seccomp(), in that it allows the loading of BPF programs to make access-control decisions. There are two aspects to that functionality that are of interest.
The first is the ability to attach BPF programs directly to the LSM hook functions and to give them access to the arguments passed to the hooks. In the current form of the patch set, the security_file_open(), security_file_permission(), and security_mmap_file() hooks can have programs attached to them; there are plans to add more hooks in the future.
These hooks need the ability to make access-control decisions; in particular, Salaün is looking for the ability to make path-based decisions. So, for example, a program might be blocked from accessing any files outside of a dedicated, application-specific directory. To support this type of decision-making, a new type of BPF map (BPF_MAP_TYPE_LANDLOCK_ARRAY) is added. These maps can hold kernel pointers with an associated type; the actual use is to hold pointers to file structures. Then, there is a set of new BPF-callable utility functions with convenient names like bpf_landlock_cmp_fs_beneath_with_struct_file() that can tell a BPF program whether one file structure is beneath another in the filesystem hierarchy.
With that supporting structure in place, one can see how a Landlock-based sandbox would work. The control program populates a special map with the file descriptors (converted to file structures internally) of the directories that the sandboxed program is to be allowed to access. A simple BPF program, which is attached to the security hooks that are called when files are opened, can then ensure that any file-access attempt is located in or below one of the directories stored in the map. Confining a process to specific parts of the filesystem thus becomes relatively easy.
The last remaining piece is causing the relevant BPF programs to be run for the process(es) in the sandbox. There are two different ways in which that can be done:
- There is a new seccomp() operation, called
SECCOMP_SET_LANDLOCK_HOOK, which will cause a program to be
attached to a specific LSM hook for the current process. It is
possible to request that the program be invoked every time the
equivalent LSM hook is called, but there is another possibility as
well. A normal seccomp() program can be attached to one or
more system calls as usual, and Landlock can be told to only run the
LSM-attached program if the seccomp() program returns the
special SECCOMP_RET_LANDLOCK value. The seccomp()
program can, thus, make the access-control decision by itself, or it
can decide to defer to the Landlock program(s) that will be invoked
later.
- Landlock programs can be attached to a control group, using an extension to the bpf_prog_attach() patch. In this case, every process running within that control group will be regulated by the Landlock programs.
It is worth noting that the Landlock BPF programs are stackable in either context; if multiple layers of programs are attached, each will run in order and each will have the ability to veto any given operation.
Salaün demonstrated a simple program that uses the Landlock hooks. One need simply set the environment variable LANDLOCK_ALLOWED to a list of directories that a program should be allowed to access, then use the example program to launch the program of interest. The sandboxed program will be unable to access anything outside of the given list. Attempts to access forbidden files are turned back with an EPERM error; unlike seccomp(), Landlock does not kill programs that run into access restrictions.
The response to the module thus far has been mostly positive. Andy Lutomirski is concerned about the control-group mode, though, given that there are still outstanding questions about how the version-2 control-group interface is going to work in general. So he recommends leaving that piece out and just using seccomp() until that issue has been resolved. The control-group hook is a tiny piece of the whole, so, if leaving it out is the price of admission for now, it is hard to imagine that anybody will be too upset.
One other potential problem is that there is a competing proposal out there in the form of the Checmate module. From all appearances, though, Landlock is further along and more actively developed. It may make sense to take ideas from both projects, though; Checmate is more focused on networking operations at the moment, which is an area that Landlock has yet to address. So the details are yet to be determined, but it seems likely that there will be some sort of BPF-based security module in the kernel before too long. It has taken a while for the stackable security modules concept to bring about a new set of interesting security mechanisms, but that would appear to be happening at last.
[Your editor would like to thank Kernel Recipes for supporting his travel to the event.]
Brief items
Security quotes of the week
We don’t care, our payments are handled by a 3rd party payment providerIf someone can inject Javascript into your site, your database is most likely also hacked.
Thanks for your suggestion, but our shop is totally safe. There is just an annoying javascript error.Or, even better:
Our shop is safe because we use https
As I say often, cybersecurity is perhaps the most difficult intellectual occupation on the planet. Note that I said "occupation" rather than "profession." Three Septembers ago, the U.S. National Academy of Sciences concluded that cyber security should be seen as an occupation and not a profession because the rate of change is simply too great to consider professionalization. Ray Kurzweil is beyond all doubt correct; within the career lifetime of nearly everyone in this room, algorithms will be smarter than we are, and they will therefore be called upon to do what we cannot -- to protect us from other algorithms, and to ask no permission in so doing. Do we, like Ulysses, lash ourselves to the mast or do we, as the some would say, relax and enjoy the inevitable? What would we have science do? What are the possible futures you will tolerate? What horses do you want not let out of the barn? Where do we put our intelligence budget? US CYBERCOM's budget is $500 million, JPMorganChase, alone, is spending $600 million. Is that surprising or is that as it should be?
Secure Your Containers with this One Weird Trick (RHEL Blog)
Over on the Red Hat Enterprise Linux Blog, Dan Walsh writes about using Linux capabilities to help secure Docker containers. "Let’s look at the default list of capabilities available to privileged processes in a docker container: chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap. In the OCI/runc spec they are even more drastic only retaining, audit_write, kill, and net_bind_service and users can use ocitools to add additional capabilities. As you can imagine, I like the approach of adding capabilities you need rather than having to remember to remove capabilities you don’t." He then goes through the capabilities listed describing what they govern and when they might need to be turned on for a container application.
New vulnerabilities
asterisk: denial of service
| Package(s): | asterisk | CVE #(s): | |||||
| Created: | October 19, 2016 | Updated: | October 19, 2016 | ||||
| Description: | From the Mageia advisory:
The overlap dialing feature in chan_sip allows chan_sip to report to a device that the number that has been dialed is incomplete and more digits are required. If this functionality is used with a device that has performed username/password authentication RTP resources are leaked. This occurs because the code fails to release the old RTP resources before allocating new ones in this scenario. If all resources are used then RTP port exhaustion will occur and no RTP sessions are able to be set up (AST-2016-007). | ||||||
| Alerts: |
| ||||||
atomic-openshift: authentication bypass
| Package(s): | atomic-openshift | CVE #(s): | CVE-2016-7075 | ||||
| Created: | October 18, 2016 | Updated: | October 19, 2016 | ||||
| Description: | From the Red Hat advisory:
It was found that Kubernetes did not correctly validate X.509 client intermediate certificate host name fields. An attacker could use this flaw to bypass authentication requirements by using a specially crafted X.509 certificate. | ||||||
| Alerts: |
| ||||||
chromium-browser: multiple vulnerabilities
| Package(s): | chromium-browser | CVE #(s): | CVE-2016-5181 CVE-2016-5182 CVE-2016-5183 CVE-2016-5184 CVE-2016-5185 CVE-2016-5186 CVE-2016-5187 CVE-2016-5188 CVE-2016-5189 CVE-2016-5190 CVE-2016-5191 CVE-2016-5192 CVE-2016-5193 CVE-2016-5194 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 17, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat advisory:
Multiple flaws were found in the processing of malformed web content. A web page containing malicious content could cause Chromium to crash, execute arbitrary code, or disclose sensitive information when visited by the victim. (CVE-2016-5181, CVE-2016-5182, CVE-2016-5183, CVE-2016-5184, CVE-2016-5185, CVE-2016-5187, CVE-2016-5194, CVE-2016-5186, CVE-2016-5188, CVE-2016-5189, CVE-2016-5190, CVE-2016-5191, CVE-2016-5192, CVE-2016-5193) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
dbus: code execution
| Package(s): | dbus | CVE #(s): | |||||||||||||
| Created: | October 14, 2016 | Updated: | November 10, 2016 | ||||||||||||
| Description: | From the Red Hat bugzilla:
A format string vulnerability in the reference bus implementation, dbus-daemon, could potentially allow local users to cause arbitrary code execution or denial of service. | ||||||||||||||
| Alerts: |
| ||||||||||||||
derby: information leak
| Package(s): | derby | CVE #(s): | CVE-2015-1832 | ||||||||
| Created: | October 14, 2016 | Updated: | November 18, 2016 | ||||||||
| Description: | From the openSUSE bug report:
Apache Derby could allow a remote attacker to obtain sensitive information, caused by a XML external entity (XXE) error when processing XML data by the XML datatype and XmlVTI. An attacker could exploit this vulnerability to read arbitrary files on the system or cause a denial of service. | ||||||||||
| Alerts: |
| ||||||||||
dwarfutils: three vulnerabilities
| Package(s): | dwarfutils | CVE #(s): | CVE-2015-8538 CVE-2016-2050 CVE-2016-2091 | ||||
| Created: | October 19, 2016 | Updated: | October 19, 2016 | ||||
| Description: | From the Debian LTS advisory:
CVE-2015-8538: A specially crafted ELF file can cause a segmentation fault. CVE-2016-2050: Out-of-bounds write CVE-2016-2091: Out-of-bounds read | ||||||
| Alerts: |
| ||||||
epiphany: unspecified
| Package(s): | epiphany webkitgtk4 | CVE #(s): | |||||||||
| Created: | October 19, 2016 | Updated: | October 19, 2016 | ||||||||
| Description: | From the Fedora advisory:
Update WebKitGTK+ package to 2.14.1. Major changes in 2.14.0: * Threaded compositor is enabled by default in both X11 and Wayland. * Accelerated compositing is now supported in Wayland. * Clipboard works in Wayland too. * Memory pressure handler always works even when cgroups is not present or not configured. * The HTTP disk cache implements speculative revalidation of resources. * DRI3 is no longer a problem when using the modesetting intel driver. * The amount of file descriptors that are kept open has been drastically reduced. Fixes from 2.14.1: * MiniBrowser and jsc binaries are now installed in pkglibexecdir instead of bindir. * Improve performance when resizing a window with multiple web views in X11. * Check whether GDK can use GL before using gdk_cairo_draw_from_gl() in Wayland. * Updated default UserAgent string or better compatibility. * Fix a crash on github.com in IntlDateTimeFormat::resolvedOptions when using the C locale. * Fix BadDamage X errors when closing the web view in X11. * Fix UIProcess crash when using Japanese input method. * Fix build with clang due to missing header includes. * Fix the build with USE_REDIRECTED_XCOMPOSITE_WINDOW disabled. * Fix several crashes and rendering issues. * Translation updates: German. Update Epiphany to be compatible with the new WebKitGTK+ package. | ||||||||||
| Alerts: |
| ||||||||||
ffmpeg: multiple vulnerabilities
| Package(s): | ffmpeg | CVE #(s): | CVE-2016-7502 CVE-2016-7555 CVE-2016-7562 CVE-2016-7785 CVE-2016-7905 | ||||||||||||
| Created: | October 18, 2016 | Updated: | January 30, 2017 | ||||||||||||
| Description: | From the openSUSE advisory:
- CVE-2016-7562: out-of-bounds array write fault via specially crafted avi files - CVE-2016-7502: out-of-bounds array write via incorrect block values - CVE-2016-7905: null-point-exception when decoding avi files with crafted 'gab2' structs - CVE-2016-7555: memory leak when decoding avi files with crafted 'strh' struct - CVE-2016-7785: assert fault via avi files with crafted 'strh' struct | ||||||||||||||
| Alerts: |
| ||||||||||||||
guile: two vulnerabilities
| Package(s): | guile | CVE #(s): | CVE-2016-8605 CVE-2016-8606 | ||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 17, 2016 | Updated: | February 17, 2017 | ||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
- CVE-2016-8605 (information disclosure): The mkdir procedure of GNU Guile, an implementation of the Scheme programming language, temporarily changed the process' umask to zero. During that time window, in a multithreaded application, other threads could end up creating files with insecure permissions. For example, mkdir without the optional mode argument would create directories as 0777. - CVE-2016-8606 (arbitrary code execution): It was reported that the REPL server is vulnerable to the HTTP inter- protocol attack. This constitutes a remote code execution vulnerability for developers running a REPL server that listens on a loopback device or private network. Applications that do not run a REPL server, as is usually the case, are unaffected. A remote attacker is able to execute arbitrary code via a HTTP inter-protocol attack if the REPL server is listening on a loopback device or private network. Running a multi-threaded guile application can cause directories or files to be created with world readable/writable/executable permissions during a small window which leads to information disclosure. | ||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||
java-1.8.0-openjdk: multiple vulnerabilities
| Package(s): | java-1.8.0-openjdk | CVE #(s): | CVE-2016-5542 CVE-2016-5554 CVE-2016-5573 CVE-2016-5582 CVE-2016-5597 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 19, 2016 | Updated: | January 16, 2017 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat advisory:
* It was discovered that the Hotspot component of OpenJDK did not properly check arguments of the System.arraycopy() function in certain cases. An untrusted Java application or applet could use this flaw to corrupt virtual machine's memory and completely bypass Java sandbox restrictions. (CVE-2016-5582) * It was discovered that the Hotspot component of OpenJDK did not properly check received Java Debug Wire Protocol (JDWP) packets. An attacker could possibly use this flaw to send debugging commands to a Java program running with debugging enabled if they could make victim's browser send HTTP requests to the JDWP port of the debugged application. (CVE-2016-5573) * It was discovered that the Libraries component of OpenJDK did not restrict the set of algorithms used for Jar integrity verification. This flaw could allow an attacker to modify content of the Jar file that used weak signing key or hash algorithm. (CVE-2016-5542) Note: After this update, MD2 hash algorithm and RSA keys with less than 1024 bits are no longer allowed to be used for Jar integrity verification by default. MD5 hash algorithm is expected to be disabled by default in the future updates. A newly introduced security property jdk.jar.disabledAlgorithms can be used to control the set of disabled algorithms. * A flaw was found in the way the JMX component of OpenJDK handled classloaders. An untrusted Java application or applet could use this flaw to bypass certain Java sandbox restrictions. (CVE-2016-5554) * A flaw was found in the way the Networking component of OpenJDK handled HTTP proxy authentication. A Java application could possibly expose HTTPS server authentication credentials via a plain text network connection to an HTTP proxy if proxy asked for authentication. (CVE-2016-5597) Note: After this update, Basic HTTP proxy authentication can no longer be used when tunneling HTTPS connection through an HTTP proxy. Newly introduced system properties jdk.http.auth.proxying.disabledSchemes and jdk.http.auth.tunneling.disabledSchemes can be used to control which authentication schemes can be requested by an HTTP proxy when proxying HTTP and HTTPS connections respectively. Note: If the web browser plug-in provided by the icedtea-web package was installed, the issues exposed via Java applets could have been exploited without user interaction if a user visited a malicious website. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
libarchive: three vulnerabilities
| Package(s): | libarchive | CVE #(s): | CVE-2016-8687 CVE-2016-8688 CVE-2016-8689 | ||||||||||||||||||||
| Created: | October 18, 2016 | Updated: | December 12, 2016 | ||||||||||||||||||||
| Description: | From the Debian LTS advisory:
Agostino Sarubbo of Gentoo discovered several security vulnerabilities in libarchive, a multi-format archive and compression library. An attacker could take advantage of these flaws to cause a buffer overflow or an out of bounds read using a carefully crafted input file. CVE-2016-8687: Agostino Sarubbo of Gentoo discovered a possible stack-based buffer overflow when printing a filename in bsdtar_expand_char() of util.c. CVE-2016-8688: Agostino Sarubbo of Gentoo discovered a possible out of bounds read when parsing multiple long lines in bid_entry() and detect_form() of archive_read_support_format_mtree.c. CVE-2016-8689: Agostino Sarubbo of Gentoo discovered a possible heap-based buffer overflow when reading corrupted 7z files in read_Header() of archive_read_support_format_7zip.c. | ||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||
libass: three vulnerabilities
| Package(s): | libass | CVE #(s): | CVE-2016-7972 CVE-2016-7970 CVE-2016-7969 | ||||||||||||||||||||||||
| Created: | October 13, 2016 | Updated: | February 21, 2017 | ||||||||||||||||||||||||
| Description: | From the Mageia advisory:
Amount of memory allocated during memory reallocation in the shaper wasn't tracked, possibly resulting in undefined behavior (CVE-2016-7972). Illegal read in Gaussian blur coefficient calculations (CVE-2016-7970). Mode 0/3 line wrapping equalization in specific cases could result in illegal reads while laying out and shaping text. (CVE-2016-7969) | ||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||
libgd2: two vulnerabilities
| Package(s): | libgd2 | CVE #(s): | CVE-2016-6911 CVE-2016-8670 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 19, 2016 | Updated: | December 23, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian LTS advisory:
CVE-2016-6911: invalid read in gdImageCreateFromTiffPtr() (most of the code is not present in the Wheezy version) CVE-2016-8670: Stack Buffer Overflow in GD dynamicGetbuf | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
libgit2: two vulnerabilities
| Package(s): | libgit2 | CVE #(s): | CVE-2016-8568 CVE-2016-8569 | ||||||||||||||||||||||||||||
| Created: | October 19, 2016 | Updated: | January 19, 2017 | ||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
CVE-2016-8568:
* Read out-of-bounds in git_oid_nfmt:
CVE-2016-8569:
* DoS using a null pointer dereference in git_commit_message:
Proposed patch: | ||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||
mpg123: denial of service
| Package(s): | mpg123 | CVE #(s): | CVE-2016-1000247 | ||||||||
| Created: | October 17, 2016 | Updated: | October 26, 2016 | ||||||||
| Description: | From the Debian LTS advisory:
Jerold Hoong discovered a flaw in the id3 tag processing code of libmpg123. A specially crafted mp3 input file could be used to cause a buffer over-read, resulting in a denial of service. | ||||||||||
| Alerts: |
| ||||||||||
qemu: three vulnerabilities
| Package(s): | qemu | CVE #(s): | CVE-2016-7466 CVE-2016-8576 CVE-2016-7995 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 19, 2016 | Updated: | October 26, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
CVE-2016-7466: Quick Emulator(Qemu) built with the USB xHCI controller emulation support is vulnerable to a memory leakage issue. It could occur while doing a USB device unplug operation; Doing so repeatedly would result in leaking host memory, affecting other services on the host. A privileged user inside guest could use this flaw to cause a DoS on the host and/or potentially crash the Qemu process instance on the host. CVE-2016-8576: Quick Emulator(Qemu) built with the USB xHCI controller emulation support is vulnerable to an infinite loop issue. It could occur while processing USB command ring in 'xhci_ring_fetch'. A privileged user/process inside guest could use this issue to crash the Qemu process on the host leading to DoS. CVE-2016-7995: Qemu emulator(Qemu) built with the USB EHCI emulation support is vulnerable to a memory leakage flaw. It could occur while processing isochronous transfer descriptors(iTD), with buffer page select(PG) index that falls beyond buffer page array area. A privileged user inside guest could use this flaw to leak Qemu memory bytes leading to a DoS on the host. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
quagga: stack overrun
| Package(s): | quagga | CVE #(s): | CVE-2016-1245 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 18, 2016 | Updated: | November 14, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian LTS advisory:
It was discovered that there was stack overrun in IPv6 RA receive code in quagga, a BGP/OSPF/RIP routing daemon. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||
ruby: encrypted ciphertext duplication
| Package(s): | ruby | CVE #(s): | CVE-2016-7798 | ||||
| Created: | October 13, 2016 | Updated: | October 19, 2016 | ||||
| Description: | From the Mageia advisory:
A bug in openssl module caused using an all 0 IV for AES-GCM ciphers in some cases (when setting a key, an iv, and then setting a key a again. | ||||||
| Alerts: |
| ||||||
tiff: denial of service
| Package(s): | tiff | CVE #(s): | CVE-2016-3622 | ||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 13, 2016 | Updated: | October 19, 2016 | ||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entry:
The fpAcc function in tif_predict.c in the tiff2rgba tool in LibTIFF 4.0.6 and earlier allows remote attackers to cause a denial of service (divide-by-zero error) via a crafted TIFF image. | ||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||
tor: denial of service
| Package(s): | tor | CVE #(s): | CVE-2016-8860 | ||||||||||||||||||||||||||||
| Created: | October 19, 2016 | Updated: | December 26, 2016 | ||||||||||||||||||||||||||||
| Description: | From the Debian advisory:
It has been discovered that Tor treats the contents of some buffer chunks as if they were a NUL-terminated string. This issue could enable a remote attacker to crash a Tor client, hidden service, relay, or authority. CVE assignment email. | ||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||
xen: information leak/corruption
| Package(s): | xen | CVE #(s): | CVE-2016-7777 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | October 14, 2016 | Updated: | November 3, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entry:
Xen 4.7.x and earlier does not properly honor CR0.TS and CR0.EM, which allows local x86 HVM guest OS users to read or modify FPU, MMX, or XMM register state information belonging to arbitrary tasks on the guest by modifying an instruction while the hypervisor is preparing to emulate it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 4.9-rc1, released on October 15, one day earlier than some might have expected. "My own favorite 'small detail under the hood' happens to be Andy Lutomirski's new virtually mapped kernel stack allocations. They make it easier to find and recover from stack overflows, but the effort also cleaned up some code, and added a kernel stack mapping cache to avoid any performance downsides." The virtually mapped kernel stack work was covered here in June.
Stable updates: 4.8.2, 4.7.8, and 4.4.25 were released on October 16. The relatively small 4.8.3, 4.7.9, and 4.4.26 updates are in the review process as of this writing; they can be expected on or after October 21.
Quotes of the week
< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Kernel development news
The end of the 4.9 merge window
By the time that Linus released 4.9-rc1 and closed the merge window for the 4.9 development cycle, 14,308 non-merge changesets had found their way into the mainline repository. As expected, this cycle has already broken the previous record for the busiest cycle ever, and it has a while to go still. 820 of those changesets were merged after last week's summary was written. Some of the more interesting changes found in this last set include:
- The XFS filesystem has gained support for shared extents — ranges of
file data that can be shared between multiple owners — and a
copy-on-write mechanism to manage modifications to those extents.
That, in turn, allows XFS to support copy_file_range() along with
other nice features like data deduplication.
- The NFS server now supports the NFS4.2 COPY operation, allowing file
data to be copied without traveling to the client and back.
- The watchdog subsystem has a new "pretimeout" mechanism to allow the
system to respond just prior to the expiration of a timer. Two new
"governors" are provided; one simply prints a log message, while the
other will panic the system in the hope of generating more useful
information for debugging the problem.
- A set of EXPORT_SYMBOL()
improvements has been merged. It is now possible to place export
directives into assembly code, and the handling of exported symbols in
library objects has been improved. One immediate practical result is
that it is now possible to place all EXPORT_SYMBOL()
directives next to the definition of the symbol that is being
exported. At the moment, checksums (for use with
CONFIG_MODVERSIONS) for assembly symbols are not generated;
that should be fixed in the near future.
- The build system can now use "thin archives" for the creation of
intermediate objects, rather than linking them with
ld -r. A thin archive contains symbol information, but
simply points to the component object files rather than making
copies. The main purpose here seems to be to make the PowerPC build
work more smoothly; see this
commit for some more information.
- The build system can also perform dead code and data elimination.
This option is potentially hazardous, since, without some extra
effort, the linker may see some needed code as being dead, but it can
also reduce the resulting image size considerably.
- There is a new GCC plugin called "latent_entropy", which comes
from the grsecurity/PaX patch set. It will instrument
the kernel in an attempt to collect randomness, especially during the
early bootstrap process.
- New hardware support includes: Loongson 1C processors, Freescale "data patch acceleration architecture" hardware buffer and queue-management subsystems, and Imagination Technologies ASCII LCD displays.
At this point the feature work is done; all that remains is to stabilize all that new code for the final 4.9 release. If all goes according to the usual schedule, that release can be expected on December 4 or 11.
Rethinking device memory allocation
James Jones started his 2016 X.Org Developers Conference (XDC) talk by saying that he would like to make some real progress at the conference on creating a user-space API for allocating memory that is also accessible by various devices. His talk on day one of the conference set the stage for a meeting of interested developers on day two. By day three, he reported back in a lightning talk on the progress made.
Jones has worked at NVIDIA on window system integration over the last decade or so, which originally meant X11, but now also includes other window systems. There are some existing solutions for memory allocation, but NVIDIA noticed some drawbacks to them when it tried to make them work with its drivers. So the company proposed EGLStream as a solution, which was "not so well-received so far", but it did help identify the problems that need to be solved.
That proposed patch added EGLStream to the Weston compositor, but it launched a discussion of Generic Buffer Management (GBM), which Weston already uses for memory allocation, versus EGLStream. Many strong views were expressed in that discussion; there has already been considerable investment in the existing APIs, both by Mesa and Wayland developers as well as by NVIDIA, so it is not surprising that there were differences of opinion. But it was nice to have a civil discussion about the memory allocation issue, he said, and many areas for improvement were identified. The discussion has died down and it was suggested that XDC would be a good venue to make some progress on the issue.
The problem is how to allocate device-accessible surfaces (memory buffers for various kinds of graphics and video data) from user space. The devices are things like GPUs, scanout engines, and video encoders and decoders. The surfaces allocated are for textures, images, and such; there is a need for some kind of handle for the surfaces that can be securely passed between user-space processes. In addition, a way to manage the surface state (e.g. format, color parameters, compression) and its layout in memory needs to be part of the API. In order to use these buffers in different parts of the system, some kind of synchronization mechanism is required. The latter is not directly related to the allocation problem, but is something that needs to be kept in mind, he said.
His goal is to get a consensus-based forward-looking API for surface allocation, but he has "no idea" what that API will be, at least yet. It should be agnostic with regard to window systems, kernels, and graphics vendors. So it will be able to be used for window systems like Wayland and others, by old and new Linux kernels, and by other kernels beyond Linux, as long as they are POSIX-like. It would have a "minimal but optimal driver interface" that would still be able to use "100% of the GPU's capabilities". While not directly related to surface allocation, the "final destination", he said, is to have "a completely optimized scene graph" for Weston and other scene-graph compositors.
Prior art
Jones then went into a review of the existing solutions to this problem, with their pros and cons—starting with GBM. At the basic level, GBM has the ability to allocate surfaces and to arbitrate the uses of a surface with a set of flags. It also provides handles to those surfaces. It is incorporated into many code bases at this point, so it is widely deployed and well tested. It has a pretty minimal API and fairly small implementation.
But GBM does have some shortcomings. The handles are process-local; there are ways to import handles from elsewhere, but not to export them to other processes using the API. It is focused on GPU operations (texturing, rendering, and display), so there is no way to specify that a surface would be used for rendering and passed to a video encoder, for example. Related to that is that the arbitration for the capabilities needed by a surface is done only in the scope of a single device, so you can't use the API to specify surfaces that will be used with multiple devices.
The Chrome OS Freon project attempted to add surface state management capabilities on top of GBM. There was a lot of discussion between vendors, but no consensus was reached on an optimal design, so something "not ideal" was settled on. The main point of contention was the level of abstraction in describing the transitions between various uses of a surface.
Android's Gralloc has a similar feature set to GBM. It has support for synchronization using fence file descriptors, but passing handles between processes requires other components from an Android system as there is no direct support for it in Gralloc. It has been widely deployed and is proven in the field. It also has an allocation-time usage specification that has support for non-graphics usage (such as video encoders and decoders).
Many of the shortcomings of Gralloc are similar to those of GBM as well. There is no explicit surface state management and the arbitration abilities are flag-based. It is open source, but the API is proprietary in some sense, since Google controls it.
EGLStream was developed to solve the problems he described, so it is not surprising that it provides allocation, arbitration, handles that can be shared by different processes, state management, and synchronization. NVIDIA has been shipping EGLStream for quite some time for a lot of different use cases, he said. It has been ported to all of the different operating systems that the company supports and has a comprehensive feature set.
While EGLStream is an open standard, in practice there is only a single vendor that has implemented it. It does not have cross-device support and it is EGL-based, which may complicate things by bringing OpenGL into the picture. It has been said that EGLStream does too much encapsulation and tries to do too much extra within the API. In addition, its behavior is loosely defined, or even undefined, in some cases.
The DMA-BUF allocation mechanism provides handles to memory allocations that can be shared between drivers; it supports non-graphics devices as well. But it does not have a centralized user-space allocation API, is Linux-only, and lacks any way to describe the content layout. It also only has a limited means to describe the planned usage of the memory at allocation time.
The Vulkan 3D graphics and compute API is one other thing to consider, Jones said. It provides an allocation mechanism as well as the most detailed allocation-time usage specification that he knows of. It has explicit state management and has a robust synchronization mechanism as well. Vulkan is both extensible and portable, but there is no support at this point for cross-process handles or arbitration. It is also focused only on graphics, compute, and display operations.
Path forward
Based on the prior art and the needs going forward, a set of features needed was identified and generally agreed upon. Whatever the new API is, it should be minimal—anything that is not needed should be eliminated. It should also be portable to multiple platforms and have support for non-graphics devices (e.g. rendering to a video encoder or texturing from a video decoder). It should also use the GPU optimally in the steady state when someone is not moving windows around on the screen; X11 already has this, so anything new should be at least as good.
To achieve that, he believes there is a need for something like what Vulkan has in terms of an allocation-time usage specification. So when the driver is asked for an allocation, all of the different use cases for the surface can be specified. That will allow the driver to negotiate the surface capabilities based on those use cases. During transitions (such as moving a window or going from a window to full screen), the performance still needs to be good. The idea is to allow multiple uses of the surface without having to do reallocations.
So, there are various existing APIs and a set of more-or-less consensus goals; what is the path forward? He suggested focusing on solving specific problems that occur with the existing APIs, rather than trying to pick a winner from those APIs. By solving the problems, it will become clear what the API should look like—what it is called at that point is not particularly important.
Specifically, he suggested that the focus should be on how to create a surface that is cross-driver, cross-engine, and cross-device. Historically, that has been where everything falls apart. If agreement can be reached on that, other simpler cases will just fall out naturally.
He presented a set of assumptions that he hoped would help simplify the initial discussions. To start with, those working on this problem should assume they are designing an ideal allocation API. That may not actually be the case, but it is a good way to think about it. Thinking in terms of the user-space API first, while keeping both API elegance and the capabilities of the hardware in mind, is also important.
There needs to be a standard way to describe the capabilities of different devices (for example, devices have different tiling formats, but other drivers won't know anything about some of those formats). It could be similar to the Khronos data format specification but cover other types of capabilities beyond pixel data formats.
Capabilities could then be queried from each driver, though the list could become quite large, so some filtering mechanism would be needed. There would also need to be a central authority of some sort to maintain the capability namespace. That could simply be a file in a Git repository or, perhaps, a group like Khronos—it simply needs to be authoritative. The surface allocation layer would collect up and intersect the capabilities of all of the different drivers.
There is a question of how to filter these capabilities. The API could provide a way to describe the desired usage of the surface, including things like its format, dimensions, and the operations that will be performed using it. The Khronos data format could again be used as a model for how to describe this information. Some types of data have obvious representations (e.g. width/height) and others can be indicated using Boolean flags like those in Gralloc. But there would also be capabilities that are driver-specific, so drivers would have to ignore ones that are targeted at other devices.
Once the capabilities that are not supported by all of the involved drivers have been eliminated, there needs to be a way to choose the optimal remaining choice. Sorting the remainder depends on the implementation and usage, so it cannot be done by the common framework. His straw-man proposal was to let the application decide once the list has been narrowed down.
After the surface has been allocated, its chosen properties must also be described. That could perhaps use the same data format as the capability information, but it must be communicated to the requester in some fashion.
He finished the presentation by noting that all of what had been discussed thus far concerned the image-level capabilities for the requested memory. But there are also some memory-level capabilities that may come into play, notably whether the memory must be physically contiguous. He thought that the image capability concept could be generalized to cover the memory-level requirements as well. Extensibility to allow for tiling layouts or hardware compression of surfaces, for example, would also be important.
Results
In Jones's lightning report of the meetings held on day two, he indicated that some good progress had been made; agreement had been reached on some key points. An allocation request will contain some basic properties like width, height, and format (others will be available via an extension mechanism) along with a list of usage descriptions (e.g. render target, video encoder input).
The arbitration of the properties is based on intersected sets of supported capabilities along with sets of constraints that get combined together (e.g. a certain stride might constrain the alignment differently between devices). The exact merging of the constraints may not exactly be the union of them, but the merging algorithm will be baked into the library, he said. There will be a set of common capabilities, but some can be vendor-specific; constraint definitions will be shared.
The capability sets will be reported back to the application, which can serialize them to pass to other processes to allow for incremental refinement. Processes could ask that the list be filtered for specific uses to help winnow down the choices. Once that is done, the sorting is handled by the drivers and the allocation takes place once a single capability set has been chosen. This API will be exposed via a library that has user-space driver/vendor back-ends.
There are still plenty of things to be resolved, particularly how sorting the capabilities is actually done. There was a lot of discussion how that might be handled, but no conclusion was reached. In addition, the application may need to be able to tell the hardware when the surface is only being used as one of the use cases and when it transitions to one of the others, but how to do that has not been determined.
How to specify format types is another unresolved piece and they did not discuss the type of handle that would be used for an allocated surface. There is a question whether devices will be enumerable using the API. Also, which kernel interface would be used for allocation has not been resolved. Essentially, Jones said, it has reached a point where folks need to go off and start doing some research and trying things out before further progress can be made.
For more information, Jones's PDF slides from the talk are available, as is YouTube video of his talk and lightning talk report. His notes from the meetings are also available. He posted an update and pointer to his GitHub repository on the dri-devel mailing list on October 4.
[I would like to thank the X.Org Foundation for sponsoring my travel to Helsinki for XDC.]
Linux drivers in user space — a survey
Writing device drivers in user space, rather than as kernel modules, is a topic that comes up from time to time for a variety of reasons. The kernel's approach to user-space drivers varies considerably depending on the type of device involved. The recent posting of a patch set aimed at allowing LED drivers to be written as user-space programs seems like a suitable opportunity to have a look at the range of options currently available.
For it to be possible to write a device driver in user space it is necessary for the kernel to export the required interfaces. There are two different sorts of interfaces, that meet different needs, that the kernel can export; I will call them "upstream" and "downstream" interfaces.
When one reflects on the tree-like nature of the driver model, as described in an earlier article, it is clear that there is a chain, or path, of drivers from the root out to the leaves, each making use of services provided by the driver closer to the root (or "upstream") and providing services to the driver closer to the leaf (or "downstream"). An upstream interface allows a user-space program to directly access services provided by the kernel that normally are only accessed by other kernel drivers. A downstream interface allows a user-space program to instantiate a new device for some specific kernel driver, and then provide services to it that would normally be provided by some other kernel driver.
Upstream interfaces
An upstream interface is one that provides access to some hardware, possibly more directly than with the standard interfaces. In several cases this is provided not with a new interface but with a slight modification to an existing interface. Opening a block device with the O_DIRECT flag allows directly reading from and writing to that device without involving the page cache or the readahead and write-behind that it supports. Similarly, direct access to a serial port is obtained by opening a TTY device and disabling certain termios settings such as ECHO and ICANON. The documentation for cfmakeraw() identifies 16 such flags that are cleared.
Direct access to a network device can be achieved by creating a network socket using the AF_PACKET address family and specifying the SOCK_RAW communication type. This socket can then be bound to a particular interface or a particular Ethernet protocol type. A slightly less direct interface can be had by using SOCK_RAW with AF_INET. This still provides the routing and other functionality common to all IP protocols, but gives complete control over the payload of each IP packet.
Moving on to more purpose-built interfaces, the sg and bsg drivers (SCSI generic and block SCSI generic) both provide direct access to SCSI devices, or other devices such as SATA that use a compatible protocol. They allow SCSI command descriptor blocks (CDBs) to be sent to devices and to have results returned. The bsg interface is integrated with the block layer and supports a newer version of the sg interface that includes support for bidirectional commands. libsgutils is the recommended mechanism for making use of these interfaces, rather than working directly with /dev/sgN. Similarly, libusb provides a direct interface to USB devices, allowing arbitrary USB commands to be sent to any connected USB device.
I2C and SPI — 2-wire and 4-wire buses for communicating between integrated circuits on the same board — can be directly accessed via special-purpose character devices. For I2C, the i2c-tools package provides a scriptable interface. For SPI there do not appear to be any packaged solutions, though the armbedded.eu web site provides some code that would be worth trying for anyone who is interested.
All the interfaces listed so far are always available, to sufficiently privileged processes, if the kernel knows about the target device at all. Other interfaces require the kernel to be explicitly instructed to export a low-level interface. In the case of GPIOs (general-purpose I/O pins) and power regulators, this is as simple as adding some directives to the device-tree description of the hardware. The devices then appear in sysfs complete with attribute files allowing relevant settings to be changed and values to be read.
Finally, and requiring even more in-kernel support, is the UIO framework, which is intended for devices that are accessed through memory-mapped device registers, as is the norm for devices attached to PCI and similar buses. A simple in-kernel device driver can be written using the UIO framework that allows a user-space program to map that register bank into its own memory, and also to respond to interrupts from the device. This does not provide generic access to any PCI device, but does make it easy to get user-space access to a particular device of interest, so that the bulk of the driver can be developed, debugged, and maintained outside of the kernel.
This variety of different interface styles could be seen as a hodge-podge that is just crying out to be unified. On the other hand, different sorts of devices really are different and need different sorts of interfaces. Part of the role of an operating system like Linux is to hide as much of that difference as possible behind uniform abstractions. It should not be surprising that, if we want to bypass those abstractions and access the devices directly, we will be confronted by the variety that Linux generally tries to hide.
Downstream interfaces
Where upstream interfaces provide direct access to hardware, downstream interfaces allow a program to emulate some hardware and so provide access to other programs that expect to use a particular sort of interface. Rather than just providing a different sort of access to an already existing device, a downstream interface must make it possible to create a new device, configure it, and then provide whatever functionality is expected of that device type.
Probably the first driving force for these downstream interfaces was the introduction of networking and the consequent desire to allow a program on one computer to work with a device on another computer. With this came pseudo TTYs (PTYs), which are likely the oldest downstream interface in Unix. They allow a TTY to be created on which a user can log in and run programs that don't need to be aware that they are not attached to a physical terminal. The text entered can easily come from anywhere on the network, and the output generated can go back to the same place (or elsewhere).
The desire for network access to storage brought about such things as nbd, the network block device, and NFS, the network file system. Their design differs from that of PTYs in that they don't just provide an interface to user space that a network service could use but, instead, create the network connection themselves and define a protocol to carry the data and control over that connection. The most likely reason for this is that managing a storage service in a user-space program is prone to deadlocks. If the program ever needs to allocate memory, the kernel might choose to free up memory by writing out to a storage device, and if that device is managed by the program allocating memory it could easily deadlock. It is much safer to bypass user space and send directly to the network.
These network protocols can still serve as downstream interfaces in that they make it possible to instantiate a block device (with nbd) or a filesystem (with NFS) and provide services to it. This has been used to good effect with automounting programs such as amd (subsequently renamed to am-utils) that present as an NFS filesystem that contains only directories and symlinks (thus avoiding any deadlock issues) and transparently mounts filesystems when they are first accessed.
Though using NFS for this purpose is quite effective, it is not perfect; due to the limited possible interactions with the Linux virtual filesystem layer, filesystems must be mounted somewhere else and the NFS filesystem only contains a symbolic link to the real mount point. To address this shortcoming, Linux provides a dedicated downstream interface for creating filesystems, autofs, which supports the extra interactions required to automount filesystems directly onto directories.
Similarly there is a downstream interface for writing filesystems that is careful about how it interfaces with the page cache, and manages to avoid the writeback deadlocks described above: FUSE.
As part of FUSE there is CUSE, which allows character devices to be implemented in user space. There does not appear to be a corresponding "BUSE" for implementing block devices in user space, though some years ago there was a proposal for "ABUSE" which aimed to do just that. Block devices can be implemented in user space on a remote machine using nbd and presumably that is sufficient to meet most needs.
Networking plays a role in the next pair of examples too; the TUN and TAP drivers allow network devices to be emulated. TAP sends and receives Ethernet frames, so any networking protocol can be used with a TAP device. TUN works at the IP level, which is simpler and often sufficient providing there is no need to handle non-IP protocols such as ARP. These can most obviously be used for tunneling and creating virtual private networks (VPNs) but could also be used for user-space monitoring and filtering of network traffic.
Network devices, block devices and character devices (which include TTYs) cover all the device types that Unix supported before Linux came along. Linux has added a variety of new device types, some of which can be implemented in user space.
The input subsystem provides a standard interface for input device such as keyboards, mice, joysticks, touch pads, and similar devices. These are exposed to user space as character devices, so it might be possible to emulate them using CUSE, but it is more convenient if they are integrated with the rest of the input subsystem, and that is what uinput allows. If a program opens /dev/uinput and issues some ioctl() commands, a new input device is created. Events will be reported on that device when they are written to the file descriptor opened on /dev/uinput.
User-space LEDs — how and why
The latest addition to the collection of downstream interfaces is conceptually similar to uinput but it allows the emulation of LED devices rather than input devices. To support this functionality, it introduces a new device called /dev/uleds. Opening this device and writing the name of a new device (zero-padded to 64 bytes) will create an LED device with the given name.
There is no option to configure any other aspects of the LED, but there is not much that could be configured anyway. A LED can generally indicate the number of brightness levels that it can support; LEDs created with uleds always support 256 brightness levels. Whenever the brightness is changed, a byte can be read which reports the new level. An LED can also indicate that it knows how to blink so that, when needed, it can be given a single "blink" request rather than periodic "on" and "off" requests. A uleds device cannot be used to experiment with this functionality, but it could undoubtedly be added later using an ioctl() if a need was found.
The particular need that is driving the development of this interface
by David Lechner is the desire to make two embedded systems compatible
with one another. "I would like to make a userspace program
that works the same on both devices.
" If that program accesses
an LED device directly, the device must appear to be present on both
systems; where it isn't physically present it can now be emulated,
possibly using a widget on a graphic display.
At much the same time Marcel Holtmann had been working on a similar interface to allow the testing of LED triggers from the Bluetooth subsystem. Various subsystems can be connected to a LED, using a trigger, to signal the current state of that subsystem. Without an LED device, it is hard to test those triggers. With the ability to emulate a LED device, that impediment to development need no longer exist.
The 4th revision of the user-space LEDs patch set was posted in mid-September and appears to have addressed all the issues that reviewers found. We can expect the code to land in mainline for Linux 4.10. It seems unlikely that this will be the last device type that someone will want to emulate. Some devices, such as power regulators, seem so intimately related to hardware that it is hard to imagine an emulator ever being wanted. Others, like maybe a GPIO, might usefully be provided with a downstream interface for emulation. Whether there turns out to be a genuine need for that is something we will have to wait to see.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Device drivers
Device driver infrastructure
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Distributions
Browserified JavaScript in Debian
A disagreement about the status of certain arguably non-free JavaScript components and their inclusion into the "main" Debian archives (as opposed to "contrib" or "non-free") is interesting in its own right, but it also raises some governance questions for the distribution. Debian is known for its strict adherence to the Debian Free Software Guidelines (DFSG), so it won't be a surprise that the question about the JavaScript components revolves around their fitness under those guidelines. The other piece, though, goes even deeper, perhaps, as it centers on the role and powers of the Technical Committee (TC) as laid out in the Debian Constitution.
JavaScript is increasingly used in a variety of different projects and there is often some kind of processing done on the raw JavaScript code for various purposes. "Minified" JavaScript gets obfuscated by making variable and function names as small as possible and removing extra white space, which is largely done to reduce its size for sending over the network, but has the effect of making it unusable as source code. Other projects use "browserified" JavaScript, which collects up several different modules into one file—sometimes performing other transformations along the way. And, of course, some projects use both.
Many of the projects that minify or browserify code use Grunt as a build system. But, Grunt is not packaged for Debian due to its reliance on the non-free JSHint component. That makes it problematic for Debian packages to build these files as part of the packaging procedure, so the packagers end up just including the processed files from upstream. The DFSG is quite clear, though, that Debian packages must come with source code—and it seems to be generally accepted that this processed JavaScript code does not qualify.
The issue raised its head again recently, when Pirate Praveen filed a bug
requesting that the TC grant an exception for browserified
JavaScript to be included in the Stretch (Debian 9) release, which is
nearing its freeze date. In the bug, Praveen said that "every major web based software will have to be moved to contrib
because its likely at least one of the javascript dependencies are in
browserified form
". He listed Diaspora, GitLab, Pagure, and
Prometheus as being affected and suggested that there be an effort to get
Grunt packaged for the release after Stretch so that the problem would be
resolved in that release.
But, as happened with a similar
bug Praveen filed back in July, the Technical Committee members are not
quite sure what they are being asked to do—nor whether they actually have
the power to make a ruling on the issue. In July, TC member Don Armstrong
asked: "Could you clarify what the precise question is that you'd like the CTTE
to answer?
" In the more recent bug, TC member Tollef Fog Heen echoed that:
Joseph R. Justice, who is something of a neutral party with regard to the dispute, provided a helpful summary of some of the previous discussions of the issue. He tried to make it more clear what decision was being sought: effectively, either overriding the FTP team (which decides what can go into the main archive) with regard to Stretch or making a statement that the TC agrees with the decision and recognizes that it means users will have to enable the contrib and/or non-free archives for these types of packages.
TC member Sam Hartman responded, noting that the FTP team had not yet decided that the browserified JavaScript makes a package unqualified for the main archive, though he admitted that it was pretty clear the team would rule that way. But he also raised a constitutional question:
Beyond that, Hartman said that he wasn't comfortable making a blanket
ruling about browserified JavaScript and would rather see some kind of
intellectual framework that could be more widely applied, rather than make some ad
hoc decision. "User convenience is something we're likely to consider, but 'source is
what we need source to be so things work well for users,' is going to be
a really improbable sell.
" But that didn't sit well with Adrian Bunk, who pointed
to a bug
filed against the Perl Configure script, which cannot be generated from the
Debian sources. The Perl maintainer seems to have more or less routed
around the DFSG question. Bunk said:
Clearly perl isn't going to be kicked out of Debian because of this, but a less important package might well be.
That is exactly the problem here - browserified javascript is not important enough, so FTP team and TC are getting away with not making a decision.
Hartman admitted that he disagreed with the Perl maintainer's decision, but that still leaves the browserified JavaScript question (and the libjs-handlebars bug that Praveen is most concerned about) unresolved. Several in the thread suggested that Praveen take the question directly to the FTP team, which he did. The response indicated that the team would not oppose the Release team granting an exception for Stretch. So Praveen, who appears to be dogged in his persistence, filed a bug requesting such an exception. The request was a bit overbroad, perhaps, but it seems likely that something will be worked out for libjs-handlebars at least.
In the thread, both Bunk and Ian Jackson expressed frustration with the seeming inability of the TC members to find a way to help resolve the situation—or at least to move it along in a positive direction. It is an issue that apparently pops up with some frequency (not necessarily with regard to browserified JavaScript, but for Perl, SQLite, and others), but never seems to get resolved. However, the TC members seem to feel that they don't really have the proper authority.
On the tech-ctte mailing list, TC member Didier Raboud asked Debian Project
Secretary Kurt Roeckx for some assistance in determining what authority the
TC has in overriding decisions made by teams who have been delegated their
authority by the Debian Project Leader (DPL), such as the FTP and Release teams.
Roeckx's answer was not entirely clear-cut,
in part because the TC has broad powers, but the main takeaway is the
following: "The only way I can see how to overrule a delegate of the
DPL is by using a GR [General Resolution].
" He did note that it is
possible the TC could "overrule" a delegate by using a different power that
ended up having
the same effect. It is a little hard to see that ever happening without it
causing some major upheaval within Debian, though.
It would seem that progress has been made here—in typical Debian style. The project has built up its processes and procedures over the years and it sometimes looks rather bureaucratic in its deliberations and decision-making, but project members mostly seem to respect and even revere how it all works. That type of organization may not be for everyone, but there are plenty of choices in the free software world for those who find Debian too constraining. As a large, and largely smoothly functioning, group, though, Debian shines in its ability to work things out without rancor for the most part.
Brief items
Distribution quotes of the week
-=-=-=-=-=- Don't Delete Anything Between These Lines =-=-=-=-=-=-=-=- 59c0e60e-94a1-11e6-8e0b-0e6d1d2d9c75 [ ] Choice 1: Repeal subsequent GR [ ] Choice 2: Bcc all correspondence to Wikileaks [ ] Choice 3: Run in circles, scream and shout [ ] Choice 4: Further Discussion -=-=-=-=-=- Don't Delete Anything Between These Lines =-=-=-=-=-=-=-=-
Ubuntu 16.10 (Yakkety Yak) released
Ubuntu 16.10 (Yakkety Yak) has been released. "Under the hood, there have been updates to many core packages, including a new 4.8-based kernel, a switch to gcc-6, and much more." The flavors Kubuntu, Lubuntu, Ubuntu GNOME, Ubuntu Kylin, Ubuntu MATE, Ubuntu Studio, and Xubuntu have also been released. Ubuntu 16.10 will be supported for 9 months.
RebeccaBlackOS Wayland Live CD release
There are new iso images available for the Wayland showcase distribution RebeccaBlackOS. "I have almost everything Wayland related on these ISOs, and also now, there are an increasing number distributions containing Wayland sessions, such as Fedora with Gnome-Shell, and KDE Neon's ISOs. My ISOs have more Wayland Desktops however, both KDE and Gnome, as well as Enlightenment, Orbital, Hawaii, Orbment and Sway. Also all Wayland enabled toolkits, Qt, GTK, EFL, SDL, glfw and FreeGLUT This has the master versions of all the Desktop Environments."
Live kernel patches for Ubuntu
Canonical has announced the availability of a live kernel patch service for the 16.04 LTS release. "It’s the best way to ensure that machines are safe at the kernel level, while guaranteeing uptime, especially for container hosts where a single machine may be running thousands of different workloads." Up to three systems can be patched for free; the service requires a fee thereafter. There is a long FAQ about the service in this blog post; it appears to be based on the mainline live-patching functionality with some Canonical add-ons.
Distribution News
Debian GNU/Linux
DebConf18 in your city! Meet end-of-year bid deadline
The Debian Project is seeking proposals for hosting DebConf18. "There are a lot of past DebConf organisers around to help, so you're not on your own. But we'll need your energy and local expertise to find a venue, accomodation, and all the other local options. Of course, we'll be able to provide guidance all along the way." Proposals are due by December 31.
Other distributions
Scientific Linux 5 Six Month Warning - SL5 End Of Life
Scientific Linux 5 will reach end-of-life on March 31, 2017. There will be no more updates, including security updates, after that date.
Newsletters and articles of interest
Distribution newsletters
- Debian Misc Developer News (October 17)
- DistroWatch Weekly, Issue 683 (October 17)
- Lunar Linux weekly news (October 14)
- openSUSE news (October 13)
- openSUSE Tumbleweed – Review of the Week (October 14)
- Ubuntu Kernel Team weekly newsletter (October 11)
- Ubuntu Weekly Newsletter, Issue 484 (October 16)
Parrot Security 3.2 “CyberSloop” Ethical Hacking OS (TechWorm)
TechWorm reviews Parrot Security 3.2. "If you are a hacker, pentester, or a security researcher, this news should interest you. The best Linux OS after Kali, Parrot Security 3.2 “CyberSloop” was released today. The developers released the second point release to the Debian-based Parrot Security 3.x GNU/Linux distribution designed for ethical hackers and security researchers." A brief release announcement is on the Parrot Blog.
Open source ResinOS adds Docker to ARM/Linux boards (HackerBoards)
HackerBoards takes a look at ResinOS. "Resin.io, the company behind the Linux/Javascript-based Resin.io IoT framework for deploying applications as Docker containers, began spinning off the Linux OS behind the framework as an open source project over a year ago. The open source ResinOS is now publicly available on its own in a stable 2.0.0-beta.1 version, letting other developers create their own Docker-based IoT networks. ResinOS can run on 20 different mostly ARM-based embedded Linux platforms including the Raspberry Pi, BeagleBone, and Odroid-C1, enabling secure rollouts of updated applications over a heterogeneous network."
Page editor: Rebecca Sobol
Development
PostgreSQL 9.6 improves synchronous replication and more
The PostgreSQL project released version 9.6 on September 29th. This new major release has an assortment of new goodies for PostgreSQL fans, including parallel query and phrase search, new options for synchronous replication, remote query execution using foreign data wrappers, "crosstab" data transformations in psql, and more. Together with version 9.6, the community released a completely rewritten version of the pgAdmin database graphical interface.
The 9.6 release is also notable because it was an on-time release, after two years of increasingly late releases by the project. Version 9.5 was released in January 2016 instead of its target of October 2015. The timeliness of the release is probably due to the project's adoption of a release management team last year.
Despite the short development window, 9.6 is full of interesting features. We'll explore multiple synchronous replicas, foreign data wrapper changes, crosstabs and the new pgAdmin here.
Multiple Synchronous Replicas and Remote Apply
PostgreSQL has had "synchronous replication" (SR) as an option since version 9.1. The "synchronous" part of that refers to the fact that transactions are committed on one or more replicas before a success message is returned to the database client. This guarantees that, if the client saw a commit message, the data is persisted to at least two nodes. Users primarily use this option when data is so valuable that they'd rather have the database reject writes than risk losing them after commit should the database cluster master fail.
Such users are also willing to put up with slower response times due to the doubled network lag. But they don't have to pay this cost with every commit: PostgreSQL allows users to choose synchronous or asynchronous replication on a per-commit basis. If only a minority of the user's data is critical, only that data needs to be copied before commit.
Synchronous replication had two major limitations that have been addressed in version 9.6. One is that SR only supported a single synchronous replica at a time; while multiple replicas could be designated as "candidate replicas," only one of them would be synchronous with the master for any individual commit. This meant that users wanting to guarantee that data was written to at least three locations — sometimes a requirement for valuable data — had no way to do so. Now they do.
In prior versions, you would designate the synchronous replicas for a master server by setting the 'synchronous_standby_names' parameter like so:
synchronous_standby_names = 'sanfran, london, singapore'
What this meant was that the master would attempt to write transactions synchronously to the replica 'sanfran'. If sanfran was offline, then it would write to london. The other replicas would maintain asynchronous replication so that they would be prepared to take over if required. For example, in that cluster you might see this:
postgres=# select application_name as server, state,
sync_priority as priority, sync_state
from pg_stat_replication;
server | state | priority | sync_state
-----------+-----------+----------+------------
sanfran | streaming | 1 | sync
london | streaming | 2 | potential
singapore | streaming | 3 | potential
In version 9.6, you can designate a number of synchronous replicas to target, like so:
synchronous_standby_names = '2 (sanfran, london, singapore)'
This means that both sanfran and london need to acknowledge transactions before they commit. We can see that there are now more than one replica in sync:
server | state | priority | sync_state
-----------+-----------+----------+------------
sanfran | streaming | 1 | sync
london | streaming | 2 | sync
singapore | streaming | 3 | potential
If sanfran or london lose their connection to the master, then singapore becomes synchronous. Now the replicas database view looks like this:
server | state | priority | sync_state
-----------+-----------+----------+------------
london | streaming | 2 | sync
singapore | streaming | 3 | sync
If two servers in that group go offline, the master will stop accepting synchronous writes. Clearly, this is a feature intended for data valuable enough to put up with a lot of overhead to guarantee that no data will be lost.
The other limitation with SR addressed in this release has to do with the consistency of visible data. Previously, while the data was guaranteed to persist on the synchronous replica, that didn't mean that it was necessarily visible if someone queried the replica too soon. The reason that data might not be immediately visible was that the transaction could be written to the transaction log but not applied to the copy of the database in memory due to blocking by other concurrent requests on the replica. This could cause an application that load-balances queries between the master and the replica to fail to read a write it just committed on the master.
PostgreSQL 9.6 fixes this with the remote_apply option for synchronous transactions:
bench=# begin;
BEGIN
bench=# set synchronous_commit = 'remote_apply';
SET
bench=# update pgbench_accounts
set abalance = 100
where aid = 100 and bid = 1;
UPDATE 1
bench=# commit;
This mode does not return to the client until the transaction is visible on all synchronous replicas. This means that a read against a replica even a millisecond later will return consistent data. Of course, this results in longer response times, so the new mode is mostly useful for applications with a low volume of database writes who want to support a write-then-read pattern for load-balancing among read replicas.
The next step for synchronous replication is supporting true quorum replication. This feature, which is likely to be available in PostgreSQL 10 next year, allows a user to specify a group of replicas and, for example, designate that transactions need to be synchronously committed to three out of five of them in no particular order. This will allow users to improve response times over 9.6's prioritized lists of replicas without reducing retention guarantees.
Remote Query Operations
In an ongoing multi-year effort, the PostgreSQL project has been improving its database federation features. Federation is a set of features where a single database server can run query operations against multiple other servers in order to spread out workloads or integrate different applications. In PostgreSQL, federation is accomplished with Foreign Data Wrappers (FDW), a feature based on ANSI SQL standard syntax.
While Postgres's FDW extensions support a multitude of data sources as diverse as MySQL, Redis, and Twitter, the most powerful FDW is the Postgres-to-Postgres extension, postgres_fdw. While postgres_fdw was previously capable of executing searches (scans) on remote servers, sorting data and joins between tables would be executed locally. Now, these can be performed on the remote server as well, a feature known as "push-down".
For example, say that you wanted the top ten accounts from the accounts table on a remote server. You could now grab that and offload all of the work onto that server. You can also join accounts to branches on the remote server, returning only the joined data. First, let's import all of the tables in a remote PostgreSQL database as FDW tables in this database:
CREATE EXTENSION postgres_fdw;
CREATE SERVER bank_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS ( host 192.168.1.102, dbname 'accountdb',
use_remote_estimate on );
CREATE USER MAPPING FOR CURRENT_USER
SERVER bank_server
OPTIONS ( name 'acct_user', password 'mxm34dd7i' );
The above commands are the same since version 9.3, except for the new use_remote_estimate option. This option, if the extension supports it, tells the database to estimate the cost of executing queries on the remote server. Next we'll import all tables in the remote database, and run ANALYZE to gather statistics on them:
IMPORT FOREIGN SCHEMA public
FROM SERVER bank_server
INTO SCHEMA public;
ANALYZE;
Now we can join tables on the remote server:
bank=# explain analyze
select branchname, aid, abalance
from accounts
join branches using (bid)
order by abalance desc limit 10;
QUERY PLAN
-----------------------
Limit
-> Sort
Sort Key: pgbench_accounts.abalance DESC
Sort Method: top-N heapsort Memory: 25kB
-> Foreign Scan
Relations: (public.accounts)
INNER JOIN (public.branches)
Planning time: 9.892 ms
Execution time: 497.107 ms
So here the database is showing you that the "Foreign Scan" is being performed on the remotely JOINed relations, and not by retrieving them and JOINing them locally. For a query like the above, this can be several orders of magnitude faster, depending on table sizes and network bandwidth.
For writing data, postgres_fdw previously had the performance issue that it would only delete one row at a time on the remote server. This was often quite slow. Now it can execute UPDATEs and DELETEs entirely on the other server:
bank=# explain delete
from accounts
where bid = 7;
QUERY PLAN
------------------
Delete on pgbench_accounts
-> Foreign Delete on pgbench_accounts
Version 9.6 also includes the ability to execute user-defined functions on remote servers. While these features are currently only available in the postgres_fdw extension, the PostgreSQL community hopes that other FDW extension authors will take advantage of the support for them in the core database engine. For version 10, the developers will be working on remote execution of aggregation and other operations.
Crosstabs in psql
PostgreSQL's command-line database client, psql, has gained another feature that helps users treat it like a complete programming environment. In 9.6, they can now use "crosstabs" in order to manipulate data for copy and paste reporting.
Crosstabs are a feature many people will be familiar with from spreadsheet programs, which sometimes call them "pivot tables." The idea is to take the results of a query, and re-arrange it so that some of the rows become columns. For example, say I had this list of totals for LWN subscribers (all data is fictional):
lwn=# select city, level, count(*)
lwn-# from lwn_subscribers
lwn-# group by city, level;
city | level | count
---------------+--------------+-------
San Francisco | starving | 203
Brisbane | leader | 100
Melbourne | supporter | 35
New York | starving | 204
Portland | starving | 212
...
Now, say I wanted to show the city on the left, and the subscriber level across the top. In version 9.6, I could do this:
lwn=# select city, level, count(*)
lwn-# from lwn_subscribers
lwn-# group by city, level
lwn-# \crosstabview city level count
city | starving | leader | supporter | professional
---------------+----------+--------+-----------+--------------
San Francisco | 203 | 114 | 50 | 266
Brisbane | 198 | 100 | 36 | 255
Melbourne | 96 | 36 | 35 | 113
New York | 204 | 105 | 48 | 271
Portland | 212 | 94 | 50 | 243
Perth | 93 | 46 | 21 | 111
That's a result just a bit of formatting away from a finished report. The PostgreSQL command-line interface also now includes the ability to edit VIEW definitions. Given that the project's developers use psql more often than any other database client, you can expect them to keep adding features to it in every version.
All New pgAdmin
In further database client news, the PostgreSQL project's semi-official graphical user interface, pgAdmin, has undergone three total rewrites over the years. The first two versions, pgAdmin and pgAdmin2, written in different releases of Visual Basic. In 2002, it was re-written in C++ and wxWindows in order to be multi-platform and released as "pgadmin3". The new version, pgAdmin4, has been re-created again, this time in Python, Flask, JavaScript, and Qt.
This means that, for the first time, pgAdmin has both browser-based and desktop versions. Given the decline of desktop client applications among developers, and the popularity of PostgreSQL with the Python community, the current rewrite seems timely.
To use the browser-based version, a user installs the Flask web application on their database server, or on another server that can connect to the database server. Then the user can access and manage their databases using an the graphical interface. For example, here's the "monitoring" view for a server:
You can also explore and modify database objects:
And run SQL queries:
It is a complete database administration tool, covering other activities such as backups, data import/export, and permissions management. While this Flask application could be packaged with each PostgreSQL server, a single pgAdmin node can connect to any PostgreSQL servers on the same network where the user has an administrative login.
The desktop version is based on the same basic Python code, but uses Python-Qt to offer a desktop application for users who prefer that to browser-based interfaces.
On to PostgreSQL 10
Now that PostgreSQL 9.6 is out, the developers have moved on to working on the next version, to be named PostgreSQL 10. This also involves a change in how the project does version numbers. As with other community open source projects, exact feature lists are unpredictable. Some features are under heavy development and look likely for the next version, including:
- The pglogical streaming logical replication tool will allow simpler, more reliable replication between PostgreSQL databases of different major versions. Among other things, this will allow "hot upgrade" via replication.
- Improved hash indexes, making this index type useful in PostgreSQL for the first time.
- Large reductions in write amplification, addressing some of the complaints of former PostgreSQL user Uber.
- A new declarative table partitioning feature will be both easier to use and more performant than the current implementation.
- SCRAM authentication support for much more secure password management and handling.
Regardless of what new features are in the next version, though, PostgreSQL 9.6 has given users plenty of reasons to upgrade.
Brief items
Development quotes of the week
In other words, everybody is building software, but ignoring the tools we need to build them.
Apache OpenOffice 4.1.3 released
The long-awaited OpenOffice 4.1.3 release is out. "Apache OpenOffice 4.1.3 is a maintenance release incorporating important bug fixes, security fixes, updated dictionaries, and build fixes. All users of Apache OpenOffice 4.1.2 or earlier are advised to upgrade."
KDE celebrates 20 years
KDE.news notes the 20th anniversary of the KDE project. "In the 20 years since then so much has happened. We released great software, fought for software freedom and empowered people all over the world to take charge of their digital life. In many ways we have achieved what we set out to do 20 years ago - 'a consistent, nice looking free desktop-environment' and more."
For those feeling nostalgic, there is a new version of the KDE 1.1.2 desktop ported to contemporary systems.
Newsletters and articles
Development newsletters
- Emacs News (October 17)
- These Weeks in Firefox (October 17)
- What's cooking in git.git (October 17)
- Git Rev News (September)
- This Week in GTK+ (October 17, 2016)
- OCaml Weekly News (October 18)
- OpenStack Developer Mailing List Digest (October 14)
- Perl Weekly (October 17)
- PostgreSQL Weekly News (October 16)
- Python Weekly (October 13)
- Ruby Weekly (October 13)
- This Week in Rust (October 11)
- This Week in Rust (October 18)
- Wikimedia Tech News (October 17)
Kügler: Plasma’s road ahead
Sebastian Kügler reports on KDE's Plasma team meeting. "We took this opportunity to also look and plan ahead a bit further into the future. In what areas are we lacking, where do we want or need to improve? Where do we want to take Plasma in the next two years?" Specific topics include release schedule changes, UI and theming improvements, feature backlog, Wayland, mobile, and more. (Thanks to Paul Wise)
Guile security vulnerability w/ listening on localhost + port
Christopher Allan Webber looks at a security vulnerability in Guile. Guile applications are generally not vulnerable, but arbitrary scheme code may by used to attack the systems of Guile developers. "There is also a lesson here that applies beyond Guile: the presumption that "localhost" is only accessible by local users can't be guaranteed by modern operating system environments. If you are looking to provide local-execution-only, we recommend using unix domain sockets or named pipes. Don't rely on localhost plus some port."
Page editor: Rebecca Sobol
Announcements
Brief items
The Linux Foundation launches the JS Foundation
The Linux Foundation has announced that the JS Foundation is now a Linux Foundation Project. "The JS Foundation is committed to help JavaScript application and server-side projects cultivate best practices and policies that promote high quality standards and broad, diverse contributions for long-term sustainability. Today the JS Foundation touts a new open, technical governance structure and also announces a Mentorship Program to help encourage a culture of collaboration and sustainability throughout the JavaScript community. Initial projects being welcomed into the mentorship program include: Appium, Interledger.js, JerryScript, Mocha, Moment.js, Node-RED and webpack."
Articles of interest
Celebrating open standards around the world
Opensource.com celebrates World Standards Day on October 14. "Whether in the world of software, where without standards we would have been unable to connect the world through the Internet and the World Wide Web, or the physical world, where standards make nearly everything you buy easier, more useful, and safer, the world would be a difficult place to navigate without standards. And critical to the useful of standards is making them available to all in an accessible, free format, unencumbered by legal or other hurdles."
New Books
Practical Forensic Imaging--new from No Starch Press
No Starch Press has released "Practical Forensic Imaging" by Bruce Nikkel.Understanding ECMAScript 6 -- new from No Starch Press
No Starch Press has released "Understanding ECMAScript 6" by Nicholas C. Zakas.
Calls for Presentations
SciPy India 2016
SciPy India will take place December 10-11 at IIT Bombay, India. The call for papers is open until November 20. "We look forward to your submissions on the use of Python for scientific computing and education. This includes pedagogy, exploration, modeling, and analysis from both applied and developmental perspectives. We welcome contributions from academia as well as industry."
<Programming> 2017: Call for papers
<Programming> 2017 will take place April 3-6 in Brussels, Belgium. "<Programming> 2017 accept scholarly papers including essays that advance the knowledge of programming. Almost anything about programming is in scope, but in each case there should be a clear relevance to the act and experience of programming." The deadline for paper submissions is December 1. There is also a call for Workshop proposals, with a deadline of November 15.
CFP Deadlines: October 20, 2016 to December 19, 2016
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
| Deadline | Event Dates | Event | Location |
|---|---|---|---|
| October 25 | May 8 May 11 |
O'Reilly Open Source Convention | Austin, TX, USA |
| October 26 | November 5 | Barcelona Perl Workshop | Barcelona, Spain |
| October 28 | November 25 November 27 |
Pycon Argentina 2016 | Bahía Blanca, Argentina |
| October 30 | February 17 | Swiss Python Summit | Rapperswil, Switzerland |
| October 31 | February 4 February 5 |
FOSDEM 2017 | Brussels, Belgium |
| November 11 | November 11 November 12 |
Linux Piter | St. Petersburg, Russia |
| November 11 | January 27 January 29 |
DevConf.cz 2017 | Brno, Czech Republic |
| November 13 | December 10 | Mini Debian Conference Japan 2016 | Tokyo, Japan |
| November 15 | March 2 March 5 |
Southern California Linux Expo | Pasadena, CA, USA |
| November 15 | March 28 March 31 |
PGConf US 2017 | Jersey City, NJ, USA |
| November 18 | February 18 February 19 |
PyCaribbean | Bayamón, Puerto Rico, USA |
| November 20 | December 10 December 11 |
SciPy India | Bombay, India |
| November 21 | January 16 | Linux.Conf.Au 2017 Sysadmin Miniconf | Hobart, Tas, Australia |
| November 21 | January 16 January 17 |
LCA Kernel Miniconf | Hobart, Australia |
| November 28 | March 25 March 26 |
LibrePlanet 2017 | Cambridge, MA, USA |
| December 1 | April 3 April 6 |
‹Programming› 2017 | Brussels, Belgium |
| December 10 | February 21 February 23 |
Embedded Linux Conference | Portland, OR, USA |
| December 10 | February 21 February 23 |
OpenIoT Summit | Portland, OR, USA |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: October 20, 2016 to December 19, 2016
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| October 18 October 20 |
Qt World Summit 2016 | San Francisco, CA, USA |
| October 21 October 23 |
Software Freedom Kosovo 2016 | Prishtina, Kosovo |
| October 22 October 23 |
Datenspuren 2016 | Dresden, Germany |
| October 22 | 2016 Columbus Code Camp | Columbus, OH, USA |
| October 25 October 28 |
OpenStack Summit | Barcelona, Spain |
| October 26 October 27 |
All Things Open | Raleigh, NC, USA |
| October 27 October 28 |
Rust Belt Rust | Pittsburgh, PA, USA |
| October 28 October 30 |
PyCon CZ 2016 | Brno, Czech Republic |
| October 29 October 30 |
PyCon HK 2016 | Hong Kong, Hong Kong |
| October 29 October 30 |
PyCon.de 2016 | Munich, Germany |
| October 31 November 1 |
Linux Kernel Summit | Santa Fe, NM, USA |
| October 31 | PyCon Finland 2016 | Helsinki, Finland |
| October 31 November 2 |
O’Reilly Security Conference | New York, NY, USA |
| November 1 November 4 |
Linux Plumbers Conference | Santa Fe, NM, USA |
| November 1 November 4 |
PostgreSQL Conference Europe 2016 | Tallin, Estonia |
| November 3 | Bristech Conference 2016 | Bristol, UK |
| November 4 November 6 |
FUDCon Phnom Penh | Phnom Penh, Cambodia |
| November 5 November 6 |
OpenFest 2016 | Sofia, Bulgaria |
| November 5 | Barcelona Perl Workshop | Barcelona, Spain |
| November 7 November 9 |
Velocity Amsterdam | Amsterdam, Netherlands |
| November 9 November 11 |
O’Reilly Security Conference EU | Amsterdam, Netherlands |
| November 11 November 12 |
Seattle GNU/Linux Conference | Seattle, WA, USA |
| November 11 November 12 |
Linux Piter | St. Petersburg, Russia |
| November 12 November 13 |
T-Dose | Eindhoven, Netherlands |
| November 12 November 13 |
Mini-DebConf | Cambridge, UK |
| November 12 November 13 |
PyCon Canada 2016 | Toronto, Canada |
| November 13 November 18 |
The International Conference for High Performance Computing, Networking, Storage and Analysis | Salt Lake City, UT, USA |
| November 14 November 18 |
Tcl/Tk Conference | Houston, TX, USA |
| November 14 | The Third Workshop on the LLVM Compiler Infrastructure in HPC | Salt Lake City, UT, USA |
| November 14 November 16 |
PGConfSV 2016 | San Francisco, CA, USA |
| November 16 November 18 |
ApacheCon Europe | Seville, Spain |
| November 16 November 17 |
Paris Open Source Summit | Paris, France |
| November 17 | NLUUG (Fall conference) | Bunnik, The Netherlands |
| November 18 November 20 |
GNU Health Conference 2016 | Las Palmas, Spain |
| November 18 November 20 |
UbuCon Europe 2016 | Essen, Germany |
| November 19 | eloop 2016 | Stuttgart, Germany |
| November 21 November 22 |
Velocity Beijing | Beijing, China |
| November 24 | OWASP Gothenburg Day | Gothenburg, Sweden |
| November 25 November 27 |
Pycon Argentina 2016 | Bahía Blanca, Argentina |
| November 29 November 30 |
5th RISC-V Workshop | Mountain View, CA, USA |
| November 29 December 2 |
Open Source Monitoring Conference | Nürnberg, Germany |
| December 3 | NoSlidesConf | Bologna, Italy |
| December 3 | London Perl Workshop | London, England |
| December 6 | CHAR(16) | New York, NY, USA |
| December 10 | Mini Debian Conference Japan 2016 | Tokyo, Japan |
| December 10 December 11 |
SciPy India | Bombay, India |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol


