|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for August 27, 2015

A look at The Machine

By Jake Edge
August 26, 2015

LinuxCon North America

In what was perhaps one of the shortest keynotes on record (ten minutes), Keith Packard outlined the hardware architecture of "The Machine"—HP's ambitious new computing system. That keynote took place at LinuxCon North America in Seattle and was thankfully followed by an hour-long technical talk by Packard the following day (August 18), which looked at both the hardware and software for The Machine. It is, in many ways, a complete rethinking of the future of computers and computing, but there is a fairly long way to go between here and there.

The hardware

The basic idea of the hardware is straightforward. Many of the "usual computation units" (i.e. CPUs or systems on chip—SoCs) are connected to a "massive memory pool" using photonics for fast interconnects. That leads to something of an equation, he said: electrons (in CPUs) + photons (for communication) + ions (for memory storage) = computing. Today's computers transfer a lot of data and do so over "tiny little pipes". The Machine, instead, can address all of its "amazingly huge pile of memory" from each of its many compute elements. One of the underlying principles is to stop moving memory around to use it in computations—simply have it all available to any computer that needs it.

[Keith Packard]

Some of the ideas for The Machine came from HP's DragonHawk systems, which were traditional symmetric multiprocessing systems, but packed a "lot of compute in a small space". DragonHawk systems would have 12TB of RAM in an 18U enclosure, while the nodes being built for The Machine will have 32TB of memory in 5U. It is, he said, a lot of memory and it will scale out linearly. All of the nodes will be connected at the memory level so that "every single processor can do a load or store instruction to access memory on any system".

Nodes in this giant cluster do not have to be homogeneous, as long as they are all hooked to the same memory interconnect. The first nodes that HP is building will be homogeneous, just for pragmatic reasons. There are two circuit boards on each node, one for storage and one for the computer. Connecting the two will be the "next generation memory interconnect" (NGMI), which will also connect both parts of the node to the rest of the system using photonics.

The compute part of the node will have a 64-bit ARM SoC with 256GB of purely local RAM along with a field-programmable gate array (FPGA) to implement the NGMI protocol. The storage part will have four banks of memory (each with 1TB), each with its own NGMI FPGA. A given SoC can access memory elsewhere without involving the SoC on the node where the memory resides—the NGMI bridge FPGAs will talk to their counterpart on the other node via the photonic interface. Those FPGAs will eventually be replaced by application-specific integrated circuits (ASICs) once the bugs are worked out.

ARM was chosen because it was easy to get those vendors to talk with the project, Packard said. There is no "religion" about the instruction set architecture (ISA), so others may be used down the road.

Eight of these nodes can be collected up into a 5U enclosure, which gives eight processors and 32TB of memory. Ten of those enclosures can then be placed into a rack (80 processors, 320TB) and multiple racks can all be connected on the same "fabric" to allow addressing up to 32 zettabytes (ZB) from each processor in the system.

The storage and compute portions of each node are powered separately. The compute piece has two 25Gb network interfaces that are capable of remote DMA. The storage piece will eventually use some kind of non-volatile/persistent storage (perhaps even the fabled memristor), but is using regular DRAM today, since it is available and can be used to prove the other parts of the design before switching.

SoCs in the system may be running more than one operating system (OS) and for more than one tenant, so there are some hardware protection mechanisms built into the system. In addition, the memory-controller FPGAs will encrypt the data at rest so that pulling a board will not give access to the contents of the memory even if it is cooled (à la cold boot) or when some kind of persistent storage is used.

At one time, someone said that 640KB of memory should be enough, Packard said, but now he is wrestling with the limits of the 48-bit addresses used by the 64-bit ARM and Intel CPUs. That only allows addressing up to 256TB, so memory will be accessed in 8GB "books" (or, sometimes, 64KB "booklettes"). Beyond the SoC, the NGMI bridge FPGA (which is also called the "Z bridge") deals with two different kinds of addresses: 53-bit logical Z addresses and 75-bit Z addresses. Those allow addressing 8PB and 32ZB respectively.

The logical Z addresses are used by the NGMI firewall to determine the access rights to that memory for the local node. Those access controls are managed outside of whatever OS is running on the SoC. So the mapping of memory is handled by the OS, while the access controls for the memory are part of the management of The Machine system as a whole.

NGMI is not intended to be a proprietary fabric protocol, Packard said, and the project is trying to see if others are interested. A memory transaction on the fabric looks much like a cache access. The Z address is presented and 64 bytes are transferred.

The software

Packard's group is working on GPL operating systems for the system, but others can certainly be supported. If some "proprietary Washington company" wanted to port its OS to The Machine, it certainly could. Meanwhile, though, other groups are working on other free systems, but his group is made up of "GPL bigots" that are working on Linux for the system. There will not be a single OS (or even distribution or kernel) running on a given instance of the The Machine—it is intended to support multiple different environments.

Probably the biggest hurdle for the software is that there is no cache coherence within the enormous memory pool. Each SoC has its own local memory (256GB) that is cache coherent, but accesses to the "fabric-attached memory" (FAM) between two processors are completely uncoordinated by hardware. That has implications for applications and the OS that are using that memory, but OS data structures should be restricted to the local, cache-coherent memory as much as possible.

For the FAM, there is a two-level allocation scheme that is arbitrated by a "librarian". It allocates books (8GB) and collects them into "shelves". The hardware protections provided by the NGMI firewall are done on book boundaries. A shelf could be a collection of books that are scattered all over the FAM in a single load-store domain (LSD—not Packard's favorite acronym, he noted), which is defined by the firewall access rules. That shelf could then be handed to the OS to be used for a filesystem, for example. That might be ext4, some other Linux filesystem, or the new library filesystem (LFS) that the project is working on.

Talking to the memory in a shelf uses the POSIX API. A process does an open() on a shelf and then uses mmap() to map the memory into the process. Underneath, it uses the direct access (DAX) support to access the memory. For the first revision, LFS will not support sparse files. Also, locking will not be global throughout an LSD, but will be local to an OS running on a node.

For management of the FAM, each rack will have a "top of rack" management server, which is where the librarian will run. That is a fairly simple piece of code that just does bookkeeping and keeps track of the allocations in a SQLite database. The SoCs are the only parts of the system that can talk to the firewall controller, so other components communicate with a firewall proxy that runs in user space, which relays queries and updates. There are a "whole bunch of potential adventures" in getting the memory firewall pieces all working correctly, Packard said.

The lack of cache coherence makes atomic operations on the FAM problematic, as traditional atomics rely on that feature. So the project has added some hardware to the bridges to do atomic operations at that level. There is a fam_atomic library to access the operations (fetch and add, swap, compare and store, and read), which means that each operation is done at the cost of a system call. Once again, this is just the first implementation; other mechanisms may be added later. One important caveat is that the FAM atomic operations do not interact with the SoC cache, so applications will need to flush those caches as needed to ensure consistency.

Physical addresses at the SoC level can change, so there needs to be support for remapping those addresses. But the SoC caches and DAX both assume static physical mappings. A subset of the physical address space will be used as an aperture into the full address space of the system and books can be mapped into that aperture.

Flushing the SoC cache line by line would "take forever", so a way to flush the entire cache when the physical address mappings change has been added. In order to do that, two new functions have been added to the Intel persistent memory library (libpmem): one to check for the presence of non-coherent persistent memory (pmem_needs_invalidate()) and another to invalidate the CPU cache (pmem_invalidate()).

In a system of this size, with the huge amounts of memory involved, there needs to be well-defined support for memory errors, Packard said. Read is easy—errors are simply signaled synchronously—but writes are trickier because the actual write is asynchronous. Applications need to know about the errors, though, so SIGBUS is used to signal an error. The pmem_drain() call will act as a barrier, such that errors in writes before that call will signal at or before the call. Any errors after the barrier will be signaled post-barrier.

There are various areas where the team is working on free software, he said, including persistent memory and DAX. There is also ongoing work on concurrent/distributed filesystems and non-coherent cache management. Finally, reliability, availability, and serviceability (RAS) are quite important to the project, so free software work is proceeding in that area as well.

Even with two separate sessions, it was a bit of a whirlwind tour of The Machine. As he noted, it is an environment that is far removed from the desktop world Packard had previously worked in. By the sound, there are plenty of challenges to overcome before The Machine becomes a working computing device—it will be an interesting process to watch.

[I would like to thank the Linux Foundation for travel assistance to Seattle for LinuxCon North America.]

Comments (20 posted)

Data visualizations in text

By Nathan Willis
August 26, 2015

TypeCon

Data visualization is often thought of in terms of pixels; considerable work goes into shaping large data sets into a form where spatial relationships are made clear and where colors, shapes, intensity, and point placement encode various quantities for rapid understanding. At TypeCon 2015 in Denver, though, researcher Richard Brath presented a different approach: taking advantage of readers' familiarity with the written word to encode more information into text itself.

Brath is a PhD candidate at London South Bank University where, he said, "I turn data into shapes and color and so on." Historically speaking, though, typography has not been a part of that equation. He showed a few examples of standard data visualizations, such as "heatmap" diagrams. Even when there are multiple variables under consideration, the only typography involved is plain text labels. "Word clouds" are perhaps the only well-known example of visualizations that involve altering text based on data, but even that is trivial: the most-frequent words or tags are simply bigger. More can certainly be done.

[Richard Brath]

Indeed, more has been done—at least on rare occasion; Brath has cataloged and analyzed instances where other type attributes have been exploited to encode additional information in visualizations. An oft-overlooked example, he said, is cartography: subtle changes in spacing, capitalization, and font weight are used to indicate many distinct levels of map features. The reader may not consciously recognize it, but the variations give cues as to which neighboring text labels correspond to which map features. Some maps even incorporate multiple types of underline and reverse italics in addition to regular italics (two features that are quite uncommon elsewhere).

Brath also showed several historical charts and diagrams (some dating back to the 18th Century) that used typographic features to encode information. Royal family trees, for example, would sometime vary the weight, slant, and style of names to indicate the pedigree and status of various family members. A far more modern example of signifying information with font attributes, he said, can be seen in code editors, where developers take it for granted that syntax highlighting will distinguish between symbols, operators, and structures—hopefully without adversely impacting readability.

On the whole, though, usage of these techniques is limited to specific niches. Brath set out to catalog the typographic features that were employed, then to try an apply them to entirely new data-visualization scenarios. The set of features available for encoding information included standard properties like weight and slant, plus capitalization, x-height, width (i.e., condensed through extended), spacing, serif type, stroke contrast, underline, and the choice of typeface itself. Naturally, some of those attributes map well to quantitative data (such as weight, which can be varied continuously throughout a range), while others would only be useful for encoding categorical information (such as whether letters are slanted or upright).

He then began creating and testing a variety of visualizations in which he would encode information by varying some of the font attributes. Perhaps the most straightforward example was the "text-skimming" technique: a preprocessor varies the weight of individual words in a document based on their overall frequency in the language used. Unusual words are bolder, common words are lighter, with several gradations incorporated. Articles and pronouns can even be put into italics to further differentiate them from the more critical parts of the text. The result is a paragraph that, in user tests, readers can skim through at significantly higher speed; it is somewhat akin to overlaying a word-frequency cloud on top of the text itself.

[Movie review visualization]

A bit further afield was Brath's example of encoding numeric data linearly in a line of text. He took movie reviews from the Rotten Tomatoes web site and used each reviewer's numeric rating as the percentage of the text rendered in bold. The result, when all of the reviews for a particular film are placed together, effectively maps a histogram of the reviews onto the reviews themselves. In tests, he said, participants typically found it easier to extract information from this form than from Rotten Tomatoes's default layout, which places small numbers next to review quotes in a grid, intermingled with various images.

He also showed examples of visualization techniques that varied multiple font attributes to encode more than one variable. The first was a response to limitations of choropleth maps—maps where countries or other regions are colored (or shaded) to indicate a particular score on some numeric scale. While choropleths work fine for single variables, it is hard to successfully encode multiple variables using the technique, and looking back and forth between multiple single-variable choropleth maps makes it difficult for the reader to notice any correlations between them.

[Map data visualization]

Brath's technique encoded three variables (health expenditure as a percentage of GDP, life expectancy, and prevalence of HIV) into three font attributes (weight, case, and slant), using the three-letter ISO country codes as the text for each nation on the map. The result makes it easier to zero in on particular combinations of the variables (for example, countries with high health expenditures and short life expectancies) or, at least, easier than flipping back and forth between three maps.

His final example of multi-variable encoding used x-height and font width to encode musical notation into text. The use case presented was how to differentiate singing from prose within a book. Typically, the only typographic technique employed in a book is to offset the sung portion of the text and set it in italics. Brath, instead, tested varying the heights of the letters to indicate note pitch and the widths to indicate note duration.

The reaction to this technique from the audience at TypeCon was, to say the least, mixed. While it is clear that the song text encodes some form of rhythmic changes and variable intensity, it does not map easily to notes, and the rendered text is not exactly easy to look at. Brath called it a work in progress; his research is far from over.

He ended the session by encouraging audience members to visit his research blog and take the latest survey to test the impact of some of the visualization techniques firsthand. He also posed several questions to the crowd, such as why there were many font families that come with a variety of different weights, but essentially none that offer multiple x-height options or italics with multiple angles of slant.

Brath's blog makes for interesting reading for anyone concerned with data visualizations or text. He often explores practical issues—for example, how overuse of color and negatively impact text legibility, which could have implications for code markup tools, or the difficulties to overcome when trying to slant text at multiple angles. Programmers, who spend much of their time staring at text, are no doubt already familiar with many ways in which typographic features can encode supplementary information (in this day and age, who does not associate a hyperlink closely with an underline, after all?). But there are certainly still many places where the attributes of text might be used to make data easier to find or understand.

Comments (3 posted)

Topics from the LLVM microconference

By Jake Edge
August 26, 2015

Linux Plumbers Conference

A persistent theme throughout the LLVM microconference at the 2015 Linux Plumbers Conference was that of "breaking the monopoly" of GCC, the GNU C library (glibc), and other tools that are relied upon for building Linux systems. One could quibble with the "monopoly" term, since it is self-imposed and not being forced from the outside, but the general idea is clear: using multiple tools to build our software will help us in numerous ways.

Kernel and Clang

Most of the microconference was presentation-oriented, with relatively little discussion. Jan-Simon Möller kicked things off with a status report on the efforts to build a Linux kernel using LLVM's Clang C compiler. The number of patches needed for building the kernel has dropped from around 50 to 22 "small patches", he said. Most of those are in the kernel build system or are for little quirks in driver code. Of those, roughly two-thirds can likely be merged upstream, while the others are "ugly hacks" that will probably stay in the LLVM-Linux tree.

There are currently five patches needed in order to build a kernel for the x86 architecture. Two of those are for problems building the crypto code (the AES_NI assembly code will not build with the LLVM integrated assembler and there are longstanding problems with variable-length arrays in structures). The integrated assembler also has difficulty handling some "assembly" code that is used by the kernel build system to calculate offsets; GCC sees it as a string, but the integrated assembler tries to actually assemble it.

The goal of building an "allyesconfig" kernel has not yet been realized, but a default configuration (defconfig) can be built using the most recent Git versions of LLVM and Clang. It currently requires disabling the integrated assembler for the entire build, but the goal is to disable it just for the files that need it.

Other architectures (including ARM for the Raspberry Pi 2) can be built using roughly half-a-dozen patches per architecture, Möller said. James Bottomley was concerned about the "Balkanization" of kernel builds once Linus Torvalds and others start using Clang for their builds; obsolete architectures and those not supported by LLVM may stop building altogether, he said. But microconference lead Behan Webster thought that to be an unlikely outcome. Red Hat and others will always build their kernels using GCC, he said, so that will be supported for quite a long time, if not forever.

Using multiple compilers

Kostya Serebryany is a member of the "dynamic testing tools" team at Google, which has the goal of providing tools for the C++ developers at the company to find bugs without any help from the team. He was also one of the proponents of the "monopoly" term for GCC, since it is used to build the kernel, glibc, and all of the distribution binaries. But, he said, making all of that code buildable using other compilers will allow various other tools to also be run on the code.

For example, the AddressSanitizer (ASan) can be used to detect memory errors such as stack overflow, use after free, using stack memory after a function has returned, and so on. Likewise, ThreadSanitizer (TSan), MemorySanitizer (MSan), and UndefinedBehaviorSanitizer (UBSan) can find various kinds of problems in C and C++ code. But all are based on Clang and LLVM, so only code that can be built with that compiler suite can be sanitized using these tools.

GCC already has some similar tools and the Linux kernel has added some as well (the kernel address sanitizer, for example), which have found various bugs, including quite a few security bugs. GCC's support has largely come about because of the competition with LLVM and still falls short in some areas, he said.

Beyond that, though, there are other techniques beyond "best effort" tools like the sanitizers. For example, fuzzing and hardening are two techniques that can be used to either find more bugs or eliminate certain classes of bugs. He stated that coverage-guided fuzzing can be used to narrow in on problem areas in the code. LLVM's LibFuzzer can be used to perform that kind of fuzzing. He noted that the Heartbleed bug can be "found" using LibFuzzer in roughly five seconds on his laptop.

Two useful hardening techniques are also available with LLVM: control flow integrity (CFI) and SafeStack. CFI will abort the program when it detects certain kinds of undesired behavior—for example that the virtual function table for a program has been altered. SafeStack protects against stack overflows by placing local variables on a separate stack. That way, the return address and any variables are not contiguous in memory.

Serebryany said that it was up to the community to break the monopoly. He was not suggesting simply switching to using LLVM exclusively, but to ensuring that the kernel, glibc, and distributions all could be built with it. Furthermore, he said that continuous integration should be set up so that all of these pieces can always be built with both compilers. When other compilers arrive, they should also be added into the mix.

To that end, Webster asked if Google could help getting the kernel patches needed to build with Clang upstream. Serebryany said that he thought that, by showing some of the advantages of being able to build with Clang (such as the fuzzing support), Google might be able to help get those patches merged.

BPF and LLVM

The "Berkeley Packet Filter" (BPF) language has expanded its role greatly over the years, moving from simply being used for packet filtering to now providing the in-kernel virtual machine for security (seccomp), tracing, and more. Alexei Starovoitov has been the driving force behind extending the BPF language (into eBPF) as well as expanding its scope in the kernel. LLVM can be used to compile eBPF programs for use by the kernel, so Starovoitov presented about the language and its uses at the microconference.

He began by noting wryly that he "works for Linus Torvalds" (in the same sense that all kernel developers do). He merged his first patches into GCC some fifteen years ago, but he has "gone over to Clang" in recent years.

The eBPF language is supported by both GCC and LLVM using backends that he wrote. He noted that the GCC backend is half the size of the LLVM version, but that the latter took much less time to write. "My vote goes to LLVM for the simplicity of the compiler", he said. The LLVM-BPF backend has been used to demonstrate how to write a backend for the compiler. It is now part of LLVM stable and will be released as part of LLVM 3.7.

GCC is built for a single backend, so you have to specifically create a BPF version, but LLVM has all of its backends available using command-line arguments (--target bpf). LLVM also has an integrated assembler that can take the C code describing the BPF and turn it into in-memory BPF bytecode that can be loaded into the kernel.

BPF for tracing is currently a hot area, Starovoitov said. It is a better alternative to SystemTap and runs two to three times faster than Oracle's DTrace. Part of that speed comes from LLVM's optimizations plus the kernel's internal just-in-time compiler for BPF bytecode.

Another interesting tool is the BPF Compiler Collection (BCC). It makes it easy to write and run BPF programs by embedding them into Python (either directly as strings in the Python program or by loading them from a C file). Underneath the Python "bpf" module is LLVM, which compiles the program before the Python code loads it into the kernel. A simple printk() can easily be added into the kernel without recompiling it (or rebooting). He noted that Brendan Gregg has added a bunch of example tools to show how to use the C+Python framework.

Under the covers, the framework uses libbpfprog that compiles a C source file into BPF bytecode using Clang/LLVM. It can also load the bytecode and any BPF maps to the kernel using the bpf() system call and attach the program(s) to various types of hooks (e.g. kprobes, tc classifiers/actions). The Python bpf module simply provides bindings for the library.

The presentation was replete with examples, which are available in the slides [PDF] as well.

Alternatives for the core

There was a fair amount of overlap between the last two sessions I was able to sit in on. Both Bernhard Rosenkraenzer and Khem Raj were interested in replacing more than just the compiler in building a Linux system. Traditionally, building a Linux system starts with GCC, glibc, and binutils, but there are now alternatives to those. How much of a Linux system can be built using those alternatives?

Some parts of binutils are still needed, Rosenkraenzer said. The binutils gold linker can be used instead of the traditional ld. (Other linker options were presented in Mark Charlebois's final session of the microconference, which I unfortunately had to miss.) The gas assembler from binutils can be replaced with Clang's integrated assembler for the most part, but there are still non-standard assembly constructs that require gas.

Tools like nm, ar, ranlib, and others will need to be made to understand three different formats: regular object files, LLVM bytecode, and the GCC intermediate representation. Rosenkraenzer showed a shell-script wrapper that could be used to add this support to various utilities.

For the most part, GCC can be replaced by Clang. OpenMandriva switched to Clang as its primary compiler in 2014. The soon-to-be-released OpenMandriva 3 is almost all built with Clang 3.7. Some packages are still built with gcc or g++, however. OpenMandriva still needed to build GCC, though, to get libraries that were needed such as libgcc, libatomic, and others (including, possibly, libstdc++).

The GCC compatibility claimed by Clang is too conservative, Rosenkraenzer said. The __GNUC__ macro definition in Clang is set to 4.2.1, but switching that to 4.9 produces better code. There were several thoughts on why Clang has chosen 4.2.1, though both are related: 4.2.1 was the last GPLv2 release of GCC, so some people may not be allowed to look at later versions; in addition, GCC 4.2.1 was the last version that was used to build the BSD portions of OS X.

There are a whole list of GCC-isms that should be avoided for compatibility with Clang. Rosenkraenzer's slides [PDF] list many of them. He noted that there have been a number of bugs found via Clang warnings or errors when building various programs—GCC did not complain about those problems.

Another "monopoly component" that one might want to replace would be glibc. The musl libc alternative is viable, but only if binary compatibility with other distributions is not required. But musl cannot be built with Clang, at least yet.

Replacing GCC's libstdc++ with LLVM's libc++ is possible but, again, binary compatibility is sacrificed. That is a bigger problem than it is for musl, though, Rosenkraenzer said. Using both is possible, but there are problems when libraries (e.g. Qt) are linked to, say, libc++ and a binary-only Qt program uses libstdc++, which leads to crashes. libc++ is roughly half the size of libstdc++, however, so environments like Android (which never used libstdc++) are making the switch.

Cross-compiling under LLVM/Clang is easier since all of the backends are present and compilers for each new target do not need to be built. There is still a need to build the cross-toolchains, though, for binutils, libatomic, and so on. Rosenkraenzer has been working on a tool to do automated bootstrapping of the toolchain and core system.

Conclusion

It seems clear that use of LLVM within Linux is growing and that growth is having a positive effect. The competition with GCC is helping both to become better compilers, while building our tools with both is finding bugs in critical components like the kernel. Whether it is called "breaking the monopoly" or "diversifying the build choices", this trend is making beneficial changes to our ecosystem.

[I would like to thank the Linux Plumbers Conference organizing committee for travel assistance to Seattle for LPC.]

Comments (17 posted)

Reviving the Hershey fonts

By Nathan Willis
August 26, 2015

TypeCon

At the 2015 edition of TypeCon in Denver, Adobe's Frank Grießhammer presented his work reviving the famous Hershey fonts from the Mid-Century era of computing. The original fonts were tailor-made for early vector-based output devices but, although they have retained a loyal following (often as a historical curiosity), they have never before been produced as an installable digital font.

Grießhammer started his talk by acknowledging his growing reputation for obscure topics—in 2013, he presented a tool for rapid generation of the Unicode box-drawing characters—but argued that the Hershey fonts were overdue for proper recognition. He first became interested in the fonts and their peculiar history in 2014, when he was surprised to find a well-designed commercial font that used only straight line segments for its outlines. The references indicated that this choice was inspired by the Hershey fonts, which led Grießhammer to dig into the topic further.

[Frank Grießhammer]

The fonts are named for their creator, Allen V. Hershey (1910–2004), a physicist working at the US Naval Weapons Laboratory in the 1960s. At that time, the laboratory used one of the era's most advanced computers, the IBM Naval Ordnance Research Calculator (NORC), a vacuum-tube and magnetic-tape based machine. NORC's output was provided by the General Dynamics S-C 4020, which could either plot on a CRT display or directly onto microfilm. It was groundbreaking for the time, since the S-C 4020 could plot diagrams and charts directly, rather than simply outputting tables that had to be hand-drawn by draftsmen after the fact.

By default, the S-C 4020 would output text by projecting light through a set of letter stencils, but Hershey evidently saw untapped potential in the S-C 4020's plotting capabilities. Using the plotting functions, he designed a set of high-quality Latin fonts (both upright and italics), followed by Greek, a full set of mathematical and technical symbols, blackletter and Lombardic letterforms, and an extensive set of Japanese glyphs—around 2,300 characters in total. Befitting the S-C 4020's plotting capabilities, the letters were formed entirely by straight line segments.

The format used to store the coordinates of the curves is, to say the least, unusual. Each coordinate point is stored as pair of ASCII letters, where the numeric value of each letter is found by taking its offset from the letter R. That is, "S" has a value of +1, while "L" has a value of -6. The points are plotted with the origin in the center of the drawing area, with x increasing to the right and y increasing downward.

[Hershey font sample]

Typographically, Hershey's designs were commendable; he drew his characters based on historical samples, implemented his own ligatures, and even created multiple optical sizes. Hershey then proceeded to develop four separate styles that each used different numbers of strokes (named "simplex," "duplex," "complex," and "triplex").

The project probably makes Hershey the inventor of "desktop publishing" if not "digital type" itself, Grießhammer said, but Hershey himself is all but forgotten. There is scant information about him online, Grießhammer said; he has still not even been able to locate a photograph (although, he added, Hershey may be one of the unnamed individuals seen in group shots of the NORC room, which can be found online).

Hershey's vector font set has lived on as a subject for computing enthusiasts, however. The source files are in the public domain (a copy of the surviving documents is available from the Ghostscript project, for example) and there are a number of software projects online that can read their peculiar format and reproduce the shapes. At his GitHub page, Grießhammer has links to several of them, such as Kamal Mostafa's libhersheyfont. Inkscape users may also be familiar with the Hershey Text extension, which can generate SVG paths based on a subset of the Hershey fonts. In that form, the paths are suitable for use with various plotters, laser-cutters, or CNC mills; the extension was developed by Evil Mad Scientist Laboratories for use with such devices.

Nevertheless, there has never been an implementation of the designs in PostScript, TrueType, or OpenType format, so they cannot be used to render text in standard widgets or elements. Consequently, Grießhammer set out to create his own. He wrote a script to convert the original vector instructions into Bézier paths in UFO format, then had to associate the resulting shapes with the correct Unicode codepoints—Hershey's work having predated Unicode by decades.

The result is not quite ready for release, he said. Hershey's designs are zero-width paths, which makes sense for drawing with a CRT, but is not how modern outline fonts work. To be usable in TrueType or OpenType form, each line segment needs to be traced in outline form to make a thin rectangle. That can be done, he reported, but he is still working out what outlining options create the most useful final product. The UFO files, though, can be used to create either TrueType or OpenType fonts.

When finished, Grießhammer said, he plans to release the project under an open source license at github.com/adobe-fonts/hershey. He hopes that it will not only be useful, but will also bring some more attention to Hershey himself and his contribution to modern digital publishing.

Comments (7 posted)

Page editor: Jonathan Corbet

Security

Nested NMIs lead to CVE-2015-3290

By Jake Edge
August 26, 2015

Non-maskable interrupts (or NMIs) are a hardware feature that is typically used to signal hardware errors or other unrecoverable faults. They differ from regular interrupts in that they can occur when interrupts are otherwise blocked (i.e. they are not maskable). NMIs can be caused by user-space programs, though, so their handling in the kernel needs to be bulletproof or it can lead to security holes. Since the beginning of 2014, it would seem that NMI handling has been subject to races that allow user-space programs to elevate their privileges—a bug that is known as CVE-2015-3290.

NMIs are used by profiling and debugging tools, such as perf, to determine where in the code the CPU is currently executing. NMIs also get nested, effectively, when an NMI handler causes an exception like a breakpoint or a page fault. Handling that nesting is complicated, which is what went astray and led to the bug.

The first notification about the problem came in a July 22 post to the oss-security mailing list from Andy Lutomirski about a number of NMI-handling security bugs. All are security-related, but one was embargoed to allow distributions to fix it before releasing any details. So he mentioned CVE-2015-3290 without giving any details, though he did include something of a warning: "*Patch your systems*".

The details came in a post-embargo advisory from Lutomirski on August 4. In some detail, he described the problem, but also provided a proof-of-concept program to tickle the bug. It requires that user space be able to do two things: arrange for nested NMIs to occur and for those NMIs return to a 16-bit stack, which is normally done for running 16-bit binaries using programs like DOSEMU. A 16-bit stack can be arranged via the modify_ldt() system call. One way to generate the NMIs required is to be run with a heavy perf load, as the proof-of-concept exploit suggests.

The Linux nested-NMI handling relies on a small section of code that needs to be run atomically. That works fine on x86_64 when using iret to return to a 64-bit stack (which effectively does the needed steps in an atomic manner), but when the NMI is returning to a segment with a 16-bit stack, iret does not restore the register state correctly. So the kernel has a workaround (called "espfix64") that tries to handle that situation by doing a complicated stack-switching dance.

That stack switching is where the problem lies. There are approximately 19 instructions where a second (i.e. nested) NMI will cause the "atomic" section to not be atomic. Furthermore, an attacker who can arrange (or luck into) landing in a two-instruction window in those instructions will be able to reliably elevate their privileges to that of root. During that window, the attacker controls the address where the return from interrupt will go. Outside of the window, a nested NMI will cause various failures and crashes, which Lutomirski's exploit will fix up while it waits for one to hit the window:

A careful exploit (attached) can recover from all the crashy failures and can regenerate a valid *privileged* state if a nested NMI occurs during the two-instruction window. This exploit appears to work reasonably quickly across a fairly wide range of Linux versions.

The espfix64 code was added in Linux 3.13, which was released over a year and a half ago in January 2014. Given that Lutomirski's proof of concept works quickly, that means there are (or, hopefully, were) a lot of systems that could be easily affected by this flaw.

The fix uses a "sneaky trick", according to Lutomirski. Instead of checking the value of the 64-bit stack pointer register (i.e. RSP) to see if it points at the NMI stack to determine if there is a nested NMI, a different test is used. As he pointed out, malicious user-space code could point RSP there, issue a system call, then cause an NMI to happen, which would fool the kernel into believing it was processing a nested NMI when it wasn't.

Lutomirski uses the fact that the "direction flag" (DF) bit in the FLAGS register is atomically reset by the iret instruction, so he sets that bit to indicate that the kernel is processing an NMI. His fix also changes the system-call entry point so that a user-space program cannot set DF while it still controls the value of RSP.

CVE-2015-3290 and the rest of the NMI-handling problems that Lutomirski has found seem a little concerning, overall. NMIs are complex beasties and their handling even more so. It would be surprising if there are not other problems lurking there. But, for now, taking Lutomirski's advice should be high on everyone's list.

Comments (5 posted)

Brief items

Security quotes of the fortnight

Google has been ordered by the [UK] Information Commissioner’s office to remove nine links to current news stories about older reports which themselves were removed from search results under the ‘right to be forgotten’ ruling.

The search engine had previously removed links relating to a 10 year-old criminal offence by an individual after requests made under the right to be forgotten ruling. Removal of those links from Google’s search results for the claimant’s name spurred new news posts detailing the removals, which were then indexed by Google’s search engine.

Google refused to remove links to these later news posts, which included details of the original criminal offence, despite them forming part of search results for the claimant’s name, arguing that they are an essential part of a recent news story and in the public interest.

The Guardian

GOP presidential candidate Donald Trump immediately called FOX News to say that the EU's actions are a crude start but adding that, "When I'm president you're going to have a really wonderful censorship system here in the USA. It's going to make those Russian and European systems look like stupid, ugly women. You're going to forget there ever were mass arrests and deportations here. I know how to do censorship. You're going to love the Trump censorship system!"

An EU spokesperson noted that upon finalization of this global RTBF [right to be forgotten] censorship order, all search and other references to articles, stories, or other materials describing this order, including this posting, would be retroactively deleted.

Lauren Weinstein

The Snake Oil Competition (SOC) is an effort organized to identify new craptographic schemes in order to improve on the state-of-the-art, and to encourage the use of snake oil cryptography. Snake Oil cryptography is widely used in practice, but recent events show that more research is urgently needed to fill much needed gaps in the field.

The winner(s) will be invited to a special edition of the Journal of Craptology (JoC). The first prize is a bottle of premium snake oil, and 100 trillion ZWR (Third Zimbabwean Dollar), equivalent to 1027 ZWD (First Zimbabwean Dollar). The loser will also be invited to the JoC.

snakeoil.cr.yp.to committee

Not just terrorists, but terrorists with a submarine! This is why Ft. Leavenworth, a prison from which no one has ever escaped, is unsuitable for housing Guantanamo detainees.

I've never understood the argument that terrorists are too dangerous to house in US prisons. They're just terrorists, it's not like they're Magneto.

Bruce Schneier reacts to a movie plot threat promulgated by a Kansas senator

TL;DR: doing RSA crypto with a public exponent value of "1" makes crypto very fast. Fast is not always good.
Kurt Seifried

Comments (none posted)

Stagefright: Mission Accomplished? (Exodus Intelligence)

It would seem that reports of the demise of the Stagefright Android vulnerability may be rather premature. Exodus Intelligence is reporting that at least one of the fixes for integer overflow did not actually fully fix the problem, so MPEG4 files can still crash Android and potentially allow code execution. "Around July 31st, Exodus Intelligence security researcher Jordan Gruskovnjak noticed that there seemed to be a severe problem with the proposed patch. As the code was not yet shipped to Android devices, we had no ability to verify this authoritatively. In the following week, hackers converged in Las Vegas for the annual Black Hat conference during which the Stagefright vulnerability received much attention, both during the talk and at the various parties and events. After the festivities concluded and the supposedly patched firmware was released to the public, Jordan proceeded to investigate whether his assumptions regarding its fallibility were well founded. They were."

Comments (37 posted)

Ruoho: Multiple Vulnerabilities in Pocket

On his blog, Clint Ruoho reports on multiple vulnerabilities he found in the Pocket service that saves articles and other web content for reading later on a variety of devices. Pocket integration has been controversially added to Firefox recently, which is what drew his attention to the service. "The full output from server-status then was synced to my Android, and was visible when I switched from web to article view. Apache’s mod_status can provide a great deal of useful information, such as internal source and destination IP address, parameters of URLs currently being requested, and query parameters. For Pocket’s app, the URLs being requested include URLs being viewed by users of the Pocket application, as some of these requests are done as HTTP GETs. These details can be omitted by disabling ExtendedStatus in Apache. Most of Pocket’s backend servers had ExtendedStatus disabled, however it remained enabled on a small subset, which would provide meaningful information to attackers." He was able to get more information, such as the contents of /etc/passwd on Pocket's Amazon EC2 servers. (Thanks to Scott Bronson and Pete Flugstad.)

Comments (30 posted)

Reports from the Linux Security Summit

The Linux Security Summit was held August 20-21 in Seattle, Washington. Unfortunately, that overlapped Linux Plumbers Conference, so LWN was unable to attend. But both James Morris and Paul Moore have nice writeups of the summit. From Morris's: "As with the previous year, we followed a two-day format, with most of the refereed presentations on the first day, with more of a developer focus on the second day. We had good attendance, and also this year had participants from a wider field than the more typical kernel security developer group. We hope to continue expanding the scope of participation next year, as it’s a good opportunity for people from different areas of security, and FOSS, to get together and learn from each other. This was the first year, for example, that we had a presentation on Incident Response, thanks to Sean Gillespie who presented on GRR, a live remote forensics tool initially developed at Google."

Comments (none posted)

New vulnerabilities

audit: unsafe escape-sequence handling

Package(s):audit CVE #(s):CVE-2015-5186
Created:August 19, 2015 Updated:August 31, 2015
Description:

From the CVE entry:

When auditing the filesystem the names of files are logged. These filenames can contain escape sequences, when viewed using the ausearch programs "-i" option for example this can result in the escape sequences being processed unsafely by the terminal program being used to view the data.

Alerts:
Fedora FEDORA-2015-13526 audit 2015-08-19
Fedora FEDORA-2015-13471 audit 2015-08-19
Mageia MGASA-2015-0333 audit 2015-08-30

Comments (none posted)

conntrack: denial of service

Package(s):conntrack CVE #(s):CVE-2015-6496
Created:August 20, 2015 Updated:January 4, 2016
Description: From the Debian advisory:

It was discovered that in certain configurations, if the relevant conntrack kernel module is not loaded, conntrackd will crash when handling DCCP, SCTP or ICMPv6 packets.

Alerts:
Fedora FEDORA-2015-1aee5e6f0b conntrack-tools 2016-01-03
Fedora FEDORA-2015-5eb2131441 conntrack-tools 2016-01-03
openSUSE openSUSE-SU-2015:1688-1 conntrack-tools 2015-10-06
Debian-LTS DLA-295-1 conntrack 2015-08-19
Debian DSA-3341-1 conntrack 2015-08-20
Mageia MGASA-2015-0363 conntrack-tools 2015-09-13

Comments (none posted)

extplorer: cross-site scripting

Package(s):extplorer CVE #(s):CVE-2015-0896
Created:August 24, 2015 Updated:May 4, 2016
Description: From the CVE entry:

Multiple cross-site scripting (XSS) vulnerabilities in eXtplorer before 2.1.7 allow remote attackers to inject arbitrary web script or HTML via unspecified vectors.

Alerts:
Debian-LTS DLA-453-1 extplorer 2016-05-03
Debian-LTS DLA-296-1 extplorer 2015-08-21

Comments (none posted)

firefox: multiple vulnerabilities

Package(s):firefox CVE #(s):CVE-2015-4473 CVE-2015-4474 CVE-2015-4475 CVE-2015-4477 CVE-2015-4478 CVE-2015-4479 CVE-2015-4480 CVE-2015-4481 CVE-2015-4482 CVE-2015-4483 CVE-2015-4484 CVE-2015-4485 CVE-2015-4486 CVE-2015-4487 CVE-2015-4488 CVE-2015-4489 CVE-2015-4490 CVE-2015-4491 CVE-2015-4492 CVE-2015-4493 CVE-2015-4495
Created:August 17, 2015 Updated:December 2, 2015
Description:
     * MFSA 2015-79/CVE-2015-4473/CVE-2015-4474 Miscellaneous memory safety
       hazards
     * MFSA 2015-80/CVE-2015-4475 (bmo#1175396) Out-of-bounds read with
       malformed MP3 file
     * MFSA 2015-81/CVE-2015-4477 (bmo#1179484) Use-after-free in MediaStream
       playback
     * MFSA 2015-82/CVE-2015-4478 (bmo#1105914) Redefinition of
       non-configurable JavaScript object properties
     * MFSA 2015-83/CVE-2015-4479/CVE-2015-4480/CVE-2015-4493 Overflow issues
       in libstagefright
     * MFSA 2015-84/CVE-2015-4481 (bmo1171518) Arbitrary file overwriting
       through Mozilla Maintenance Service with hard links (only affected
       Windows)
     * MFSA 2015-85/CVE-2015-4482 (bmo#1184500) Out-of-bounds write with
       Updater and malicious MAR file (does not affect openSUSE RPM packages
       which do not ship the updater)
     * MFSA 2015-86/CVE-2015-4483 (bmo#1148732) Feed protocol with POST
       bypasses mixed content protections
     * MFSA 2015-87/CVE-2015-4484 (bmo#1171540) Crash when using shared
       memory in JavaScript
     * MFSA 2015-88/CVE-2015-4491 (bmo#1184009) Heap overflow in gdk-pixbuf
       when scaling bitmap images
     * MFSA 2015-89/CVE-2015-4485/CVE-2015-4486 (bmo#1177948, bmo#1178148)
       Buffer overflows on Libvpx when decoding WebM video
     * MFSA 2015-90/CVE-2015-4487/CVE-2015-4488/CVE-2015-4489 Vulnerabilities
       found through code inspection
     * MFSA 2015-91/CVE-2015-4490 (bmo#1086999) Mozilla Content Security
       Policy allows for asterisk wildcards in violation of CSP specification
     * MFSA 2015-92/CVE-2015-4492 (bmo#1185820) Use-after-free in
       XMLHttpRequest with shared workers
Alerts:
Gentoo 201512-10 firefox 2015-12-30
Gentoo 201605-06 nss 2016-05-31
openSUSE openSUSE-SU-2016:0876-1 thunderbird 2016-03-24
Mageia MGASA-2016-0105 firefox 2016-03-09
Debian DSA-3410-1 icedove 2015-12-01
SUSE SUSE-SU-2015:2081-1 firefox 2015-11-23
Fedora FEDORA-2015-13436 firefox 2015-08-18
Slackware SSA:2015-226-01 firefox 2015-08-14
openSUSE openSUSE-SU-2015:1390-1 firefox 2015-08-14
Fedora FEDORA-2015-13397 firefox 2015-08-15
openSUSE openSUSE-SU-2015:1389-1 firefox 2015-08-14
CentOS CESA-2015:1682 thunderbird 2015-08-25
SUSE SUSE-SU-2015:1528-1 MozillaFirefox, mozilla-nss 2015-09-10
CentOS CESA-2015:1682 thunderbird 2015-08-25
openSUSE openSUSE-SU-2015:1454-1 thunderbird 2015-08-28
SUSE SUSE-SU-2015:1449-1 MozillaFirefox, mozilla-nss 2015-08-28
openSUSE openSUSE-SU-2015:1453-1 thunderbird 2015-08-28
CentOS CESA-2015:1682 thunderbird 2015-08-25

Comments (none posted)

golang: HTTP request smuggling

Package(s):golang CVE #(s):CVE-2015-5739 CVE-2015-5740 CVE-2015-5741
Created:August 18, 2015 Updated:July 28, 2016
Description: From the Red Hat bugzilla entry:

There have been found potentially exploitable flaws in Golang net/http library affecting versions 1.4.2 and 1.5.

Problems:
* Double Content-length headers in a request does not generate a 400 error, the second Content-length is ignored.
* Invalid headers are parsed as valid headers (like "Content Length:" with a space in the middle)
Exploitations:
In a situation where the net/http agent HTTP communication with the final http clients is using some reverse proxy (reverse proxy cache, SSL terminators, etc), some requests can be made exploiting the net/http HTTP protocol violations.

Alerts:
openSUSE openSUSE-SU-2016:1894-1 go 2016-07-27
Fedora FEDORA-2015-15618 golang 2015-10-01
Fedora FEDORA-2015-15619 golang 2015-10-01
Fedora FEDORA-2015-13002 golang 2015-08-18
Fedora FEDORA-2015-12957 golang 2015-08-18

Comments (none posted)

jasper: denial of service

Package(s):jasper CVE #(s):CVE-2015-5203
Created:August 26, 2015 Updated:September 19, 2016
Description: From the Arch Linux advisory:

A double free issue has been discovered in the function jasper_image_stop_load. This vulnerability can be triggered by loading a specially crafted image through jasper.

A remote attacker is able to send a specially crafted image that triggers a double free leading to denial of service.

Alerts:
openSUSE openSUSE-SU-2016:2737-1 jasper 2016-11-05
openSUSE openSUSE-SU-2016:2722-1 jasper 2016-11-04
Fedora FEDORA-2016-bbecf64af4 jasper 2016-09-21
Fedora FEDORA-2016-5a7e745a56 jasper 2016-09-18
Mageia MGASA-2016-0298 jasper 2016-09-16
Fedora FEDORA-2016-7776983633 jasper 2016-08-15
Arch Linux ASA-201612-9 jasper 2016-12-09
openSUSE openSUSE-SU-2016:2833-1 jasper 2016-11-17
Arch Linux ASA-201508-10 jasper 2015-08-26

Comments (none posted)

kdepim: no attachment encryption

Package(s):kdepim CVE #(s):CVE-2014-8878
Created:August 18, 2015 Updated:August 26, 2015
Description: From the Mageia advisory:

This update fixes a security vulnerability in kdepim : kmail doesn't encrypt attachments when "automatic encryption" is selected

Alerts:
Mageia MGASA-2015-0315 kdepim 2015-08-18

Comments (none posted)

libstruts1.2-java: unclear vulnerability

Package(s):libstruts1.2-java CVE #(s):CVE-2014-0899
Created:August 18, 2015 Updated:August 26, 2015
Description: From the Debian-LTS advisory:

The Validator in Apache Struts 1.1 and later contains a function to efficiently define rules for input validation across multiple pages during screen transitions. This function contains a vulnerability where input validation may be bypassed. When the Apache Struts 1 Validator is used, the web application may be vulnerable even when this function is not used explicitly.

Alerts: (No alerts in the database for this vulnerability)

Comments (none posted)

mediawiki: multiple vulnerabilities

Package(s):mediawiki CVE #(s):
Created:August 24, 2015 Updated:August 26, 2015
Description: From the Mediawiki advisory:

I would like to announce the release of MediaWiki 1.25.2, 1.24.3, and 1.23.10.

* Internal review discovered that Special:DeletedContributions did not properly protect the IP of autoblocked users. This fix makes the functionality of Special:DeletedContributions consistent with Special:Contributions and Special:BlockList.

* Internal review discovered that watchlist anti-csrf tokens were not being compared in constant time, which could allow various timing attacks. This could allow an attacker to modify a user's watchlist via csrf.

* John Menerick reported that MediaWiki's thumb.php failed to sanitize various error messages, resulting in xss.

Additionally, several extensions have been updated to fix security issues.

Alerts: (No alerts in the database for this vulnerability)

Comments (none posted)

mysql: unspecified vulnerability

Package(s):rh-mysql56-mysql CVE #(s):CVE-2015-4756
Created:August 17, 2015 Updated:August 26, 2015
Description: From the Red Hat advisory:

CVE-2015-4756 mysql: unspecified vulnerability related to Server:InnoDB

Alerts:
Gentoo 201610-06 mysql 2016-10-11
openSUSE openSUSE-SU-2015:1629-1 mysql-community-server 2015-09-25
Red Hat RHSA-2015:1646-01 rh-mariadb100-mariadb 2015-08-20
Red Hat RHSA-2015:1630-01 rh-mysql56-mysql 2015-08-17

Comments (none posted)

nagios-plugins: three vulnerabilities

Package(s):nagios-plugins CVE #(s):CVE-2014-4702 CVE-2014-4701 CVE-2014-4703
Created:August 18, 2015 Updated:August 26, 2015
Description: From a Red Hat bugzilla entry:

CVE-2014-4702: Similar to the CVE-2014-4701 issue in the check_dhcp plug-in, the same flaw was found to affect check_icmp. A local attacker could obtain sensitive information by using this flaw to read parts of INI configuration files that belong to the root user.

From another Red Hat bugzilla entry:

CVE-2014-4701, CVE-2014-4703: It was reported that check_dhcp plugin allow local unprivileged user to read parts of INI config files belonging to root on a local system. It could allow an attacker to obtain sensitive information like passwords that should only be accessible by root user. The vulnerability is due to check_dhcp plugin having Root SUID permissions and inappropriate access control when reading user provided config file (through --extra-opts= option).

Alerts:
Fedora FEDORA-2015-12987 nagios-plugins 2015-08-18
Fedora FEDORA-2015-12972 nagios-plugins 2015-08-18

Comments (none posted)

net-snmp: code execution

Package(s):net-snmp CVE #(s):CVE-2015-5621
Created:August 18, 2015 Updated:September 8, 2015
Description: From the Red Hat advisory:

It was discovered that the snmp_pdu_parse() function could leave incompletely parsed varBind variables in the list of variables. A remote, unauthenticated attacker could use this flaw to crash snmpd or, potentially, execute arbitrary code on the system with the privileges of the user running snmpd. (CVE-2015-5621)

Alerts:
Ubuntu USN-2711-1 net-snmp 2015-08-17
Scientific Linux SLSA-2015:1636-1 net-snmp 2015-08-17
Oracle ELSA-2015-1636 net-snmp 2015-08-17
Oracle ELSA-2015-1636 net-snmp 2015-08-17
Mandriva MDVSA-2015:229 net-snmp 2015-05-06
Mageia MGASA-2015-0187 net-snmp 2015-05-05
CentOS CESA-2015:1636 net-snmp 2015-08-17
CentOS CESA-2015:1636 net-snmp 2015-08-17
Red Hat RHSA-2015:1636-01 net-snmp 2015-08-17
openSUSE openSUSE-SU-2015:1502-1 net-snmp 2015-09-07

Comments (none posted)

openshift: privilege escalation

Package(s):openshift CVE #(s):CVE-2015-5222
Created:August 21, 2015 Updated:August 26, 2015
Description: From the Red Hat advisory:

An improper permission check issue was discovered in the server admission control component in OpenShift. A user with build permissions could use this flaw to execute arbitrary shell commands on a build pod with the privileges of the root user.

Alerts:
Red Hat RHSA-2015:1650-01 openshift 2015-08-20

Comments (none posted)

openssh: multiple vulnerabilities

Package(s):openssh CVE #(s):CVE-2015-6565 CVE-2015-6563 CVE-2015-6564
Created:August 19, 2015 Updated:August 26, 2015
Description:

From the OpenSSH release notes:

sshd(8): OpenSSH 6.8 and 6.9 incorrectly set TTYs to be world- writable. Local attackers may be able to write arbitrary messages to logged-in users, including terminal escape sequences. Reported by Nikolay Edigaryev. (CVE-2015-6565)

sshd(8): Portable OpenSSH only: Fixed a privilege separation weakness related to PAM support. Attackers who could successfully compromise the pre-authentication process for remote code execution and who had valid credentials on the host could impersonate other users. Reported by Moritz Jodeit. (CVE-2015-6563)

sshd(8): Portable OpenSSH only: Fixed a use-after-free bug related to PAM support that was reachable by attackers who could compromise the pre-authentication process for remote code execution. Also reported by Moritz Jodeit. (CVE-2015-6564)

Alerts:
Scientific Linux SLSA-2015:2088-6 openssh 2015-12-21
Scientific Linux SLSA-2016:0741-1 openssh 2016-06-08
Red Hat RHSA-2016:0741-01 openssh 2016-05-10
Gentoo 201512-04 openssh 2015-12-21
Red Hat RHSA-2015:2088-06 openssh 2015-11-19
SUSE SUSE-SU-2015:1581-1 openssh 2015-09-21
Mageia MGASA-2015-0321 openssh 2015-08-21
Fedora FEDORA-2015-13520 openssh 2015-08-19
Fedora FEDORA-2015-13469 openssh 2015-08-27

Comments (none posted)

openstack-neutron: denial of service

Package(s):openstack-neutron CVE #(s):CVE-2015-3221
Created:August 25, 2015 Updated:August 26, 2015
Description: From the Red Hat advisory:

A Denial of Service flaw was found in the L2 agent when using the IPTables firewall driver. By submitting an address pair that will be rejected as invalid by the ipset tool, an attacker may cause the agent to crash.

Alerts:
Red Hat RHSA-2015:1680-01 openstack-neutron 2015-08-24

Comments (none posted)

owncloud: three vulnerabilities

Package(s):owncloud CVE #(s):CVE-2015-4715 CVE-2015-4717 CVE-2015-4718
Created:August 14, 2015 Updated:August 26, 2015
Description: From the Mageia advisory:

In ownCloud before 6.0.8 and 8.0.4, a bug in the SDK used to connect ownCloud against the Dropbox server might allow the owner of "Dropbox.com" to gain access to any files on the ownCloud server if an external Dropbox storage was mounted (CVE-2015-4715).

In ownCloud before 6.0.8 and 8.0.4, the sanitization component for filenames was vulnerable to DoS when parsing specially crafted file names passed via specific endpoints. Effectively this lead to a endless loop filling the log file until the system is not anymore responsive (CVE-2015-4717).

In ownCloud before 6.0.8 and 8.0.4, the external SMB storage of ownCloud was not properly neutralizing all special elements which allows an adversary to execute arbitrary SMB commands. This was caused by improperly sanitizing the ";" character which is interpreted as command separator by smbclient (the used software to connect to SMB shared by ownCloud). Effectively this allows an attacker to gain access to any file on the system or overwrite it, finally leading to a PHP code execution in the case of ownCloud’s config file (CVE-2015-4718).

Alerts:
Debian DSA-3373-1 owncloud 2015-10-18
Mageia MGASA-2015-0314 owncloud 2015-08-13

Comments (none posted)

pcre: code execution

Package(s):pcre CVE #(s):CVE-2015-8381
Created:August 14, 2015 Updated:December 2, 2015
Description: From the Red Hat bugzilla entry:

Latest version of PCRE is prone to a Heap Overflow vulnerability which could caused by the following regular expression.

    /(?J:(?|(:(?|(?'R')(\k'R')|((?'R')))H'Rk'Rf)|s(?'R'))))/
Alerts:
Red Hat RHSA-2016:2750-01 rh-php56 2016-11-15
Gentoo 201607-02 libpcre 2016-07-09
Red Hat RHSA-2016:1132-01 rh-mariadb100-mariadb 2016-05-26
openSUSE openSUSE-SU-2016:3099-1 pcre 2016-12-12
Ubuntu USN-2943-1 pcre3 2016-03-29
Arch Linux ASA-201508-11 pcre 2015-08-26
Fedora FEDORA-2015-12921 pcre 2015-08-13
Mageia MGASA-2015-0343 pcre 2015-09-08
Fedora FEDORA-2015-14242 pcre 2015-09-11
Fedora FEDORA-2015-14235 pcre 2015-09-11

Comments (none posted)

php: multiple vulnerabilities

Package(s):php CVE #(s):
Created:August 24, 2015 Updated:August 26, 2015
Description: The php package has been updated to version 5.6.12, fixing several bugs and security issues. See the upstream Changelog for more details.

Also 5.5.28 has been released: upstream changelog.

Alerts: (No alerts in the database for this vulnerability)

Comments (none posted)

python-django: multiple vulnerabilities

Package(s):python-django CVE #(s):CVE-2015-5963 CVE-2015-5964
Created:August 19, 2015 Updated:October 16, 2015
Description:

From the Debian advisory:

Lin Hua Cheng discovered that a session could be created when anonymously accessing the django.contrib.auth.views.logout view. This could allow remote attackers to saturate the session store or cause other users' session records to be evicted.

Additionally the contrib.sessions.backends.base.SessionBase.flush() and cache_db.SessionStore.flush() methods have been modified to avoid creating a new empty session as well.

Alerts:
Fedora FEDORA-2015-1dd5bc998f python-django 2015-11-19
Red Hat RHSA-2015:1894-01 python-django 2015-10-15
Red Hat RHSA-2015:1876-01 python-django 2015-10-08
openSUSE openSUSE-SU-2015:1598-1 python-django 2015-09-22
openSUSE openSUSE-SU-2015:1580-1 python-Django 2015-09-19
Mageia MGASA-2015-0327 python-django, python-django14 2015-08-27
Arch Linux ASA-201508-9 python-django 2015-08-25
Ubuntu USN-2720-1 python-django 2015-08-18
Debian DSA-3338-1 python-django 2015-08-18
Red Hat RHSA-2015:1766-01 python-django 2015-09-10
Debian-LTS DLA-301-1 python-django 2015-08-26
Red Hat RHSA-2015:1767-01 python-django 2015-09-10

Comments (none posted)

python-django-horizon: cross-site scripting

Package(s):python-django-horizon CVE #(s):CVE-2015-3219 CVE-2015-3988
Created:August 25, 2015 Updated:August 26, 2015
Description: From the CVE entries:

Cross-site scripting (XSS) vulnerability in the Orchestration/Stack section in OpenStack Dashboard (Horizon) 2014.2 before 2014.2.4 and 2015.1.x before 2015.1.1 allows remote attackers to inject arbitrary web script or HTML via the description parameter in a heat template, which is not properly handled in the help_text attribute in the Field class. (CVE-2015-3219)

Multiple cross-site scripting (XSS) vulnerabilities in OpenStack Dashboard (Horizon) 2015.1.0 allow remote authenticated users to inject arbitrary web script or HTML via the metadata to a (1) Glance image, (2) Nova flavor or (3) Host Aggregate. (CVE-2015-3988)

Alerts:
Debian DSA-3617-1 horizon 2016-07-06
Red Hat RHSA-2015:1679-01 python-django-horizon 2015-08-24

Comments (none posted)

qemu: two vulnerabilities

Package(s):qemu CVE #(s):CVE-2015-5166 CVE-2015-5165
Created:August 18, 2015 Updated:September 28, 2015
Description: From a Red Hat bugzilla entry:

CVE-2015-5165: Qemu emulator built with the RTL8139 emulation support is vulnerable to an information leakage flaw. It could occur while processing network packets under RTL8139 controller's C+ mode of operation.

A guest user could use this flaw to read uninitialised Qemu heap memory upto 65K bytes.

From another Red Hat bugzilla entry:

CVE-2015-5166: Qemu emulator built with the IDE Emulation PCI PIIX3/4 support is vulnerable to a use-after-free flaw. It could occur when trying to write data to an I/O port inside guest. This issue is specific to the Xen platform.

A privileged(CAP_SYS_RAWIO) guest user on the Xen platform could use this flaw to crash the Qemu instance or probably attempt to make a guest escape.

Alerts:
Oracle ELSA-2016-0997 qemu-kvm 2016-05-17
Debian-LTS DLA-479-1 xen 2016-05-18
Mageia MGASA-2016-0098 xen 2016-03-07
openSUSE openSUSE-SU-2015:2003-1 xen 2015-11-17
openSUSE openSUSE-SU-2015:1964-1 xen 2015-11-12
SUSE SUSE-SU-2015:1643-1 Xen 2015-09-25
Fedora FEDORA-2015-15946 xen 2015-09-26
Fedora FEDORA-2015-15944 xen 2015-09-27
Scientific Linux SLSA-2015:1833-1 qemu-kvm 2015-09-22
Oracle ELSA-2015-1833 qemu-kvm 2015-09-22
CentOS CESA-2015:1833 qemu-kvm 2015-09-22
Red Hat RHSA-2015:1833-01 qemu-kvm 2015-09-22
Scientific Linux SLSA-2015:1793-1 qemu-kvm 2015-09-15
Oracle ELSA-2015-1793 qemu-kvm 2015-09-15
Red Hat RHSA-2015:1793-01 qemu-kvm 2015-09-15
Mageia MGASA-2015-0368 qemu 2015-09-15
Ubuntu USN-2724-1 qemu, qemu-kvm 2015-08-27
Red Hat RHSA-2015:1683-01 qemu-kvm-rhev 2015-08-25
Red Hat RHSA-2015:1674-01 qemu-kvm-rhev 2015-08-24
SUSE SUSE-SU-2015:1421-1 xen 2015-08-21
Fedora FEDORA-2015-13402 qemu 2015-08-18
Debian DSA-3349-1 qemu-kvm 2015-09-02
Debian DSA-3348-1 qemu 2015-09-02
Mageia MGASA-2015-0369 qemu 2015-09-15
Red Hat RHSA-2015:1718-01 qemu-kvm-rhev 2015-09-03
SUSE SUSE-SU-2015:1479-2 xen 2015-09-02
SUSE SUSE-SU-2015:1479-1 xen 2015-09-02
Fedora FEDORA-2015-13404 qemu 2015-09-01

Comments (none posted)

request-tracker4: cross-site scripting

Package(s):request-tracker4 CVE #(s):CVE-2015-5475
Created:August 13, 2015 Updated:August 26, 2015
Description: From the Debian advisory:

It was discovered that Request Tracker, an extensible trouble-ticket tracking system is susceptible to a cross-site scripting attack via the user [and] group rights management pages (CVE-2015-5475) and via the cryptography interface, allowing an attacker with a carefully-crafted key to inject JavaScript into RT's user interface. Installations which use neither GnuPG nor S/MIME are unaffected by the second cross-site scripting vulnerability.

Alerts:
Debian DSA-3335-1 request-tracker4 2015-08-13
Fedora FEDORA-2015-13664 rt 2015-08-27
Fedora FEDORA-2015-13718 rt 2015-08-27

Comments (none posted)

roundup: multiple vulnerabilities

Package(s):roundup CVE #(s):CVE-2012-6130 CVE-2012-6131 CVE-2012-6132 CVE-2012-6133
Created:August 24, 2015 Updated:August 26, 2015
Description: From the CVE entries:

Cross-site scripting (XSS) vulnerability in the history display in Roundup before 1.4.20 allows remote attackers to inject arbitrary web script or HTML via a username, related to generating a link. (CVE-2012-6130)

Cross-site scripting (XSS) vulnerability in cgi/client.py in Roundup before 1.4.20 allows remote attackers to inject arbitrary web script or HTML via the @action parameter to support/issue1. (CVE-2012-6131)

Cross-site scripting (XSS) vulnerability in Roundup before 1.4.20 allows remote attackers to inject arbitrary web script or HTML via the otk parameter. (CVE-2012-6132)

From the Debian LTS advisory:

XSS flaws in ok and error messages
We solve this differently from the proposals in the bug-report by not allowing *any* html-tags in ok/error messages anymore. (CVE-2012-6133)

Alerts:
Debian-LTS DLA-298-1 roundup 2015-08-23

Comments (none posted)

ruby: information disclosure

Package(s):ruby1.8 CVE #(s):CVE-2009-5147
Created:August 26, 2015 Updated:December 17, 2015
Description: From the Debian LTS advisory:

"sheepman" fixed a vulnerability in Ruby 1.8: DL::dlopen could open a library with tainted name even if $SAFE > 0.

Alerts:
Fedora FEDORA-2015-c4409eb73a ruby 2016-01-08
Fedora FEDORA-2015-eef21b972e ruby 2015-12-29
Arch Linux ASA-201512-11 ruby 2015-12-17
Debian-LTS DLA-300-1 ruby1.9.1 2015-08-26
Debian-LTS DLA-299-1 ruby1.8 2015-08-26

Comments (none posted)

strongswan: incorrect payload processing

Package(s):strongswan CVE #(s):CVE-2015-3991
Created:August 19, 2015 Updated:August 26, 2015
Description:

From the Fedora advisory:

Incorrect payload processing for different IKE versions.

Alerts:
Fedora FEDORA-2015-5279 strongswan 2015-08-19
Fedora FEDORA-2015-5247 strongswan 2015-08-19

Comments (none posted)

twig: code execution

Package(s):twig CVE #(s):
Created:August 26, 2015 Updated:August 26, 2015
Description: From the Debian advisory:

James Kettle, Alain Tiemblo, Christophe Coevoet and Fabien Potencier discovered that twig, a templating engine for PHP, did not correctly process its input. End users allowed to submit twig templates could use specially crafted code to trigger remote code execution, even in sandboxed templates.

Alerts:
Debian DSA-3343-1 twig 2015-08-26

Comments (none posted)

uwsgi: denial of service

Package(s):uwsgi CVE #(s):
Created:August 18, 2015 Updated:August 26, 2015
Description: From the uwsgi announcement:

Hi, an emergency release fixing an HTTPS resource leak (spotted by André Cruz) is available

http://uwsgi-docs.readthedocs.org/en/latest/Changelog-2.0.11.1.html

If you use the uWSGI https router you should upgrade to avoid excessive file descriptors and memory allocation.

Alerts:
Fedora FEDORA-2015-12032 uwsgi 2015-08-18
Fedora FEDORA-2015-12020 uwsgi 2015-08-18

Comments (none posted)

virtualbox: unspecified vulnerability

Package(s):virtualbox CVE #(s):CVE-2015-2594
Created:August 18, 2015 Updated:September 14, 2015
Description: From the SUSE bug tracker:

Unspecified vulnerability in the Oracle VM VirtualBox component in Oracle Virtualization VirtualBox prior to 4.0.32, 4.1.40, 4.2.32, and 4.3.30 allows local users to affect confidentiality, integrity, and availability via unknown vectors related to Core.

Alerts:
Debian-LTS DLA-313-1 virtualbox-ose 2015-09-29
openSUSE openSUSE-SU-2015:1400-1 virtualbox 2015-08-18
Debian DSA-3359-1 virtualbox 2015-09-13

Comments (none posted)

vlc: code execution

Package(s):vlc CVE #(s):CVE-2015-5949
Created:August 20, 2015 Updated:February 17, 2016
Description: From the Debian advisory:

Loren Maggiore of Trail of Bits discovered that the 3GP parser of VLC, a multimedia player and streamer, could dereference an arbitrary pointer due to insufficient restrictions on a writable buffer. This could allow remote attackers to execute arbitrary code via crafted 3GP files.

Alerts:
Gentoo 201603-08 vlc 2016-03-12
openSUSE openSUSE-SU-2016:0476-1 vlc 2016-02-16
Debian DSA-3342-1 vlc 2015-08-20
Mageia MGASA-2015-0324 vlc 2015-08-25
Mageia MGASA-2015-0329 vlc 2015-08-27

Comments (none posted)

webkitgtk4: three unspecified vulnerabilities

Package(s):webkitgtk4 CVE #(s):
Created:August 18, 2015 Updated:August 26, 2015
Description: From the Fedora advisory:

WebKitGTK+ 2.8.5 includes fixes for 3 security issues.

Alerts:
Fedora FEDORA-2015-13001 webkitgtk4 2015-08-18

Comments (none posted)

wireshark: multiple vulnerabilities

Package(s):wireshark CVE #(s):
Created:August 24, 2015 Updated:August 26, 2015
Description: From the openSUSE advisory:

Wireshark was updated to fix several security vulnerabilities and bugs.

- Wireshark 1.12.7 [boo#941500] The following vulnerabilities have been fixed:

* Wireshark could crash when adding an item to the protocol tree. wnpa-sec-2015-21

* Wireshark could attempt to free invalid memory. wnpa-sec-2015-22

* Wireshark could crash when searching for a protocol dissector. wnpa-sec-2015-23

* The ZigBee dissector could crash. wnpa-sec-2015-24

* The GSM RLC/MAC dissector could go into an infinite loop. wnpa-sec-2015-25

* The WaveAgent dissector could crash. wnpa-sec-2015-26

* The OpenFlow dissector could go into an infinite loop. wnpa-sec-2015-27

* Wireshark could crash due to invalid ptvcursor length checking. wnpa-sec-2015-28

* The WCCP dissector could crash. wnpa-sec-2015-29

* Further bug fixes and updated protocol support as listed in: https://www.wireshark.org/docs/relnotes/wireshark-1.12.7....

Alerts: (No alerts in the database for this vulnerability)

Comments (none posted)

zendframework: XML external entity attack

Package(s):zendframework CVE #(s):CVE-2015-5161
Created:August 20, 2015 Updated:September 15, 2015
Description: From the Debian advisory:

Dawid Golunski discovered that when running under PHP-FPM in a threaded environment, Zend Framework, a PHP framework, did not properly handle XML data in multibyte encoding. This could be used by remote attackers to perform an XML External Entity attack via crafted XML data.

Alerts:
SUSE SUSE-SU-2016:1638-1 php53 2016-06-21
Debian-LTS DLA-499-1 php5 2016-05-31
Fedora FEDORA-2015-f1e18131bc php-ZendFramework 2015-11-09
Fedora FEDORA-2015-6d70a701bf php-ZendFramework 2015-11-09
Fedora FEDORA-2015-2e7c06c639 php-ZendFramework 2015-11-08
Debian DSA-3340-1 zendframework 2015-08-19
Fedora FEDORA-2015-13488 php-guzzle-Guzzle 2015-08-27
Fedora FEDORA-2015-13488 php-ZendFramework2 2015-08-27
Fedora FEDORA-2015-13529 php-guzzle-Guzzle 2015-08-27
Mageia MGASA-2015-0370 php-ZendFramework 2015-09-15
Mageia MGASA-2015-0371 php-ZendFramework 2015-09-15
Fedora FEDORA-2015-13529 php-ZendFramework2 2015-08-27
Debian-LTS DLA-302-1 zendframework 2015-08-27

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 4.2-rc8, released on August 23. In the end, Linus decided to wait one more week before putting out the final 4.2 release. "It's not like there are any real outstanding issues, and I waffled between just doing the release and doing another -rc. But we did have another low-level x86 issue come up this week, and together with the fact that a number of people are on vacation, I decided that waiting an extra week isn't going to hurt. But it was close. It's a fairly small rc8, and I really feel like it could have gone either way."

Previously, 4.2-rc7 came out on August 16.

Stable updates: 4.1.6, 3.14.51, and 3.10.87 were released on August 17.

Comments (none posted)

The bcachefs filesystem

Kent Overstreet, author of the bcache block caching layer, has announced that bcache has metamorphosed into a fully featured copy-on-write filesystem. "Well, years ago (going back to when I was still at Google), I and the other people working on bcache realized that what we were working on was, almost by accident, a good chunk of the functionality of a full blown filesystem - and there was a really clean and elegant design to be had there if we took it and ran with it. And a fast one - the main goal of bcachefs to match ext4 and xfs on performance and reliability, but with the features of btrfs/zfs."

Comments (94 posted)

Kernel development news

The bcachefs filesystem

By Jonathan Corbet
August 25, 2015
The Linux kernel does not lack for filesystem support; many dozens of filesystem implementations are available for one use case or another. But, after all these years, Linux arguably lacks an established "next-generation" filesystem with advanced features and a design suited to contemporary hardware. That situation holds despite the existence of a number of competitors for that title; Btrfs remains at the top of the list, but others, such as tux3 and (still!) reiser4, are out there as well. In each case, it has taken rather longer than expected for the code to reach the required level of maturity. The list of putative next-generation filesystems has just gotten longer with the recent announcement of the "bcachefs" filesystem.

Bcachefs is an extension of bcache, which first appeared in LWN in 2010. Bcache was designed as a caching layer that improves block I/O performance by using a fast solid-state drive as a cache for a (slower, larger) underlying storage device. Bcache has been steadily developed over the last five years; it was merged into the mainline kernel during the 3.10 development cycle in 2013.

Mainline bcache is not a filesystem; instead, it looks like a special kind of block device. It manages the movement of blocks of data between fast and slow storage, working to ensure that the most frequently used data is kept on the faster device. This task is complex; bcache must manage data in a way that yields high performance while ensuring that no data is ever lost, even in the face of an unclean shutdown. Even so, at its interface to the rest of the system, bcache looks like a simple block device: give it numbered blocks of data, and it will store (and retrieve) them.

Users typically want something a bit higher-level than that; they want to be able to organize blocks into files, and files into directory hierarchies. That task is handled by a filesystem like ext4 or Btrfs. Thus, on current systems, bcache will be used in conjunction with a filesystem layer to provide a complete solution.

It seems that, over time, bcache has developed the potential to provide filesystem functionality on its own. In the bcachefs announcement, Kent Overstreet said:

Well, years ago (going back to when I was still at Google), I and the other people working on bcache realized that what we were working on was, almost by accident, a good chunk of the functionality of a full blown filesystem - and there was a really clean and elegant design to be had there if we took it and ran with it.

The actual running with this idea appears to have happened relatively recently, with the first publicly visible version of the bcachefs code being committed to the bcache repository in May 2015. Since then, it has seen a steady stream of commits from Kent; it was announced on the bcache mailing list in mid-July, and on linux-kernel just over a month later.

With the bcachefs code added, bcache has gained the namespace and file-management features that, until now, had to be supplied by a separate filesystem layer. Like Btrfs, it is a copy-on-write filesystem, meaning that data is never overwritten. Instead, a block that is overwritten moves to a new location, with the older version persisting as long as any references to it remain. Copy-on-write works well on solid-state storage devices and makes a number of advanced features relatively easy to implement.

Since the original bcache was a block-device management layer, bcachefs has some strong features in this area. Naturally, it offers multi-tier hybrid caching of data, and is able to integrate multiple physical devices into a single logical volume. Bcachefs does not appear to have any sort of higher-level RAID capability at this time, though; a basic replication mechanism is "like 80% done". Features like data checksumming and compression are supported.

The plans for the future include filesystem features like snapshots — an important Btrfs feature that is not yet available in bcachefs. Kent listed erasure coding as well, presumably as an alternative to higher-level RAID support. Native support for shingled magnetic recording drives is on the list, as is support for working with raw flash storage directly.

But none of those features are present in bcachefs now; work has been focused on getting the basic filesystem working in a reliable manner. Performance tuning has not been a priority thus far, but the filesystem claims reasonable performance numbers already — though, as Kent admitted, it suffers from the common (to copy-on-write filesystems) problem of "filling up" well before the underlying storage is actually filled with data. Importantly, the on-disk filesystem format has not yet been finalized — a clear sign that a filesystem is not yet ready for real-world use.

Another important (though unlisted) missing feature is a filesystem integrity checker ("fsck") utility.

Bcachefs looks like a promising filesystem, even if many of the intended features have not yet been implemented. But those who have watched filesystem development for any period of time will know what comes next: a surprisingly long wait while the code matures to the point that it can actually be trusted for production workloads. This process, it seems, cannot be hurried beyond a certain point; that is why other next-generation filesystem efforts are seemingly never quite ready. The low-level device-management code in bcachefs is tested and production-quality, but the filesystem code lacks that pedigree. Kent says that it "won't be done in a month (or a year)", but the truth is that it may not be done for several years yet; that is how filesystem development tends to go.

How many years depends, of course, on how many people test the filesystem and how much development effort it gets. Currently it has a development community of one — Kent — and he has noted that his full-time attention is "only going to last as long as my interest and my savings account hold out". If bcachefs acquires both a commercial sponsor and a wider development community, it may yet develop into that mature next-generation filesystem that we seem to never quite get (though Btrfs is there by some accounts). Until that happens, it should probably be looked at as an interesting idea with some advanced proof-of-concept code.

Comments (7 posted)

Steps toward power-aware scheduling

By Jonathan Corbet
August 25, 2015
Power-aware scheduling appears to have become one of those perennial linux-kernel topics that never quite reach a conclusion. Nobody disputes the existence of a problem to be solved, and potential solutions are not in short supply. But somehow none of those solutions ever quite makes it to the point of being ready for incorporation into the mainline scheduler. A few new patch sets showing a different approach to the problem have made the rounds recently. They may not be ready for merging either, but they do show how the understanding of the problem is evolving.

A sticking point in recent years has been the fact that there are a few subsystems related to power management and scheduling, and they are poorly integrated with each other. The cpuidle subsystem makes guesses about how deeply an idle CPU should sleep, but it does so based on recent history and without a view into the system's current workload. The cpufreq mechanism tries to observe the load on each CPU to determine the frequency and voltage the CPU should be operating at, but it doesn't talk to the scheduler at all. The scheduler, in turn, has no view of a CPU's operating parameters and, thus, cannot make optimal scheduling decisions.

It has become clear that this scattered set of mechanisms needs to be cleaned up before meaningful progress can be made on the current problem set. The scheduler maintainers have made it clear that they won't be interested in solutions that don't bring the various control mechanisms closer together.

Improved integration

One possible part of the answer is this patch set from Michael Turquette, currently in its third revision. Michael's patch replaces the current array of cpufreq governors with a new governor that is integrated with the scheduler. In essence, the scheduler occasionally calls directly into the governor, passing it a value describing the load that, the scheduler thinks, is currently set to run on the CPU. The governor can then select a frequency/voltage pair that enables the CPU to execute that load most efficiently.

The projected load on each CPU is generated by the per-entity load tracking subsystem. Since each process has its own tracked load, the scheduler can quickly sum up the load presented by all of the runnable processes on a CPU and pass that number on to the governor. If a process changes its state or is moved to another CPU, the load values can be updated immediately. That should make the new governor much more responsive than current governors, which must observe the CPU for a while to determine that a change needs to be made.

The per-entity load tracking code was a big step forward when it was added to the scheduler, but it still has some shortcomings. In particular, its concept of load is not tied to the CPU any given process might be running on. If different CPUs are running at different frequencies, the loads computed for processes on those CPUs will not be comparable. The problem gets worse on systems (like those based on the big.LITTLE architecture) where some CPUs are inherently more powerful than others.

The solution to this problem appears to be Morten Rasmussen's compute-capacity-invariant load/utilization tracking patch set. With these patches applied, all load and utilization values calculated by the scheduler are scaled relative to the current CPU capacity. That makes these values uniform across the system, allowing the scheduler to better judge the effects of moving a process from one CPU to another. It also will clearly help the power-management problem: matching CPU capacity to the projected load will work better if the load values are well-calibrated and understood.

With those two patch sets in place, the scheduler will be better equipped to run the system in a relatively power-efficient manner (though related issues like optimal task placement have not yet been addressed here). In the real world, though, not everybody wants to run in the most efficient mode all the time. Some systems may be managed more for performance than for power efficiency; the desired policy on other systems may vary depending on what jobs are running at the time. Linux currently supports a number of CPU-frequency governors designed to implement different policies; if the scheduler-driven governor is to replace all of those, it, too, must be able to support multiple policies.

Schedtune

One possible step in that direction can be seen in this patch set from Patrick Bellasi. It adds a tuning mechanism to the scheduler-driven governor so that multiple policies become possible. At its simplest, this tuning takes the form of a single, global value, stored in /proc/sys/kernel/sched_cfs_boost. The default value for this parameter is zero, which indicates that the system should be run for power efficiency. Higher values, up to 100, bias CPU frequency selection toward performance.

The exact meaning of this knob is fairly straightforward. At any given time, the scheduler can calculate the CPU capacity that it expects the currently runnable processes to require. The space between that capacity and the maximum capacity the CPU can provide is called the "margin." A non-zero value of sched_cfs_boost describes the percentage of the margin that should be made available via a more aggressive CPU-frequency/voltage selection.

So, for example, if the current load requires a CPU running at 60% capacity, the margin is 40%. Setting sched_cfs_boost to 50 will cause 50% of that margin to be made available, so the CPU should run at 80% of its maximum capacity. If sched_cfs_boost is set to 100, the CPU will always run at its maximum speed, optimizing the system as a whole for performance.

What about situations where the desired policy varies over time? A phone handset may want to run with higher performance while a phone call is active or when the user is interacting with the screen, but in the most efficient mode possible while checking for the day's obligatory pile of app updates. One could imagine making the desired power policy a per-process attribute, but Patrick, instead, opted to use the control-group mechanism instead.

With Patrick's patch set comes a new controller called "schedtune". That controller offers a single knob, called schedtune.boost, to describe the policy that should apply to processes within the group. One possible implementation would be to change the CPU's operating parameters every time a new process starts running, but there are a couple of problems with that approach. It could lead to excessive changing of CPU frequency and voltage, which can be counterproductive. Beyond that, though, a process needing high performance could find itself waiting behind another that doesn't; if the CPU runs slowly during that wait, the high-performance process may not get the response time it needs.

To avoid such problems, the controller looks at all running processes on the CPU and finds the one with the largest boost value. That value is then used to run all processes on the CPU.

The schedtune controller as currently implemented has a couple of interesting limitations. It can only handle a two-level control group hierarchy, and it can manage a maximum of sixteen possible groups. Neither of these characteristics fits well with the new, unified-hierarchy model for control groups, so the schedtune controller is highly likely to require modification before this patch set could be considered for merging into the mainline.

But, then, experience says that eventual merging may be a distant prospect in any case. The scheduler must work well for a huge variety of workloads, and cannot be optimized for one at the expense of others. Finding a way to add power awareness to the scheduler in a way that works for all workloads was never going to be an easy task. The latest patches show that progress is being made toward a general-purpose solution that, with luck, leaves the scheduler more flexible and maintainable than before. But whether that progress is reaching the point of being a solution that can be merged remains to be seen.

Comments (14 posted)

Porting Linux to a new processor architecture, part 1: The basics

August 26, 2015

This article was contributed by Joël Porquet

Although a simple port may count as little as 4000 lines of code—exactly 3,775 for the mmu-less Hitachi 8/300 recently reintroduced in Linux 4.2-rc1—getting the Linux kernel running on a new processor architecture is a difficult process. Worse still, there is not much documentation available describing the porting process. The aim of this series of three articles is to provide an overview of the procedure, or at least one possible procedure, that can be followed when porting the Linux kernel to a new processor architecture.

After spending countless hours becoming almost fluent in many of the supported architectures, I discovered that a well-defined skeleton shared by the majority of ports exists. Such a skeleton can logically be split into two parts that intersect a great deal. The first part is the boot code, meaning the architecture-specific code that is executed from the moment the kernel takes over from the bootloader until init is finally executed. The second part concerns the architecture-specific code that is regularly executed once the booting phase has been completed and the kernel is running normally. This second part includes starting new threads, dealing with hardware interrupts or software exceptions, copying data from/to user applications, serving system calls, and so on.

Is a new port necessary?

As LWN reported about another porting experience in an article published last year, there are three meanings to the word "porting".

It can be a port to a new board with an already-supported processor on it. Or it can be a new processor from an existing, supported processor family. The third alternative is to port to a completely new architecture.

Sometimes, the answer to whether one should start a new port from scratch is crystal clear—if the new processor comes with a new instruction set architecture (ISA), that is usually a good indicator. Sometimes it is less clear. In my case, it took me a couple weeks to figure out this first question.

At the time, May 2013, I had just been hired by the French academic computer lab LIP6 to port the Linux kernel to TSAR, an academic processor architecture that the system-on-chip research group was designing. TSAR is an architecture that follows many of the current trends: lots of small, single-issue, energy-efficient processor cores around a scalable network-on-chip. It also adds some nice innovations: a full-hardware cache-coherency protocol for both data/instruction caches and translation lookaside buffers (TLBs) as well as physically distributed but logically shared memory.

My dilemma was that the processor cores were compatible with the MIPS32 ISA, which meant the port could fall into the second category: "new processor from an existing processor family". But since TSAR had a virtual-memory model radically different from those of any MIPS processors, I would have been forced to drastically modify the entire MIPS branch in order to introduce this new processor, sometimes having almost no choice but to surround entire files with #ifndef TSAR ... #endif.

Quickly enough, it came down to the most logical—and interesting—conclusion:

    mkdir linux/arch/tsar

Get to know your hardware

Really knowing the underlying hardware is definitely the fundamental, and perhaps most obvious, prerequisite to porting Linux to it.

The specifications of a processor are often—logically or physically—split into a least two parts (as were, for example, the recently published specifications for the new RISC-V processor). The first part usually details the user-level ISA, which basically means the list of user-level instructions that the processor is able to understand—and execute. The second part describes the privileged architecture, which includes the list of kernel-level-only instructions and the various system registers that control the processor status.

This second part contains the majority—if not the entirety—of the information that makes a port special and thus often prevents the developer from opportunely reusing code from other architectures.

Among the important questions that should be answered by such specifications are:

  • What are the virtual-memory model of the processor architecture, the format of the page table, and the translation mechanism?

    Many processor architectures (e.g. x86, ARM, or TSAR) define a flexible virtual-memory layout. Their virtual address space can theoretically be split any way between the user and kernel spaces—although the default layout for 32-bit processors in Linux usually allocates the lower 3GiB to user space and reserves the upper 1GiB for kernel space. In some other architectures, this layout is strongly constrained by the hardware design. For instance, on MIPS32, the virtual address space is statically split into two regions of the same size: the lower 2GiB is dedicated to user space and the upper 2GiB to kernel space; the latter even contains predefined windows into the physical address space.

    The format of the page table is intimately linked to the translation mechanism used by the processor. In the case of a hardware-managed mechanism, when the TLB—a hardware cache of limited size containing recently used translations between virtual and physical addresses—does not contain the translation for a given virtual address (referred to as TLB miss), a hardware state machine will transparently fetch the proper translation from the page table structure in memory and fill the TLB with it. This means that the format of the page table must be fixed—and certainly defined by the processor's specifications. In a software-based mechanism, a TLB miss exception is handled by a piece of code, which theoretically leaves complete liberty as to how the page table is organized—only the format of TLB entries is specified.

  • How to enable/disable the interrupts, switch from privileged mode to user mode and vice-versa, get the cause of an exception, etc.?

    Although all these operations generally only involve reading and/or modifying certain bit fields in the set of available system registers, they are always very particular to each architecture. It is for this reason that, most of the time, they are actually performed by small chunks of dedicated assembly code.

  • What is the ABI?

    Although one could think that the Application Binary Interface (ABI) is only supposed to concern compilation tools, as it defines the way the stack is formatted into stack-frames, the ways arguments and return values are given or returned by functions, etc.; it is actually absolutely necessary to be familiar with it when porting Linux. For example, as the recipient of system calls (which are typically defined by the ABI), the kernel has to know where to get the arguments and how to return a value; or on a context switch, the kernel must know what to save and restore, as well as what constitutes the context of a thread, and so on.

Get to know the kernel

Learning a few kernel concepts, especially concerning the memory layout used by Linux, will definitely help. I admit it took me a while to understand what exactly was the distinction between low memory and high memory, and between the direct mapping and vmalloc regions.

For a typical and simple port (to a 32-bit processor), in which the kernel occupies the upper 1GiB of the virtual address space, it is usually fairly straightforward. Within this 1GiB, Linux defines that the lower portion of it will be directly mapped to the lower portion of the system memory (hence referred to as low memory): meaning that if the kernel accesses the address 0xC0000000, it will be redirected to the physical address 0x00000000.

In contrast, in systems with more physical memory than that which is mappable in the direct mapping region, the upper portion of the system memory (referred to as high memory) is not normally accessible to the kernel. Other mechanisms must be used, such as kmap() and kmap_atomic(), in order to gain temporary access to these high-memory pages.

Above the direct mapping region is the vmalloc region that is controlled by vmalloc(). This allocation mechanism provides the ability to allocate pages of memory in a virtually contiguous way in spite of the fact that these pages may not necessarily be physically contiguous. It is particularly useful for allocating a large amount of memory pages in a virtually contiguous manner, as otherwise it can be impossible to find the equivalent amount of contiguous free physical pages.

Further reading about the memory management in Linux can be found in Linux Device Drivers [PDF] and this LWN article.

How to start?

With your head full of the processor's specifications and kernel principles, it is finally time to add some files to this newly created arch directory. But wait ... where and how should we start? As with any porting or even any code that must respect a certain API, the procedure is a two-step process.

First, a minimal set of files that define a minimal set of symbols (functions, variables, defines) is necessary for the kernel to even compile. This set of files and symbols can often be deduced from compilation failures: if compilation fails because of a missing file/symbol, it is a good indicator that it should probably be implemented (or sometimes that some configuration options should be modified). In the case of porting Linux, this approach is particularly relevant when implementing the numerous headers that define the API between the architecture-specific code and the rest of the kernel.

After the kernel finally compiles and is able to be executed on the target hardware, it is useful to know that the boot code is very sequential. That allows many functions to stay empty at first and to only be implemented gradually until the system finally becomes stable and reaches the init process. This approach is generally possible for almost all of the C functions executed after the early assembly boot code. However it is advised to have the early_printk() infrastructure up and working otherwise it can be difficult to debug.

Finally getting started: the minimal set of non-code files

Porting the compilation tools to the new processor architecture is a prerequisite to porting the Linux kernel, but here we'll assume it has already been performed. All that is left to do in terms of compilation tools is to build a cross-compiler. Since at this point it is likely that porting a standard C library has not been completed (or even started), only a stage-1 cross-compiler can be created.

Such a cross-compiler is only able to compile source code for bare metal execution, which is a perfect fit for the kernel since it does not depend on any external library. In contrast, a stage-2 cross-compiler has built-in support for a standard C library.

The first step of porting Linux to a new processor is the creation of a new directory inside arch/, which is located at the root of the kernel tree (e.g. linux/arch/tsar/ in my case). Inside this new directory, the layout is quite standardized:

  • configs/: default configurations for supported systems (i.e. *_defconfig files)
  • include/asm/ for the headers dedicated to internal use only, i.e. Linux source code
  • include/uapi/asm for the headers that are meant to be exported to user space (e.g. the libc)
  • kernel/: general kernel management
  • lib/: optimized utility routines (e.g. memcpy(), memset(), etc.)
  • mm/: memory management

The great thing is that once the new arch directory exists, Linux automatically knows about it. It only complains about not finding a Makefile, not about this new architecture:

    ~/linux $ make ARCH=tsar
    Makefile: ~/linux/arch/tsar/Makefile: No such file or directory

As shown in the following example, a minimal arch Makefile only has a few variables to specify:

    KBUILD_DEFCONFIG := tsar_defconfig

    KBUILD_CFLAGS += -pipe -D__linux__ -G 0 -msoft-float
    KBUILD_AFLAGS += $(KBUILD_CFLAGS)

    head-y := arch/tsar/kernel/head.o

    core-y += arch/tsar/kernel/
    core-y += arch/tsar/mm/

    LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
    libs-y += $(LIBGCC)
    libs-y += arch/tsar/lib/

    drivers-y += arch/tsar/drivers/
  • KBUILD_DEFCONFIG must hold the name of a valid default configuration, which is one of the defconfig files in the configs directory (e.g. configs/tsar_defconfig).
  • KBUILD_CFLAGS and KBUILD_AFLAGS define compilation flags, respectively for the compiler and the assembler.
  • {head,core,libs,...}-y list the objects (or subdirectory containing the objects) to be compiled in the kernel image (see Documentation/kbuild/makefiles.txt for detailed information)

Another file that has its place at the root of the arch directory is Kconfig. This file mainly serves two purposes: it defines new arch-specific configuration options that describe the features of the architecture, and it selects arch-independent configuration options (i.e. options that are already defined elsewhere in Linux source code) that apply to the architecture.

As this will be the main configuration file for the newly created arch, its content also determines the layout of the menuconfig command (e.g. make ARCH=tsar menuconfig). It is difficult to give a snippet of the file as it depends very much on the targeted architecture, but looking at the same file for other (simple) architectures should definitely help.

The defconfig file (e.g. configs/tsar_defconfig) is necessary to complete the files related to the Linux kernel build system (kbuild). Its role is to define the default configuration for the architecture, which basically means specifying a set of configuration options that will be used as a seed to generate a full configuration for the Linux kernel compilation. Once again, starting from defconfig files of other architectures should help, but it is still advised to refine them, as they tend to activate many more features than a minimalistic system would ever need—support for USB, IOMMU, or even filesystems is, for example, too early at this stage of porting.

Finally the last "not really code but still really important" file to create is a script (usually located at kernel/vmlinux.lds.S) that will instruct the linker how to place the various sections of code and data in the final kernel image. For example, it is usually necessary for the early assembly boot code to be set at the very beginning of the binary, and it is this script that allows us do so.

Conclusion

At this point, the build system is ready to be used: it is now possible to generate an initial kernel configuration, customize it, and even start compiling from it. However, the compilation stops very quickly since the port still does not contain any code.

In the next article, we will dive into some code for the second portion of the port: the headers, the early assembly boot code, and all the most important arch functions that are executed until the first kernel thread is created.

Comments (none posted)

Development statistics for the 4.2 kernel

By Jonathan Corbet
August 18, 2015
As of this writing, the 4.2-rc7 prepatch is out and the final 4.2 kernel looks to be (probably) on-track to be released on August 23. Tradition says that it's time for a look at the development statistics for this cycle. 4.2, in a couple of ways, looks a bit different from recent cycles, with some older patterns reasserting themselves.

At the end of the merge window, there was some speculation as to whether 4.2 would be the busiest development cycle yet. The current record holder is 3.15, which had 13,722 non-merge changesets at the time of its final release. 4.2, which had 13,555 at the -rc7 release, looks to fall a little short of that figure. So we will not have broken the record for the most changesets in any development cycle, but it was awfully close.

One record that did fall, though, is the number of developers contributing code to the kernel. The previous record holder (4.1, at 1,539) didn't keep that position for long; 1,569 developers have contributed to 4.2. Of those developers, 279 have made their first contribution to the Linux kernel. An eye-opening 1.09 million lines of code were added this time around with 285,000 removed, for a total growth of 800,000 lines of code.

The most active developers this time around were:

Most active 4.2 developers
By changesets
Ingo Molnar3042.2%
Mauro Carvalho Chehab2031.5%
Herbert Xu1711.3%
Krzysztof Kozlowski1611.2%
Geert Uytterhoeven1491.1%
Al Viro1401.0%
Lars-Peter Clausen1371.0%
H Hartley Sweeten1361.0%
Thomas Gleixner1270.9%
Hans Verkuil1240.9%
Tejun Heo1100.8%
Alex Deucher950.7%
Paul Gortmaker910.7%
Vineet Gupta880.7%
Jiang Liu840.6%
Christoph Hellwig790.6%
Hans de Goede780.6%
Arnaldo Carvalho de Melo770.6%
Mateusz Kulikowski740.5%
Takashi Iwai730.5%
By changed lines
Alex Deucher42550135.7%
Johnny Kim337262.8%
Raghu Vatsavayi144841.2%
Greg Kroah-Hartman125001.0%
Stephen Boyd110620.9%
Dan Williams107360.9%
Hans Verkuil106410.9%
Narsimhulu Musini102630.9%
Ingo Molnar92540.8%
Jakub Kicinski85310.7%
Herbert Xu85150.7%
Yoshinori Sato76120.6%
Saeed Mahameed74930.6%
Sunil Goutham74710.6%
Christoph Hellwig73840.6%
Vineet Gupta71710.6%
Mateusz Kulikowski68520.6%
Maxime Ripard67670.6%
Sudeep Dutt66470.6%
Mauro Carvalho Chehab64220.5%

Some years ago, Ingo Molnar routinely topped the per-changesets list, but he has been busy with other pursuits recently. That changed this time around, though, with a massive rewrite of the low-level x86 floating-point-unit management code. Mauro Carvalho Chehab continues to be an active maintainer of the media subsystem, and Herbert Xu's work almost entirely reflects his role as the maintainer of the kernel's crypto subsystem. Krzysztof Kozlowski contributed cleanups throughout the driver subsystem, and Geert Uytterhoeven, despite being the m68k architecture maintainer, did most of his work within the ARM tree and related driver subsystems.

On the "lines added" side, Alex Deucher accounted for nearly half of the entire growth of the kernel this time around with the addition of the new amdgpu graphics driver. Johnny Kim added the wilc1000 network driver to the staging tree, Raghu Vatsavayi added support for Cavium Liquidio Ethernet adapters, Greg Kroah-Hartman removed the obsolete i2o subsystem, and Stephen Boyd removed a bunch of old driver code while adding driver support for QCOM SPMI regulators and more.

The top contributor statistics in recent years have often been dominated by developers generating lots of cleanup patches or reworking staging drivers. One might expect to see a lot of that activity in an especially busy development cycle, but that is not the case for 4.2. Instead, the top contributors include many familiar names and core contributors. One might be tempted to think that the cleanup work is finally approaching completion, but one would be highly likely to be disappointed in future development cycles.

The most active companies supporting development in the 4.2 cycle (of 236 total) were:

Most active 4.2 employers
By changesets
Intel166512.3%
Red Hat163912.1%
(Unknown)8846.5%
(None)8846.5%
Samsung6815.0%
SUSE4963.7%
Linaro4493.3%
(Consultant)4123.0%
IBM3912.9%
AMD2862.1%
Google2461.8%
Renesas Electronics2031.5%
Free Electrons2031.5%
Texas Instruments1911.4%
Facebook1761.3%
Oracle1631.2%
Freescale1561.2%
ARM1451.1%
Cisco1421.0%
Broadcom1381.0%
By lines changed
AMD43809436.8%
Intel963318.1%
Red Hat629595.3%
(None)461403.9%
(Unknown)418863.5%
Atmel349422.9%
Samsung293262.5%
Linaro227141.9%
Cisco211701.8%
SUSE188911.6%
Code Aurora Forum184351.5%
Mellanox180441.5%
(Consultant)152341.3%
IBM150951.3%
Cavium Networks145801.2%
Free Electrons136401.1%
Unisys134281.1%
Linux Foundation126171.1%
MediaTek118561.0%
Google118111.0%

Once again, there are few surprises here. At 6.5%, the percentage of changes coming from volunteers is at its lowest point ever. AMD, unsurprisingly, dominated the lines-changed column with the addition of the amdgpu driver. Beyond that, it is mostly the usual companies supporting kernel development in the usual way.

The kernel community depends heavily on its testers and bug reporters; at least some of the time, their contribution is recorded as Tested-by and Reported-by tags in the patches themselves. In the 4.2 development cycle, 946 Tested-by credits were placed in 729 patches, and 611 Reported-by credits were placed in 682 patches. The most active contributors in this area were:

Most active 4.2 testers and reporters
Tested-by credits
Joerg Roedel404.2%
Keita Kobayashi353.7%
Krishneil Singh313.3%
Arnaldo Carvalho de Melo303.2%
Ira Weiny242.5%
Doug Ledford232.4%
Alex Ng222.3%
Aaron Brown212.2%
Javier Martinez Canillas192.0%
ZhenHua Li192.0%
Reported-by credits
Wu Fengguang7611.1%
Dan Carpenter416.0%
Russell King233.4%
Ingo Molnar121.8%
Stephen Rothwell101.5%
Linus Torvalds81.2%
Hartmut Knaack71.0%
Huang Ying60.9%
Christoph Hellwig50.7%
Sudeep Holla50.7%

The power of Wu Fengguang's zero-day build robot can be seen here; it resulted in 11% of all of the credited bug reports in this development cycle. The work of all of the kernel's testers and bug reporters leads to a more stable kernel release for everybody. The biggest concern with these numbers, perhaps, is that we might still not be doing a thorough job of documenting the contribution of all of our testers and reporters.

All told, the kernel development community continues to run like a well-tuned machine, producing stable kernel releases on a predictable (and fast) schedule. Back in 2010, your editor worried that the community might be headed toward another scalability crisis, but such worries have proved to be unfounded, for now at least. There must certainly be limits to the volume of change that can be managed by the current development model, but we do not appear to have reached them yet.

Comments (6 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 4.2-rc7 ?
Greg KH Linux 4.1.6 ?
Sebastian Andrzej Siewior 4.1.5-rt5 ?
Luis Henriques Linux 3.16.7-ckt16 ?
Greg KH Linux 3.14.51 ?
Greg KH Linux 3.10.87 ?
Ben Hutchings Linux 3.2.71 ?

Architecture-specific

Core kernel code

Development tools

John Kacur rt-tests-0.93 ?

Device drivers

Device driver infrastructure

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Eric W. Biederman Bind mount escape fixes ?
Andreas Gruenbacher Inode security label invalidation ?

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet

Distributions

Copyright assignment and license enforcement for Debian

By Nathan Willis
August 26, 2015

DebConf

At DebConf 2015 in Heidelberg, Germany, Bradley Kuhn from the Software Freedom Conservancy (SFC) announced a new collaboration with the Debian project through which Debian contributors can engage the SFC to act on their behalf to conduct license-compliance efforts. The Debian Copyright Aggregation Project (DCAP) is a voluntary program, but it gives interested Debian developers a means to help ensure that others do not violate the licenses under which their work is published.

Kuhn's session was held on August 15. A video recording [WebM] has since been published, and an announcement has been posted on both the Debian and SFC web sites.

In his talk, Kuhn noted that Debian is one of the few free-software projects to have a fully democratic governance model and that it has remained a "staunchly non-commercial" project since the beginning. Those factors underscore Debian's concern for doing "the morally correct" things important to the hobbyist contributor. That is, Debian is composed of volunteers who make their contribution to Debian their first priority, with any affiliation to an employer coming after that. Consequently, Debian still acts on behalf of individual developers' wishes where a commercial entity might not.

Although Debian is fundamentally about people, he said, the project's most visible assets are those people's copyrights on the software in the Debian archive. Partnering with SFC on the Debian Copyright Aggregation Project is a way for individual developers to leverage those assets "to maximize fair treatment of others"—namely, by ensuring that the copyrights of individual Debian contributors are not violated by third parties failing to adhere to the terms of the relevant software license. The DCAP arrangements were agreed to in April by SFC and then Debian Project Leader (DPL) Lucas Nussbaum, with consultation from Software in the Public Interest (SPI).

DCAP has three dimensions. First, SFC can now accept copyright-assignment agreements from any Debian contributors who choose to participate. Participants can assign any subset of their copyrights that they choose to SFC. Second, any contributors who are not interested in assigning copyrights (which is an essentially permanent arrangement) also have the option of signing an enforcement agreement, under which the contributor can ask SFC to represent it as an "authorized copyright agent" in license-enforcement actions. That agreement lets the developer retain all of their copyrights, and merely allows SFC to conduct enforcement work on their behalf.

The enforcement agreement specifically does not empower SFC to pursue litigation, he added in response to an audience member's question. A contributor interested in making that arrangement could raise it with SFC, but it would require coming to a separate agreement.

Third, SFC will provide license consulting, advice, and compliance services to Debian on an ongoing basis, in coordination with the DPL. The consultation service means that SFC will provide a certain number of pro-bono hours each month to answer questions forwarded by the DPL and to provide policy-related advice.

In the long run, Kuhn said, copyright assignment is a practical tool. It is a sad fact that developers (like everyone else) inevitably pass away or drop out of the project, at which point defending their copyrights becomes arduous at best, if not impossible. Others simply forget to defend their copyrights because they get busy. For those that care deeply about protecting their contributions to free software, the aggregation project will hopefully make the process easier.

Copyright assignment can be a thorny issue, Kuhn admitted. Critics will point to copyright assignment as a tool that companies sometimes use to take decision-making power out of the hands of the developers they employ. But that strategy—which tends to involve producing a proprietary version of a software product as well as an open-source version—only works when the company gets 100% of the copyrights involved. Debian will always be a multi-copyright project, so there is no chance that anyone (the SFC included) could turn copyright assignments against it. Furthermore, he said, assigning copyright to SFC is different than assigning it to a company, because SFC is a US charity and, under US law, a charity cannot be sold.

DCAP is designed to be flexible. Participation is entirely optional, and the enforcement agreements can be canceled at any time (with 30 days notice). Kuhn noted that another former DPL, Stefano Zacchiroli, was the first to sign up for DCAP. Zacchiroli assigned all his copyrights in Debian to the SFC, "past, present, and future." Paul Tagliamonte, in contrast, assigned SFC a subset of the copyrights on his Debian contributions via DCAP.

Developers must currently contact SFC by email (at debian-services@sfconservancy.org) to sign up for the project, but the joint announcement indicates that a self-service enrollment system is under development. Kuhn ended the talk by noting that the copyright assignment and enforcement agreement options were there to provide Debian contributors with a range of options. He predicted that the enforcement agreement would be the more popular choice, particularly since Harald Welte's gpl-violations.org project had shut down, but that he was happy to be able to give something back to the Debian community after years of being a happy user.

[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]

Comments (2 posted)

Debian and binary firmware blobs

By Nathan Willis
August 26, 2015

DebConf

Debian's annual DebConf event is part conference, part hackathon; various teams and ad-hoc groups meet up over the course of the week to discuss future plans, get work done, and make decisions that are best reached with face-to-face conversation. At the 2015 DebConf, one of those face-to-face conversations dealt with the thorny problem of how Debian should handle binary firmware blobs. Because Debian is a dedicated free-software project, including proprietary firmware in the installation images offered to users is out of the question to most contributors. But that stance makes Debian impossible to install for at least some small percentage of would-be users—which is far from ideal. Nevertheless, the project may have hashed out a way to move forward.

The issue was explored in depth at an August 17 round-table discussion entitled "Firmware - a hard or soft problem?" The session was packed to overflowing; more than 40 people (plus one dog) attempted to cram into the meeting room. Moderating the discussion was Debian developer Steve McIntyre, who leads the debian-cd group responsible for creating and publishing the official Debian ISO images.

The root problem with binary-only firmware, he said, is that it is now quite common for computers—particularly laptops—to include components that cannot function at all without a loadable firmware blob. This includes "almost every WiFi chipset" on the market, which in turn makes it impossible for some users to even install Debian using the default ISO images—because those images quite deliberately do not include any non-free software. Various binary firmware blobs are available through Debian; they currently live in the "non-free" archive area alongside the Adobe Flash plugin and various other proprietary programs.

Most Debian project members recognize the inconvenience (and even counter-productivity) of this situation, and Debian has historically relied on an inelegant workaround. Namely, unofficial ISO images are built that include the binary firmware blobs necessary to bootstrap a Debian installation. To avoid being seen as an endorsement of non-free software, though, those unofficial images are not advertised and project members only direct new users to them reluctantly. "It's a pain," McIntyre said in summary.

Several possible solutions have been proposed in the past. One, for instance, was providing a downloadable tar archive of the all of the binary firmware. If the installer determines that a given system requires a firmware blob, it could point the user to the firmware archive URL. This approach was rejected as unworkable because fetching and loading the tar archive during installation may not be practical. Many of the target computers may not have a second USB port (the installation media occupying one, of course) and users may not be able to run off and find a second memory stick.

Putting the tarball on a second partition on the installer USB stick was discussed, but evidently Windows makes it difficult or impossible to use more than one partition on a USB stick. That would inconvenience Windows users trying to switch to Debian. Given those hurdles, the tarball approach was deemed to not make life easier for end users than the current, unofficial-ISO approach.

It had also been suggested that Debian could simply enable the non-free section by default, and count on educating users to keep the distinction between free and proprietary software clear. This, too, was rejected—even if those participating in the discussion favored it (which they did not), the change would require a General Resolution, and many similar votes have happened in the past, all reconfirming Debian's commitment to not enabling non-free by default.

The proposal that did garner support, though, was splitting the non-free section into two or more parts based on the type of content it contained. Non-free firmware would be one of those; possibly other sections (like one for non-free documentation) would be created as well. In any case, such a split would underscore that there is an important difference between non-free firmware needed to get Debian installed and other proprietary applications or libraries.

Most in the room seemed to agree that this split makes sense. After all, one member of the group said, at least hardware that needs loadable firmware offers the possibility that free firmware will be developed and can be used at a later date; firmware that is burned in does not offer that hope. Whether firmware will be the only section to be carved out into a separate archive component is a matter that is still up for debate. Several other divisions were suggested, but deemed out-of-scope for the session.

However the split is implemented (a task which will ultimately be up to the Debian FTP masters), the next question will be how the availability of the firmware archive should be communicated to the user. Enabling it by default remains out of the question, but the attendees agreed that it was best to communicate the situation to the user during the installation process and give them the opportunity to enable the firmware archive and continue. As of now, a user attempting to install Debian on a machine that requires binary firmware will see that install fail, and only find clues to how the situation can be resolved by reading through some not-well-advertised text files.

Most thought that a "friendly" explanation of the issue was needed—one that said, in essence, "Because the hardware manufacturer of this component does not provide software we can distribute, Debian cannot run on this machine unless you install this non-free add-on" and provided links to more details about the free-software issues involved. All agreed that the wording of this message was critical; several commented that they liked the message that Canonical started using several years ago in its alerts when non-free drivers were needed.

A question was raised in the session about including an "email the manufacturer to ask for free-software support" tool in the installer, similar to the "write your Congressperson" advocacy seen in politics. While most agreed that encouraging some form of free-software advocacy was a worthy goal, the consensus was that identifying the correct company to email might not be possible. In many cases, the real culprit is a chipset maker, not the device maker, and problematic devices routinely switch to new chipsets without changing their USB device IDs. Since those device IDs are all Debian has access to, it may not be possible to unambiguously decide who should get the user's advocacy email.

This is a topic with many angles and plenty of nuance; in the interest of simplifying matters, the participants even agreed to avoid the question of how non-free firmware would impact efforts to get Debian endorsed by the Free Software Foundation. The session had to be drawn to a close before any firm plans were put together. But, as of now, Debian does seem prepared to provide separate access to the non-free firmware many users need to start using Debian in the first place. Doing so without compromising the project's longstanding commitment to free software requires a delicate balancing act, but project members appear to willing to undertake the task.

[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]

Comments (3 posted)

Brief items

Distribution quotes of the week

NOBODY expects the Debian acquisition!
-- Romain Francoise

Debian's reached the age of 22
I wish I could be there with you
In Heidelberg, fair German city
To share, in person, this my ditty

...

Free software, arguments, warmth, good cheer
Too soon all over 'til next year
All of the best are there / on 'Net
Here's hope that it's the best Debconf yet

-- Andrew Cater

Once I got over the thrill of being the “superuser,” the unspeakable power I had previously seen only behind plate glass, I became enraptured not so much by Linux itself as by the process in which it had been created—hundreds of people hacking away at their own little corner of the system and using the Internet to swap code, slowly but surely making the system better with each change—and set out to make my own contribution to the growing community, a new distribution called Debian that would be easier to use and more robust because it would be built and maintained collaboratively by its users, much like Linux.
-- Ian Murdock

We do work that is important and often unpaid. We tend to have deep technical skills but exercise them in huge communities where interpersonal issues become magnified. We are activists and artists and architects all at once. We're changing the world in ways that are often unnoticed not only by the public, but by ourselves. This is true of the entire FOSS world, but it seems especially true of Gentoo.
-- Rich Freeman

Comments (4 posted)

Distribution News

Debian GNU/Linux

Debian and Software Freedom Conservancy announce Copyright Aggregation Project

Software Freedom Conservancy's Bradley M. Kuhn has announced the Conservancy's Debian Copyright Aggregation Project. "This new project, formed at the request of Debian developers, gives Debian contributors various new options to ensure the defense of software freedom. Specifically, Debian contributors may chose to either assign their copyrights to Conservancy for permanent stewardship, or sign Conservancy's license enforcement agreement, which delegates to Conservancy authority to enforce Free Software licenses (such as the GNU General Public License). Several Debian contributors have already signed both forms of agreement."

Full Story (comments: none)

Debian Installer Stretch Alpha 2 release

The second alpha of the Debian installer for 'Stretch' (version 9) has been released. The biggest change in this version is the update of the Linux kernel from the 4.0 series to the 4.1 series.

Full Story (comments: none)

Squeeze non-LTS architectures moving to archive.debian.org

The Long Term Support effort for Debian 6.0 'squeeze' only covers i386/amd64 architectures. Non-LTS architectures will move to archive.debian.org. "This does not (yet) affect other Squeeze suites, like backports, but they will follow soonish."

Full Story (comments: none)

Bits from the Wanna Build team

Mehdi Dogguy reports on the Wanna Build team meeting at DebCamp. "We have worked on getting arch:all packages buildable on our autobuilders. We've got a few patches added to make that happen. Architecture independent packages (arch:all) are now auto-built on dedicated amd64 builders. We tested our changes as much as we were able to and enabled arch:all uploads for Sid and Experimental. If your auto-built arch:all package doesn't make it through to ftp-master's archive, please do contact us so that we can have a look and get it fixed quickly."

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

The State of Fedora: 2015 Edition (Fedora Magazine)

Fedora Magazine reports on Fedora project leader Matthew Miller's keynote at Flock, which is the Fedora contributor conference. He outlined the state of the distribution using some graphs and statistics and said "we’re doing very well as a project and it’s thanks to all of you". The use of Internet Relay Chat (IRC) by the project was another topic: "Fedorans do like to work together. Last year there were 1,066 IRC meetings (official meetings, not just being in IRC talking), and 765 IRC meetings in 2015 alone. 'This shows how vibrant we are, but also is buried in IRC. There’s a lot of Fedora activity you don’t see on the Fedora Web site… I want to look at ways to make that more visible,' says Miller. There are efforts to make the activity more visible, says Miller. 'If I want to interact with the project, is somebody there? Yes, but we have millions of dead pages on the wiki… we need to make this more visible.' IRC is 'definitely a measure of engagement' but it’s also a high barrier of entry, says Miller. 'Wow that’s complicated. Wow, that’s still around?' is a common response from new contributors to IRC. The technology, and 'culture' can be confusing."

Comments (21 posted)

Sabayon Linux development in 2015

Sabayon developer Joost Ruis takes a look at recent developments in the Sabayon Linux project, including new Docker images. "These are forked directly from a Gentoo stage3 docker image. The result is a very clean chroot that is even closer to Gentoo. Our docker pulls in the stage3, adds Sabayon overlay, installs Entropy to a point where it can run. Then it checks the Portage database to list what packages are installed and replaces them with the packages from Entropy. (Ain’t that cool?). Now we can keep our minimal chroot current and easy make changes whenever we want. The docker base image is then being “squashed” so we can feed it as an image to our Molecule™ script that will build our iso images for us. With this move we also made the creation of spins more accessible to developers! Go fork us!"

Comments (none posted)

Page editor: Rebecca Sobol

Development

New features and new widgets in GTK+

By Nathan Willis
August 26, 2015

GUADEC
At the 2015 edition of GUADEC in Gothenburg, Sweden, a series of talks addressed the most recent work on the GTK+ widget toolkit. Matthias Clasen covered the most ground, describing updates to eight existing GTK+ widgets, while Timm Bäder and Matthew Waters presented new GTK+ widgets for image handing and GStreamer media pipelines, respectively.

Improved widgets and controls

Clasen styled his talk ("GTK+ can do this?") as a walkthrough of little-known options and tricks available in the toolkit. Many of the examples involve recent additions to the toolkit, though some of them predate the current GTK+ development cycle. There were enough tips and secrets, he said, that he hoped everyone in the audience would be able to say that they had learned of something new.

First, he addressed scrollbar widgets. Recent changes to GTK+ scrollbars include support for kinetic scrolling and GTK+'s frame-synchronization protocol (which ensures that scrolling appears smooth). But there are little-known features available as well. Some users and developers have grumbled that GTK3 scrollbars lacked the up/down "stepper" buttons of earlier versions; Clasen explained that these steppers can now be added with a simple CSS rule:

    .scrollbar {
      -GtkScrollbar-has-forward-stepper: true;
      -GtkScrollbar-has-secondary-backward-stepper: true
      }

Similarly, GTK3 scrollbars now support moving through a window by page-length increments with shift-clicks, and enabling smooth scrolling via right click.

[Matthias Clasen]

Clasen's second topic was output-only windows. Previously, input events that happened to land in a GTK+ overlay (say, a tooltip or transient popover) would get passed to the application's top-level parent window, which is not always what the developer desired. In many instances, the correct behavior is to pass the event to a window (say, a toolbar) somewhere in the middle of the hierarchy, and GTK3 now supports this with a GtkOverlay::pass-through property. He also noted that the pass-through functionality can be used to draw decorative overlays.

Clasen then showed how developers can add their own content to the popovers that appear when the user triggers a touch-selection event. The signal is called GtkTextView::populate-popup and, in addition to adding custom options for touch-selection popovers, it can be exploited to customize the right-click context menu of any GTK+ widget. That even includes scrollbar widgets, he noted. "I don't know why you would and I'm not sure it's a good idea," he added, "but if you need to do it, you can."

Arguably more practical than context menus on a scrollbar were two enhancements to GTK+ controls that Clasen demonstrated. One is that spinbuttons—which traditionally allow the user to enter a numeric value by clicking + or - buttons, can now be customized. The range of acceptable values has always been configurable, but now the labels can be, too. He showed an example where the underlying values on a "month" spinbutton were restricted to the interval 1–12, but where each value was presented as the corresponding month name. This was followed by examples that displayed the underlying numeric values as clock times and as hexadecimal digits.

The other enhanced control is the slider, where the user moves a handle across a scale to set the value. But a slider may not correspond to a variable that can accept continuous input. In previous versions of GTK+, it was possible to add tick marks to the scale so that the user could see discrete values along the slider, but it was still possible for the user to leave the slider in between the markers. This has now been fixed; developers can mark a slider as accepting a set of discrete values, and the handle will "stick" to the nearest acceptable value as the user moves it along the scale. To make the feature work, developers will also need to set the round-digits property on the scale, so that only discrete steps are returned:

    <object class=”GtkScale”>
      <property name=”round-digits”>0</property>
    </object>

There are also two new text-related features in GTK+, Clasen said. First, text-view widgets now support Pango markup, which allows the text to be styled or colored and allows various font features (like spacing or character variants) to be activated. Second, any Pango text can be turned into a Cairo path using pango_cairo_layout_path(), which allows it to then be manipulated with a wide variety of Cairo tools and transformations. This should be done with great care, he said, particularly since the Pango-to-Cairo conversion is not very efficient.

For each feature he discussed, Clasen showed example code as well as a live demo using the gtk3-demo application. For those who missed the talk (and until a video of the session is published) his slides [PDF] are available online, as is a blog post showing screenshots of many of the demonstrated features.

New widgets

There were, as one might expect, several other talks that addressed new or ongoing work within GTK+. Emmanuelle Bassi gave an update on his work creating the spiritual successor to the Clutter toolkit, GTK+ Scene Graph Kit (GSK), although he said the code was not yet ready to be released for mass consumption. Several of the LibreOffice talks referenced Pranav Kant's Google Summer of Code (GSoC) project with GNOME Documents, in which he made progress toward a new GTK+ widget for accessing a LibreOffice document from any GTK+ program.

Another GsoC intern, Timm Bäder, presented a lightning talk about his project: a GTK+ widget named GtkImageView. The widget's purpose, he said, is to provide developers with a convenient way to show images to users—in particular, large images that are too big for GtkImage. That existing widget is optimized for icons, button images, and similar small content. It starts to break down when loading large images, however.

The new widget can load any image type supported by gdk-pixbuf, and it loads content asynchronously. It also implements scaling and rotation functions, and it supports GTK+'s internal scale-factor setting, so it works on high-DPI displays. Bäder is still at work on the widget, he said, and may add more features in the future, such as touch-gesture support.

[Matthew Waters]

The last new GTK+ widget discussed at the event was Matthew Waters's gtkgst, a widget for displaying the output of a GStreamer pipeline in a GTK+ application. Obviously both GTK+ and GStreamer are mature projects at this point, so Waters started off his talk by explaining the difficulties of working with both of them in a single application.

The main difficulty is that GStreamer pipelines are inherently complex beasts: they have to handle a wide range of video codecs, color spaces, scaling factors, and effects when showing a video—and even more variables when generating or editing video. Historically, embedding a video in a GTK+ window has added even more complications, requiring the application to push key and mouse events into GStreamer, to notify the GStreamer video sink of resize events, and to perform careful setup that is highly dependent on the details of the windowing system.

Waters's new widget is an attempt to wrap such details into a more convenient package. The code lives in GStreamer's "plugins-bad" package for now (although, when stable, it will likely move to "plugins-good"). An application developer only needs to set up the GStreamer video pipeline they require and connect it to the gtkgst sink; that will provide a GTK+ widget that can be placed anywhere in the widget hierarchy. That allows for clean separation between the GStreamer and GTK+ sides of the code, which should simplify development and troubleshooting. In response to an audience question, he said that gtkgst renders video far more smoothly than Clutter.

The current implementation renders the video into a GtkDrawingArea, but Waters is in the process of implementing it using OpenGL. That would enable hardware acceleration and multithreading, he said, although there are a number of challenges to overcome before it is ready for general usage. Both GTK+ and GStreamer have supported OpenGL for some time, but hooking them up to one another is not quite trivial. His code works on X11 and Wayland so far, and he hopes to add Mac OS X and Windows support in the future.

GTK+ is approaching 20 years of age, and while there are certainly longer continuously running projects in free software, it can be easy for a project to stop evolving to ever-changing circumstances and developer expectations. Nevertheless, the toolkit seems to be resilient and is still adapting to support new uses with each passing release.

[The author would like to thank the GNOME Foundation for travel assistance to attend GUADEC 2015.]

Comments (2 posted)

Glibc wrappers for (nearly all) Linux system calls

By Jonathan Corbet
August 20, 2015
The GNU C Library (glibc) is a famously conservative project. In the past, that conservatism created a situation where there is no way to directly call a number of Linux system calls from a glibc-using program. As glibc has relaxed a bit in recent years, its developers have started to reconsider adding wrapper functions for previously inaccessible system calls. But, as the discussion shows, adding these wrappers is still not as straightforward as one might think.

A C programmer working with glibc now would look in vain for for a straightforward way to invoke a number of Linux system calls, including futex(), gettid(), getrandom(), renameat2(), execveat(), bpf(), kcmp(), seccomp(), and a number of others. The only way to get at these system calls is via the syscall() function. Over the years, there have been requests to add wrappers for a number of these system calls; in some cases, such as gettid() and futex(), the requests were summarily rejected by the (at-the-time) glibc maintainer in fairly typical style. More recently these requests have been reopened and others have been entertained, but there have been no system-call wrappers added since glibc 2.15, corresponding roughly to the 3.2 kernel.

On the face of it, adding a new system-call wrapper should be a simple exercise. The kernel has already defined an API for the system call, so it is just a matter of writing a simple function that passes the caller's arguments through to the kernel implementation. Things quickly get more complicated than that, though, for a number of reasons, but they all come down to one root cause: glibc is not just a wrapper interface for kernel-supplied functionality. Instead, it provides a (somewhat standard-defined) API that is meant to be consistent and independent of any specific operating system.

There are provisions for adding kernel-specific functions to glibc now; those functions will typically fail (with errno set to ENOSYS) when called on a kernel that does not support them. Examples of such functions include the Linux-specific epoll_wait() and related system calls. As a general rule, though, the glibc developers, as part of their role maintaining the low-level API for the GNU system, would like to avoid kernel-specific additions.

This concern has had the effect of keeping a lot of Linux system-call wrappers out of the GNU C Library. It is not necessarily that the glibc developers do not want that functionality, but figuring out how a new function would fit into the overall GNU API is not a straightforward task. The ideal interface may not (from the glibc point of view) be the one exposed by the Linux kernel, so another may need to be designed. Issues like error handling, thread safety, support on non-Linux systems, and POSIX-thread cancellation points can complicate things considerably. In many cases, it seems that few developers have wanted to run the gauntlet of getting new system-call wrappers into the library, even if the overall attitude toward such wrappers has become markedly more friendly in recent years.

Back in May 2015, Joseph Myers proposed relaxing the rules just a little bit, at least in cases when the functionality provided by a wrapper might be of general interest. In such cases, Joseph suggested, there would be no immediate need to provide support for other operating-system kernels unless somebody found the desire and the time to do the work.

Roland McGrath is, by his own admission, the hardest glibc developer to convince about the value of adding Linux-specific system calls to the library. He still does not see a clear case for adding many Linux system-call wrappers to the core library; it is only clear, he said, when the system call is to be a part of the GNU API:

My top concern is adding cruft to the core libc ABIs. That means specifically symbols in the shared objects for libc, libpthread, librt, libdl, libm, and libutil.

I propose that we rule out adding any symbols to the core libc ABIs that are not entering the OS-independent GNU API.

Roland does not seem to believe that glibc should entirely refuse to support system calls that don't meet the above criterion, though. Instead, he suggested creating another library specifically for them. It would be called something like "libinux-syscalls" (so that one would link with "-linux-syscalls"). Functions relegated to this library should be simple wrappers, without internal state, with the idea that supporting multiple versions of the library would be possible.

There was some discussion on the details of this idea, but the core of it seems to be relatively uncontroversial. Also uncontroversial is the idea that glibc need not provide wrappers for system calls that are obsolete, that cannot be used without interfering with glibc (set_thread_area() is an example), or those that are expected to have a single caller (such as create_module()). So Carlos O'Donell has proposed a set of rules that would clear the way for the immediate addition of operating-system-independent system calls into the core and the addition of a system-dependent library for the rest.

Of course, "immediate" is a relative term. Any system-call wrappers will still need to be properly implemented and documented, with test cases and more. There is also, in some cases, the more fundamental question of what the API should look like. Consider the case of the futex() system call, which provides access to a fast mutual-exclusion mechanism. As defined by the kernel, futex() is a multiplexer interface, with a single entry point providing access to a range of different operations.

Torvald Riegel made the case that exposing this multiplexer interface would do a disservice to glibc users:

Keeping the multiplexing is bad for users. Can you tell me off-hand what goes in "uaddr2", "val", or "val3" for all the ops? Is it easy to remember based on the function signature? Can you remember in which cases "timeout" is actually "val2" and not a pointer but cast to uint32_t? So are we going to expect users to cast uint32_t's to a pointer to call one of the operations and consider that a useful API design? It's a nice way to potentially trigger compiler warnings though.

He proposed exposing a different API based around several functions with names like futex_wake() and futex_wait(); he also posted a patch implementing this interface. Joseph, while not disagreeing with that interface, insisted that the C library should provide direct access to the raw system call, saying: "The fact that, with hindsight, we might not have designed an API the way it was in fact designed does not mean we should embed that viewpoint in the choice of APIs provided to users". In the end, the two seemed to agree that both types of interface should, in some cases, be provided. If the C library can provide a useful higher-level interface, that may be appropriate to add, but more direct access to the system call as provided by the kernel should be there too.

The end result of all this is that we are likely to see a break in the logjam that has kept new system-call wrappers out of glibc. Some new wrappers could even conceivably show up in the 2.23 release, which can be expected sometime around February 2016. Even if the attitude and rules have changed, though, this is still glibc we are talking about, so catching up with the kernel may take a while yet. But one can take comfort in the fact that a path is now visible, even if it may yet be a slow one.

Comments (36 posted)

Brief items

Quotes of the week

"Software as a service" is a competitor to "software."
— Asheesh Laroia at DebConf

You're trying, awesome! But when your customer service video for Linux support starts with "Apply kernel patches", you have already failed.
Sarah Sharp

Comments (5 posted)

Glibc 2.22 released

Version 2.22 of the GNU C Library is out. The biggest user-visible changes are an update to Unicode 7.0.0 and the addition of a vectorized math library for the x86_64 architecture. Beyond that, of course, there is a pile of bug fixes, a few of which address security-related problems.

Full Story (comments: 22)

Rkt 0.8 released

Version 0.8 of the rkt container specification has been released. The changelog notes that this version adds support for running under the LKVM hypervisor and adds experimental support for user namespaces. Other features include improved integration with systemd and additional functional tests. An accompanying blog post goes into further detail for many of these new features.

Comments (1 posted)

WordPress 4.3 released

Version 4.3 of the WordPress blogging platform has been released. New features include keyboard shortcuts for formatting text while editing posts, a site-icon creator, and support for sending password-reset links to users (rather than emailing users their lost passwords).

Comments (none posted)

Go 1.5 released

Version 1.5 of the Go language has been released. "This release includes significant changes to the implementation. The compiler tool chain was translated from C to Go, removing the last vestiges of C code from the Go code base. The garbage collector was completely redesigned, yielding a dramatic reduction [PDF] in garbage collection pause times. Related improvements to the scheduler allowed us to change the default GOMAXPROCS value (the number of concurrently executing goroutines) from 1 to the number of available CPUs. Changes to the linker enable distributing Go packages as shared libraries to link into Go programs, and building Go packages into archives or shared libraries that may be linked into or loaded by C programs (design doc)."

Comments (162 posted)

KDE Ships Plasma 5.4.0, Feature Release for August

KDE has released Plasma 5.4 with some new features. "This release of Plasma brings many nice touches for our users such as much improved high DPI support, KRunner auto-completion and many new beautiful Breeze icons. It also lays the ground for the future with a tech preview of Wayland session available. We're shipping a few new components such as an Audio Volume Plasma Widget, monitor calibration tool and the User Manager tool comes out beta."

Comments (16 posted)

ArgyllCMS 1.8.0 released with support for SwatchMate Cube colorimeter (Libre Graphics World)

Libre Graphics World has posted a look at the latest release of the ArgyllCMS color-management system. New in the release is support for several new hardware colorimeters, from a low-cost Kickstarter-funded device to a € 2,800 professional tool.

Comments (none posted)

ownCloud Desktop Client 2.0 is available

Version 2.0 of the desktop client for ownCloud has been released. This update adds support for working with multiple ownCloud accounts and a setting to let users synchronize only files underneath a specified file size.

Comments (none posted)

Newsletters and articles

Development newsletters from the past two weeks

Comments (none posted)

Schaller: An Open Letter to Apache Foundation and Apache OpenOffice team

Christian Schaller has posted an open letter to the Apache Software Foundation with a non-trivial request: "So dear Apache developers, for the sake of open source and free software, please recommend people to go and download LibreOffice, the free office suite that is being actively maintained and developed and which has the best chance of giving them a great experience using free software. OpenOffice is an important part of open source history, but that is also what it is at this point in time."

In this context, it's interesting to note that OpenOffice project chair Jan Iverson recently stepped down, listing resistance to an effort to cooperate with LibreOffice as one of the main reasons. The project currently looks set to name Dennis Hamilton (who is running unopposed) as its new chair.

Comments (146 posted)

Mozilla: The Future of Developing Firefox Add-ons

Mozilla has announced a significant set of changes for authors of Firefox add-ons. These include a new API (and the deprecation of XUL and XPCOM), a process-based sandboxing mechanism, mandatory signing of extensions, and more. "For our add-on development community, these changes will bring benefits, like greater cross-browser add-on compatibility, but will also require redevelopment of a number of existing add-ons. We’re making a big investment by expanding the team of engineers, add-on reviewers, and evangelists who work on add-ons and support the community that develops them. They will work with the community to improve and finalize the WebExtensions API, and will help developers of unsupported add-ons make the transition to newer APIs and multi-process support."

Comments (90 posted)

Page editor: Nathan Willis

Announcements

Brief items

The Open Mainframe Project

The Linux Foundation has announced the launch of the Open Mainframe Project. "In just the last few years, demand for mainframe capabilities have drastically increased due to Big Data, mobile processing, cloud computing and virtualization. Linux excels in all these areas, often being recognized as the operating system of the cloud and for advancing the most complex technologies across data, mobile and virtualized environments. Linux on the mainframe today has reached a critical mass such that vendors, users and academia need a neutral forum to work together to advance Linux tools and technologies and increase enterprise innovation."

Comments (15 posted)

GUADEC videos released

GUADEC was held in Gothenburg, Sweden on August 7–9. Videos of the presentations are available.

Comments (none posted)

FSF30: Get in on the party and User Freedom Summit

The Free Software Foundation will have a 30th birthday party in Boston, Massachusetts on October 3. There will be a User Freedom Summit in the daytime, before the party. "We know that not every free software fan can join us in person in Boston -- so we're hosting a party network where you can promote your own party (we'll even offer some ideas for making your event lots of fun!) We'll have a livestream of the Boston party, and welcome photos and reports from your own parties, too!"

Full Story (comments: none)

Articles of interest

Happy 24th birthday, Linux kernel (Opensource.com)

Opensource.com wishes Linux a happy 24th birthday, with a brief timeline of Linux history. "There's some debate in the Linux community as to whether we should be celebrating Linux's birthday today or on October 5 when the first public release was made, but Linus says he is O.K. with you celebrating either one, or both! So as we say happy birthday, let's take a quick look back at the years that have passed and how far we have come."

Comments (2 posted)

Ubuntu on the Mainframe: Interview with Canonical's Dustin Kirkland (Linux.com)

Linux.com has an interview with Dustin Kirkland of Canonical's Ubuntu Product and Strategy team, about Ubuntu on the mainframe and more. "Canonical is doing a lot of different things in the enterprise space, to solve different problems. One of the interesting works going on at Canonical is Fan networking. We all know that the world is running out of IPv4 addresses (or already has). The obvious solution to this problem is IPv6, but it’s not universally available. Kirkland said, "There are still places where IPv6 doesn't exist -- little places like Amazon web services where you end up finding lots of containers." The problem multiplies as many instances in cloud need IP addresses. "Each of those instances can run hundreds of containers, each of those containers then needs to be addressable," said Kirkland."

Comments (none posted)

Calls for Presentations

Speaking Opportunities: O'Reilly Fluent 2016 - Developing the Web

The O'Reilly Fluent Conference will take place March 8-10, 2016 in San Francisco, CA. "Fluent covers the full scope of the Web Platform and associated technologies, including WebGL, CSS3, mobile APIs, Node.js, AngularJS, ECMAScript 6, and more." The call for papers closes September 21.

Full Story (comments: none)

CFP Deadlines: August 27, 2015 to October 26, 2015

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

DeadlineEvent Dates EventLocation
August 31 November 21
November 22
PyCon Spain 2015 Valencia, Spain
August 31 October 19
October 22
Perl Dancer Conference 2015 Vienna, Austria
August 31 November 5
November 7
systemd.conf 2015 Berlin, Germany
August 31 October 9 Innovation in the Cloud Conference San Antonio, TX, USA
August 31 November 10
November 11
Open Compliance Summit Yokohama, Japan
September 1 October 1
October 2
PyConZA 2015 Johannesburg, South Africa
September 6 October 10 Programistok Białystok, Poland
September 12 October 10 Poznańska Impreza Wolnego Oprogramowania Poznań, Poland
September 15 November 9
November 11
PyData NYC 2015 New York, NY, USA
September 15 November 14
November 15
NixOS Conference 2015 Berlin, Germany
September 20 October 26
October 28
Samsung Open Source Conference Seoul, South Korea
September 21 March 8
March 10
Fluent 2016 San Francisco, CA, USA
September 25 December 5
December 6
openSUSE.Asia Summit Taipei, Taiwan
September 27 November 9
November 11
KubeCon San Francisco, CA, USA
September 28 November 14
November 15
PyCon Czech 2015 Brno, Czech Republic
September 30 November 28 Technical Dutch Open Source Event Eindhoven, The Netherlands
September 30 November 7
November 8
OpenFest 2015 Sofia, Bulgaria
September 30 December 27
December 30
32. Chaos Communication Congress Hamburg, Germany
October 1 April 4
April 6
Web Audio Conference Atlanta, GA, USA
October 2 October 29 FOSS4G Belgium 2015 Brussels, Belgium
October 2 December 8
December 9
Node.js Interactive Portland, OR, USA
October 15 November 21 LinuxPiter Conference Saint-Petersburg, Russia

If the CFP deadline for your event does not appear here, please tell us about it.

Upcoming Events

Events: August 27, 2015 to October 26, 2015

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
August 28
September 3
ownCloud Contributor Conference Berlin, Germany
August 29 EmacsConf 2015 San Francisco, CA, USA
September 2
September 6
End Summer Camp Forte Bazzera (VE), Italy
September 10
September 12
FUDcon Cordoba Córdoba, Argentina
September 10
September 13
International Conference on Open Source Software Computing 2015 Amman, Jordan
September 11
September 13
vBSDCon 2015 Reston, VA, USA
September 15
September 16
verinice.XP Berlin, Germany
September 16
September 18
PostgresOpen 2015 Dallas, TX, USA
September 16
September 18
X.org Developer Conference 2015 Toronto, Canada
September 19
September 20
WineConf 2015 Vienna, Austria
September 21
September 23
Octave Conference 2015 Darmstadt, Germany
September 21
September 25
Linaro Connect San Francisco 2015 San Francisco, CA, USA
September 22
September 24
NGINX Conference San Francisco, CA, USA
September 22
September 23
Lustre Administrator and Developer Workshop 2015 Paris, France
September 23
September 25
LibreOffice Conference Aarhus, Denmark
September 23
September 25
Surge 2015 National Harbor, MD, USA
September 24 PostgreSQL Session 7 Paris, France
September 25
September 27
PyTexas 2015 College Station, TX, USA
September 28
September 30
Nagios World Conference 2015 Saint Paul, MN, USA
September 28
September 30
OpenMP Conference Aachen, Germany
September 29
September 30
Open Source Backup Conference 2015 Cologne, Germany
September 30
October 2
Kernel Recipes 2015 Paris, France
October 1
October 2
PyConZA 2015 Johannesburg, South Africa
October 2
October 4
PyCon India 2015 Bangalore, India
October 2
October 3
Ohio LinuxFest 2015 Columbus, OH, USA
October 5
October 7
LinuxCon Europe Dublin, Ireland
October 5
October 7
Qt World Summit 2015 Berlin, Germany
October 5
October 7
Embedded Linux Conference Europe Dublin, Ireland
October 8 OpenWrt Summit Dublin, Ireland
October 8
October 9
CloudStack Collaboration Conference Europe Dublin, Ireland
October 8
October 9
GStreamer Conference 2015 Dublin, Ireland
October 9 Innovation in the Cloud Conference San Antonio, TX, USA
October 10 Programistok Białystok, Poland
October 10 Poznańska Impreza Wolnego Oprogramowania Poznań, Poland
October 10
October 11
OpenRISC Conference 2015 Geneva, Switzerland
October 14
October 16
XII Latin American Free Software Foz do Iguacu, Brazil
October 17 Central Pennsylvania Open Source Conference Lancaster, PA, USA
October 18
October 20
2nd Check_MK Conference Munich, Germany
October 19
October 23
Tcl/Tk Conference Manassas, VA, USA
October 19
October 22
ZendCon 2015 Las Vegas, NV, USA
October 19
October 22
Perl Dancer Conference 2015 Vienna, Austria
October 21
October 22
Real Time Linux Workshop Graz, Austria
October 23
October 24
Seattle GNU/Linux Conference Seattle, WA, USA
October 24
October 25
PyCon Ireland 2015 Dublin, Ireland

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds